Commit Graph

13162 Commits

Author SHA1 Message Date
Ed Maste
0ba1b36553 Rationalize license text on Linuxolator files
Many licenses on Linuxolator files contained small variations from the
standard FreeBSD license text.  To avoid license proliferation switch to
the standard 2-clause FreeBSD license for those files where I have
permission from each of the listed copyright holders.  Additional files
waiting on permission from others are listed in review D14210.

Approved by:	kan, marcel, sos, rdivacky
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2018-02-16 15:00:14 +00:00
Conrad Meyer
5bd0149714 x86 pmap: Make memory mapped via pmap_qenter() non-executable
The idea is, the pmap_qenter() API is now defined to not produce executable
mappings.  If you need executable mappings, use another API.

Add pg_nx flag in pmap_qenter on x86 to make kernel pages non-executable.

Other architectures that support execute-specific permissons on page table
entries should subsequently be updated to match.

Submitted by:	Darrick Lew <darrick.freebsd AT gmail.com>
Reviewed by:	markj
Discussed with:	alc, jhb, kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14062
2018-02-14 23:35:47 +00:00
Hans Petter Selasky
33ec1ccbae Import the mthca kernel side infiniband driver from Linux 4.9 and fix
compilation under FreeBSD. The mthca driver was temporarily removed as
part of the Linux 4.9 RoCE/infinband upgrade.

Top commit in Linux source tree:
69973b830859bc6529a7a0468ba0d80ee5117826

Sponsored by:	Mellanox Technologies
2018-02-13 17:04:34 +00:00
Jeff Roberson
e958ad4cf3 Make v_wire_count a per-cpu counter(9) counter. This eliminates a
significant source of cache line contention from vm_page_alloc().  Use
accessors and vm_page_unwire_noq() so that the mechanism can be easily
changed in the future.

Reviewed by:	markj
Discussed with:	kib, glebius
Tested by:	pho (earlier version)
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14273
2018-02-12 22:53:00 +00:00
Warner Losh
982e7bdafc We don't support gcc < 4.2.1, so varargs.h now is just #error
always. Unifdef for versions prior to 4.2.1 and remove now-unused
header files.

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D14323
2018-02-12 14:48:14 +00:00
Mark Johnston
ab7c09f121 Use vm_page_unwire_noq() instead of directly modifying page wire counts.
No functional change intended.

Reviewed by:	alc, kib (previous revision)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D14266
2018-02-08 19:28:51 +00:00
Jeff Roberson
e2068d0bcd Use per-domain locks for vm page queue free. Move paging control from
global to per-domain state.  Protect reservations with the free lock
from the domain that they belong to.  Refactor to make vm domains more
of a first class object.

Reviewed by:    markj, kib, gallatin
Tested by:      pho
Sponsored by:   Netflix, Dell/EMC Isilon
Differential Revision:  https://reviews.freebsd.org/D14000
2018-02-06 22:10:07 +00:00
Konstantin Belousov
5370a081a8 Move signal trampolines out of locore.s into separate source file.
Similar to other arches, the move makes the subject of locore.s only
the kernel startup.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-02-06 00:02:30 +00:00
Ed Maste
8a3b44cfc2 Additional linuxolator whitespace cleanup, missed in r328890 2018-02-05 18:39:06 +00:00
Ed Maste
132f90c660 Linuxolator whitespace cleanup
A version of each of the MD files by necessity exists for each CPU
architecture supported by the Linuxolator.  Clean these up so that new
architectures do not inherit whitespace issues.

Clean up shared Linuxolator files while here.

Sponsored by:	Turing Robotic Industries Inc.
2018-02-05 17:29:12 +00:00
Konstantin Belousov
319117fd57 IBRS support, AKA Spectre hardware mitigation.
It is coded according to the Intel document 336996-001, reading of the
patches posted on lkml, and some additional consultations with Intel.

For existing processors, you need a microcode update which adds IBRS
CPU features, and to manually enable it by setting the tunable/sysctl
hw.ibrs_disable to 0.  Current status can be checked in sysctl
hw.ibrs_active.  The mitigation might be inactive if the CPU feature
is not patched in, or if CPU reports that IBRS use is not required, by
IA32_ARCH_CAP_IBRS_ALL bit.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14029
2018-01-31 14:36:27 +00:00
Bryan Drewery
595109196a Don't use an .OBJDIR for 'make sysent'.
Reported by:	emaste, jhb
Sponsored by:	Dell EMC
2018-01-29 19:14:15 +00:00
Warner Losh
d6b6639713 Add ISA PNP tables to ISA drivers. Fix a few incidental comments.
ACPI ISA PBP tables not tagged, there's bigger issues with them.
2018-01-29 00:22:30 +00:00
Konstantin Belousov
c8f9c1f3d9 Use PCID to optimize PTI.
Use PCID to avoid complete TLB shootdown when switching between user
and kernel mode with PTI enabled.

I use the model close to what I read about KAISER, user-mode PCID has
1:1 correspondence to the kernel-mode PCID, by setting bit 11 in PCID.
Full kernel-mode TLB shootdown is performed on context switches, since
KVA TLB invalidation only works in the current pmap. User-mode part of
TLB is flushed on the pmap activations as well.

Similarly, IPI TLB shootdowns must handle both kernel and user address
spaces for each address.  Note that machines which implement PCID but
do not have INVPCID instructions, cause the usual complications in the
IPI handlers, due to the need to switch to the target PCID temporary.
This is racy, but because for PCID/no-INVPCID we disable the
interrupts in pmap_activate_sw(), IPI handler cannot see inconsistent
state of CPU PCID vs PCPU pmap/kcr3/ucr3 pointers.

On the other hand, on kernel/user switches, CR3_PCID_SAVE bit is set
and we do not clear TLB.

I can imagine alternative use of PCID, where there is only one PCID
allocated for the kernel pmap. Then, there is no need to shootdown
kernel TLB entries on context switch. But copyout(3) would need to
either use method similar to proc_rwmem() to access the userspace
data, or (in reverse) provide a temporal mapping for the kernel buffer
into user mode PCID and use trampoline for copy.

Reviewed by:	markj (previous version)
Tested by:	pho
Discussed with:	alc (some aspects)
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D13985
2018-01-27 11:49:37 +00:00
Ed Maste
7eb2159f6a Use BSD-2-Clause-FreeBSD license on linux_support.s
These files previously had a 3-clause license and 'THE REGENTS' text.
Switch to standard 2-clause text with kib's approval, and add the SPDX
tag.

Approved by:	kib
2018-01-23 20:35:43 +00:00
Pedro F. Giffuni
ac2fffa4b7 Revert r327828, r327949, r327953, r328016-r328026, r328041:
Uses of mallocarray(9).

The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.

Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.

Reported by:	wosch
PR:		225197
2018-01-21 15:42:36 +00:00
Roger Pau Monné
50a53194f6 xen: fix IDT setup after PTI
On amd64 the IDT handler was not set correctly when using PTI.

While there also fix the selectors to SEL_KPL.

Obtained from:	kib
MFC with:	r328083
2018-01-20 14:59:37 +00:00
Nathan Whitehorn
ad6b97e7ca Define PHYS_TO_DMAP() and DMAP_TO_PHYS() as panics on the architectures
(i386 and arm) that never implement them. This allows the removal of
#ifdef PHYS_TO_DMAP on code otherwise protected by a runtime check on
PMAP_HAS_DMAP. It also fixes the build on ARM and i386 after I forgot an
#ifdef in r328168.

Reported by:	Milan Obuch
Pointy hat to:	me
2018-01-19 22:17:13 +00:00
Nathan Whitehorn
9a8196ce19 Remove SFBUF_OPTIONAL_DIRECT_MAP and such hacks, replacing them across the
kernel by PHYS_TO_DMAP() as previously present on amd64, arm64, riscv, and
powerpc64. This introduces a new MI macro (PMAP_HAS_DMAP) that can be
evaluated at runtime to determine if the architecture has a direct map;
if it does not (or does) unconditionally and PMAP_HAS_DMAP is either 0 or
1, the compiler can remove the conditional logic.

As part of this, implement PHYS_TO_DMAP() on sparc64 and mips64, which had
similar things but spelled differently. 32-bit MIPS has a partial direct-map
that maps poorly to this concept and is unchanged.

Reviewed by:		kib
Suggestions from:	marius, alc, kib
Runtime tested on:	amd64, powerpc64, powerpc, mips64
2018-01-19 17:46:31 +00:00
John Baldwin
b1288166e0 Use long for the last argument to VOP_PATHCONF rather than a register_t.
pathconf(2) and fpathconf(2) both return a long.  The kern_[f]pathconf()
functions now accept a pointer to a long value rather than modifying
td_retval directly.  Instead, the system calls explicitly store the
returned long value in td_retval[0].

Requested by:	bde
Reviewed by:	kib
Sponsored by:	Chelsio Communications
2018-01-17 22:36:58 +00:00
Konstantin Belousov
bd50262f70 PTI for amd64.
The implementation of the Kernel Page Table Isolation (KPTI) for
amd64, first version. It provides a workaround for the 'meltdown'
vulnerability.  PTI is turned off by default for now, enable with the
loader tunable vm.pmap.pti=1.

The pmap page table is split into kernel-mode table and user-mode
table. Kernel-mode table is identical to the non-PTI table, while
usermode table is obtained from kernel table by leaving userspace
mappings intact, but only leaving the following parts of the kernel
mapped:

    kernel text (but not modules text)
    PCPU
    GDT/IDT/user LDT/task structures
    IST stacks for NMI and doublefault handlers.

Kernel switches to user page table before returning to usermode, and
restores full kernel page table on the entry. Initial kernel-mode
stack for PTI trampoline is allocated in PCPU, it is only 16
qwords.  Kernel entry trampoline switches page tables. then the
hardware trap frame is copied to the normal kstack, and execution
continues.

IST stacks are kept mapped and no trampoline is needed for
NMI/doublefault, but of course page table switch is performed.

On return to usermode, the trampoline is used again, iret frame is
copied to the trampoline stack, page tables are switched and iretq is
executed.  The case of iretq faulting due to the invalid usermode
context is tricky, since the frame for fault is appended to the
trampoline frame.  Besides copying the fault frame and original
(corrupted) frame to kstack, the fault frame must be patched to make
it look as if the fault occured on the kstack, see the comment in
doret_iret detection code in trap().

Currently kernel pages which are mapped during trampoline operation
are identical for all pmaps.  They are registered using
pmap_pti_add_kva().  Besides initial registrations done during boot,
LDT and non-common TSS segments are registered if user requested their
use.  In principle, they can be installed into kernel page table per
pmap with some work.  Similarly, PCPU can be hidden from userspace
mapping using trampoline PCPU page, but again I do not see much
benefits besides complexity.

PDPE pages for the kernel half of the user page tables are
pre-allocated during boot because we need to know pml4 entries which
are copied to the top-level paging structure page, in advance on a new
pmap creation.  I enforce this to avoid iterating over the all
existing pmaps if a new PDPE page is needed for PTI kernel mappings.
The iteration is a known problematic operation on i386.

The need to flush hidden kernel translations on the switch to user
mode make global tables (PG_G) meaningless and even harming, so PG_G
use is disabled for PTI case.  Our existing use of PCID is
incompatible with PTI and is automatically disabled if PTI is
enabled.  PCID can be forced on only for developer's benefit.

MCE is known to be broken, it requires IST stack to operate completely
correctly even for non-PTI case, and absolutely needs dedicated IST
stack because MCE delivery while trampoline did not switched from PTI
stack is fatal.  The fix is pending.

Reviewed by:	markj (partially)
Tested by:	pho (previous version)
Discussed with:	jeff, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2018-01-17 11:44:21 +00:00
Pedro F. Giffuni
74641f0bc6 x86: make some use of mallocarray(9).
Focus on code where we are doing multiplications within malloc(9). None of
these ire likely to overflow, however the change is still useful as some
static checkers can benefit from the allocation attributes we use for
mallocarray.

This initial sweep only covers malloc(9) calls with M_NOWAIT. No good
reason but I started doing the changes before r327796 and at that time it
was convenient to make sure the sorrounding code could handle NULL values.

X-Differential revision: https://reviews.freebsd.org/D13837
2018-01-15 21:08:22 +00:00
Ed Maste
b6bf2e7c35 Enable VIMAGE in i386 GENERIC (revert r327840)
We've switched back to ld.bfd on i386 for now.

PR:		225077
Sponsored by:	The FreeBSD Foundation
2018-01-14 16:04:51 +00:00
Jeff Roberson
ab3185d15e Implement NUMA support in uma(9) and malloc(9). Allocations from specific
domains can be done by the _domain() API variants.  UMA also supports a
first-touch policy via the NUMA zone flag.

The slab layer is now segregated by VM domains and is precise.  It handles
iteration for round-robin directly.  The per-cpu cache layer remains
a mix of domains according to where memory is allocated and freed.  Well
behaved clients can achieve perfect locality with no performance penalty.

The direct domain allocation functions have to visit the slab layer and
so require per-zone locks which come at some expense.

Reviewed by:	Attilio (a slightly older version)
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
2018-01-12 23:25:05 +00:00
Ed Maste
d162bbc1d3 Temporarily disable VIMAGE on i386
An lld-linked i386 kernel hangs on boot, after em(4) starts.  This seems
similar to the issue that prompted us to disable VIMAGE on arm64 in
r326179.  Disable VIMAGE on i386 for now while we investigate.

PR:		225077
Sponsored by:	The FreeBSD Foundation
2018-01-11 19:08:43 +00:00
Konstantin Belousov
c751f90c0c Fix grammar.
Submitted by:	alc
MFC after:	3 days
2018-01-11 16:50:03 +00:00
Konstantin Belousov
6da5c56ae5 Remove redundand CLD instructions.
We already clear %RFLAGS.DF on the kernel entry due to the compiler's
ABI requirements.

Suggested by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2018-01-11 13:22:13 +00:00
Konstantin Belousov
3ee6e65875 Update comment explaining the check, to reality.
Discussed with:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2018-01-11 12:07:24 +00:00
Conrad Meyer
e6fcf7898d x86: Document purpose of _safe variants of {rd,wr}msr()
Sponsored by:	Dell EMC Isilon
2018-01-10 22:41:00 +00:00
Warner Losh
a333cdf369 inittodr(0) actually sets the time, so there's no need to call
tc_setclock(). It's redundant. Tweak UPDATING based on code review of
past releases.

Relnotes: yes (for the removal of pmtimer)
2018-01-10 17:25:08 +00:00
Warner Losh
41add9e267 Move prof_machdep.c to it's more traditional place under i386/i386. 2018-01-10 16:51:55 +00:00
Warner Losh
695d254365 Retire pmtimer driver. Move time fixing into apm driver. Move
Iwasaki-san's copyright over. Remove FIXME code that couldn't possibly
work. Call tc_settime() with our estimate of the delta we've been
alseep (the one we print) to adjust the time. Not sure what to do
about callouts, so keep the small #ifdef in place there.

Differential Revision: https://reviews.freebsd.org/D13823
2018-01-10 14:59:19 +00:00
Warner Losh
9c5a114886 Remove ccbque.h from i386/isa.
inline ccbque.h into scsi_low.h. The file isn't MD, so shouldn't live
in i386/isa. It's only used by scsi_low, so move it there so no new
clients accidentally grow. scsi_low may not even still work, and the
locking here is still SPL based. CAM should do the right thing, but
I've received no reports of these cards still working. At least it
compiles still and there's one fewer files in sys/i386/isa. While I'm
here, ansify and de-splize. CCB_MWANTED appears to be a clear-only
flag, but I've not changed that.

Differential Review: https://reviews.freebsd.org/D13672
2018-01-09 16:11:33 +00:00
Konstantin Belousov
84874cc151 Avoid re-check of usermode condition.
It does not change anything in the behavior of trap_pfault(), while
eliminating obfuscation of jumping to the code which checks for the
condition reversed of the goto cause.  Also avoid force initialize the
rv variable, since it is now only accessed after storing vm_fault()
return value.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D13725
2018-01-01 20:47:03 +00:00
Konstantin Belousov
abbfe9e5d1 Move i386/isa/elink.[hc] to dev/ep.
The ep(4) driver is the only consumer of the two functions from
elink.c.  I removed the standalone module as well, and most likely,
the module metadata is not needed anywhere, but this is for later
cleanup.

Discussed with:	imp, jhb
Sponsored by:	The FreeBSD Foundation
2017-12-30 11:42:49 +00:00
Konstantin Belousov
0cc997d237 Move i386/isa/npx.c to i386i386/npx.c.
The i386 FPU (AKA npx) code does not depend on ISA devices at all,
after the support for IRQ13 FPU exceptions was removed.  Put the file
into the expected place in the kernel source tree.

Discussed with:	jhb
Sponsored by:	The FreeBSD Foundation
2017-12-30 11:33:04 +00:00
Warner Losh
d14e21a831 Remove two stray references to wl driver, removed some time ago. 2017-12-30 07:59:32 +00:00
John Baldwin
9a1bf495b1 Remove a few more dangling references to ie(4). 2017-12-28 21:35:53 +00:00
Pedro F. Giffuni
aa0e6abb41 sys/i386/isa/elink*: sync with (older) NetBSD.
Make it easier to identify the point where we started diverging from
NetBSD. Newer versions of these files have been updated to a new license
so this information may become useful some day.

Obtained from:	NetBSD
2017-12-28 14:14:35 +00:00
Eitan Adler
caa7e52f3f kernel: Fix several typos and minor errors
- duplicate words
- typos
- references to old versions of FreeBSD

Reviewed by:	imp, benno
2017-12-27 03:23:21 +00:00
Konstantin Belousov
30d4f9e888 Add atomic_load(9) and atomic_store(9) operations.
They provide relaxed-ordered atomic access semantic.  Due to the
FreeBSD memory model, the operations are syntaxical wrappers around
the volatile accesses.  The volatile qualifier is used to ensure that
the access not optimized out and in turn depends on the volatile
semantic as implemented by supported compilers.

The motivation for adding the operation is to help people coming from
other systems or knowing the C11/C++ standards where atomics have
special type and require use of the special access operations.  It is
still the case that FreeBSD requires plain load and stores of aligned
integer types to be atomic.

Suggested by:	jhb
Reviewed by:	alc, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D13534
2017-12-19 09:59:20 +00:00
Bruce Evans
d5d5600725 Also forgotten in the previous that removed the permanent double mapping
of low physical memory:

Update the comment about leaving the permanent mapping in place.  This
also improves the wording of the comment.  PTD 0 is still left alone
because it is fairly important that it was unmapped earlier, and the
comment now describes the unmapping of the other low PTDs that the code
actually does.

Reviewed by:	kib
2017-12-18 14:29:48 +00:00
Bruce Evans
2ba6fe0009 Remove the permanent double mapping of low physical memory and replace
it by a transient double mapping for the one instruction in ACPI wakeup
where it is needed (and for many surrounding instructions in ACPI resume).
Invalidate the TLB as soon as convenient after undoing the transient
mapping.  ACPI resume already has the strict ordering needed for this.

This fixes the non-trapping of null pointers and other garbage pointers
below NBPDR (except transiently).  NBPDR is quite large (4MB, or 2MB for
PAE).

This fixes spurious traps at the first instruction in VM86 bioscalls.
The traps are for transiently missing read permission in the first
VM86 page (physical page 0) which was just written to at KERNBASE in
the kernel.  The mechanism is unknown (it is not simply PG_G).

locore uses a similar but larger transient double mapping and needs
it for 2 instructions instead of 1.  Unmap the first PDE in it after
the 2 instructions to detect most garbage pointers while bootstrapping.
pmap_bootstrap() finishes the unmapping.

Remove the avoidance of the double mapping for a recently fixed special
case.  ACPI resume could use this avoidance (made non-special) to avoid
any problems with the transient double mapping, but no such problems
are known.

Update comments in locore.  Many were for old versions of FreeBSD which
tried to map low memory r/o except for special cases, or might have
allowed access to low memory via physical offsets.  Now all kernel
maps are r/w, and removal of of the double map disallows use of physical
offsets again.
2017-12-18 13:53:22 +00:00
Bruce Evans
4a5eb9ac99 Fix the undersupported option KERNLOAD, part 2: fix crashes in locore
when KERNLOAD is smaller than NBPDR (not the default) and PG_G is
enabled (the default if the CPU supports it).  This case has relatively
minor problems with coherency of the permanent double mapping, but the
fix in r167869 to improve coherency creates page tables with 3 different
errors so never worked.

The permanent double mapping is fundamentally broken and will be removed
soon.  It fundamentally breaks trapping for null pointers and requires
complications to avoid cache coherency bugs.  It is currently used for
only a single instruction in ACPI resume,

Many fixes VM86 and/or ACPI and/or the double map were attempted near
r1200000.  r167869 attempted to fix cache coherency bugs in an unusual
case, but the bugs were unreachable because older errors in page tables
caused a crash first.

This commit just makes r167869 work as intended.  Part 1 of these fixes
fixed the other errors, but also stopped mapping the PDE for KERNBASE
as a large page, so double mapping of this PDE only causes the same
problems as when KERNLOAD is the default.  Except for the problem of
trapping null pointers, r167869 could be used to fix these problems,
but it is inactive in usual cases.  The only known other problem is
that incoherent permissions for page 0 cause spurious traps in VM86
BIOS calls.

Reviewed by:	kib
2017-12-18 11:57:05 +00:00
Bruce Evans
3f21dc2985 Fix the undersupported option KERNLOAD, part 1: fix crashes in locore
when KERNLOAD is not a multiple of NBPDR (not the default) and PSE is
enabled (the default if the CPU supports it).  Addresses in PDEs must
be a multiple of NBPDR in the PSE case, but were not so in the crashing
case.

KERNLOAD defaults to NBPDR.  NBPDR is 4 MB for !PAE and 2 MB for PAE.
The default can be changed by editing i386/include/vmparam.h or using
makeoptions.  It can be changed to less than NBPDR to save real and
virtual memory at a small cost in time, or to more than NBPDR to waste
real and virtual memory.  It must be larger than 1 MB and a multiple of
PAGE_SIZE.  When it is less than NBPDR, it is necessarily not a multiple
of NBPDR.  This case has much larger bugs which will be fixed in part 2.

The fix is to only use PSE for physical addresses above <KERNLOAD
rounded _up_ to an NBPDR boundary>.  When the rounding is non-null,
this leaves part of the kernel not using large pages.  Rounding down
would avoid this pessimization, but would break setting of PAT bits
on i/o pages if it goes below 1MB.  Since rounding down always goes
below 1MB when KERNLOAD < NBPDR and the KERNLOAD > NBPDR case is not
useful, never round down.

Fix related style bugs (e.g., wrong literal values for NBPDR in comments).

Reviewed by:	kib
2017-12-18 09:32:56 +00:00
Bruce Evans
c93ea91c54 Minor cleanups found while fixing a bug involving double mapping of low
memory:

Load the kernel eflags less magically, as in locore.  The magic increased
when I removed eflags from the pcb in r305899.

Remove a jump to low memory that became garbage when the i386 version was
mostly replaced by the amd64 version in r235622.

The amd64 version is very similar.  It still loads the flags magically,
but is not missing comments about using the special page table.

Reviewed by:	kib
2017-12-15 03:05:14 +00:00
Mark Johnston
5bab623438 Pass the trap frame to fasttrap hooks.
The DTrace fasttrap entry points expect a struct reg containing the
register values of the calling thread. Perform the conversion in
fasttrap rather than in the trap handler: this reduces the number of
ifdefs and avoids wasting stack space for traps that don't involve
DTrace.

MFC after:	2 weeks
2017-12-11 19:21:39 +00:00
Conrad Meyer
3f6867ef63 i386: Bump KSTACK_PAGES default to match amd64
Logically, extend r286288 to cover all threads, by default.

The world has largely moved on from i386.  Most FreeBSD users and developers
test on amd64 hardware.  For better or worse, we have written a non-trivial
amount of kernel code that relies on stacks larger than 8 kB, and it "just
works" on amd64, so there has been little incentive to shrink it.

amd64 had its KSTACK_PAGES bumped to 4 back in Peter's initial AMD64 commit,
r114349, in 2003.  Since that time, i386 has limped along on a stack half
the size.  We've even observed the stack overflows years ago, but neglected
to fix the issue; see the 20121223 and 20150728 entries in UPDATING.

If anyone is concerned with this change, I suggest they configure their
AMD64 kernels with KSTACK_PAGES 2 and fix the fallout there first.  Eugene
has identified a list of high stack usage functions in the first PR below.

PR:		219476, 224218
Reported by:	eugen@, Shreesh Holla <hshreesh AT yahoo.com>
Relnotes:	maybe
Sponsored by:	Dell EMC Isilon
2017-12-11 04:32:37 +00:00
Bruce Evans
fb3cc1c37d Move instantiation of msgbufp from 9 MD files to subr_prf.c.
This variable should be pure MI except possibly for reading it in MD
dump routines.  Its initialization was pure MD in 4.4BSD, but FreeBSD
changed this in r36441 in 1998.  There were many imperfections in
r36441.  This commit fixes only a small one, to simplify fixing the
others 1 arch at a time.  (r47678 added support for
special/early/multiple message buffer initialization which I want in
a more general form, but this was too fragile to use because hacking
on the msgbufp global corrupted it, and was only used for 5 hours in
-current...)
2017-12-07 07:55:38 +00:00
Pedro F. Giffuni
64de3fdd58 SPDX: use the Beerware identifier. 2017-11-30 20:33:45 +00:00
Scott Long
c15269ccb8 It's time to retire AHC_REG_PRETTY_PRINT and AHD_REG_PRETTY_PRINT from
the standard kernels.  They are still available as custom compile
options.
2017-11-29 23:41:49 +00:00
Brooks Davis
5cd667e65f Disable vim syntax highlighting.
Vim's default pick doesn't understand that ';' is a comment character
and the result looks horrible.

Reviewed by:	emaste
2017-11-28 18:23:17 +00:00
Fedor Uporov
cd76ee1ee3 Remap ENOATTR to ENODATA in the linuxulator.
In the linux ENOADATA is frequently #defined as ENOATTR.
The change is required for an xattrs support implementation.

MFC after: 1 week
Discussed with: netchild
Approved by: pfg

Differential Revision: https://reviews.freebsd.org/D13221
2017-11-27 17:03:11 +00:00
Pedro F. Giffuni
83ef78be95 sys/i386: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:08:52 +00:00
Ed Schouten
ee13ffbe03 Use TO_PTR() to convert integers to pointers.
For FreeBSD/arm64's cloudabi32 support, I'm going to need a TO_PTR() in
this place. Also use it for all of the other source files, so that the
difference remains as minimal as possible.

MFC after:	2 weeks
2017-11-26 14:45:56 +00:00
Hans Petter Selasky
8a53e1340f Merge ^/head r326132 through r326161. 2017-11-24 12:13:27 +00:00
Hans Petter Selasky
322f006ecc Implement atomic_fetchadd_64() for i386. This function is needed by the
atomic64 header file in the LinuxKPI for i386.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-11-24 12:10:42 +00:00
Ed Schouten
814629dd64 Don't let cpu_set_syscall_retval() clobber exec_setregs().
Upon successful completion, the execve() system call invokes
exec_setregs() to initialize the registers of the initial thread of the
newly executed process. What is weird is that when execve() returns, it
still goes through the normal system call return path, clobbering the
registers with the system call's return value (td->td_retval).

Though this doesn't seem to be problematic for x86 most of the times (as
the value of eax/rax doesn't matter upon startup), this can be pretty
frustrating for architectures where function argument and return
registers overlap (e.g., ARM). On these systems, exec_setregs() also
needs to initialize td_retval.

Even worse are architectures where cpu_set_syscall_retval() sets
registers to values not derived from td_retval. On these architectures,
there is no way cpu_set_syscall_retval() can set registers to the way it
wants them to be upon the start of execution.

To get rid of this madness, let sys_execve() return EJUSTRETURN. This
will cause cpu_set_syscall_retval() to leave registers intact. This
makes process execution easier to understand. It also eliminates the
difference between execution of the initial process and successive ones.
The initial call to sys_execve() is not performed through a system call
context.

Reviewed by:	kib, jhibbits
Differential Revision:	https://reviews.freebsd.org/D13180
2017-11-24 07:35:08 +00:00
Hans Petter Selasky
82725ba9bf Merge ^/head r325999 through r326131. 2017-11-23 14:28:14 +00:00
Konstantin Belousov
383f241dce Remove lint support from system headers and MD x86 headers.
Reviewed by:	dim, jhb
Discussed with:	imp
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D13156
2017-11-23 11:40:16 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Hans Petter Selasky
937d37fc6c Merge ^/head r325842 through r325998. 2017-11-19 12:36:03 +00:00
Pedro F. Giffuni
df57947f08 spdx: initial adoption of licensing ID tags.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes:	yes
Differential Revision:	https://reviews.freebsd.org/D13133
2017-11-18 14:26:50 +00:00
Konstantin Belousov
4e421792ec Remove i386 XBOX support.
It is for console presented at 2001 and featuring Pentium III
processor.  Even if any of them are still alive and run FreeBSD, we do
not have any sign of life from their users.  While removing another
dozens of #ifdefs from the i386 sources reduces the aversion from
looking at the code and improves the platform vitality.

Reviewed by:	cem, pfg, rink (XBOX support author)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D13016
2017-11-16 14:27:02 +00:00
Hans Petter Selasky
8dee9a7a44 Remove no longer supported mthca driver.
Sponsored by:	Mellanox Technologies
2017-11-13 10:59:38 +00:00
Konstantin Belousov
b1499f0e77 Remove useless DEBUG printfs in i386 sendsig() implementations.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-11-08 13:05:14 +00:00
Konstantin Belousov
5b9a3721e6 x86: Do not emit unused TD_TID symbols.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-11-04 10:51:52 +00:00
Konstantin Belousov
ec1d28b8df Eliminate unused load.
Based on github pull request:	#117
Submitted by:	Wuyang-Chung@github
MFC after:	1 week
2017-11-04 10:50:47 +00:00
Konstantin Belousov
aa788cc387 Consistently ensure that we do not load MXCSR with reserved bits set.
Some callers of fpusetregs()/npxsetregs(), most importantly
set_fpcontext(), clear reserved bits.  But some did not.  Do the
clearing in fpusetregs() and remove now redundand operation from
set_fpcontext().

Reported by:	Maxime Villard <max@m00nbsd.net>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-11-01 10:32:44 +00:00
Tijl Coosemans
f236378b54 Set the return address for stack entry points to zero.
Stack unwinders treat zero as a stop condition.  The value on the stack can
be non-zero because thread stacks may be arbitrary memory provided via
pthread_attr_setstack(3) or may be recycled from previous threads.

Reference:
https://lists.freebsd.org/pipermail/freebsd-current/2017-August/066855.html
https://lists.freebsd.org/pipermail/freebsd-current/2017-October/067254.html

Discussed with:	kib
MFC after:	1 week
2017-10-31 11:51:34 +00:00
Eitan Adler
a2aef24aa3 Update several more URLs
- Primarily http -> https
- Primarily FreeBSD project URLs
2017-10-29 08:17:03 +00:00
Mark Johnston
5fca1d90c1 Fix the VM_NRESERVLEVEL == 0 build.
Add VM_NRESERVLEVEL guards in the pmaps that implement transparent
superpage promotion using reservations.

Reviewed by:	alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D12764
2017-10-23 15:34:05 +00:00
Bjoern A. Zeeb
8e94025b41 With r181803 on 2008-08-17 23:27:27Z the first VIMAGE commit went into
HEAD.  Enable VIMAGE in GENERIC kernels and some others (where GENERIC does
not exist) on HEAD.

Disable building LINT-VIMAGE with VIMAGE being default.

This should give it a lot more exposure in the run-up to 12 to help
us evaluate whether to keep it on by default or not.
We are also hoping to get better performance testing.
The feature can be disabled using nooptions.

Requested by:		many
Reviewed by:		kristof, emaste, hiren
X-MFC after:		never
Relnotes:		yes
Differential Revision:	https://reviews.freebsd.org/D12639
2017-10-20 21:40:59 +00:00
Mark Johnston
46fcd1af63 Move kernel dump offset tracking into MI code.
All of the kernel dump implementations keep track of the current offset
("dumplo") within the dump device. However, except for textdumps, they
all write the dump sequentially, so we can reduce code duplication by
having the MI code keep track of the current offset. The new
dump_append() API can be used to write at the current offset.

This is needed to implement support for kernel dump compression in the
MI kernel dump code.

Also simplify dump_encrypted_write() somewhat: use dump_write() instead
of duplicating its bounds checks, and get rid of the redundant offset
tracking.

Reviewed by:	cem
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11722
2017-10-18 15:38:05 +00:00
Konstantin Belousov
00d37313f9 Change i386_get_ldt() to return 'EOF' when the requested range of
descriptors does not fit into currently allocated LDT, or trim the
return if the range fits partially.  Before, the function returned
EINVAL.

Fix two bugs in r324366: use capped num counter for malloc size, and
do not leak allocated buffer on EINVAL (by handling EINVAL case as
normal, see above).

Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-09 16:19:26 +00:00
Konstantin Belousov
13a5d9112c Improvements to set_user_ldt().
Remove mtx_owned() checks from set_user_ldt().  Split the function
into _locked() version which requires the dt_lock spinlock owned, and
make set_user_ldt() a wrapper.  Add a comment in swtch.s noting that
the call to the new set_user_ldt() cannot recurse on dt_lock.

Remove #ifdef SMP block, the addend is always zero on UP.

Fix type of set_user_ldt_rv(), making it match the type used for
smb_rendezvous() callback, and remove the cast.  Use curproc.

Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-09 16:07:27 +00:00
Konstantin Belousov
787f835bf0 Reset the fs and gs bases on exec(2).
The values from the old address space do not make sense for the new
program.  In particular, gsbase might be the TLS base for the old
program but the new program has no TLS now.

amd64 already handles this correctly.

Reported and reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-09 15:39:43 +00:00
Konstantin Belousov
4f7c931a11 More style.
Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-09 15:24:18 +00:00
Konstantin Belousov
a4730cdb82 Improve i386_get_ldt().
Provide consistent snapshot of the requested descriptors by preventing
other threads from modifying LDT while we fetch the data, lock dt_lock
around the read.  Copy the data into intermediate buffer, which is
copied out after the lock is dropped.

Comparing with the amd64 version, the read is done byte by byte, since
there is no atomic 64bit read (cmpxchg8b method is too heavy comparing
with the avoided issues).

Improve overflow checking for the descriptors range calculations and
remove unneeded casts.  Use unsigned types for sizes.

Allow zero num argument to i386_get_ldt() and i386_set_ldt().  This
case is handled naturally by the code flow.

Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-06 14:29:53 +00:00
Konstantin Belousov
57f99aede6 Remove unneeded cast.
Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-06 10:17:50 +00:00
Konstantin Belousov
98c2674e93 Style.
Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-06 10:16:57 +00:00
Konstantin Belousov
6cdf28b7ad Use ANSI C declarations.
Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-05 19:11:25 +00:00
Konstantin Belousov
1f32d2a7ba Correct format specifiers in the debug code. Style.
Requested by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-05 18:58:28 +00:00
Konstantin Belousov
711d5183b1 Style.
Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-05 18:42:13 +00:00
Konstantin Belousov
349216589d A different fix for the issue from r323722.
Split the handlers for pop of invalid selectors from the trap frame
into usermode and kernel variants.  Usermode handler is kept as is, it
restores the already loaded parts of the trap frame and jumps to set
up a signal delivery to the user process.

New kernel part of the handler emulates IRET treatment of the segments
which would violate access right.  It loads NUL selector in the
segment register which load causes the fault, and then continues the
return to interrupted kernel code.  Since invalid selectors in the
segment registers in the kernel mode can only exist while kernel still
enters or exits from userspace, we only zero invalid userspace
selectors.  If userspace tries to use the segment register, it gets a
signal, as if the processor segment descriptor cache was reloaded.

Reported by:	Maxime Villard <max@m00nbsd.net>
Suggested and reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-28 09:01:28 +00:00
Konstantin Belousov
053e8ce538 Restore a part of r323722.
Do not return from interrupt using the POP_FRAME;iret instruction
sequence, always jump to doreti.

The user segments selectors saved on the stack might become invalid
because userspace manipulated LDT in a parallel thread.  trap() is
aware of such issue, but it is only prepared to handle it at iret and
segment registers load operations in doreti path.

Also remove POP_FRAME macro because it is no longer used.

Reviewed by:	bde, jhb (as part of r323722)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-28 08:46:15 +00:00
Konstantin Belousov
d3c968bf84 Revert r323722. A better fix will be committed shortly, as well as
some still useful bits of the reverted revision.

The problem with the committed fix is that there are still issues with
returning from NMI, when NMI interrupted kernel in a moment where the
kernel segments selectors were still not loaded into registers.  If
this happens, the NMI return would loose the userspace selectors
because r323722 does not reload segment registers on return to kernel
mode.

Fixing the problem is complicated.  Since an alternative approach to
handle the original bug exists, it makes sence to stop adding more
complexity.

Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-28 08:38:24 +00:00
Josh Paetzel
c77037f16f Fix indentation for r323068
PR:	220170
Reported by:	lidl
MFC after:	3 days
Pointyhat to:	jpaetzel
2017-09-19 20:40:05 +00:00
Konstantin Belousov
3cabd93e26 Do not do torn writes to active LDTs.
Care must be taken when updating the active LDT, since parallel
threads might try to load a segment descriptor which is currently
updated. Since the results are undefined, this cannot be ignored by
claiming to be an application race.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D12413
2017-09-19 17:57:04 +00:00
Konstantin Belousov
5efe338f3d Fix handling of the segment registers on i386.
Suppose that userspace is executing with the non-standard segment
descriptors.  Then, until exception or interrupt handler executed
SET_KERNEL_SEGS, kernel is still executing with user %ds, %es and %fs.
If an interrupt occurs in this window, the interrupt handler is
executed unsafely, relying on usability of the usermode registers.  If
the interrupt results in the context switch on return, the
contamination of the kernel state spreads to the thread we switched
to.  As result, kernel data accesses might fault or, if only the base
is changed, completely messed up.

More, if the user segment was allocated in LDT, another thread might
mark the descriptor as invalid before doreti code tried to reload
them.  In this case kernel panics.

The issue exists for all exception entry points which use trap gate,
and thus do not automatically disable interrupts on entry, and for
lcall_handler.

Fix is two-fold: first, we need to disable interrupts for all kernel
entries, changing the IDT descriptor types from trap gate to interrupt
gate.  Interrupts are re-enabled not earlier than the kernel segments
are loaded into the segment registers.  Second, we only load the
segment registers from the trap frame when returning to usermode.  For
the later, all interrupt return paths must happen through the doreti
common code.

There is no way to disable interrupts on call gate, which is the
supposed mode of servicing for lcall $7,$0 syscalls.  Change the LDT
descriptor 0 into a code segment type and point it to the userspace
trampoline which redirects the syscall to int $0x80.

All the measures make the segment register handling similar to that of
amd64.  We do not apply amd64 optimizations of not reloading segment
registers on return from the syscall.

Reported by:	Maxime Villard <max@m00nbsd.net>
Tested by:	pho (the non-lcall part)
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D12402
2017-09-18 20:22:42 +00:00
Josh Paetzel
9d0ec2a920 Revert r323087
This needs more thinking out and consensus, and the commit message
was wrong AND there was a typo in the commit.

pointyhat:	jpaetzel
2017-09-01 17:03:48 +00:00
Josh Paetzel
0be04b100c Take options IPSEC out of GENERIC
PR:	220170
Submitted by:	delphij
Reviewed by:	ae, glebius
MFC after:	2 weeks
Differential Revision:	D11806
2017-09-01 15:54:53 +00:00
Josh Paetzel
3b65550eec Allow kldload tcpmd5
PR:	220170
MFC after:	2 weeks
2017-08-31 20:16:28 +00:00
Alexander Motin
ed9652da5f Add NTB driver for PLX/Avago/Broadcom PCIe switches.
This driver supports both NTB-to-NTB and NTB-to-Root Port modes (though
the second with predictable complications on hot-plug and reboot events).
I tested it with PEX 8717 and PEX 8733 chips, but expect it should work
with many other compatible ones too.  It supports up to two NT bridges
per chip, each of which can have up to 2 64-bit or 4 32-bit memory windows,
6 or 12 scratchpad registers and 16 doorbells.  There are also 4 DMA engines
in those chips, but they are not yet supported.

While there, rename Intel NTB driver from generic ntb_hw(4) to more specific
ntb_hw_intel(4), so now it is on par with this new ntb_hw_plx(4) driver and
alike to Linux naming.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2017-08-30 21:16:32 +00:00
Conrad Meyer
2744a0b69b Drop CACHE_LINE_SIZE to 64 bytes on x86
The actual cache line size has always been 64 bytes.

The 128 number arose as an optimization for Core 2 era Intel processors.  By
default (configurable in BIOS), these CPUs would prefetch adjacent cache
lines unintelligently.  Newer CPUs prefetch more intelligently.

The latest Core 2 era CPU was introduced in September 2008 (Xeon 7400
series, "Dunnington").  If you are still using one of these CPUs, especially
in a multi-socket configuration, consider locating the "adjacent cache line
prefetch" option in BIOS and disabling it.

Reported by:	mjg
Reviewed by:	np
Discussed with:	jhb
Sponsored by:	Dell EMC Isilon
2017-08-28 22:28:41 +00:00
Konstantin Belousov
5e52cd4324 MFamd64 r322720, r322723:
Simplify i386 trap().

- Use more relevant name 'signo' instead of 'i' for the local variable
  which contains a signal number to send for the current exception.
- Eliminate two labels 'userout' and 'out' which point to the very end
  of the trap() function.  Instead use return directly.
- Re-indent the prot_fault_translation block by reducing if() nesting.
- Some more monor style changes.

Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-08-26 18:12:25 +00:00
Konstantin Belousov
aefdf821cd Remove unused code.
The machdep.uprintf_signal sysctl replaced it in more convenient way,
not requiring recompilation to use and providing more information on
fault.

Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-08-26 18:09:27 +00:00
Konstantin Belousov
325b406c49 MFamd64 r322718:
Use ANSI C declaration for trap_pfault().  Style.

Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-08-26 18:06:29 +00:00
Konstantin Belousov
8ce5520255 MFamd64 r322719:
Trim excessive 'extern'.

Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-08-26 18:04:29 +00:00
Konstantin Belousov
cef5e2bd91 Use the known valid segment when accessing memory in #UD handler.
Make sure that %eflags.D flag is cleared for hook.
Improve comments.

When #UD dtrace code checks for a registered hook before checking that
the exception was raised from kernel mode, we might run with the user
%ds, trapping on access.  Exception entry from userspace automatically
load valid %ss, which we can use there instead.

Noted and reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2017-08-19 21:00:02 +00:00