Commit Graph

13474 Commits

Author SHA1 Message Date
Konstantin Belousov
48ec6d3bc9 Do not call hw_mds_recalculate() from initializecpu().
If MDS mitigation is enabled by the tunable but MDS microcode is not
early-loaded, software mitigation is selected.  This causes
initializecpu() to try to allocate memory which makes boot process
very unhappy.

Create SYSINIT that runs sufficiently late to succeed.

Reported by:	naddy
PR:	237968
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-05-21 22:56:21 +00:00
Conrad Meyer
e2e050c8ef Extract eventfilter declarations to sys/_eventfilter.h
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.

EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).

As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions.  The remainder of the patch addresses
adding appropriate includes to fix those files.

LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).

No functional change (intended).  Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed.  __FreeBSD_version has been bumped.
2019-05-20 00:38:23 +00:00
Brooks Davis
376f78be92 FCP-101: Bump __FreeBSD_version for driver removal.
Remove gone_by_fcp101_dev macro.

Remove orphaned comment.
2019-05-17 15:24:54 +00:00
Brooks Davis
7a582e5374 FCP-101: Remove xe(4)
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:24:44 +00:00
Brooks Davis
02fae06a11 FCP-101: Remove wb(4)
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:24:34 +00:00
Brooks Davis
e8504bf9e7 FCP-101: Remove vx(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:24:26 +00:00
Brooks Davis
be345ff023 FCP-101: Remove txp(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:24:17 +00:00
Brooks Davis
b1b1c2fe38 FCP-101: Remove tx(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:24:08 +00:00
Brooks Davis
7c897ca91f FCP-101: Remove tl(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:24:00 +00:00
Brooks Davis
90089841de FCP-101: Remove sn(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:23:52 +00:00
Brooks Davis
3b70dd81f5 FCP-101: Remove sf(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:23:43 +00:00
Brooks Davis
607790d10f FCP-101: Remove pcn(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:23:34 +00:00
Brooks Davis
dd262716a1 FCP-101: Remove fe(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:23:26 +00:00
Brooks Davis
3ee01a1385 FCP-101: Remove ex(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:23:18 +00:00
Brooks Davis
e153ee663a FCP-101: Remove ep(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:23:10 +00:00
Brooks Davis
05aa6e583b FCP-101: Remove ed(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:23:02 +00:00
Brooks Davis
08ac01a92c FCP-101: Remove de(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:22:54 +00:00
Brooks Davis
e1edf1240b FCP-101: Remove cs(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:22:45 +00:00
Konstantin Belousov
7c5a46a1bc Remove resolver_qual from DEFINE_IFUNC/DEFINE_UIFUNC macros.
In all practical situations, the resolver visibility is static.

Requested by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:	so (emaste)
Differential revision:	https://reviews.freebsd.org/D20281
2019-05-16 22:20:54 +00:00
Ryan Libby
d375016d8d x86: spell vpxor %zmm0 as vpxord
Fix gcc/gas amd64 & i386 build after r347566.

Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20264
2019-05-15 18:13:43 +00:00
Konstantin Belousov
7355a02bdd Mitigations for Microarchitectural Data Sampling.
Microarchitectural buffers on some Intel processors utilizing
speculative execution may allow a local process to obtain a memory
disclosure.  An attacker may be able to read secret data from the
kernel or from a process when executing untrusted code (for example,
in a web browser).

Reference: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00233.html
Security:	CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091
Security:	FreeBSD-SA-19:07.mds
Reviewed by:	jhb
Tested by:	emaste, lwhsu
Approved by:	so (gtetlow)
2019-05-14 17:02:20 +00:00
Dmitry Chagin
c5156c7785 Linuxulator depends on a fundamental kernel settings such as SMP. Many
of them listed in opt_global.h which is not generated while building
modules outside of a kernel and such modules never match real cofigured
kernel.

So, we should prevent our users from building obviously defective modules.

Therefore, remove the root cause of the building of modules outside of a
kernel - the possibility of building modules with DEBUG or KTR flags.
And remove all of DEBUG printfs as it is incomplete and in threaded
programms not informative, also a half of system call does not have DEBUG
printf. For debuging Linux programms we have dtrace, ktr and ktrace ability.

PR:		222861
Reviewed by:	trasz
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20178
2019-05-13 18:24:29 +00:00
Mateusz Guzik
a8c2fcb287 x86: store pending bitmapped IPIs in per-cpu areas
This gets rid of the global cpu_ipi_pending array.

While replace cmpset with fcmpset in the delivery code and opportunistically
check if given IPI is already pending.

Sponsored by:	The FreeBSD Foundation
2019-05-12 06:36:54 +00:00
Andrew Gallatin
542970fa2d Remove IPSEC from GENERIC due to performance issues
Having IPSEC compiled into the kernel imposes a non-trivial
performance penalty on multi-threaded workloads due to IPSEC
refcounting. In my benchmarks of multi-threaded UDP
transmit (connected sockets), I've seen a roughly 20% performance
penalty when the IPSEC option is included in the kernel (16.8Mpps
vs 13.8Mpps with 32 senders on a 14 core / 28 HTT Xeon
2697v3)). This is largely due to key_addref() incrementing and
decrementing an atomic reference count on the default
policy. This cause all CPUs to stall on the same cacheline, as it
bounces between different CPUs.

Given that relatively few users use ipsec, and that it can be
loaded as a module, it seems reasonable to ask those users to
load the ipsec module so as to avoid imposing this penalty on the
GENERIC kernel. Its my hope that this will make FreeBSD look
better in "out of the box" benchmark comparisons with other
operating systems.

Many thanks to ae for fixing auto-loading of ipsec.ko when
ifconfig tries to configure ipsec, and to cy for volunteering
to ensure the the racoon ports will load the ipsec.ko module

Reviewed by:	cem, cy, delphij, gnn, jhb, jpaetzel
Differential Revision:	https://reviews.freebsd.org/D20163
2019-05-09 22:38:15 +00:00
Kyle Evans
251a32b5b2 tun/tap: merge and rename to tuntap
tun(4) and tap(4) share the same general management interface and have a lot
in common. Bugs exist in tap(4) that have been fixed in tun(4), and
vice-versa. Let's reduce the maintenance requirements by merging them
together and using flags to differentiate between the three interface types
(tun, tap, vmnet).

This fixes a couple of tap(4)/vmnet(4) issues right out of the gate:
- tap devices may no longer be destroyed while they're open [0]
- VIMAGE issues already addressed in tun by kp

[0] emaste had removed an easy-panic-button in r240938 due to devdrn
blocking. A naive glance over this leads me to believe that this isn't quite
complete -- destroy_devl will only block while executing d_* functions, but
doesn't block the device from being destroyed while a process has it open.
The latter is the intent of the condvar in tun, so this is "fixed" (for
certain definitions of the word -- it wasn't really broken in tap, it just
wasn't quite ideal).

ifconfig(8) also grew the ability to map an interface name to a kld, so
that `ifconfig {tun,tap}0` can continue to autoload the correct module, and
`ifconfig vmnet0 create` will now autoload the correct module. This is a
low overhead addition.

(MFC commentary)

This may get MFC'd if many bugs in tun(4)/tap(4) are discovered after this,
and how critical they are. Changes after this are likely easily MFC'd
without taking this merge, but the merge will be easier.

I have no plans to do this MFC as of now.

Reviewed by:	bcr (manpages), tuexen (testing, syzkaller/packetdrill)
Input also from:	melifaro
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D20044
2019-05-08 02:32:11 +00:00
Ed Maste
0e26cd440f make sysent after r347228
Regenerate to add @generated tag in generated files.
2019-05-07 18:10:21 +00:00
Conrad Meyer
665919aaaf x86: Implement MWAIT support for stopping a CPU
IPI_STOP is used after panic or when ddb is entered manually.  MONITOR/
MWAIT allows CPUs that support the feature to sleep in a low power way
instead of spinning.  Something similar is already used at idle.

It is perhaps especially useful in oversubscribed VM environments, and is
safe to use even if the panic/ddb thread is not the BSP.  (Except in the
presence of MWAIT errata, which are detected automatically on platforms with
known wakeup problems.)

It can be tuned/sysctled with "machdep.stop_mwait," which defaults to 0
(off).  This commit also introduces the tunable
"machdep.mwait_cpustop_broken," which defaults to 0, unless the CPU has
known errata, but may be set to "1" in loader.conf to signal that mwait
wakeup is broken on CPUs FreeBSD does not yet know about.

Unfortunately, Bhyve doesn't yet support MONITOR extensions, so this doesn't
help bhyve hypervisors running FreeBSD guests.

Submitted by:   Anton Rang <rang AT acm.org> (earlier version)
Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20135
2019-05-04 20:34:26 +00:00
Conrad Meyer
83dc49beaf x86: Define pc_monitorbuf as a logical structure
Rather than just accessing it via pointer cast.

No functional change intended.

Discussed with:	kib (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20135
2019-05-04 17:35:13 +00:00
Dmitry Chagin
d151344dbf In order to reduce duplication between MD parts of the Linuxulator
move bits that are MI out into the headers in compat/linux.
For that remove bogus _packed attribute from struct l_sockaddr
and use MI types for struct members.

And continue to move into the linux_common module a code that is
intended for both Linuxulator modules (both instruction set - 32 & 64 bit)
or for external modules like linsysfs or linprocfs.

To avoid header pollution introduce new sys/compat/linux_common.h header.

Reviewed by:	emaste
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20137
2019-05-03 08:42:49 +00:00
Conrad Meyer
d6745408c7 Add a COMPAT_FREEBSD12 kernel option.
Use it wherever COMPAT_FREEBSD11 is currently specified, like r309749.

Reviewed by:	imp, jhb, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20120
2019-05-02 18:10:23 +00:00
Dmitry Chagin
c034ecf316 Since r339624 HEAD does not need for backslashes in syscalls.master,
however to make a merge r345471 to the stable add backslashes
to the syscalls.master.

MFC after:	3 days
2019-04-23 18:10:46 +00:00
Konstantin Belousov
fdfe249b63 Fix initial x87 state after r345562.
After the referenced commit, we did not set x87 and sse valid bits in
the xstate_bv bitmask for initial fpu state (stored in memory), when
using XSAVE.

The state is loaded into FPU register file to initialize the process
FPU state, and since both bits were clear, the default x87 and SSE
states were loaded.  By chance, FreeBSD ABI SSE2 state is same as FPU
initial state, so the bug is not visible for 64bit processes.  But on
i386, the precision control should be set to double (53bit mantissa),
instead of the default double extended (64bit mantissa). For 32bit
processes on amd64, kernel reloads control word with the right mask,
which only left native i386 and amd64 native but using x87 as
affected.

Fix it by setting minimal required xstate_bv mask.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-04-16 19:46:02 +00:00
Warner Losh
f7ab01581a Move mpr/mps drivers from per-arch NOTES files into the MI notes
file. They are in more arches they they aren't. Add appropriate
nodevice directives in powerpc and arm.
2019-04-13 06:30:45 +00:00
Konstantin Belousov
2a508645b4 pci_cfgreg.c: Use io port config access for early boot time.
Some early PCIe chipsets are explicitly listed in the white-list to
enable use of the MMIO config space accesses, perhaps because ACPI
tables were not reliable source of the base MCFG address at that time.
For that chipsets, MCFG base was read from the known chipset MCFGbase
config register.

During very early stage of boot, when access to the PCI config space
is performed (see e.g. pci_early_quirks.c), we cannot map 255MB of
registers because the method used with pre-boot pmap overflows initial
kernel page tables.

Move fallback to read MCFGbase to the attachment method of the
x86/legacy device, which removes code duplication, and results in the
use of io accesses until MCFG is parsed or legacy attach called.

For amd64, pre-initialize cfgmech with CFGMECH_1, right now we
dynamically assign CFGMECH_1 to it anyway, and remove checks for
CFGMECH_NONE.

There is a mention in the Intel documentation for corresponding
chipsets that OS must use either io port or MMIO access method, but we
already break this rule by reading MCFGbase register, so one more
access seems to be innocent.

Reported by:	longwitz@incore.de
PR:	236838
Reviewed by:	avg (other version), jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D19833
2019-04-09 18:07:17 +00:00
Warner Losh
d99880cf46 Add mpr, mps, mpt to NOTES file
Add these to all the architectures that these are in the GENERIC
kernel.
2019-04-05 02:54:02 +00:00
Jung-uk Kim
278f0de60d Merge ACPICA 20190329. 2019-03-29 20:21:28 +00:00
Conrad Meyer
8207def158 x86: Use XSAVEOPT for fpusave(), when available
Remove redundant npxsave_core definition while here.

Suggested by:	Anton Rang
Reviewed by:	kib, Anton Rang <rang AT acm.org>
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19665
2019-03-26 22:45:41 +00:00
Dmitry Chagin
1f66cb5154 Regen from r345471.
MFC after:	1 month
2019-03-24 14:51:17 +00:00
Dmitry Chagin
f730d606d5 Update syscall.master to 5.0.
For 32-bit Linuxulator, ipc() syscall was historically
the entry point for the IPC API. Starting in Linux 4.18, direct
syscalls are provided for the IPC. Enable it.

MFC after:	1 month
2019-03-24 14:50:02 +00:00
Dmitry Chagin
d9be8b39a5 Regen for r345469 (shmat()).
MFC after:	1 month
2019-03-24 14:46:07 +00:00
Dmitry Chagin
7dabf89bcf Linux between 4.18 and 5.0 split IPC system calls.
In preparation for doing this in the Linuxulator modify our linux_shmat()
to match actual Linux shmat() system call.

MFC after:	1 month
2019-03-24 14:44:35 +00:00
Mark Johnston
64087fd7f3 Disallow preemptive creation of wired superpage mappings.
There are some unusual cases where a process may cause an mlock()ed
range of memory to be unmapped.  If the application subsequently
faults on that region, the handler may attempt to create a superpage
mapping backed by the resident, wired pages.  However, the pmap code
responsible for creating such a mapping (pmap_enter_pde() on i386
and amd64) does not ensure that a leaf page table page is available
if the superpage is later demoted; the demotion operation must therefore
perform a non-blocking page allocation and must unmap the entire
superpage if the allocation fails.  The pmap layer ensures that this
can never happen for wired mappings, and so the case described above
breaks that invariant.

For now, simply ensure that the MI fault handler never attempts to
create a wired superpage except via promotion.

Reviewed by:	kib
Reported by:	syzbot+292d3b0416c27c131505@syzkaller.appspotmail.com
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19670
2019-03-21 19:52:50 +00:00
Konstantin Belousov
c51b496e0b i386: improve detection of the fast page fault assist.
In particular, check that we are assisting the page fault from kernel mode.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-03-17 18:31:48 +00:00
Konstantin Belousov
fd8d844f76 amd64 KPTI: add control from procctl(2).
Add the infrastructure to allow MD procctl(2) commands, and use it to
introduce amd64 PTI control and reporting.  PTI mode cannot be
modified for existing pmap, the knob controls PTI of the new vmspace
created on exec.

Requested by:	jhb
Reviewed by:	jhb, markj (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D19514
2019-03-16 11:44:33 +00:00
Konstantin Belousov
6f1fe3305a amd64: Add md process flags and first P_MD_PTI flag.
PTI mode for the process pmap on exec is activated iff P_MD_PTI is set.

On exec, the existing vmspace can be reused only if pti mode of the
pmap matches the P_MD_PTI flag of the process.  Add MD
cpu_exec_vmspace_reuse() callback for exec_new_vmspace() which can
vetoed reuse of the existing vmspace.

MFC note: md_flags change struct proc KBI.

Reviewed by:	jhb, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D19514
2019-03-16 11:31:01 +00:00
Konstantin Belousov
39d70f6b80 Provide deterministic (and somewhat useful) value for RDPID result,
and for %ecx after RDTSCP.

Initialize TSC_AUX MSR with CPUID.  It allows for userspace to cheaply
identify CPU it was executed on some time ago, which is sometimes useful.

Note: The values returned might be changed in future.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-03-15 16:43:28 +00:00
Andrey V. Elsukov
40025d42fd Fix typo.
MFC after:	1 week
2019-03-07 10:01:32 +00:00
John Baldwin
2e43efd0bb Drop "All rights reserved" from my copyright statements.
Reviewed by:	rgrimes
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D19485
2019-03-06 22:11:45 +00:00
Edward Tomasz Napierala
1699546def Remove sv_pagesize, originally introduced with r100384.
In all of the architectures we have today, we always use PAGE_SIZE.
While in theory one could define different things, none of the
current architectures do, even the ones that have transitioned from
32-bit to 64-bit like i386 and arm. Some ancient mips binaries on
other systems used 8k instead of 4k, but we don't support running
those and likely never will due to their age and obscurity.

Reviewed by:	imp (who also contributed the commit message)
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19280
2019-03-01 16:16:38 +00:00
Konstantin Belousov
bced332adf Invalidate cache for the PDPTE page when using PAE paging but PAT is
not supported.

According to SDM rev. 69 vol. 3, for PDPTE registers loads:
- when PAT is not supported, access to the PDPTE page is performed as
  UC, see 4.9.1;
- when PAT is supported, the access is WB, see 4.9.2.

So potentially CPU might load stale memory as PDPTEs if both PAT and
self-snoop are not implemented.  To be safe, add total local cache
flush to pmap_cold() before initial load of cr3, and flush PDPTE page
in pmap_pinit(), if PAT is not implemented.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D19365
2019-02-28 19:19:02 +00:00
Konstantin Belousov
d5f2c1e4fc i386 PAE: avoid atomic for pte_store() where possible.
Instead carefully write upper word, and only than the lower word with
PG_V, for previously invalid ptes.  It provides some measurable system
time saving on buildworld.

Reviewed by:	markj
Tested by:	pho
Measured by:	bde (early version)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D19226
2019-02-26 09:45:44 +00:00
Konstantin Belousov
e7a9df16e6 Add kernel support for Intel userspace protection keys feature on
Skylake Xeons.

See SDM rev. 68 Vol 3 4.6.2 Protection Keys and the description of the
RDPKRU and WRPKRU instructions.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D18893
2019-02-20 09:51:13 +00:00
Warner Losh
b1ece24388 Remove drm from LINT kernels
drm was accidentally left in the LINT kernels.

Pointy hat to: imp
2019-02-19 21:20:50 +00:00
Konstantin Belousov
5ddeaf67c6 Provide convenience C wrappers for RDPKRU and WRPKRU instructions.
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D18893
2019-02-19 19:17:20 +00:00
Konstantin Belousov
642bb66b63 Provide userspace versions of do_cpuid() and cpuid_count() on i386.
Some older compilers, when generating PIC code, cannot handle inline
asm that clobbers %ebx (because %ebx is used as the GOT offset
register).  Userspace versions avoid clobbering %ebx by saving it to
stack before executing the CPUID instruction.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-14 13:53:11 +00:00
Konstantin Belousov
fa50a3552d Implement Address Space Layout Randomization (ASLR)
With this change, randomization can be enabled for all non-fixed
mappings.  It means that the base address for the mapping is selected
with a guaranteed amount of entropy (bits). If the mapping was
requested to be superpage aligned, the randomization honours the
superpage attributes.

Although the value of ASLR is diminshing over time as exploit authors
work out simple ASLR bypass techniques, it elimintates the trivial
exploitation of certain vulnerabilities, at least in theory.  This
implementation is relatively small and happens at the correct
architectural level.  Also, it is not expected to introduce
regressions in existing cases when turned off (default for now), or
cause any significant maintaince burden.

The randomization is done on a best-effort basis - that is, the
allocator falls back to a first fit strategy if fragmentation prevents
entropy injection.  It is trivial to implement a strong mode where
failure to guarantee the requested amount of entropy results in
mapping request failure, but I do not consider that to be usable.

I have not fine-tuned the amount of entropy injected right now. It is
only a quantitive change that will not change the implementation.  The
current amount is controlled by aslr_pages_rnd.

To not spoil coalescing optimizations, to reduce the page table
fragmentation inherent to ASLR, and to keep the transient superpage
promotion for the malloced memory, locality clustering is implemented
for anonymous private mappings, which are automatically grouped until
fragmentation kicks in.  The initial location for the anon group range
is, of course, randomized.  This is controlled by vm.cluster_anon,
enabled by default.

The default mode keeps the sbrk area unpopulated by other mappings,
but this can be turned off, which gives much more breathing bits on
architectures with small address space, such as i386.  This is tied
with the question of following an application's hint about the mmap(2)
base address. Testing shows that ignoring the hint does not affect the
function of common applications, but I would expect more demanding
code could break. By default sbrk is preserved and mmap hints are
satisfied, which can be changed by using the
kern.elf{32,64}.aslr.honor_sbrk sysctl.

ASLR is enabled on per-ABI basis, and currently it is only allowed on
FreeBSD native i386 and amd64 (including compat 32bit) ABIs.  Support
for additional architectures will be added after further testing.

Both per-process and per-image controls are implemented:
- procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS;
- NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible
  to force ASLR off for the given binary.  (A tool to edit the feature
  control note is in development.)
Global controls are:
- kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2);
- kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings;
- kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2);
- vm.cluster_anon - enables anon mapping clustering.

PR:	208580 (exp runs)
Exp-runs done by:	antoine
Reviewed by:	markj (previous version)
Discussed with:	emaste
Tested by:	pho
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D5603
2019-02-10 17:19:45 +00:00
Konstantin Belousov
eb785fab3b Port sysctl kern.elf32.read_exec from amd64 to i386.
Make it more comprehensive on i386, by not setting nx bit for any
mapping, not just adding PF_X to all kernel-loaded ELF segments.  This
is needed for the compatibility with older i386 programs that assume
that read access implies exec, e.g. old X servers with hand-rolled
module loader.

Reported and tested by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-07 02:17:34 +00:00
Konstantin Belousov
f76b5ab6cc Fix resume on i386 PAE.
It was broken before PAE/no-PAE merge, but since now PAE is the
default, resume is apparently becomes for all machines.

The corrected issues:
- the trampoline page is not mapped executable, so machine faults when
  paging is on;
- MSR.EFER and %cr4 both should be loaded before paging is enabled,
  otherwise paging structures are invalid (cr4.PAE and EFER.NX).
- MSR.EFER and %cr4 should be only loaded if present.  I attempt to handle
  this by not touching the registers if the value is zero.

There are some more bits still not quite correct, e.g. unconditional
access to %cr4 in resumectx.

Reported and debugging help by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-07 02:09:34 +00:00
Ed Maste
e26563b8c7 Retire SPX_HACK option unused after r342244 2019-02-06 17:21:25 +00:00
Konstantin Belousov
2648ed9275 Make it possible to override PAE mode on boot.
Initialize the static kenv in pmap_cold() and fetch user opinion on
vm.pmap.pae_mode tunable if hardware is capable.  Note that the static
environment is reinitilized in init386() later when paging is enabled.

Reviewed by:	bde
Discussed with:	kevans
Sponsored by:	The FreeBSD Foundation
MFC after:	2 months
2019-02-05 20:09:31 +00:00
Konstantin Belousov
c7301c6b2c Remove pointless initial value for i386 vm.pmap.pat_works sysctl definition.
The OID is served by external data.

Submitted by:	bde
MFC after:	3 days
2019-02-05 20:02:16 +00:00
Dimitry Andric
0f166953f7 Use NLDT to get number of LDTs on i386
Compiling a GENERIC kernel for i386 with clang 8.0 results in the
following warning:

/usr/src/sys/i386/i386/sys_machdep.c:542:40: error: 'sizeof ((ldt))' will return the size of the pointer, not the array itself [-Werror,-Wsizeof-pointer-div]
        nldt = pldt != NULL ? pldt->ldt_len : nitems(ldt);
                                              ^~~~~~~~~~~
/usr/src/sys/sys/param.h:299:32: note: expanded from macro 'nitems'
#define nitems(x)       (sizeof((x)) / sizeof((x)[0]))
                         ~~~~~~~~~~~ ^

Indeed, 'ldt' is declared as 'union descriptor *', so nitems() is not
the right way to determine the number of LDTs.  Instead, the NLDT define
from sys/x86/include/segments.h should be used.

Reviewed by:	kib
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D19074
2019-02-04 18:07:03 +00:00
Konstantin Belousov
cbb65b7ec5 i386: Do not ever store to other-CPU counter64 slot.
On CPUs supporting cmpxchg8b, fetch is performed by cmpxchg8b on
corresponding CPU slot, which unconditionally write to the slot.  If
for that slot, the owner CPU increments it, then both CPUs might run
the cmpxchg8b instruction concurrently and this might race and
override the incremental write.  So the counter update would be lost.

Fix it by implementing fetch as IPI and accumulation of result.  It is
acceptable for rare counter64 fetch operation to be more expensive.

Diagnosed and tested by:	Andreas Longwitz <longwitz@incore.de>
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2019-02-03 21:28:58 +00:00
Konstantin Belousov
a6786c1799 Disable boot-time memory test on i386 be default.
With the current 24G memory limit for GENERIC, the boot time test
causes quite visible delay, amplified by the default
debug.late_console = 0.

The comment text is copied from the same setting explanation for
amd64.

Suggested by:	bde
Discussed with:	emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	2 months
2019-02-01 21:09:36 +00:00
Konstantin Belousov
c75f49f7d8 Make iflib a loadable module.
iflib is already a module, but it is unconditionally compiled into the
kernel.  There are drivers which do not need iflib(4), and there are
situations where somebody might not want iflib in kernel because of
using the corresponding driver as module.

Reviewed by:	marius
Discussed with:	erj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D19041
2019-01-31 19:05:56 +00:00
Konstantin Belousov
e259e5f4c0 Remove duplicate declarations.
Submitted by:	bde
MFC after:	2 months
2019-01-30 16:29:15 +00:00
Konstantin Belousov
9a52756044 i386: Merge PAE and non-PAE pmaps into same kernel.
Effectively all i386 kernels now have two pmaps compiled in: one
managing PAE pagetables, and another non-PAE. The implementation is
selected at cold time depending on the CPU features. The vm_paddr_t is
always 64bit now. As result, nx bit can be used on all capable CPUs.

Option PAE only affects the bus_addr_t: it is still 32bit for non-PAE
configs, for drivers compatibility. Kernel layout, esp. max kernel
address, low memory PDEs and max user address (same as trampoline
start) are now same for PAE and for non-PAE regardless of the type of
page tables used.

Non-PAE kernel (when using PAE pagetables) can handle physical memory
up to 24G now, larger memory requires re-tuning the KVA consumers and
instead the code caps the maximum at 24G. Unfortunately, a lot of
drivers do not use busdma(9) properly so by default even 4G barrier is
not easy. There are two tunables added: hw.above4g_allow and
hw.above24g_allow, the first one is kept enabled for now to evaluate
the status on HEAD, second is only for dev use.

i386 now creates three freelists if there is any memory above 4G, to
allow proper bounce pages allocation. Also, VM_KMEM_SIZE_SCALE changed
from 3 to 1.

The PAE_TABLES kernel config option is retired.

In collaboarion with: pho
Discussed with:	emaste
Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D18894
2019-01-30 02:07:13 +00:00
Andriy Voskoboinyk
86d535ab47 Garbage collect AH_SUPPORT_AR5416 config option.
It does nothing since r318857.
2019-01-25 13:48:40 +00:00
Andriy Voskoboinyk
4945f79a4c Remove IEEE80211_AMPDU_AGE config option.
It is noop since r297774.
2019-01-20 15:17:56 +00:00
Fedor Uporov
6651cf410c Fix errno values returned from DUMMY_XATTR linuxulator calls
Reported by: weiss@uni-mainz.de
Reviewed by: markj
MFC after: 1 day
Differential Revision: https://reviews.freebsd.org/D18812
2019-01-11 07:58:25 +00:00
Konstantin Belousov
f2c79297eb Fix i386 LINT build after r342769.
It seems that libkern/mcount.c is the only consumer of vm/pmap.h that
does not include machine/atomic.h.  Make it work by bringing
machine/atomic.h when pmap.h is used for kernel non-asm .c file.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-01-04 19:10:46 +00:00
Konstantin Belousov
0598e55ea3 i386: Use atomic 64bit load to read PDE value from PAE pagetables in
pmap_kextract().

pmap_kextract() can race with promotion/demotion on the kernel page
table, in which case current non-atomic 64bit read would see torn
value, breaking pmap_kextract().  pmap_kextract() would correctly
handle either promoted or demoted PDE, but not a mix where one word
is from a different state.

It requires PAE and > 4G memory to reproduce.  We observed this in
real loads, both for intensive use of malloc(9)/free(9) where
vtoslab() returned invalid pointer to the slab, and with the use of
busdma_bounce, where incorrect page was bounced.

In collaboration with:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D18714
2019-01-04 17:33:07 +00:00
Konstantin Belousov
f0d85a5dc5 x86: Report per-cpu IPI TLB shootdown generation in ddb 'show pcpu' output.
It is useful for inspecting tlb shootdown hangs.  The smp_tlb_generation value
is available using regular ddb data inspection commands.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-01-04 17:25:47 +00:00
Konstantin Belousov
d3f4030708 Fix typo in r342710.
Noted by:	lidl
MFC after:	3 days
2019-01-03 19:35:07 +00:00
Mark Johnston
9bfc7fa41d Avoid setting PG_U unconditionally in pmap_enter_quick_locked().
This KPI may in principle be used to create kernel mappings, in which
case we certainly should not be setting PG_U.  In any case, PG_U must be
set on all layers in the page tables to grant user mode access, and we
were only setting it on leaf entries.  Thus, this change should have no
functional impact.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-01-02 15:36:35 +00:00
Konstantin Belousov
3c72855616 More references to pmap_cold().
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2018-12-31 18:11:04 +00:00
Konstantin Belousov
e68d443853 Update comments: paging is initialized in pmap_cold().
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2018-12-31 18:05:48 +00:00
Konstantin Belousov
05d5652a7d i386: Fix allocation of the KVA frame for pmap_quick_enter_page().
Due to the typo, it shared the frame with the CMAP1 transient mapping.

In collaboration with:	pho
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation (kib)
2018-12-29 15:49:03 +00:00
Mateusz Guzik
70a975ae6b Remove iBCS2, part3: the implementation
Reviewed by:	kib (previous version)
Sponsored by:	The FreeBSD Foundation
2018-12-19 22:02:49 +00:00
Mateusz Guzik
628888f0e0 Remove iBCS2, part2: general kernel
Reviewed by:	kib (previous version)
Sponsored by:	The FreeBSD Foundation
2018-12-19 21:57:58 +00:00
Dimitry Andric
67350cb56a Merge ^/head r340918 through r341763. 2018-12-09 11:39:45 +00:00
Konstantin Belousov
759e5d25da Fix PAE boot.
With the introduction of M_EXEC support for kmem_malloc(), some kernel
mappings start having NX bit set in the paging structures early, for
PAE kernels on machines with NX support, i.e. practically on all
machines.  In particular, AP trampoline and initialization needs to
access pages which translations has NX bit set, before initializecpu()
is called.

Check for CPUID NX feature and enable EFER.NXE before we enable paging
in mp boot trampoline.  This allows the CPU to use the kernel page
table instead of generating page fault due to reserved bit set.

PR:	233819
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-12-08 22:12:57 +00:00
Mark Johnston
352aaa5122 Plug memory disclosures via ptrace(2).
On some architectures, the structures returned by PT_GET*REGS were not
fully populated and could contain uninitialized stack memory.  The same
issue existed with the register files in procfs.

Reported by:	Thomas Barabosch, Fraunhofer FKIE
Reviewed by:	kib
MFC after:	3 days
Security:	kernel stack memory disclosure
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18421
2018-12-03 20:54:17 +00:00
Konstantin Belousov
36e1b9702e Correct the tunable name in the message.
Submitted by:	 Andre Albsmeier <mail@fbsd.e4m.org>
PR:	231577
MFC after:	1 week
2018-12-01 16:43:18 +00:00
Eric van Gyzen
607a0eb2f1 Remove superfluous bzero in getcontext/swapcontext/sendsig
We zero the whole structure; we don't need to zero the __spare__ field again.

Remove trailing whitespace.

MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2018-11-26 20:56:05 +00:00
Dimitry Andric
6149ed01a1 Merge ^/head r340368 through r340426. 2018-11-14 06:46:44 +00:00
Niclas Zeising
af14df7703 Add evdev support to amd64 and i386 kernels
Include evdev support and drivers in the amd64 and i386 GENERIC and MINIMAL
kernels.  Evdev is used by X and wayland to handle input devices, and this
change, together with upcomming changes in ports will make us handle input
devices better in graphical UIs.

Reviewed by:	wulf, bapt, imp
Approved by:	imp
Differential Revision:	https://reviews.freebsd.org/D17912
2018-11-12 21:01:28 +00:00
Dimitry Andric
c06e7b66a1 Merge ^/head r340126 through r340212. 2018-11-07 18:52:28 +00:00
John Baldwin
7f7f6f85a1 Add a custom implementation of cpu_lock_delay() for x86.
Avoid using DELAY() since it can try to use spin locks on CPUs without
a P-state invariant TSC.  For cpu_lock_delay(), always use the TSC if
it exists (even if it is not P-state invariant) to delay for a
microsecond.  If the TSC does not exist, read from I/O port 0x84 to
delay instead.

PR:		228768
Reported by:	Roger Hammerstein <cheeky.m@live.com>
Reviewed by:	kib
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D17851
2018-11-05 22:54:03 +00:00
John Baldwin
4cbbb74888 Add a KPI for the delay while spinning on a spin lock.
Replace a call to DELAY(1) with a new cpu_lock_delay() KPI.  Currently
cpu_lock_delay() is defined to DELAY(1) on all platforms.  However,
platforms with a DELAY() implementation that uses spin locks should
implement a custom cpu_lock_delay() doesn't use locks.

Reviewed by:	kib
MFC after:	3 days
2018-11-05 21:34:17 +00:00
Dimitry Andric
2a22df74e9 Merge ^/head r339813 through r340125. 2018-11-04 15:49:06 +00:00
John Baldwin
b317cfd4c0 Don't enter DDB for fatal traps before panic by default.
Add a new 'debugger_on_trap' knob separate from 'debugger_on_panic'
and make the calls to kdb_trap() in MD fatal trap handlers prior to
calling panic() conditional on this new knob instead of
'debugger_on_panic'.  Disable the new knob by default.  Developers who
wish to recover from a fatal fault by adjusting saved register state
and retrying the faulting instruction can still do so by enabling the
new knob.  However, for the more common case this makes the user
experience for panics due to a fatal fault match the user experience
for other panics, e.g. 'c' in DDB will generate a crash dump and
reboot the system rather than being stuck in an infinite loop of fatal
fault messages and DDB prompts.

Reviewed by:	kib, avg
MFC after:	2 months
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D17768
2018-11-01 21:34:17 +00:00
Kyle Evans
90ba2725c1 i386/MINIMAL: VERBOSE_SYSINIT=0 for consistency
MFC after:	never
2018-10-31 22:55:43 +00:00
Kyle Evans
be352d20d5 Compile in VERBOSE_SYSINIT support by default, remain silent by default
The loader tunable 'debug.verbose_sysinit' may be used to toggle verbosity.
This is added to the debugging section of these kernconfs to be turned off
in stable branches for clarity of intent.

MFC after:	never
2018-10-31 22:38:19 +00:00
Mark Johnston
9978bd996b Add malloc_domainset(9) and _domainset variants to other allocator KPIs.
Remove malloc_domain(9) and most other _domain KPIs added in r327900.
The new functions allow the caller to specify a general NUMA domain
selection policy, rather than specifically requesting an allocation from
a specific domain.  The latter policy tends to interact poorly with
M_WAITOK, resulting in situations where a caller is blocked indefinitely
because the specified domain is depleted.  Most existing consumers of
the _domain KPIs are converted to instead use a DOMAINSET_PREF() policy,
in which we fall back to other domains to satisfy the allocation
request.

This change also defines a set of DOMAINSET_FIXED() policies, which
only permit allocations from the specified domain.

Discussed with:	gallatin, jeff
Reported and tested by:	pho (previous version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17418
2018-10-30 18:26:34 +00:00
Dimitry Andric
c6879c6c14 Merge ^/head r339015 through r339669. 2018-10-23 21:09:37 +00:00
Warner Losh
6a18678249 Remove the ncr(4) drive.
This driver has been obsolete since the FreeBSD 4.x. It should have
been removed then since the sym(4) driver had subsumed it. The driver
was commented out of GENERIC in 2000.

RelNotes: Yes
2018-10-22 02:36:18 +00:00
Warner Losh
49a93324fe Remove stg(4) driver
stg(4) is marked as gone in 12. Remove it. There are no sightings of
it in the nycbug dmesg database. It was for an obscure SCSI card that
sold mostly in Japan, and was especially popilar among pc98 hackers in
the 4.x time frame. It was also only enabled on i386.

Relnote: Yes
2018-10-22 02:35:50 +00:00
Warner Losh
08204c2cc3 Remove nsp(4) driver
nsp(4) is marked as gone in 12. Remove it. There are no sightings of
it in the nycbug dmesg database. It was for an obscure SCSI card that
sold mostly in Japan, and was especially popilar among pc98 hackers in
the 4.x time frame. It was also only enabled on i386.

Relnote: Yes
2018-10-22 02:35:38 +00:00
Warner Losh
2dfd358865 Remove ncv(4) driver
ncv(4) is marked as gone in 12. Remove it. There are no sightings of
it in the nycbug dmesg database. It was for an obscure SCSI card that
sold mostly in Japan, and was especially popilar among pc98 hackers in
the 4.x time frame..

Relnote: Yes
2018-10-22 02:35:26 +00:00
Warner Losh
e9b5375b04 Retire dpt(4)
Marked as gone in 12 and not relevant since the early 90s. No
sightings in nycbug's dmesg database.

Relnotes: yes
2018-10-22 02:35:12 +00:00
Warner Losh
c1cdf6a42f Remove mse(4) from tree
Remove mse and all support for bus and inport devices from the tree.
Data from nycbug's dmesg database shows the last sighting of this
driver was in 4.10 on only one machine.

Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D17628
2018-10-22 02:34:10 +00:00
Warner Losh
48ac1a9566 Remove the gone_in(12) devices.
We're planning on removing adv, adw, aha, aic, bt, ncv, nsp, and stg
soon. They have been tagged for removal in 12. At least get them out
of GENERIC.

MFC after: 3 days
Relnotes: yes
2018-10-22 02:28:18 +00:00
Mark Johnston
36209a40d1 Add an assertion to pmap_enter().
When modifying an existing managed mapping, we should find a PV entry
for the old mapping.  Verify this.

Before r335784 this would have been implicitly tested by the fact that
we always freed the PV entry for the old mapping.

Reviewed by:	alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D17626
2018-10-20 20:53:35 +00:00
Conrad Meyer
6e423878f0 Add a MINIMAL config for i386, based on amd64
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D17560
2018-10-20 19:16:43 +00:00
Dimitry Andric
f3e1dfebeb Fix placement of __bss_start in i386 kernel linker script
With lld 7.0.0, a rather nasty problem in our kernel linker script came
to light.  We use quite a lot of so-called "orphan" sections, e.g.
sections which are not explicitly named in the linker script.  Mainly,
these are the linker sets (such as set_sysinit_set).

Note that the placement of these orphan sections is not very well
defined.  Usually, any read-only orphan sections get placed after the
last read-only section from the linker script, and similarly for the
read/write variants.

In our linker scripts, there are also symbol assignments like _etext,
_edata, and __bss_start, which are used in various places to refer to
the start or end addresses of sections.

However, some of these symbol assignments are interspersed with output
section descriptions.  While the linker will guarantee that a symbol
assignment after some section will stay after that section, there is no
guarantee that an orphan section cannot be inserted just before it.

Take for example the following script:

SECTIONS
{
  .data : { *(.data) }
  __bss_start = .;
  .bss : { *(.bss) }
}

If an orphan section (like set_sysinit_set) is now inserted just after
the __bss_start assignment, __bss_start will actually point to the start
of that orphan section, *not* to the start of the .bss section.

Unfortunately, something like this happened with our i386 kernel linker
script, and since sys/i386/i386/locore.s tries to zero .bss, it ended up
zeroing all the linker sets too, leading to a crash very soon after the
<--BOOT--> message.

To fix this, move the __bss_start symbol assignment *into* the .bss
section description, so there is no way a linker can then insert orphan
sections at that point.  Also add a corresponding __bss_end symbol.

In addition, change sys/i386/i386/locore.s, so it clears from
__bss_start to __bss_end, instead of assuming that _edata is just
before .bss (which may not be true), and that _end is just after _bss
(which also may not be true).

This allows an i386 kernel linked with lld 7.0.0 to boot successfully.
2018-10-11 20:44:25 +00:00
Brooks Davis
c7d0908e1c Regenerated assorted syscall related files after:
- r327895: Implement 'domainset'...
 - r329876: Use linux types for linux-specific syscalls

Diff generated with:
	find . -name syscalls.conf | xargs dirname | \
	    xargs -n1 -I DIR make -C DIR sysent

Approved by:	re (kib)
Sponsored by:	DARPA, AFRL
2018-10-09 20:42:17 +00:00
Michael Tuexen
6b45121a6d Address the warning regarding duplicate option 'GEOM_PART_GPT' when
configuring kernels for i386, amd64, and arm64.
The 'GEOM_PART_GPT' option was added to the DEFAULTS configuration
in r337967.

Approved by:		re (kib@)
Reviewed by:		ler@
Differential Revision:	https://reviews.freebsd.org/D17458
Sponsored by:		Netflix, Inc.
2018-10-07 15:54:13 +00:00
Mark Johnston
cb4961abc1 Apply r339046 to i386.
Belatedly add a comment to the amd64 pmap explaining why we initialize
the kernel pmap's resident page count.

Reviewed by:	alc, kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17377
2018-10-01 18:48:33 +00:00
Konstantin Belousov
5f11ee2075 Fix UP build.
Reported by:	tijl
Sponsored by:	The FreeBSD Foundation
Approved by:	re (rgrimes)
2018-09-29 16:17:35 +00:00
John Baldwin
83382d027f Don't clear DR6 for debug exceptions from userland.
This reverts part of r333368.  The attempt to clear DR6 was occuring
too soon as trapsignal() does not pause to let the debugger notice the
SIGTRAP and query DR6.  The signal exchange does not occur until much
later during ast().  As a result, GDB was no longer recognizing
hardware breakpoints and watchpoints on x86.

In addition, any userland programs that want to inspect DR6 in a
SIGTRAP handler don't have a way to do this if we clear DR6 in the
exception handler.

Instead of relying on the kernel to clear DR6, debuggers will have to
explicitly clear it after a trace trap (which they needed to do on
older kernels anyway).

Reviewed by:	kib
Approved by:	re (delphij)
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D17319
2018-09-27 17:33:59 +00:00
Konstantin Belousov
d12c446550 Convert x86 cache invalidation functions to ifuncs.
This simplifies the runtime logic and reduces the number of
runtime-constant branches.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
Differential revision:	https://reviews.freebsd.org/D16736
2018-09-19 19:35:02 +00:00
Konstantin Belousov
ad8edd79cc Convert i386 NPX hardware context save methods to ifuncs.
Since ifunc-capable linker is now required on i386, bring this code in
line with the amd64 counterpart.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
Differential revision:	https://reviews.freebsd.org/D16736
2018-09-19 16:37:43 +00:00
Konstantin Belousov
f0165b1ca6 Remove {max/min}_offset() macros, use vm_map_{max/min}() inlines.
Exposing max_offset and min_offset defines in public headers is
causing clashes with variable names, for example when building QEMU.

Based on the submission by:	royger
Reviewed by:	alc, markj (previous version)
Sponsored by:	The FreeBSD Foundation (kib)
MFC after:	1 week
Approved by:	re (marius)
Differential revision:	https://reviews.freebsd.org/D16881
2018-08-29 12:24:19 +00:00
Konstantin Belousov
23c97bcba1 Remove dead code in i386 cpu_throw().
Curpmap must be already valid when cpu_throw() is called, even for early
AP startup.

Suggested by:	alc
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
Approved by:	re (marius)
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D16893
2018-08-25 15:31:23 +00:00
Konstantin Belousov
60b7423434 Unify amd64 and i386 vmspace0 pmap activation.
Add pmap_activate_boot() for i386, move the invocation on APs from MD
init_secondary() to x86 init_secondary_tail().

Suggested by:	alc
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
Approved by:	re (marius)
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D16893
2018-08-25 15:21:28 +00:00
Warner Losh
592ffb2175 Revert drm2 removal.
Revert r338177, r338176, r338175, r338174, r338172

After long consultations with re@, core members and mmacy, revert
these changes. Followup changes will be made to mark them as
deprecated and prent a message about where to find the up-to-date
driver.  Followup commits will be made to make this clear in the
installer. Followup commits to reduce POLA in ways we're still
exploring.

It's anticipated that after the freeze, this will be removed in
13-current (with the residual of the drm2 code copied to
sys/arm/dev/drm2 for the TEGRA port's use w/o the intel or
radeon drivers).

Due to the impending freeze, there was no formal core vote for
this. I've been talking to different core members all day, as well as
Matt Macey and Glen Barber. Nobody is completely happy, all are
grudgingly going along with this. Work is in progress to mitigate
the negative effects as much as possible.

Requested by: re@ (gjb, rgrimes)
2018-08-24 00:02:00 +00:00
Mark Johnston
36716fe2e6 Prepare the kernel linker to handle PC-relative ifunc relocations.
The boot-time ifunc resolver assumes that it only needs to apply
IRELATIVE relocations to PLT entries.  With an upcoming optimization,
this assumption no longer holds, so add the support required to handle
PC-relative relocations targeting GNU_IFUNC symbols.
- Provide a custom symbol lookup routine that can be used in early boot.
  The default lookup routine uses kobj, which is not functional at that
  point.
- Apply all existing relocations during boot rather than filtering
  IRELATIVE relocations.
- Ensure that we continue to apply ifunc relocations in a second pass
  when loading a kernel module.

Reviewed by:	kib
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D16749
2018-08-22 20:44:30 +00:00
Matt Macy
d157fbd5b4 Remove legacy drm and drm2 from tree
As discussed on the MLs drm2 conflicts with the ports' version and there
is no upstream for most if not all of drm. Both have been merged in to
a single port.

Users on powerpc, 32-bit hardware, or with GPUs predating Radeon
and i915 will need to install the graphics/drm-legacy-kmod. All
other users should be able to use one of the LinuxKPI-based ports:
graphics/drm-stable-kmod, graphics/drm-next-kmod, graphics/drm-devel-kmod.

MFC: never
Approved by: core@
2018-08-22 01:50:12 +00:00
Alan Cox
83a90bffd8 Eliminate kmem_malloc()'s unused arena parameter. (The arena parameter
became unused in FreeBSD 12.x as a side-effect of the NUMA-related
changes.)

Reviewed by:	kib, markj
Discussed with:	jeff, re@
Differential Revision:	https://reviews.freebsd.org/D16825
2018-08-21 16:43:46 +00:00
John Baldwin
a800b45c18 Merge amd64 and i386 <machine/intr_machdep.h> headers.
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D16803
2018-08-20 12:31:39 +00:00
John Baldwin
a568818913 Remove some vestiges of IPI_LAZYPMAP on i386.
The support for lazy pmap invalidations on i386 was removed in r281707.
This removes the constant for the IPI and stops accounting for it when
sizing the interrupt count arrays.

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D16801
2018-08-19 16:14:59 +00:00
John Baldwin
8cd385fda0 Make 'device crypto' lines more consistent.
- In configurations with a pseudo devices section, move 'device crypto'
  into that section.
- Use a consistent comment.  Note that other things common in kernel
  configs such as GELI also require 'device crypto', not just IPSEC.

Reviewed by:	rgrimes, cem, imp
Differential Revision:	https://reviews.freebsd.org/D16775
2018-08-18 20:32:08 +00:00
Warner Losh
62ee5bbd73 GPT is standard in x86 and arm64 land. Add it to DEFAULTS with the
others.

Differential Revision: https://reviews.freebsd.org/D16740
2018-08-17 14:47:21 +00:00
Mark Johnston
97edfc1b45 Implement kernel support for early loading of Intel microcode updates.
Updates in the format described in section 9.11 of the Intel SDM can
now be applied as one of the first steps in booting the kernel.  Updates
that are loaded this way are automatically re-applied upon exit from
ACPI sleep states, in contrast with the existing cpucontrol(8)-based
method.  For the time being only Intel updates are supported.

Microcode update files are passed to the kernel via loader(8).  The
file type must be "cpu_microcode" in order for the file to be recognized
as a candidate microcode update.  Updates for multiple CPU types may be
concatenated together into a single file, in which case the kernel
will select and apply a matching update.  Memory used to store the
update file will be freed back to the system once the update is applied,
so this approach will not consume more memory than required.

Reviewed by:	kib
MFC after:	6 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D16370
2018-08-13 17:13:09 +00:00
Devin Teske
ab9ed8a1bd Fix misspellings of transmitter/transmitted
Reviewed by:	emaste, bcr
Sponsored by:	Smule, Inc.
Differential Revision:	https://reviews.freebsd.org/D16025
2018-08-10 20:37:32 +00:00
Hans Petter Selasky
25a1e0f636 Implement missing atomic_fcmpset_XXX() support for i386.
This also fixes i386 build after r337527.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-08-09 11:30:13 +00:00
Konstantin Belousov
e45b89d23d Add pmap_is_valid_memattr(9).
Discussed with:	alc
Sponsored by:	The FreeBSD Foundation, Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D15583
2018-08-01 18:45:51 +00:00
Warner Losh
67d33338c0 Rename VM_FREELIST_ISADMA to VM_FREELIST_LOWMEM.
There's no differene between VM_FREELIST_ISADMA and VM_FREELIST_LOWMEM
except for the default boundary (16MB on x86 and 256MB on MIPS, but
they are otherwise the same). We don't need both for any system we
support (there were some really old ARC systems that did have ISA/EISA
bus, but we never ran on them and they are too old to ever grow
support for).

Differential Review: https://reviews.freebsd.org/D16290
2018-07-27 18:34:20 +00:00
Mark Johnston
6c85795a25 Fix handling of KVA in kmem_bootstrap_free().
Do not use vm_map_remove() to release KVA back to the system.  Because
kernel map entries do not have an associated VM object, with r336030
the vm_map_remove() call will not update the kernel page tables.  Avoid
relying on the vm_map layer and instead update the pmap and release KVA
to the kernel arena directly in kmem_bootstrap_free().

Because the pmap updates will generally result in superpage demotions,
modify pmap_init() to insert PTPs shadowed by superpage mappings into
the kernel pmap's radix tree.

While here, port r329171 to i386.

Reported by:	alc
Reviewed by:	alc, kib
X-MFC with:	r336505
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D16426
2018-07-27 15:46:34 +00:00
Konstantin Belousov
41bed185c1 Extend ranges of the critical sections to ensure that context switch
code never sees FPU pcb flags not consistent with the hardware state.

This is uncovered by the eager FPU switch mode.

Analyzed, reviewed and tested by:	gleb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-07-24 19:22:52 +00:00
Alan Cox
db016164e0 Annotate a parameter as unused.
X-MFC with:	r336288
2018-07-20 16:31:25 +00:00
Mark Johnston
483f692ea6 Have preload_delete_name() free pages backing preloaded data.
On i386 and amd64, add a vm_phys segment for physical memory used to
store the kernel binary and other preloaded data.  This makes it
possible to free such memory back to the system once it is no longer
needed, e.g., when a preloaded kernel module is unloaded.  Previously,
it would have remained unused.

Reviewed by:	kib, royger
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D16330
2018-07-19 20:00:28 +00:00
Mark Johnston
697be9a3bd Restore the check for the page size extension after r332489.
Without this, the support for transparent superpage promotion on i386
was left disabled.

Reviewed by:	alc, kib
Differential Revision:	https://reviews.freebsd.org/D16279
2018-07-15 22:18:31 +00:00
Alan Cox
f2ae19e501 Correct some typos.
Reviewed by:	kib
2018-07-14 19:35:41 +00:00
Alan Cox
8c0873714c Add support for pmap_enter(..., psind=1) to the i386 pmap. In other words,
add support for explicitly requesting that pmap_enter() create a 2 or 4 MB
page mapping.  (Essentially, this feature allows the machine-independent
layer to create superpage mappings preemptively, and not wait for automatic
promotion to occur.)

Export pmap_ps_enabled() to the machine-independent layer.

Add a flag to pmap_pv_insert_pde() that specifies whether it should fail or
reclaim a PV entry when one is not available.

Refactor pmap_enter_pde() into two functions, one by the same name, that is
a general-purpose function for creating PDE PG_PS mappings, and another,
pmap_enter_4mpage(), that is used to prefault 2 or 4 MB read- and/or
execute-only mappings for execve(2), mmap(2), and shmat(2).

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D16246
2018-07-14 17:20:27 +00:00
Alan Cox
76999163ad Eliminate unnecessary differences between i386's pmap_enter() and amd64's.
For example, fully construct the new PTE before entering the critical
section.  This change is a stepping stone to psind == 1 support on i386.

Reviewed by:	kib, markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D16188
2018-07-10 18:00:55 +00:00
Alan Cox
717d5c0b47 Invalidate the mapping before updating its physical address.
Doing so ensures that all threads sharing the pmap have a consistent
view of the mapping.  This fixes the problem described in the commit
log messages for r329254 without the overhead of an extra fault in the
common case.  Once other pmap_enter() implementations are similarly
modified, the workaround added in r329254 can be removed, reducing the
overhead of CoW faults.

See also r335784 for amd64.  The i386 implementation of pmap_enter()
already reused the PV entry from the old mapping.

Reviewed by:	kib, markj
Tested by:	pho
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D16133
2018-07-08 16:51:54 +00:00
Konstantin Belousov
53dec71d39 Expand x86 struct pcpus to UMA_PCPU_ALLOC_SIZE AKA PAGE_SIZE.
This restores counters(9) operation.
Revert r336024. Improve assert of pcpu size on x86.

Reviewed by:	mmacy
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D16163
2018-07-06 19:50:44 +00:00
Konstantin Belousov
fb0a281196 Revert to recommit with the proper message. 2018-07-06 19:50:25 +00:00
Konstantin Belousov
1614716655 Save a call to pmap_remove() if entry cannot have any pages mapped.
Due to the way rtld creates mappings for the shared objects, each dso
causes unmap of at least three guard map entries.  For instance, in
the buildworld load, this change reduces the amount of pmap_remove()
calls by 1/5.

Profiled by:	alc
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D16148
2018-07-06 19:48:47 +00:00
Hans Petter Selasky
a7a7f5b472 Make sure kernel modules built by default are portable between UP and
SMP systems by extending defined(SMP) to include defined(KLD_MODULE).

This is a regression issue after r335873 .

Discussed with:		mmacy@
Sponsored by:		Mellanox Technologies
2018-07-06 10:13:42 +00:00
Matt Macy
ab3059a8e7 Back pcpu zone with domain correct pages
- Change pcpu zone consumers to use a stride size of PAGE_SIZE.
  (defined as UMA_PCPU_ALLOC_SIZE to make future identification easier)

- Allocate page from the correct domain for a given cpu.

- Don't initialize pc_domain to non-zero value if NUMA is not defined
  There are some misconceptions surrounding this field. It is the
  _VM_ NUMA domain and should only ever correspond to valid domain
  values as understood by the VM.

The former slab size of sizeof(struct pcpu) was somewhat arbitrary.
The new value is PAGE_SIZE because that's the smallest granularity
which the VM can allocate a slab for a given domain. If you have
fewer than PAGE_SIZE/8 counters on your system there will be some
memory wasted, but this is obviously something where you want the
cache line to be coming from the correct domain.

Reviewed by: jeff
Sponsored by: Limelight Networks
Differential Revision:  https://reviews.freebsd.org/D15933
2018-07-06 02:06:03 +00:00
Konstantin Belousov
945a6b310b Extend r335969 to superpages.
It is possible that a fictitious unmanaged userspace mapping of
superpage is created on x86, e.g. by pmap_object_init_pt(), with the
physical address outside the vm_page_array[] coverage.

Noted and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D16085
2018-07-05 17:28:06 +00:00
Konstantin Belousov
a0ef97f6fa Revert r335999 to re-commit with the correct error message. 2018-07-05 17:26:13 +00:00
Konstantin Belousov
b26d7d4ceb Use vm_page_unhold_pages() instead of manually rolling unoptimized
version of it.

Noted by:	alc
Sponsored by:	The FreeBSD Foundation
2018-07-05 16:40:20 +00:00
Konstantin Belousov
c59dfa63bf In x86 pmap_extract_and_hold(), there is no need to recalculate the
physical address, which is readily available after sucessfull
vm_page_pa_tryrelock().

Noted and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D16085
2018-07-05 16:38:54 +00:00
Konstantin Belousov
81dac87135 In x86 pmap_extract_and_hold(), there is no need to recalculate the
physical address, which is readily available after sucessfull
vm_page_pa_tryrelock().

Noted and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D16085
2018-07-05 16:27:34 +00:00
Konstantin Belousov
84a15fe70d In x86 pmap_extract_and_hold()s, handle the case of PHYS_TO_VM_PAGE()
returning NULL.

vm_fault_quick_hold_pages() can be legitimately called on userspace
mappings backed by fictitious pages created by unmanaged device and sg
pagers.

Note that other architectures pmap_extract_and_hold() might need
similar fix, but I postponed the examination.

Reported by:	bde
Discussed with:	alc
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D16085
2018-07-04 21:21:59 +00:00
Matt Macy
f4b3640475 inline atomics and allow tied modules to inline locks
- inline atomics in modules on i386 and amd64 (they were always
  inline on other arches)
- allow modules to opt in to inlining locks by specifying
  MODULE_TIED=1 in the makefile

Reviewed by: kib
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16079
2018-07-02 19:48:38 +00:00