Commit Graph

13375 Commits

Author SHA1 Message Date
Konstantin Belousov
a9c7546a1d Fix a corner case in demotion of kernel mappings.
It is possible for the kernel mapping to be created with superpage by
directly installing pde using pmap_enter_2mpage() without filling the
corresponding page table page.  This can happen e.g. if the range is
already backed by reservation and vm_fault_soft_fast() conditions are
satisfied, which was observed on the pipe_map.

In this case, demotion must fill the page obtained from the pmap
radix, same as if the page is newly allocated.  Use PG_PROMOTED bit as
an indicator that the page is valid, instead of the wire count of the
page table page.

Since the PG_PROMOTED bit is set on pde when we leave TLB entries for
4k pages around, which in particular means that the ptes were filled,
it provides more correct indicator.  Note that pmap_protect_pde()
clears PG_PROMOTED, which handles the case when protection was changed
on the superpage without adjusting ptes.

Reported by:	pho
In collaboration with:	alc
Tested by:	alc, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D20380
2019-05-24 17:19:06 +00:00
Konstantin Belousov
48ec6d3bc9 Do not call hw_mds_recalculate() from initializecpu().
If MDS mitigation is enabled by the tunable but MDS microcode is not
early-loaded, software mitigation is selected.  This causes
initializecpu() to try to allocate memory which makes boot process
very unhappy.

Create SYSINIT that runs sufficiently late to succeed.

Reported by:	naddy
PR:	237968
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-05-21 22:56:21 +00:00
Conrad Meyer
e2e050c8ef Extract eventfilter declarations to sys/_eventfilter.h
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.

EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).

As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions.  The remainder of the patch addresses
adding appropriate includes to fix those files.

LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).

No functional change (intended).  Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed.  __FreeBSD_version has been bumped.
2019-05-20 00:38:23 +00:00
Brooks Davis
376f78be92 FCP-101: Bump __FreeBSD_version for driver removal.
Remove gone_by_fcp101_dev macro.

Remove orphaned comment.
2019-05-17 15:24:54 +00:00
Brooks Davis
7a582e5374 FCP-101: Remove xe(4)
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:24:44 +00:00
Brooks Davis
02fae06a11 FCP-101: Remove wb(4)
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:24:34 +00:00
Brooks Davis
e8504bf9e7 FCP-101: Remove vx(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:24:26 +00:00
Brooks Davis
be345ff023 FCP-101: Remove txp(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:24:17 +00:00
Brooks Davis
b1b1c2fe38 FCP-101: Remove tx(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:24:08 +00:00
Brooks Davis
7c897ca91f FCP-101: Remove tl(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:24:00 +00:00
Brooks Davis
90089841de FCP-101: Remove sn(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:23:52 +00:00
Brooks Davis
3b70dd81f5 FCP-101: Remove sf(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:23:43 +00:00
Brooks Davis
607790d10f FCP-101: Remove pcn(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:23:34 +00:00
Brooks Davis
dd262716a1 FCP-101: Remove fe(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:23:26 +00:00
Brooks Davis
3ee01a1385 FCP-101: Remove ex(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:23:18 +00:00
Brooks Davis
e153ee663a FCP-101: Remove ep(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:23:10 +00:00
Brooks Davis
05aa6e583b FCP-101: Remove ed(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:23:02 +00:00
Brooks Davis
08ac01a92c FCP-101: Remove de(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:22:54 +00:00
Brooks Davis
e1edf1240b FCP-101: Remove cs(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:22:45 +00:00
Konstantin Belousov
7c5a46a1bc Remove resolver_qual from DEFINE_IFUNC/DEFINE_UIFUNC macros.
In all practical situations, the resolver visibility is static.

Requested by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:	so (emaste)
Differential revision:	https://reviews.freebsd.org/D20281
2019-05-16 22:20:54 +00:00
Ryan Libby
d375016d8d x86: spell vpxor %zmm0 as vpxord
Fix gcc/gas amd64 & i386 build after r347566.

Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20264
2019-05-15 18:13:43 +00:00
Konstantin Belousov
7355a02bdd Mitigations for Microarchitectural Data Sampling.
Microarchitectural buffers on some Intel processors utilizing
speculative execution may allow a local process to obtain a memory
disclosure.  An attacker may be able to read secret data from the
kernel or from a process when executing untrusted code (for example,
in a web browser).

Reference: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00233.html
Security:	CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091
Security:	FreeBSD-SA-19:07.mds
Reviewed by:	jhb
Tested by:	emaste, lwhsu
Approved by:	so (gtetlow)
2019-05-14 17:02:20 +00:00
Dmitry Chagin
c5156c7785 Linuxulator depends on a fundamental kernel settings such as SMP. Many
of them listed in opt_global.h which is not generated while building
modules outside of a kernel and such modules never match real cofigured
kernel.

So, we should prevent our users from building obviously defective modules.

Therefore, remove the root cause of the building of modules outside of a
kernel - the possibility of building modules with DEBUG or KTR flags.
And remove all of DEBUG printfs as it is incomplete and in threaded
programms not informative, also a half of system call does not have DEBUG
printf. For debuging Linux programms we have dtrace, ktr and ktrace ability.

PR:		222861
Reviewed by:	trasz
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20178
2019-05-13 18:24:29 +00:00
Mateusz Guzik
a8c2fcb287 x86: store pending bitmapped IPIs in per-cpu areas
This gets rid of the global cpu_ipi_pending array.

While replace cmpset with fcmpset in the delivery code and opportunistically
check if given IPI is already pending.

Sponsored by:	The FreeBSD Foundation
2019-05-12 06:36:54 +00:00
Andrew Gallatin
542970fa2d Remove IPSEC from GENERIC due to performance issues
Having IPSEC compiled into the kernel imposes a non-trivial
performance penalty on multi-threaded workloads due to IPSEC
refcounting. In my benchmarks of multi-threaded UDP
transmit (connected sockets), I've seen a roughly 20% performance
penalty when the IPSEC option is included in the kernel (16.8Mpps
vs 13.8Mpps with 32 senders on a 14 core / 28 HTT Xeon
2697v3)). This is largely due to key_addref() incrementing and
decrementing an atomic reference count on the default
policy. This cause all CPUs to stall on the same cacheline, as it
bounces between different CPUs.

Given that relatively few users use ipsec, and that it can be
loaded as a module, it seems reasonable to ask those users to
load the ipsec module so as to avoid imposing this penalty on the
GENERIC kernel. Its my hope that this will make FreeBSD look
better in "out of the box" benchmark comparisons with other
operating systems.

Many thanks to ae for fixing auto-loading of ipsec.ko when
ifconfig tries to configure ipsec, and to cy for volunteering
to ensure the the racoon ports will load the ipsec.ko module

Reviewed by:	cem, cy, delphij, gnn, jhb, jpaetzel
Differential Revision:	https://reviews.freebsd.org/D20163
2019-05-09 22:38:15 +00:00
Kyle Evans
251a32b5b2 tun/tap: merge and rename to tuntap
tun(4) and tap(4) share the same general management interface and have a lot
in common. Bugs exist in tap(4) that have been fixed in tun(4), and
vice-versa. Let's reduce the maintenance requirements by merging them
together and using flags to differentiate between the three interface types
(tun, tap, vmnet).

This fixes a couple of tap(4)/vmnet(4) issues right out of the gate:
- tap devices may no longer be destroyed while they're open [0]
- VIMAGE issues already addressed in tun by kp

[0] emaste had removed an easy-panic-button in r240938 due to devdrn
blocking. A naive glance over this leads me to believe that this isn't quite
complete -- destroy_devl will only block while executing d_* functions, but
doesn't block the device from being destroyed while a process has it open.
The latter is the intent of the condvar in tun, so this is "fixed" (for
certain definitions of the word -- it wasn't really broken in tap, it just
wasn't quite ideal).

ifconfig(8) also grew the ability to map an interface name to a kld, so
that `ifconfig {tun,tap}0` can continue to autoload the correct module, and
`ifconfig vmnet0 create` will now autoload the correct module. This is a
low overhead addition.

(MFC commentary)

This may get MFC'd if many bugs in tun(4)/tap(4) are discovered after this,
and how critical they are. Changes after this are likely easily MFC'd
without taking this merge, but the merge will be easier.

I have no plans to do this MFC as of now.

Reviewed by:	bcr (manpages), tuexen (testing, syzkaller/packetdrill)
Input also from:	melifaro
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D20044
2019-05-08 02:32:11 +00:00
Ed Maste
0e26cd440f make sysent after r347228
Regenerate to add @generated tag in generated files.
2019-05-07 18:10:21 +00:00
Conrad Meyer
665919aaaf x86: Implement MWAIT support for stopping a CPU
IPI_STOP is used after panic or when ddb is entered manually.  MONITOR/
MWAIT allows CPUs that support the feature to sleep in a low power way
instead of spinning.  Something similar is already used at idle.

It is perhaps especially useful in oversubscribed VM environments, and is
safe to use even if the panic/ddb thread is not the BSP.  (Except in the
presence of MWAIT errata, which are detected automatically on platforms with
known wakeup problems.)

It can be tuned/sysctled with "machdep.stop_mwait," which defaults to 0
(off).  This commit also introduces the tunable
"machdep.mwait_cpustop_broken," which defaults to 0, unless the CPU has
known errata, but may be set to "1" in loader.conf to signal that mwait
wakeup is broken on CPUs FreeBSD does not yet know about.

Unfortunately, Bhyve doesn't yet support MONITOR extensions, so this doesn't
help bhyve hypervisors running FreeBSD guests.

Submitted by:   Anton Rang <rang AT acm.org> (earlier version)
Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20135
2019-05-04 20:34:26 +00:00
Conrad Meyer
83dc49beaf x86: Define pc_monitorbuf as a logical structure
Rather than just accessing it via pointer cast.

No functional change intended.

Discussed with:	kib (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20135
2019-05-04 17:35:13 +00:00
Dmitry Chagin
d151344dbf In order to reduce duplication between MD parts of the Linuxulator
move bits that are MI out into the headers in compat/linux.
For that remove bogus _packed attribute from struct l_sockaddr
and use MI types for struct members.

And continue to move into the linux_common module a code that is
intended for both Linuxulator modules (both instruction set - 32 & 64 bit)
or for external modules like linsysfs or linprocfs.

To avoid header pollution introduce new sys/compat/linux_common.h header.

Reviewed by:	emaste
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20137
2019-05-03 08:42:49 +00:00
Conrad Meyer
d6745408c7 Add a COMPAT_FREEBSD12 kernel option.
Use it wherever COMPAT_FREEBSD11 is currently specified, like r309749.

Reviewed by:	imp, jhb, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20120
2019-05-02 18:10:23 +00:00
Dmitry Chagin
c034ecf316 Since r339624 HEAD does not need for backslashes in syscalls.master,
however to make a merge r345471 to the stable add backslashes
to the syscalls.master.

MFC after:	3 days
2019-04-23 18:10:46 +00:00
Konstantin Belousov
fdfe249b63 Fix initial x87 state after r345562.
After the referenced commit, we did not set x87 and sse valid bits in
the xstate_bv bitmask for initial fpu state (stored in memory), when
using XSAVE.

The state is loaded into FPU register file to initialize the process
FPU state, and since both bits were clear, the default x87 and SSE
states were loaded.  By chance, FreeBSD ABI SSE2 state is same as FPU
initial state, so the bug is not visible for 64bit processes.  But on
i386, the precision control should be set to double (53bit mantissa),
instead of the default double extended (64bit mantissa). For 32bit
processes on amd64, kernel reloads control word with the right mask,
which only left native i386 and amd64 native but using x87 as
affected.

Fix it by setting minimal required xstate_bv mask.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-04-16 19:46:02 +00:00
Warner Losh
f7ab01581a Move mpr/mps drivers from per-arch NOTES files into the MI notes
file. They are in more arches they they aren't. Add appropriate
nodevice directives in powerpc and arm.
2019-04-13 06:30:45 +00:00
Konstantin Belousov
2a508645b4 pci_cfgreg.c: Use io port config access for early boot time.
Some early PCIe chipsets are explicitly listed in the white-list to
enable use of the MMIO config space accesses, perhaps because ACPI
tables were not reliable source of the base MCFG address at that time.
For that chipsets, MCFG base was read from the known chipset MCFGbase
config register.

During very early stage of boot, when access to the PCI config space
is performed (see e.g. pci_early_quirks.c), we cannot map 255MB of
registers because the method used with pre-boot pmap overflows initial
kernel page tables.

Move fallback to read MCFGbase to the attachment method of the
x86/legacy device, which removes code duplication, and results in the
use of io accesses until MCFG is parsed or legacy attach called.

For amd64, pre-initialize cfgmech with CFGMECH_1, right now we
dynamically assign CFGMECH_1 to it anyway, and remove checks for
CFGMECH_NONE.

There is a mention in the Intel documentation for corresponding
chipsets that OS must use either io port or MMIO access method, but we
already break this rule by reading MCFGbase register, so one more
access seems to be innocent.

Reported by:	longwitz@incore.de
PR:	236838
Reviewed by:	avg (other version), jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D19833
2019-04-09 18:07:17 +00:00
Warner Losh
d99880cf46 Add mpr, mps, mpt to NOTES file
Add these to all the architectures that these are in the GENERIC
kernel.
2019-04-05 02:54:02 +00:00
Jung-uk Kim
278f0de60d Merge ACPICA 20190329. 2019-03-29 20:21:28 +00:00
Conrad Meyer
8207def158 x86: Use XSAVEOPT for fpusave(), when available
Remove redundant npxsave_core definition while here.

Suggested by:	Anton Rang
Reviewed by:	kib, Anton Rang <rang AT acm.org>
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19665
2019-03-26 22:45:41 +00:00
Dmitry Chagin
1f66cb5154 Regen from r345471.
MFC after:	1 month
2019-03-24 14:51:17 +00:00
Dmitry Chagin
f730d606d5 Update syscall.master to 5.0.
For 32-bit Linuxulator, ipc() syscall was historically
the entry point for the IPC API. Starting in Linux 4.18, direct
syscalls are provided for the IPC. Enable it.

MFC after:	1 month
2019-03-24 14:50:02 +00:00
Dmitry Chagin
d9be8b39a5 Regen for r345469 (shmat()).
MFC after:	1 month
2019-03-24 14:46:07 +00:00
Dmitry Chagin
7dabf89bcf Linux between 4.18 and 5.0 split IPC system calls.
In preparation for doing this in the Linuxulator modify our linux_shmat()
to match actual Linux shmat() system call.

MFC after:	1 month
2019-03-24 14:44:35 +00:00
Mark Johnston
64087fd7f3 Disallow preemptive creation of wired superpage mappings.
There are some unusual cases where a process may cause an mlock()ed
range of memory to be unmapped.  If the application subsequently
faults on that region, the handler may attempt to create a superpage
mapping backed by the resident, wired pages.  However, the pmap code
responsible for creating such a mapping (pmap_enter_pde() on i386
and amd64) does not ensure that a leaf page table page is available
if the superpage is later demoted; the demotion operation must therefore
perform a non-blocking page allocation and must unmap the entire
superpage if the allocation fails.  The pmap layer ensures that this
can never happen for wired mappings, and so the case described above
breaks that invariant.

For now, simply ensure that the MI fault handler never attempts to
create a wired superpage except via promotion.

Reviewed by:	kib
Reported by:	syzbot+292d3b0416c27c131505@syzkaller.appspotmail.com
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19670
2019-03-21 19:52:50 +00:00
Konstantin Belousov
c51b496e0b i386: improve detection of the fast page fault assist.
In particular, check that we are assisting the page fault from kernel mode.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-03-17 18:31:48 +00:00
Konstantin Belousov
fd8d844f76 amd64 KPTI: add control from procctl(2).
Add the infrastructure to allow MD procctl(2) commands, and use it to
introduce amd64 PTI control and reporting.  PTI mode cannot be
modified for existing pmap, the knob controls PTI of the new vmspace
created on exec.

Requested by:	jhb
Reviewed by:	jhb, markj (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D19514
2019-03-16 11:44:33 +00:00
Konstantin Belousov
6f1fe3305a amd64: Add md process flags and first P_MD_PTI flag.
PTI mode for the process pmap on exec is activated iff P_MD_PTI is set.

On exec, the existing vmspace can be reused only if pti mode of the
pmap matches the P_MD_PTI flag of the process.  Add MD
cpu_exec_vmspace_reuse() callback for exec_new_vmspace() which can
vetoed reuse of the existing vmspace.

MFC note: md_flags change struct proc KBI.

Reviewed by:	jhb, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D19514
2019-03-16 11:31:01 +00:00
Konstantin Belousov
39d70f6b80 Provide deterministic (and somewhat useful) value for RDPID result,
and for %ecx after RDTSCP.

Initialize TSC_AUX MSR with CPUID.  It allows for userspace to cheaply
identify CPU it was executed on some time ago, which is sometimes useful.

Note: The values returned might be changed in future.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-03-15 16:43:28 +00:00
Andrey V. Elsukov
40025d42fd Fix typo.
MFC after:	1 week
2019-03-07 10:01:32 +00:00
John Baldwin
2e43efd0bb Drop "All rights reserved" from my copyright statements.
Reviewed by:	rgrimes
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D19485
2019-03-06 22:11:45 +00:00
Edward Tomasz Napierala
1699546def Remove sv_pagesize, originally introduced with r100384.
In all of the architectures we have today, we always use PAGE_SIZE.
While in theory one could define different things, none of the
current architectures do, even the ones that have transitioned from
32-bit to 64-bit like i386 and arm. Some ancient mips binaries on
other systems used 8k instead of 4k, but we don't support running
those and likely never will due to their age and obscurity.

Reviewed by:	imp (who also contributed the commit message)
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19280
2019-03-01 16:16:38 +00:00