Commit Graph

13607 Commits

Author SHA1 Message Date
Conrad Meyer
4ae224c663 Revert r240317 to prevent leaking pmap entries
Subsequent to r240317, kmem_free() was replaced with kva_free() (r254025).
kva_free() releases the KVA allocation for the mapped region, but no longer
clears the pmap (pagetable) entries.

An affected pmap_unmapdev operation would leave the still-pmap'd VA space
free for allocation by other KVA consumers.  However, this bug easily
avoided notice for ~7 years because most devices (1) never call
pmap_unmapdev and (2) on amd64, mostly fit within the DMAP and do not need
KVA allocations.  Other affected arch are less popular: i386, MIPS, and
PowerPC.  Arm64, arm32, and riscv are not affected.

Reported by:	Don Morris <dgmorris AT earthlink.net>
Submitted by:	Don Morris (amd64 part)
Reviewed by:	kib, markj, Don (!amd64 parts)
MFC after:	I don't intend to, but you might want to
Sponsored by:	Dell Isilon
Differential Revision:	https://reviews.freebsd.org/D25689
2020-07-16 23:29:26 +00:00
Mark Johnston
e64080e79c Switch from SCTP to SCTP_SUPPORT in GENERIC configs.
This removes SCTP from in-tree kernel configuration files.  Now, SCTP
can be enabled by simply loading the module, as discussed on
freebsd-net@.

Reviewed by:	tuexen
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25611
2020-07-16 15:09:04 +00:00
Konstantin Belousov
c123ab20ec Improve description of the vector argument for i386 smp_targeted_tlb_shootdown().
Submitted by:	alc
MFC after:	20 days
2020-07-15 16:12:00 +00:00
Konstantin Belousov
dc43978aa5 amd64: allow parallel shootdown IPIs
Stop using smp_ipi_mtx to protect global shootdown state, and
move/multiply the global state into pcpu.  Now each CPU can initiate
shootdown IPI independently from other CPUs.  Initiator enters
critical section, then fills its local PCPU shootdown info
(pc_smp_tlb_XXX), then clears scoreboard generation at location (cpu,
my_cpuid) for each target cpu.  After that IPI is sent to all targets
which scan for zeroed scoreboard generation words.  Upon finding such
word the shootdown data is read from corresponding cpu' pcpu, and
generation is set.  Meantime initiator loops waiting for all zeroed
generations in scoreboard to update.

Initiator does not disable interrupts, which should allow
non-invalidation IPIs from deadlocking, it only needs to disable
preemption to pin itself to the instance of the pcpu smp_tlb data.

The generation is set before the actual invalidation is performed in
handler. It is safe because target CPU cannot return to userspace
before handler finishes. In principle only NMI can preempt the
handler, but NMI would see the kernel handler frame and not touch
not-invalidated user page table.

Handlers loop until they do not see zeroed scoreboard generations.
This, together with hardware keeping one pending IPI in LAPIC IRR
should prevent lost shootdowns.

Notes.
1. The code does protect writes to LAPIC ICR with exclusion. I believe
   this is fine because we in fact do not send IPIs from interrupt
   handlers. More for !x2APIC mode where ICR access for write requires
   two registers write, we disable interrupts around it. If considered
   incorrect, I can add per-cpu spinlock around ipi_send().
2. Scoreboard lines owned by given target CPU can be padded to the
   cache line, to reduce ping-pong.

Reviewed by:	markj (previous version)
Discussed with:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D25510
2020-07-14 20:37:50 +00:00
Conrad Meyer
64612d4e44 geom(4): Kill GEOM_PART_EBR_COMPAT option
Take advantage of Warner's nice new real GEOM aliasing system and use it for
aliased partition names that actually work.

Our canonical EBR partition name is the weird, not-default-on-x86-prior-to-
this-revision "da1p4+00001234."  However, if compatibility mode (tunable
kern.geom.part.ebr.compat_aliases) is enabled (1, default), we continue to
provide the alias names like "da1p5" in addition to the weird canonical
names.

Naming partition providers was just one aspect of the COMPAT knob; in
addition it limited mutability, in part because it did not preserve existing
EBR header content aside from that of LBA 0.  This change saves the EBR
header for LBA 0, as well as for every EBR partition encountered.  That way,
when we write out the EBR partition table on modification, we can restore
any bootloader or other metadata in both LBA0 (the first data-containing EBR
may start after 0) as well as every logical EBR we read from the disk, and
only update the geometry metadata and linked list pointers that describe the
actual partitioning.

(This change does not add support for the 'bootcode' verb to EBR.)

PR:		232463
Reported by:	Manish Jain <bourne.identity AT hotmail.com>
Discussed with:	ae (no objection)
Relnotes:	maybe
Differential Revision:	https://reviews.freebsd.org/D24939
2020-07-01 02:16:36 +00:00
Kyle Evans
5403f186a7 linuxolator: implement memfd_create syscall
This effectively mirrors our libc implementation, but with minor fudging --
name needs to be copied in from userspace, so we just copy it straight into
stack-allocated memfd_name into the correct position rather than allocating
memory that needs to be cleaned up.

The sealing-related fcntl(2) commands, F_GET_SEALS and F_ADD_SEALS, have
also been implemented now that we support them.

Note that this implementation is still not quite at feature parity w.r.t.
the actual Linux version; some caveats, from my foggy memory:

- Need to implement SHM_GROW_ON_WRITE, default for memfd (in progress)
- LTP wants the memfd name exposed to fdescfs
- Linux allows open() of an fdescfs fd with O_TRUNC to truncate after dup.
  (?)

Interested parties can install and run LTP from ports (devel/linux-ltp) to
confirm any fixes.

PR:		240874
Reviewed by:	kib, trasz
Differential Revision:	https://reviews.freebsd.org/D21845
2020-06-29 03:09:14 +00:00
Edward Tomasz Napierala
a39cdcd7e7 Regen.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-06-27 14:43:29 +00:00
Edward Tomasz Napierala
308e194cbf Add proper types for linux message queue syscalls; mostly taken
from 32-bit Linuxulator.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25386
2020-06-27 14:42:08 +00:00
Edward Tomasz Napierala
36507f85dc Add syscall definitions for linux xattr syscalls.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25387
2020-06-27 14:39:44 +00:00
Edward Tomasz Napierala
5ac2674278 Adapt linuxulator syscalls.master files to the new layout.
No functional changes.

Reviewed by:	brooks
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25381
2020-06-21 10:09:34 +00:00
Brandon Bergren
40b664f64b [PowerPC] More relocation fixes
It turns out relocating the symbol table itself can cause issues, like fbt
crashing because it applies the offsets to the kernel twice.

This had been previously brought up in rS333447 when the stoffs hack was
added, but I had been unaware of this and reimplemented symtab relocation.

Instead of relocating the symbol table, keep track of the relocation base
in ddb, so the ddb symbols behave like the kernel linker-provided symbols.

This is intended to be NFC on platforms other than PowerPC, which do not
use fully relocatable kernels. (The relbase will always be 0)

 * Remove the rest of the stoffs hack.
 * Remove my half-baked displace_symbol_table() function.
 * Extend ddb initialization to cope with having a relocation offset on the
   kernel symbol table.
 * Fix my kernel-as-initrd hack to work with booke64 by using a temporary
   mapping to access the data.
 * Fix another instance of __powerpc__ that is actually RELOCATABLE_KERNEL.
 * Change the behavior or X_db_symbol_values to apply the relocation base
   when updating valp, to match link_elf_symbol_values() behavior.

Reviewed by:	jhibbits
Sponsored by:	Tag1 Consulting, Inc.
Differential Revision:	https://reviews.freebsd.org/D25223
2020-06-21 03:39:26 +00:00
Edward Tomasz Napierala
bafd96b8dd Regen after r362440.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-06-20 18:31:02 +00:00
Edward Tomasz Napierala
52c81be11a Add linux_madvise(2) instead of having Linux apps call the native
FreeBSD madvise(2) directly.  While some of the flag values match,
most don't.

PR:		kern/230160
Reported by:	markj
Reviewed by:	markj
Discussed with:	brooks, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25272
2020-06-20 18:29:22 +00:00
Eric van Gyzen
6fba90f201 FPU init: allocate initial state from UMA to ensure alignment
The Intel Instruction Set Reference says this about the XSAVE instruction:

    Use of a destination operand not aligned to 64-byte boundary
    (in either 64-bit or 32-bit modes) results in a general-protection
    (#GP) exception.

This alignment happens naturally when all malloc buckets are powers
of two.  However, this change is necessary on some systems when
certain non-power-of-two (and non-multiple of 64) malloc buckets
are defined.

Reviewed by:	cem; kib; earlier version by jhb
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D25098
2020-06-12 21:17:56 +00:00
Eric van Gyzen
701acc2fd8 FPU: make xsave_area_desc static
...because it can be.

Reviewed by:	cem kib
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D25098
2020-06-12 21:12:26 +00:00
Eric van Gyzen
674cbe7908 FPU init: Do potentially blocking operations before disabling interrupts
In particular, uma_zcreate creates sysctl oids, which locks an sx lock,
which uses IPIs under contention.  IPIs tend not to work very well
when interrupts are disabled.  Who knew, right?

Reviewed by:	cem kib
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D25098
2020-06-12 21:10:45 +00:00
Konstantin Belousov
3b23ffe271 amd64 pmap: reorder IPI send and local TLB flush in TLB invalidations.
Right now code first flushes all local TLB entries that needs to be
flushed, then signals IPI to remote cores, and then waits for
acknowledgements while spinning idle.  In the VMWare article 'Don’t
shoot down TLB shootdowns!' it was noted that the time spent spinning
is lost, and can be more usefully used doing local TLB invalidation.

We could use the same invalidation handler for local TLB as for
remote, but typically for pmap == curpmap we can use INVLPG for locals
instead of INVPCID on remotes, since we cannot control context
switches on them.  Due to that, keep the local code and provide the
callbacks to be called from smp_targeted_tlb_shootdown() after IPIs
are fired but before spin wait starts.

Reviewed by:	alc, cem, markj, Anton Rang <rang at acm.org>
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D25188
2020-06-10 22:07:57 +00:00
Mark Johnston
81302f1d77 Fix boot on systems where NUMA domain 0 is unpopulated.
- Add vm_phys_early_add_seg(), complementing vm_phys_early_alloc(), to
  ensure that segments registered during hammer_time() are placed in the
  right domain.  Otherwise, since the SRAT is not parsed at that point,
  we just add them to domain 0, which may be incorrect and results in a
  domain with only several MB worth of memory.
- Fix uma_startup1() to try allocating memory for zones from any domain.
  If domain 0 is unpopulated, the allocation will simply fail, resulting
  in a page fault slightly later during boot.
- Change _vm_phys_domain() to return -1 for addresses not covered by the
  affinity table, and change vm_phys_early_alloc() to handle wildcard
  domains.  This is necessary on amd64, where the page array is dense
  and pmap_page_array_startup() may allocate page table pages for
  non-existent page frames.

Reported and tested by:	Rafael Kitover <rkitover@gmail.com>
Reviewed by:	cem (earlier version), kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25001
2020-05-28 19:41:00 +00:00
Conrad Meyer
852c303b61 copystr(9): Move to deprecate (attempt #2)
This reapplies logical r360944 and r360946 (reverting r360955), with fixed
copystr() stand-in replacement macro.  Eventually the goal is to convert
consumers and kill the macro, but for a first step it helps if the macro is
correct.

Prior commit message:

Unlike the other copy*() functions, it does not serve to copy from one
address space to another or protect against potential faults.  It's just
an older incarnation of the now-more-common strlcpy().

Add a coccinelle script to tools/ which can be used to mechanically
convert existing instances where replacement with strlcpy is trivial.
In the two cases which matched, fuse_vfsops.c and union_vfsops.c, the
code was further refactored manually to simplify.

Replace the declaration of copystr() in systm.h with a small macro
wrapper around strlcpy (with correction from brooks@ -- thanks).

Remove N redundant MI implementations of copystr.  For MIPS, this
entailed inlining the assembler copystr into the only consumer,
copyinstr, and making the latter a leaf function.

Reviewed by:		jhb (earlier version)
Discussed with:		brooks (thanks!)
Differential Revision:	https://reviews.freebsd.org/D24672
2020-05-25 16:40:48 +00:00
Mark Johnston
e115748932 Fix the build after r361033 when ACPI is disabled.
Reported by:	Herbert J. Skuhra <herbert@gojira.at>
2020-05-22 01:18:55 +00:00
Konstantin Belousov
ea6020830c amd64: Add a knob to flush RSB on context switches if machine has SMEP.
The flush is needed to prevent cross-process ret2spec, which is not handled
on kernel entry if IBPB is enabled but SMEP is present.
While there, add i386 RSB flush.

Reported by:	Anthony Steinhauser <asteinhauser@google.com>
Reviewed by:	markj, Anthony Steinhauser
Discussed with:	philip
admbugs:	961
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-05-20 22:00:31 +00:00
Mark Johnston
b19149bc56 Fix the i386 build after r361033.
Reported by:	Jenkins
2020-05-14 17:56:44 +00:00
Mark Johnston
e76aab6ae2 Call acpi_pxm_set_proximity_info() slightly earlier on x86.
This function is responsible for setting pc_domain in each pcpu
structure.  Call it from the main function that starts APs, rather than
a separate SYSINIT.  This makes it easier to close the window where
UMA's per-CPU slab allocator may be called while pc_domain is
uninitialized.  In particular, the allocator uses pc_domain to allocate
domain-local pages, so allocations before this point end up using domain
0 for everything.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24757
2020-05-14 16:07:27 +00:00
Conrad Meyer
051fc58cb3 Revert r360944 and r360946 until reported issues can be resolved
Reported by:	cy
2020-05-12 04:34:26 +00:00
Conrad Meyer
580744621f copystr(9): Move to deprecate [2/2]
Unlike the other copy*() functions, it does not serve to copy from one
address space to another or protect against potential faults.  It's just
an older incarnation of the now-more-common strlcpy().

Add a coccinelle script to tools/ which can be used to mechanically
convert existing instances where replacement with strlcpy is trivial.
In the two cases which matched, fuse_vfsops.c and union_vfsops.c, the
code was further refactored manually to simplify.

Replace the declaration of copystr() in systm.h with a small macro
wrapper around strlcpy.

Remove N redundant MI implementations of copystr.  For MIPS, this
entailed inlining the assembler copystr into the only consumer,
copyinstr, and making the latter a leaf function.

Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D24672
2020-05-11 22:57:21 +00:00
Mark Johnston
1d6638472b Remove an obsolete TODO comment from several minidump implementations.
The comment referenced a non-existent function, and these minidump
implementations already buffer discontiguous physical data pages by
mapping them into a single VA range that gets passed to the dump device,
so there is no real advantage in batching calls to blk_write().

The RISC-V and MIPS minidump implementations still write a page at a
time and so would benefit from some form of batching.

MFC after:	2 weeks
Sponsored by:	Juniper Networks, Klara Inc.
2020-04-24 18:47:42 +00:00
Brooks Davis
b24e6ac8b7 Convert canary, execpathp, and pagesizes to pointers.
Use AUXARGS_ENTRY_PTR to export these pointers.  This is a followup to
r359987 and r359988.

Reviewed by:	jhb
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24446
2020-04-16 21:53:17 +00:00
Scott Long
0d9abbcf51 Fix ps_strings type change for i386 2020-04-16 05:27:13 +00:00
Brooks Davis
562894f0dc Centralize compatability translation macros.
Copy the CP, PTRIN, etc macros from freebsd32.h into a sys/abi_compat.h
and replace existing definitation with includes where required. This
eliminates duplicate code and allows Linux and FreeBSD compatability
headers to be included in the same files.

Input from:	cem, jhb
Obtained from:	CheriBSD
MFC after:	2 weeks
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24275
2020-04-14 20:30:48 +00:00
John Baldwin
59838c1a19 Retire procfs-based process debugging.
Modern debuggers and process tracers use ptrace() rather than procfs
for debugging.  ptrace() has a supserset of functionality available
via procfs and new debugging features are only added to ptrace().
While the two debugging services share some fields in struct proc,
they each use dedicated fields and separate code.  This results in
extra complexity to support a feature that hasn't been enabled in the
default install for several years.

PR:		244939 (exp-run)
Reviewed by:	kib, mjg (earlier version)
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D23837
2020-04-01 19:22:09 +00:00
Conrad Meyer
ca0ec73c11 Expand generic subword atomic primitives
The goal of this change is to make the atomic_load_acq_{8,16},
atomic_testandset{,_acq}_long, and atomic_testandclear_long primitives
available in MI-namespace.

The second goal is to get this draft out of my local tree, as anything that
requires a full tinderbox is a big burden out of tree.  MD specifics can be
refined individually afterwards.

The generic implementations may not be ideal for your architecture; feel
free to implement better versions.  If no subword_atomic definitions are
needed, the include can be removed from your arch's machine/atomic.h.
Generic definitions are guarded by defined macros of the same name.  To
avoid picking up conflicting generic definitions, some macro defines are
added to various MD machine/atomic.h to register an existing implementation.

Include _atomic_subword.h in arm and arm64 machine/atomic.h.

For some odd reason, KCSAN only generates some versions of primitives.
Generate the _acq variants of atomic_load.*_8, atomic_load.*_16, and
atomic_testandset.*_long.  There are other questionably disabled primitives,
but I didn't run into them, so I left them alone.  KCSAN is only built for
amd64 in tinderbox for now.

Add atomic_subword implementations of atomic_load_acq_{8,16} implemented
using masking and atomic_load_acq_32.

Add generic atomic_subword implementations of atomic_testandset_long(),
atomic_testandclear_long(), and atomic_testandset_acq_long(), using
atomic_fcmpset_long() and atomic_fcmpset_acq_long().

On x86, add atomic_testandset_acq_long as an alias for
atomic_testandset_long.

Reviewed by:	kevans, rlibby (previous versions both)
Differential Revision:	https://reviews.freebsd.org/D22963
2020-03-25 23:12:43 +00:00
Ed Maste
2733d8c96c retire cx,ctau drivers
The devices supported by these drivers are obsolete ISA cards, and the
sync serial protocols they supported are essentially obsolete too.

Sponsored by:	The FreeBSD Foundation
2020-03-20 16:50:19 +00:00
Brandon Bergren
3069380898 [PowerPC][Book-E] Fix missing load base in elf_cpu_parse_dynamic().
When I implemented MD DYNAMIC parsing, I was originally passing a
linker_file_t so that the MD code could relocate pointers.

However, it turns out this isn't even filled in until later, so it was
always 0.

Just pass the load base (ef->address) directly, as that's really the only
thing we were interested in in the first place.

This fixes a crash on RB800 where it was trying to write to an unmapped
address when updating the GOT.

Reviewed by:	jhibbits
Sponsored by:	Tag1 Consulting, Inc.
Differential Revision:	https://reviews.freebsd.org/D24105
2020-03-18 02:58:18 +00:00
Warner Losh
daba5ace03 Finish removal of bktr
Remove the old ioctl .h files
Remove copying/linking ioctl .h files in instasllworld
Remove bktr from lint
Add now-removed files with ObsoleteFiles
2020-03-01 20:37:42 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Konstantin Belousov
a324b7f71d Fix IBRS for machines with IBRS_ALL capability.
When turning IBRS mitigation using sysctl, as opposed to loader tunable,
send IPI to tweak MSR on all cores.  Right now code only performed MSR write
onr the CPU where sysctl was run.

Properly report hw.ibrs_active for IBRS_ALL.  Split hw_ibrs_ibpb_active out
from ibrs_active, to keep the current semantic of guiding kernel entry and
exit handlers.

Reported and tested by:	mav
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-02-25 17:26:10 +00:00
Mateusz Guzik
4ef55e371a i386: remove no longer needed atomic_load_ptr casts 2020-02-14 23:17:37 +00:00
Mark Johnston
c3d326fd44 Define MAXCPU consistently between the kernel and KLDs.
This reverts r177661.  The change is no longer very useful since
out-of-tree KLDs will be built to target SMP kernels anyway.  Moveover
it breaks the KBI in !SMP builds since cpuset_t's layout depends on the
value of MAXCPU, and several kernel interfaces, notably
smp_rendezvous_cpus(), take a cpuset_t as a parameter.

PR:		243711
Reviewed by:	jhb, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23512
2020-02-05 19:08:21 +00:00
Ed Maste
690a8a6acd regen linuxulator sysent after r357577 2020-02-05 16:54:16 +00:00
Ed Maste
fc7510aef7 linuxulator: implement sendfile
Submitted by:	Bora Özarslan <borako.ozarslan@gmail.com>
Submitted by:	Yang Wang <2333@outlook.jp>
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19917
2020-02-05 16:53:02 +00:00
Ryan Libby
10c8fb47d9 uma: convert mbuf_jumbo_alloc to UMA_ZONE_CONTIG & tag others
Remove mbuf_jumbo_alloc and let large mbuf zones use the new uma default
contig allocator (a copy of mbuf_jumbo_alloc).  Tag other zones which
require contiguous objects, even if they don't use the new default
contig allocator, so that uma knows about their constraints.

Reviewed by:	jeff, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D23238
2020-02-04 22:40:23 +00:00
Mark Johnston
1c29da0279 Reimplement stack capture of running threads on i386 and amd64.
After r355784 the td_oncpu field is no longer synchronized by the thread
lock, so the stack capture interrupt cannot be delievered precisely.
Fix this using a loop which drops the thread lock and restarts if the
wrong thread was sampled from the stack capture interrupt handler.

Change the implementation to use a regular interrupt instead of an NMI.
Now that we drop the thread lock, there is no advantage to the latter.

Simplify the KPIs.  Remove stack_save_td_running() and add a return
value to stack_save_td().  On platforms that do not support stack
capture of running threads, stack_save_td() returns EOPNOTSUPP.  If the
target thread is running in user mode, stack_save_td() returns EBUSY.

Reviewed by:	kib
Reported by:	mjg, pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23355
2020-01-31 15:43:33 +00:00
Mark Johnston
149afbf3ba Fix 64-bit syscall argument fetching in 32-bit Linux syscall handlers.
The Linux32 system call argument fetcher places each argument (passed in
registers in the Linux x86 system call convention) into an entry in the
generic system call args array.  Each member of this array is 8 bytes
wide, so this approach is broken for system calls that take off_t
arguments.

Fix the problem by splitting l_loff_t arguments in the 32-bit system
call descriptions, the same as we do for FreeBSD32.  Change entry points
to handle this using the PAIR32TO64 macro.

Move linux_ftruncate64() into compat/linux.

PR:		243155
Reported by:	Alex S <iwtcex@gmail.com>
Reviewed by:	kib (previous version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23210
2020-01-21 17:28:22 +00:00
Konstantin Belousov
2ee49fac82 Add support for Hygon Dhyana Family 18h processor.
As a new x86 CPU vendor, Chengdu Haiguang IC Design Co., Ltd (Hygon)
is a joint venture between AMD and Haiguang Information Technology Co.,
Ltd., aims at providing x86 processors for China server market.

The first generation Hygon processor(Dhyana) shares most architecture
with AMD's family 17h, but with different CPU vendor ID("HygonGenuine")
and PCI vendor ID(0x1d94) and family series number 18h(Hygon negotiated
with AMD to confirm that only Hygon use family 18h).

To enable Hygon Dhyana support in FreeBSD, add new definitions
HYGON_VENDOR_ID("HygonGenuine") and X86_VENDOR_HYGON(0x1d94) to identify
Hygon Dhyana CPU.

Initialize the CPU features(topology, local APIC ext, MSI, TSC, hwpstate,
MCA, DEBUG_CTL, etc) for amd64 and i386 mode by sharing the code path of
AMD family 17h.

The changes have been applied on FreeBSD 13.0-CURRENT and tested
successfully on Hygon Dhyana processor.

References:
[1] Linux kernel patches for Hygon Dhyana, merged in 4.20:

https://git.kernel.org/tip/c9661c1e80b609cd038db7c908e061f0535804ef

[2] MSR and CPUID definition:

https://www.amd.com/system/files/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf

Submitted by:	Pu Wen <puwen@hygon.cn>
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23163
2020-01-21 13:22:35 +00:00
Kyle Evans
05d7dd739c sysent targets: further cleanup and deduplication
r355473 vastly improved the readability and cleanliness of these Makefiles.
Every single one of them follows the same pattern and duplicates the exact
same logic.

Now that we have GENERATED/SRCS, split SRCS up into the two parameters we'll
use for ${MAKESYSCALLS} rather than assuming a specific ordering of SRCS and
include a common sysent.mk to handle the rest. This makes it less tedious to
make sweeping changes.

Some default values are provided for GENERATED/SYSENT_*; almost all of these
just use a 'syscalls.master' and 'syscalls.conf' in cwd, and they all use
effectively the same filenames with an arbitrary prefix. Most ABIs will be
able to get away with just setting GENERATED_PREFIX and including
^/sys/conf/sysent.mk, while others only need light additions. kern/Makefile
is the notable exception, as it doesn't take a SYSENT_CONF and the generated
files are spread out between ^/sys/kern and ^/sys/sys, but it otherwise fits
the pattern enough to use the common version.

Reviewed by:	brooks, imp
Nice!:		emaste
Differential Revision:	https://reviews.freebsd.org/D23197
2020-01-18 20:37:45 +00:00
Kyle Evans
1171c633fb Set .ORDER for makesyscalls generated files
When either makesyscalls.lua or syscalls.master changes, all of the
${GENERATED} targets are now out-of-date. With make jobs > 1, this means we
will run the makesyscalls script in parallel for the same ABI, generating
the same set of output files.

Prior to r356603 , there is a large window for interlacing output for some
of the generated files that we were generating in-place rather than staging
in a temp dir. After that, we still should't need to run the script more
than once per-ABI as the first invocation should update all of them. Add
.ORDER to do so cleanly.

Reviewed by:	brooks
Discussed with:	sjg
Differential Revision:	https://reviews.freebsd.org/D23099
2020-01-10 18:24:17 +00:00
Mark Johnston
f3e982e764 Define a unified pmap structure for i386.
The overloading of struct pmap for PAE and non-PAE pmaps results in
three distinct layouts for the structure, which is embedded in
struct vmspace.  This causes a large number of duplicate structure
definitions in the i386 kernel's CTF type graph.

Since most pmap fields are the same in the two pmaps, simply provide
side-by-side variants of the fields that are distinct, using fixed-size
types.

PR:		242689
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22896
2020-01-07 15:59:31 +00:00
Mark Johnston
e8bbca1b44 Consistently use pmap_t instead of struct pmap *.
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2020-01-07 15:59:02 +00:00
Alan Cox
1c3a241032 When a copy-on-write fault occurs, pmap_enter() is called on to replace the
mapping to the old read-only page with a mapping to the new read-write page.
To destroy the old mapping, pmap_enter() must destroy its page table and PV
entries and invalidate its TLB entry.  This change simply invalidates that
TLB entry a little earlier, specifically, on amd64 and arm64, before the PV
list lock is held.

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23027
2020-01-04 19:50:25 +00:00
Mateusz Guzik
b249ce48ea vfs: drop the mostly unused flags argument from VOP_UNLOCK
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21427
2020-01-03 22:29:58 +00:00
Edward Tomasz Napierala
cc50333011 Add basic getcpu(2) support to linuxulator. The purpose of this
syscall is to query the CPU number and the NUMA domain the calling
thread is currently running on.  The third argument is ignored.
It doesn't do anything regarding scheduling - it's literally
just a way to query the current state, without any guarantees
you won't get rescheduled an opcode later.

This unbreaks Java from CentOS 8
(java-11-openjdk-11.0.5.10-0.el8_0.x86_64).

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22972
2019-12-31 22:01:08 +00:00
Edward Tomasz Napierala
da7627d797 Regen after r356229.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-12-31 16:01:37 +00:00
Edward Tomasz Napierala
a8bfc7a85c Fix definitions for Linux getcpu(2).
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-12-31 15:57:29 +00:00
Pawel Biernacki
54666dffa8 linux(4): implement copy_file_range(2)
copy_file_range(2) is implemented natively since r350315, make it available
for Linux binaries too.

Reviewed by:	kib (mentor), trasz (previous version)
Approved by:	kib (mentor)
Differential Revision:	https://reviews.freebsd.org/D22959
2019-12-30 18:11:06 +00:00
Edward Tomasz Napierala
ee0fe82ee2 Implement Linux syslog(2) syscall; just enough to make Linux dmesg(8)
utility work.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22465
2019-12-29 15:53:55 +00:00
Alan Cox
e1ab406007 Correctly implement PMAP_ENTER_NOREPLACE in pmap_enter_{l2,pde}() on kernel
mappings.

Reduce code duplication by defining a function, pmap_abort_ptp(), for
handling a common error case.

Simplify error handling in pmap_enter_quick_locked().

Reviewed by:	kib
Tested by:	pho
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D22890
2019-12-29 05:36:01 +00:00
Brandon Bergren
7821a820d0 [PowerPC] Implement Secure-PLT jump table processing for ppc32.
Due to clang and LLD's tendency to use a PLT for builtins, and as they
don't have full support for EABI, we sometimes have to deal with a PLT in
.ko files in a clang-built kernel.

As such, augment the in-kernel linker to support jump table processing.

As there is no particular reason to support lazy binding in kernel modules,
only implement Secure-PLT immediate binding.

As part of these changes, add elf_cpu_parse_dynamic() to the MD API of the
in-kernel linker (except on platforms that use raw object files.)

The new function will allow MD code to act on MD tags in _DYNAMIC.

Use this new function in the PowerPC MD code to ensure BSS-PLT modules using
PLT will be rejected during insertion, and to poison the runtime resolver to
ensure we get a clear panic reason if a call is made to the resolver.

Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D22608
2019-12-24 15:56:24 +00:00
Jeff Roberson
a94ba188c3 Repeat the spinlock_enter/exit pattern from amd64 on other architectures to
fix an assert violation introduced in r355784.  Without this spinlock_exit()
may see owepreempt and switch before reducing the spinlock count.  amd64
had been optimized to do a single critical enter/exit regardless of the
number of spinlocks which avoided the problem and this optimization had
not been applied elsewhere.

Reported by:	emaste
Suggested by:	rlibby
Discussed with:	jhb, rlibby
Tested by:	manu (arm64)
2019-12-16 20:15:04 +00:00
Edward Tomasz Napierala
b5f20658ee Add compat.linux.emul_path, so it can be set to something other
than "/compat/linux".  Useful when you have several compat directories
with different Linux versions and you don't want to clash with files
installed by linux-c7 packages.

Reviewed by:	bcr (manpages)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22574
2019-12-16 20:07:04 +00:00
Edward Tomasz Napierala
cf69fe66d4 Add sync_file_range(2) implementation to linux(4); it's a thin wrapper
over the usual fsync(2).

This silences some warnings when running "apt-get upgrade".

Reviewed by:	brooks, emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22371
2019-12-14 13:37:17 +00:00
Edward Tomasz Napierala
0cde2b3239 Regen after r355752.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22371
2019-12-14 13:32:37 +00:00
Edward Tomasz Napierala
0610f417a4 Fix definitions for linuxulator's sync_file_range(2).
Reviewed by:	brooks, emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22371
2019-12-14 13:30:43 +00:00
Ryan Libby
9825eadf2c bitset: rename confusing macro NAND to ANDNOT
s/BIT_NAND/BIT_ANDNOT/, and for CPU and DOMAINSET too.  The actual
implementation is "and not" (or "but not"), i.e. A but not B.
Fortunately this does appear to be what all existing callers want.

Don't supply a NAND (not (A and B)) operation at this time.

Discussed with:	jeff
Reviewed by:	cem
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22791
2019-12-13 09:32:16 +00:00
Mark Johnston
5cff1f4dc3 Introduce vm_page_astate.
This is a 32-bit structure embedded in each vm_page, consisting mostly
of page queue state.  The use of a structure makes it easy to store a
snapshot of a page's queue state in a stack variable and use cmpset
loops to update that state without requiring the page lock.

This change merely adds the structure and updates references to atomic
state fields.  No functional change intended.

Reviewed by:	alc, jeff, kib
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22650
2019-12-10 18:14:50 +00:00
John Baldwin
d8010b1175 Copy out aux args after the argument and environment vectors.
Partially revert r354741 and r354754 and go back to allocating a
fixed-size chunk of stack space for the auxiliary vector.  Keep
sv_copyout_auxargs but change it to accept the address at the end of
the environment vector as an input stack address and no longer
allocate room on the stack.  It is now called at the end of
copyout_strings after the argv and environment vectors have been
copied out.

This should fix a regression in r354754 that broke the stack alignment
for newer Linux amd64 binaries (and probably broke Linux arm64 as
well).

Reviewed by:	kib
Tested on:	amd64 (native, linux64 (only linux-base-c7), and i386)
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22695
2019-12-09 19:17:28 +00:00
Brooks Davis
af796bfa71 sysent: Reduce duplication and improve readability.
Use the power of variable to avoid spelling out source and generated
files too many times.  The previous Makefiles were hard to read, hard to
edit, and badly formatted.

Reviewed by:	kevans, emaste
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D22714
2019-12-06 23:59:23 +00:00
Warner Losh
f86e60008b Regularize my copyright notice
o Remove All Rights Reserved from my notices
o imp@FreeBSD.org everywhere
o regularize punctiation, eliminate date ranges
o Make sure that it's clear that I don't claim All Rights reserved by listing
  All Rights Reserved on same line as other copyright holders (but not
  me). Other such holders are also listed last where it's clear.
2019-12-04 16:56:11 +00:00
John Baldwin
31174518d2 Use uintptr_t instead of register_t * for the stack base.
- Use ustringp for the location of the argv and environment strings
  and allow destp to travel further down the stack for the stackgap
  and auxv regions.
- Update the Linux copyout_strings variants to move destp down the
  stack as was done for the native ABIs in r263349.
- Stop allocating a space for a stack gap in the Linux ABIs.  This
  used to hold translated system call arguments, but hasn't been used
  since r159992.

Reviewed by:	kib
Tested on:	md64 (amd64, i386, linux64), i386 (i386, linux)
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22501
2019-12-03 23:17:54 +00:00
Scott Long
33ce28d137 Remove the trm(4) driver
Differential Revision:	https://reviews.freebsd.org/D22575
2019-11-28 02:32:17 +00:00
Kyle Evans
f22a592111 Convert in-tree sysent targets to use new makesyscalls.lua
flua is bootstrapped as part of the build for those on older
versions/revisions that don't yet have flua installed. Once upgraded past
r354833, "make sysent" will again naturally work as expected.

Reviewed by:	brooks
Differential Revision:	https://reviews.freebsd.org/D21894
2019-11-18 23:28:23 +00:00
John Baldwin
03b0d68c72 Check for errors from copyout() and suword*() in sv_copyout_args/strings.
Reviewed by:	brooks, kib
Tested on:	amd64 (amd64, i386, linux64), i386 (i386, linux)
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22401
2019-11-18 20:07:43 +00:00
Mark Johnston
85e06c728c Set MALLOC_DEBUG_MAXZONES=1 in GENERIC-NODEBUG configurations.
The purpose of this option is to make it easier to track down memory
corruption bugs by reducing the number of malloc(9) types that might
have recently been associated with a given chunk of memory.  However, it
increases fragmentation and is disabled in release kernels.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-11-18 20:03:28 +00:00
John Baldwin
5caa67fa84 Use a sv_copyout_auxargs hook in the Linux ELF ABIs.
Reviewed by:	emaste
Tested on:	amd64 (linux64 only), i386
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22356
2019-11-15 23:01:43 +00:00
John Baldwin
e353233118 Add a sv_copyout_auxargs() hook in sysentvec.
Change the FreeBSD ELF ABIs to use this new hook to copyout ELF auxv
instead of doing it in the sv_fixup hook.  In particular, this new
hook allows the stack space to be allocated at the same time the auxv
values are copied out to userland.  This allows us to avoid wasting
space for unused auxv entries as well as not having to recalculate
where the auxv vector is by walking back up over the argv and
environment vectors.

Reviewed by:	brooks, emaste
Tested on:	amd64 (amd64 and i386 binaries), i386, mips, mips64
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22355
2019-11-15 18:42:13 +00:00
Josh Paetzel
052e12a508 Add the pvscsi driver to the tree.
This driver allows to usage of the paravirt SCSI controller
in VMware products like ESXi.  The pvscsi driver provides a
substantial performance improvement in block devices versus
the emulated mpt and mps SCSI/SAS controllers.

Error handling in this driver has not been extensively tested
yet.

Submitted by:	vbhakta@vmware.com
Relnotes:	yes
Sponsored by:	VMware, Panzura
Differential Revision:	D18613
2019-11-14 23:31:20 +00:00
Brooks Davis
96c914ee97 Tidy syscall declerations.
Pointer arguments should be of the form "<type> *..." and not "<type>* ...".

No functional change.

Reviewed by:	kevans
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D22373
2019-11-14 17:11:52 +00:00
Konstantin Belousov
006269f469 i386: stop guessing the address of the trap frame in ddb backtrace.
Save the address of the trap frame in %ebp on kernel entry.  This
automatically provides it in struct i386_frame.f_frame to unwinder.

While there, more accurately handle the terminating frames,

Reviewed by:	avg, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22321
2019-11-12 15:56:27 +00:00
Andriy Gapon
78f1851613 teach db_nextframe/x86 about [X]xen_intr_upcall interrupt handler
Discussed with:	kib, royger
MFC after:	3 weeks
Sponsored by:	Panzura
2019-11-12 11:00:01 +00:00
Andriy Gapon
7aff07d914 db_nextframe/i386: reduce the number of special frame types
This change removes TRAP_INTERRUPT and TRAP_TIMERINT frame types.

Their names are a bit confusing: trap + interrupt, what is that?
The TRAP_TIMERINT name is too specific -- can it only be used for timer
"trap-interrupts"?  What is so special about them?

My understanding of the code is that INTERRUPT, TRAP_INTERRUPT and
TRAP_TIMERINT differ only in how an offset from callee's frame pointer to a
trap frame on the stack is calculated.  And that depends on a number of
arguments that a special handler passes to a callee (a function with a
normal C calling convention).

So, this change makes that logic explicit and collapses all interrupt frame
types into the INTERRUPT type.

Reviewed by:	markj
Discussed with:	kib, jhb
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D22303
2019-11-11 19:06:04 +00:00
Eric van Gyzen
854e90da4e vmm: pass M_WAITOK to uma_zalloc when allocating FPU save area
Submitted by:	patrick.sullivan3@dell.com
Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22276
2019-11-08 16:30:55 +00:00
Edward Tomasz Napierala
044ab55e41 Make linux(4) create /dev/shm. Linux applications often expect
a tmpfs to be mounted there, and because they like to verify it's
actually a mountpoint, a symlink won't do.

Reviewed by:	dchagin (earlier version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20333
2019-11-06 20:53:33 +00:00
Yuri Pankov
a161fba992 linux: futex_mtx should follow futex_list
Move futex_mtx to linux_common.ko for amd64 and aarch64 along
with respective list/mutex init/destroy.

PR:		240989
Reported by:	Alex S <iwtcex@gmail.com>
2019-10-18 12:25:33 +00:00
Conrad Meyer
dda17b3672 Implement NetGDB(4)
NetGDB(4) is a component of a system using a panic-time network stack to
remotely debug crashed FreeBSD kernels over the network, instead of
traditional serial interfaces.

There are three pieces in the complete NetGDB system.

First, a dedicated proxy server must be running to accept connections from
both NetGDB and gdb(1), and pass bidirectional traffic between the two
protocols.

Second, the NetGDB client is activated much like ordinary 'gdb' and
similarly to 'netdump' in ddb(4) after a panic.  Like other debugnet(4)
clients (netdump(4)), the network interface on the route to the proxy server
must be online and support debugnet(4).

Finally, the remote (k)gdb(1) uses 'target remote <proxy>:<port>' (like any
other TCP remote) to connect to the proxy server.

The NetGDB v1 protocol speaks the literal GDB remote serial protocol, and
uses a 1:1 relationship between GDB packets and sequences of debugnet
packets (fragmented by MTU).  There is no encryption utilized to keep
debugging sessions private, so this is only appropriate for local
segments or trusted networks.

Submitted by:	John Reimer <john.reimer AT emc.com> (earlier version)
Discussed some with:	emaste, markj
Relnotes:	sure
Differential Revision:	https://reviews.freebsd.org/D21568
2019-10-17 21:33:01 +00:00
Conrad Meyer
7790c8c199 Split out a more generic debugnet(4) from netdump(4)
Debugnet is a simplistic and specialized panic- or debug-time reliable
datagram transport.  It can drive a single connection at a time and is
currently unidirectional (debug/panic machine transmit to remote server
only).

It is mostly a verbatim code lift from netdump(4).  Netdump(4) remains
the only consumer (until the rest of this patch series lands).

The INET-specific logic has been extracted somewhat more thoroughly than
previously in netdump(4), into debugnet_inet.c.  UDP-layer logic and up, as
much as possible as is protocol-independent, remains in debugnet.c.  The
separation is not perfect and future improvement is welcome.  Supporting
INET6 is a long-term goal.

Much of the diff is "gratuitous" renaming from 'netdump_' or 'nd_' to
'debugnet_' or 'dn_' -- sorry.  I thought keeping the netdump name on the
generic module would be more confusing than the refactoring.

The only functional change here is the mbuf allocation / tracking.  Instead
of initiating solely on netdump-configured interface(s) at dumpon(8)
configuration time, we watch for any debugnet-enabled NIC for link
activation and query it for mbuf parameters at that time.  If they exceed
the existing high-water mark allocation, we re-allocate and track the new
high-water mark.  Otherwise, we leave the pre-panic mbuf allocation alone.
In a future patch in this series, this will allow initiating netdump from
panic ddb(4) without pre-panic configuration.

No other functional change intended.

Reviewed by:	markj (earlier version)
Some discussion with:	emaste, jhb
Objection from:	marius
Differential Revision:	https://reviews.freebsd.org/D21421
2019-10-17 16:23:03 +00:00
Mark Johnston
01cef4caa7 Remove page locking from pmap_mincore().
After r352110 the page lock no longer protects a page's identity, so
there is no purpose in locking the page in pmap_mincore().  Instead,
if vm.mincore_mapped is set to the non-default value of 0, re-lookup
the page after acquiring its object lock, which holds the page's
identity stable.

The change removes the last callers of vm_page_pa_tryrelock(), so
remove it.

Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21823
2019-10-16 22:03:27 +00:00
Andriy Gapon
edca4938f7 itwd(4): driver for watchdog function in ITE Super I/O chips
The chips are commonly named with "IT" prefix.

MFC after:	19 days
2019-10-16 14:57:38 +00:00
Jeff Roberson
638f867814 (6/6) Convert pmap to expect busy in write related operations now that all
callers hold it.

This simplifies pmap code and removes a dependency on the object lock.

Reviewed by:    kib, markj
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D21596
2019-10-15 03:51:46 +00:00
Jeff Roberson
205be21d99 (3/6) Add a shared object busy synchronization mechanism that blocks new page
busy acquires while held.

This allows code that would need to acquire and release a very large number
of page busy locks to use the old mechanism where busy is only checked and
not held.  This comes at the cost of false positives but never false
negatives which the single consumer, vm_fault_soft_fast(), handles.

Reviewed by:    kib
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D21592
2019-10-15 03:41:36 +00:00
Andriy Gapon
db8bee42ce i386: hide more of atomic 64-bit definitions under _KERNEL
At the moment i386 does not provide 64-bit atomic operations in
userland.  Exposing some atomic_*_64 defines can cause unnecessary
confusion.

Discussed with:	kib
MFC after:	2 weeks
2019-10-08 10:50:16 +00:00
Ed Maste
9923b64177 Remove host binary object drivers from GENERIC
Four drivers (hpt27xx, hptmv, hptnr, hptrr, hpt27xx) include precompiled
binary objects; have users load them as modules if they are needed.

Additional work (i.e., integrating devmatch) required before MFC.

Reviewed by:	markj
Relnotes:	Yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21865
2019-10-03 12:51:57 +00:00
Konstantin Belousov
df08823d07 Improve MD page fault handlers.
Centralize calculation of signal and ucode delivered on unhandled page
fault in new function vm_fault_trap().  MD trap_pfault() now almost
always uses the signal numbers and error codes calculated in
consistent MI way.

This introduces the protection fault compatibility sysctls to all
non-x86 architectures which did not have that bug, but apparently they
were already much more wrong in selecting delivered signals on
protection violations.

Change the delivered signal for accesses to mapped area after the
backing object was truncated.  According to POSIX description for
mmap(2):
   The system shall always zero-fill any partial page at the end of an
   object. Further, the system shall never write out any modified
   portions of the last page of an object which are beyond its
   end. References within the address range starting at pa and
   continuing for len bytes to whole pages following the end of an
   object shall result in delivery of a SIGBUS signal.

   An implementation may generate SIGBUS signals when a reference
   would cause an error in the mapped object, such as out-of-space
   condition.
Adjust according to the description, keeping the existing
compatibility code for SIGSEGV/SIGBUS on protection failures.

For situations where kernel cannot handle page fault due to resource
limit enforcement, SIGBUS with a new error code BUS_OBJERR is
delivered.  Also, provide a new error code SEGV_PKUERR for SIGSEGV on
amd64 due to protection key access violation.

vm_fault_hold() is renamed to vm_fault().  Fixed some nits in
trap_pfault()s like mis-interpreting Mach errors as errnos.  Removed
unneeded truncations of the fault addresses reported by hardware.

PR:	211924
Reviewed by:	alc
Discussed with:	jilles, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21566
2019-09-27 18:43:36 +00:00
Kyle Evans
d19f028e33 sysent: regenerate after r352693 2019-09-25 17:30:28 +00:00
Mark Johnston
b119329d81 Complete the removal of the "wire_count" field from struct vm_page.
Convert all remaining references to that field to "ref_count" and update
comments accordingly.  No functional change intended.

Reviewed by:	alc, kib
Sponsored by:	Intel, Netflix
Differential Revision:	https://reviews.freebsd.org/D21768
2019-09-25 16:11:35 +00:00
Konstantin Belousov
66eb1d6347 i386: reduce differences in source between PAE and non-PAE pmaps ...
by defining pg_nx as zero for non-PAE and correspondingly simplifying
some expressions.

Suggested and reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21757
2019-09-22 19:59:10 +00:00
Konstantin Belousov
b223a69238 i386: implement sysctl vm.pmap.kernel_maps.
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21739
2019-09-22 19:23:00 +00:00
Mark Johnston
e8bcf6966b Revert r352406, which contained changes I didn't intend to commit. 2019-09-16 15:04:45 +00:00
Mark Johnston
41fd4b9422 Fix a couple of nits in r352110.
- Remove a dead variable from the amd64 pmap_extract_and_hold().
- Fix grammar in the vm_page_wire man page.

Reported by:	alc
Reviewed by:	alc, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21639
2019-09-16 15:03:12 +00:00
Ed Maste
a8ce8f0ae5 Update comments and ordering in linux*_dummy.c
- sort alphabetically
- getcpu arrived in Linux 2.6.19
- fanotify_* arrived in 2.6.36
2019-09-11 17:56:48 +00:00
Ed Maste
eff9d7b799 linuxulator: memfd_create first appeared in Linux 3.17
Reference: http://man7.org/linux/man-pages/man2/memfd_create.2.html
2019-09-11 17:05:49 +00:00
Ed Maste
1515fe49f2 linuxulator: seccomp syscall first appeared in Linux 3.17
Reference: http://man7.org/linux/man-pages/man2/seccomp.2.html
2019-09-11 17:04:13 +00:00
Ed Maste
2eb6ef203a linux: add trivial renameat2 implementation
Just return EINVAL if flags != 0.  The Linux man page documents one
case of EINVAL as "The filesystem does not support one of the flags in
flags."

After r351723 userland binaries will try using new system calls.

Reported by:	mjg
Reviewed by:	mjg, trasz
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21590
2019-09-11 13:01:59 +00:00
Ed Maste
65ab1fdd21 regen linuxulator sysent after r352208 2019-09-11 12:58:53 +00:00
Ed Maste
427b1baec0 make linux_renameat2 args consistent with linux_renameat
Use 'dfd' consistently for a directory fd.
2019-09-11 12:58:06 +00:00
Mark Johnston
fee2a2fa39 Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator.  In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well.  These references are protected by the page lock, which must
therefore be acquired for many per-page operations.  This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.

Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter.  A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held.  As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.

The vm_page_wire() and vm_page_unwire() KPIs are changed.  The former
requires that either the object lock or the busy lock is held.  The
latter no longer has a return value and may free the page if it releases
the last reference to that page.  vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate.  vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold().  It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler.  vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state).  In particular, synchronization details are no longer
leaked into the caller.

The change excises the page lock from several frequently executed code
paths.  In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock.  In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.

__FreeBSD_version is bumped.  The DRM ports have been updated to
accomodate the KPI changes.

Reviewed by:	jeff (earlier version)
Tested by:	gallatin (earlier version), pho
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
Konstantin Belousov
04359b5e1e Remove useless redefinition of NSFBUFS in i386/vm_machdep.c.
Sponsored by:	The FreeBSD Foundation
2019-08-29 07:34:14 +00:00
Konstantin Belousov
a2a0f90654 Centralize __pcpu definitions.
Many extern struct pcpu <something>__pcpu declarations were
copied/pasted in sources.  The issue is that the definition is MD, but
it cannot be provided by machine/pcpu.h due to actual struct pcpu
defined in sys/pcpu.h later than the inclusion of machine/pcpu.h.
This forced the copying when other code needed direct access to
__pcpu.  There is no way around it, due to machine/pcpu.h supplying
part of struct pcpu fields.

To work around the problem, add a new machine/pcpu_aux.h header, which
should fill any needed MD definitions after struct pcpu definition is
completed. This allows to remove copies of __pcpu spread around the
source.  Also on x86 it makes it possible to remove work arounds like
OFFSETOF_CURTHREAD or clang specific warnings supressions.

Reported and tested by:	lwhsu, bcran
Reviewed by:	imp, markj (previous version)
Discussed with:	jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D21418
2019-08-29 07:25:27 +00:00
Konstantin Belousov
3a91d1062a i386: Implement atomic_load_64(9) and atomic_store_64(9).
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-08-18 15:58:44 +00:00
Jeff Roberson
2194393787 Move phys_avail definition into MI code. It is consumed in the MI layer and
doing so adds more flexibility with less redundant code.

Reviewed by:	jhb, markj, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21250
2019-08-16 00:45:14 +00:00
Warner Losh
0d89c934cb Start to split out the really x86 specific NOTES from the global notes file.
Start with COMPAT_43, since it's really only relevant to x86.

Reviewed by: jhb@
Differential Revision: https://reviews.freebsd.org/D21203
2019-08-12 22:58:13 +00:00
John Baldwin
a04725cd5c Detect invalid PCI devices more correctly in PCI interrupt router drivers.
- Check for an invalid device (vendor is invalid) before reading the
  header type register when examining function 0 of a possible device.
- When iterating over functions of a device, reject any device whose
  16-bit vendor is invalid rather than requiring the full 32-bit
  vendor+device to be all 1's.  In practice the latter check is
  probably fine, but checking the vendor is what the PCI spec
  recommends.

Reviewed by:	imp
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D21147
2019-08-06 23:15:04 +00:00
John Baldwin
c45cbc7a1f Don't reset memory attributes when mapping physical addresses for ACPI.
Previously, AcpiOsMemory was using pmap_mapbios which would always map
the requested address Write-Back (WB).  For several AMD Ryzen laptops,
the BIOS uses AcpiOsMemory to directly access the PCI MCFG region in
order to access PCI config registers.  This has the side effect of
remapping the MCFG region in the direct map as WB instead of UC
hanging the laptops during boot.

On the one laptop I examined in detail, the _PIC global method used to
switch from 8259A PICs to I/O APICs uses a pair of PCI config space
registers at offset 0x84 in the device at 0:0:0 to as a pair of
address/data registers to access an indirect register in the chipset
and clear a single bit to switch modes.

To fix, alter the semantics of pmap_mapbios() such that it does not
modify the attributes of any existing mappings and instead uses the
existing attributes.  If a new mapping is created, this new mapping
uses WB (the default memory attribute).

Special thanks to the gentleman whose name I don't have who brought
two affected laptops to the hacker lounge at BSDCan.  Direct access to
the affected systems permitted finding the root cause within an hour
or so.

PR:		231760, 236899
Reviewed by:	kib, alc
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20327
2019-08-03 01:36:05 +00:00
Alan Cox
43ded0a321 In pmap_advise(), when we encounter a superpage mapping, we first demote the
mapping and then destroy one of the 4 KB page mappings so that there is a
potential trigger for repromotion.  Currently, we destroy the first 4 KB
page mapping that falls within the (current) superpage mapping or the
virtual address range [sva, eva).  However, I have found empirically that
destroying the last 4 KB mapping produces slightly better results,
specifically, more promotions and fewer failed promotion attempts.
Accordingly, this revision changes pmap_advise() to destroy the last 4 KB
page mapping.  It also replaces some nearby uses of boolean_t with bool.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D21115
2019-07-31 05:38:39 +00:00
Ed Maste
305b9efefc linuxulator: rename linux_locore.s to .asm
It is assembled using "${CC} -x assembler-with-cpp", which by convention
(bsd.suffixes.mk) uses the .asm extension.

This is a portion of the review referenced below (D18344).  That review
also renamed linux_support.s to .S, but that is a functional change
(using the compiler's integrated assembler instead of as) and will be
revisited separately.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18344
2019-07-30 17:18:31 +00:00
Xin LI
d4565741c6 Remove gzip'ed a.out support.
The current implementation of gzipped a.out support was based
on a very old version of InfoZIP which ships with an ancient
modified version of zlib, and was removed from the GENERIC
kernel in 1999 when we moved to an ELF world.

PR:		205822
Reviewed by:	imp, kib, emaste, Yoshihiro Ota <ota at j.email.ne.jp>
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D21099
2019-07-30 05:13:16 +00:00
Alan Cox
5d18382b72 Simplify the handling of superpages in pmap_clear_modify(). Specifically,
if a demotion succeeds, then all of the 4KB page mappings within the
superpage-sized region must be valid, so there is no point in testing the
validity of the 4KB page mapping that is going to be write protected.

Deindent the nearby code.

Reviewed by:	kib, markj
Tested by:	pho (amd64, i386)
X-MFC after:	r350004 (this change depends on arm64 dirty bit emulation)
Differential Revision:	https://reviews.freebsd.org/D21027
2019-07-25 22:02:55 +00:00
Alan Cox
43184d8e5c Revert r349973. Upon further reflection, I realized that the comment
deleted by r349973 is still valid on i386.  Restore it.

Discussed with:	   markj
2019-07-16 03:09:03 +00:00
John Baldwin
c18ca74916 Don't pass error from syscallenter() to syscallret().
syscallret() doesn't use error anymore.  Fix a few other places to permit
removing the return value from syscallenter() entirely.
- Remove a duplicated assertion from arm's syscall().
- Use td_errno for amd64_syscall_ret_flush_l1d.

Reviewed by:	kib
MFC after:	1 month
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D2090
2019-07-15 21:25:16 +00:00
Alan Cox
f138406359 Remove a stale comment.
Reported by:	markj
MFC after:	1 week
2019-07-13 15:53:28 +00:00
Konstantin Belousov
30b3018d48 Provide protection against starvation of the ll/sc loops when accessing userpace.
Casueword(9) on ll/sc architectures must be prepared for userspace
constantly modifying the same cache line as containing the CAS word,
and not loop infinitely.  Otherwise, rogue userspace livelocks the
kernel.

To fix the issue, change casueword(9) interface to return new value 1
indicating that either comparision or store failed, instead of relying
on the oldval == *oldvalp comparison.  The primitive no longer retries
the operation if it failed spuriously.  Modify callers of
casueword(9), all in kern_umtx.c, to handle retries, and react to
stops and requests to terminate between retries.

On x86, despite cmpxchg should not return spurious failures, we can
take advantage of the new interface and just return PSL.ZF.

Reviewed by:	andrew (arm64, previous version), markj
Tested by:	pho
Reported by:	https://xenbits.xen.org/xsa/advisory-295.txt
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D20772
2019-07-12 18:43:24 +00:00
Mark Johnston
eeacb3b02f Merge the vm_page hold and wire mechanisms.
The hold_count and wire_count fields of struct vm_page are separate
reference counters with similar semantics.  The remaining essential
differences are that holds are not counted as a reference with respect
to LRU, and holds have an implicit free-on-last unhold semantic whereas
vm_page_unwire() callers must explicitly determine whether to free the
page once the last reference to the page is released.

This change removes the KPIs which directly manipulate hold_count.
Functions such as vm_fault_quick_hold_pages() now return wired pages
instead.  Since r328977 the overhead of maintaining LRU for wired pages
is lower, and in many cases vm_fault_quick_hold_pages() callers would
swap holds for wirings on the returned pages anyway, so with this change
we remove a number of page lock acquisitions.

No functional change is intended.  __FreeBSD_version is bumped.

Reviewed by:	alc, kib
Discussed with:	jeff
Discussed with:	jhb, np (cxgbe)
Tested by:	pho (previous version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19247
2019-07-08 19:46:20 +00:00
Alexander Motin
6683132d54 Add driver for NTB in AMD SoC.
This patch is the driver for NTB hardware in AMD SoCs (ported from Linux)
and enables the NTB infrastructure like Doorbells, Scratchpads and Memory
window in AMD SoC. This driver has been validated using ntb_transport and
if_ntb driver already available in FreeBSD.

Submitted by:	Rajesh Kumar <rajesh1.kumar@amd.com>
MFC after:	1 month
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D18774
2019-07-02 05:25:18 +00:00
Andriy Gapon
e3722b788e add superio driver
The goal of this driver is consolidate information about SuperIO chips
and to provide for peaceful coexistence of drivers that need to access
SuperIO configuration registers.

While SuperIO chips can host various functions most of them are
discoverable and accessible without any knowledge of the SuperIO.
Examples are: keyboard and mouse controllers, UARTs, floppy disk
controllers.  SuperIO-s also provide non-standard functions such as
GPIO, watchdog timers and hardware monitoring.  Such functions do
require drivers with a knowledge of a specific SuperIO.

At this time the driver supports a number of ITE and Nuvoton (fka
Winbond) SuperIO chips.
There is a single driver for all devices.  So, I have not done the usual
split between the hardware driver and the bus functionality.  Although,
superio does act as a bus for devices that represent known non-standard
functions of a SuperIO chip.  The bus provides enumeration of child
devices based on the hardcoded knowledge of such functions.  The
knowledge as extracted from datasheets and other drivers.
As there is a single driver, I have not defined a kobj interface for it.
So, its interface is currently made of simple functions.
I think that we can the flexibility (and complications) when we actually
need it.

I am planning to convert nctgpio and wbwd to superio bus very soon.
Also, I am working on itwd driver (watchdog in ITE SuperIO-s).
Additionally, there is ithwm driver based on the reverted sensors
import, but I am not sure how to integrate it given that we still lack
any sensors interface.

Discussed with:	imp, jhb
MFC after:	7 weeks
Differential Revision: https://reviews.freebsd.org/D8175
2019-07-01 17:05:41 +00:00
Navdeep Parhar
57f317e60a Display the approximate space needed when a minidump fails due to lack
of space.

Reviewed by:	kib@
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D20801
2019-06-30 03:14:04 +00:00
Alan Cox
c134ef742f When we protect PTEs (as opposed to PDEs), we only call vm_page_dirty()
when, in fact, we are write protecting the page and the PTE has PG_M set.
However, pmap_protect_pde() was always calling vm_page_dirty() when the PDE
has PG_M set.  So, adding PG_NX to a writeable PDE could result in
unnecessary (but harmless) calls to vm_page_dirty().

Simplify the loop calling vm_page_dirty() in pmap_protect_pde().

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20793
2019-06-28 22:40:34 +00:00
Conrad Meyer
c363b16c63 sys: Remove DEV_RANDOM device option
Remove 'device random' from kernel configurations that reference it (most).
Replace perhaps mistaken 'nodevice random' in two MIPS configs with 'options
RANDOM_LOADABLE' instead.  Document removal in UPDATING; update NOTES and
random.4.

Reviewed by:	delphij, markm (previous version)
Approved by:	secteam(delphij)
Differential Revision:	https://reviews.freebsd.org/D19918
2019-06-21 00:16:30 +00:00
Alan Cox
fd2dae0a30 Implement an alternative solution to the amd64 and i386 pmap problem that we
previously addressed in r348246.

This pmap problem also exists on arm64 and riscv.  However, the original
solution developed for amd64 and i386 cannot be used on arm64 and riscv.  In
particular, arm64 and riscv do not define a PG_PROMOTED flag in their level
2 PTEs.  (A PG_PROMOTED flag makes no sense on arm64, where unlike x86 or
riscv we are required to break the old 4KB mappings before making the 2MB
mapping; and on riscv there are no unused bits in the PTE to define a
PG_PROMOTED flag.)

This commit implements an alternative solution that can be used on all four
architectures.  Moreover, this solution has two other advantages.  First, on
older AMD processors that required the Erratum 383 workaround, it is less
costly.  Specifically, it avoids unnecessary calls to pmap_fill_ptp() on a
superpage demotion.  Second, it enables the elimination of some calls to
pagezero() in pmap_kernel_remove_{l2,pde}().

In addition, remove a related stale comment from pmap_enter_{l2,pde}().

Reviewed by:	kib, markj (an earlier version)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20538
2019-06-09 03:36:10 +00:00
Konstantin Belousov
c46d985629 i386 trap.c: Remove unused MAX_TRAP_MSG define.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-06-08 13:41:39 +00:00
Mark Johnston
88ea538a98 Replace uses of vm_page_unwire(m, PQ_NONE) with vm_page_unwire_noq(m).
These calls are not the same in general: the former will dequeue the
page if it is enqueued, while the latter will just leave it alone.  But,
all existing uses of the former apply to unmanaged pages, which are
never enqueued in the first place.  No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20470
2019-06-07 18:23:29 +00:00
Mark Johnston
c080655467 Fix a race between fasttrap and the user breakpoint handler.
When disabling the last enabled userspace probe, fasttrap clears the
function pointers which hook in to the breakpoint handler.  If a traced
thread hit a fasttrap breakpoint before it was removed, we must ensure
that it is able to call the hook; otherwise fasttrap will not consume
the trap and SIGTRAP will be delievered to the thread.  Synchronize
with such threads by ensuring that they load the hook pointer with
interrupts disabled, and by completing an SMP rendezvous after removing
breakpoints and before clearing the pointers.

Reported by:	Alexander Alexeev <Alexander.Alexeev@dell.com>
Tested by:	Alexander Alexeev (earlier version)
Reviewed by:	cem, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20526
2019-06-06 16:03:25 +00:00
Brooks Davis
4af6033324 makesyscalls.sh: always use absolute path for syscalls.conf
syscalls.conf is included using "." which per the Open Group:

 If file does not contain a <slash>, the shell shall use the search
 path specified by PATH to find the directory containing file.

POSIX shells don't fall back to the current working directory.

Submitted by:	Nathaniel Wesley Filardo <nwf20@cl.cam.ac.uk>
Reviewed by:	bdrewery
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D20476
2019-05-30 20:56:23 +00:00
Pedro F. Giffuni
ec845b07c6 typo: suppported. 2019-05-29 02:08:23 +00:00
Mark Johnston
d7bb6218a6 Remove pmap_pid_dump() from the i386 pmap.
It has not been compilable in a long time and doesn't seem very useful.

Suggested by:	kib
MFC after:	1 week
2019-05-25 23:36:20 +00:00
Konstantin Belousov
a9c7546a1d Fix a corner case in demotion of kernel mappings.
It is possible for the kernel mapping to be created with superpage by
directly installing pde using pmap_enter_2mpage() without filling the
corresponding page table page.  This can happen e.g. if the range is
already backed by reservation and vm_fault_soft_fast() conditions are
satisfied, which was observed on the pipe_map.

In this case, demotion must fill the page obtained from the pmap
radix, same as if the page is newly allocated.  Use PG_PROMOTED bit as
an indicator that the page is valid, instead of the wire count of the
page table page.

Since the PG_PROMOTED bit is set on pde when we leave TLB entries for
4k pages around, which in particular means that the ptes were filled,
it provides more correct indicator.  Note that pmap_protect_pde()
clears PG_PROMOTED, which handles the case when protection was changed
on the superpage without adjusting ptes.

Reported by:	pho
In collaboration with:	alc
Tested by:	alc, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D20380
2019-05-24 17:19:06 +00:00
Konstantin Belousov
48ec6d3bc9 Do not call hw_mds_recalculate() from initializecpu().
If MDS mitigation is enabled by the tunable but MDS microcode is not
early-loaded, software mitigation is selected.  This causes
initializecpu() to try to allocate memory which makes boot process
very unhappy.

Create SYSINIT that runs sufficiently late to succeed.

Reported by:	naddy
PR:	237968
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-05-21 22:56:21 +00:00
Conrad Meyer
e2e050c8ef Extract eventfilter declarations to sys/_eventfilter.h
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.

EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).

As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions.  The remainder of the patch addresses
adding appropriate includes to fix those files.

LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).

No functional change (intended).  Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed.  __FreeBSD_version has been bumped.
2019-05-20 00:38:23 +00:00
Brooks Davis
376f78be92 FCP-101: Bump __FreeBSD_version for driver removal.
Remove gone_by_fcp101_dev macro.

Remove orphaned comment.
2019-05-17 15:24:54 +00:00
Brooks Davis
7a582e5374 FCP-101: Remove xe(4)
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:24:44 +00:00
Brooks Davis
02fae06a11 FCP-101: Remove wb(4)
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:24:34 +00:00
Brooks Davis
e8504bf9e7 FCP-101: Remove vx(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:24:26 +00:00
Brooks Davis
be345ff023 FCP-101: Remove txp(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:24:17 +00:00
Brooks Davis
b1b1c2fe38 FCP-101: Remove tx(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:24:08 +00:00
Brooks Davis
7c897ca91f FCP-101: Remove tl(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:24:00 +00:00
Brooks Davis
90089841de FCP-101: Remove sn(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:23:52 +00:00
Brooks Davis
3b70dd81f5 FCP-101: Remove sf(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:23:43 +00:00
Brooks Davis
607790d10f FCP-101: Remove pcn(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:23:34 +00:00
Brooks Davis
dd262716a1 FCP-101: Remove fe(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:23:26 +00:00
Brooks Davis
3ee01a1385 FCP-101: Remove ex(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:23:18 +00:00
Brooks Davis
e153ee663a FCP-101: Remove ep(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:23:10 +00:00
Brooks Davis
05aa6e583b FCP-101: Remove ed(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:23:02 +00:00
Brooks Davis
08ac01a92c FCP-101: Remove de(4).
Relnotes:	yes
FCP:		https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D20230
2019-05-17 15:22:54 +00:00