opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.
Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.
Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.
Reviewed by: kib, cem, jhb, jtl
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14941
assym is only to be included by other .s files, and should never
actually be assembled by itself.
Reviewed by: imp, bdrewery (earlier)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D14180
When the kernel can be in real mode in early boot, we can execute from
high addresses aliased to the kernel's physical memory. If that high
address has the first two bits set to 1 (0xc...), those addresses will
automatically become part of the direct map. This reduces page table
pressure from the kernel and it sets up the kernel to be used with
radix translation, for which it has to be up here.
This is accomplished by exploiting the fact that all PowerPC kernels are
built as position-independent executables and relocate themselves
on start. Before this patch, the kernel runs at 1:1 VA:PA, but that
VA/PA is random and set by the bootloader. Very early, it processes
its ELF relocations to operate wherever it happens to find itself.
This patch uses that mechanism to re-enter and re-relocate the kernel
a second time witha new base address set up in the early parts of
powerpc_init().
Reviewed by: jhibbits
Differential Revision: D14647
accomplishes a few things:
- Makes NULL an invalid address in the kernel, which is useful for catching
bugs.
- Lays groundwork for radix-tree translation on POWER9, which requires the
direct map be at high memory.
- Similarly lays groundwork for a direct map on 64-bit Book-E.
The new base address is chosen as the base of the fourth radix quadrant
(the minimum kernel address in this translation mode) and because all
supported CPUs ignore at least the first two bits of addresses in real
mode, allowing direct-map addresses to be used in real-mode handlers.
This is required by Linux and is part of the architecture standard
starting in POWER ISA 3, so can be relied upon.
Reviewed by: jhibbits, Breno Leitao
Differential Revision: D14499
When processor enters power-save state it releases resources shared with other
cpu threads which makes other cores working much faster.
This patch also implements saving and restoring registers that might get
corrupted in power-save state.
Submitted by: Patryk Duda <pdk@semihalf.com>
Obtained from: Semihalf
Reviewed by: jhibbits, nwhitehorn, wma
Sponsored by: IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D14330
Make vm_wait() take the vm_object argument which specifies the domain
set to wait for the min condition pass. If there is no object
associated with the wait, use curthread' policy domainset. The
mechanics of the wait in vm_wait() and vm_wait_domain() is supplied by
the new helper vm_wait_doms(), which directly takes the bitmask of the
domains to wait for passing min condition.
Eliminate pagedaemon_wait(). vm_domain_clear() handles the same
operations.
Eliminate VM_WAIT and VM_WAITPFAULT macros, the direct functions calls
are enough.
Eliminate several control state variables from vm_domain, unneeded
after the vm_wait() conversion.
Scetched and reviewed by: jeff
Tested by: pho
Sponsored by: The FreeBSD Foundation, Mellanox Technologies
Differential revision: https://reviews.freebsd.org/D14384
This is part of a long-term goal of merging Book-E and AIM into a single GENERIC
kernel. As more work is done, the struct may be optimized further.
Reviewed by: nwhitehorn
threads from compile-time defines to global variables. This removes a
significant amount of duplicated runtime patches to the compile-time
defines, centralizing the conditional logic in the early startup code.
Reviewed by: jhibbits
It turns out that under some circumstances we can get DSI or DSE before we set
LPCR and LPID so we should set it as early as possible.
Authored by: Patryk Duda <pdk@semihalf.com>
Submitted by: Wojciech Macek <wma@semihalf.com>
Obtained from: Semihalf
Sponsored by: IBM, QCM Technologies
used with hashed page tables on AIM and place it into a new, modular pmap
function called pmap_decode_kernel_ptr(). This function is the inverse
of pmap_map_user_ptr(). With POWER9 radix tables, which mapping to use
becomes more complex than just AIM/BOOKE and it is best to have it in
the same place as pmap_map_user_ptr().
Reviewed by: jhibbits
to which it is specific, rather than in the generic AIM startup code. This
will be required to support the radix-table-based MMU introduced with POWER9.
buffers into a new pmap-module function pmap_map_user_ptr() that can
be implemented by the respective modules. This is required to implement
non-segment-based AIM-ish MMU systems such as the radix-tree page tables
introduced by POWER ISA 3.0 and present on POWER9.
Reviewed by: jhibbits
using a new macro PHYS_TO_DMAP, which deliberately has the same name as the
equivalent macro on amd64. This also sets the stage for moving the direct
map to another base address.
domains can be done by the _domain() API variants. UMA also supports a
first-touch policy via the NUMA zone flag.
The slab layer is now segregated by VM domains and is precise. It handles
iteration for round-robin directly. The per-cpu cache layer remains
a mix of domains according to where memory is allocated and freed. Well
behaved clients can achieve perfect locality with no performance penalty.
The direct domain allocation functions have to visit the slab layer and
so require per-zone locks which come at some expense.
Reviewed by: Attilio (a slightly older version)
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
is used as the bootloader on a number of PPC64 platforms. This involves the
following pieces:
- Making the first instruction a valid kernel entry point, since kexec
ignores the ELF entry value. This requires a separate section and linker
magic to prevent the linker from filling the beginning of the section
with stubs.
- Adding an entry point at 0x60 past the first instruction for systems
lacking firmware CPU shutdown support (notably PS3).
- Linker script changes to support the above.
MFC after: 1 month
If these are not aligned, the linker has to emit a different type of
relocation that the early boot self-relocation code cannot handle, even
in principle, resulting in them being set to zero and the kernel crashing.
MFC after: 1 week
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
PowerPC kernels in r6 is actually metadata from loader(8) or gibberish
left in r6, which is not required to be anything under the
PAPR/ePAPR/CHRP/OF standards, by another boot loader.
Note that, as a result, systems need a new boot loader to boot PPC kernels
after this revision without ending up at a mountroot prompt. New boot
loaders are backwards compatible and can boot older kernels.
Reviewed by: jhibbits
MFC after: 2 months
and such from ending on the wrong CPU on SMP systems. It would be good to
have this be more generic somehow as POWER9s appear, but PPC does not
have features bits, unfortunately.
MFC after: 3 weeks
is set and the right thing to do may be platform-dependent (it requires
firmware on PowerNV, for instance). Make it a new platform method called
platform_smp_timebase_sync().
MFC after: 3 weeks
similar to the kernel memory allocator.
This simplifies NUMA allocation because the domain will be known at wait
time and races between failure and sleeping are eliminated. This also
reduces boilerplate code and simplifies callers.
A wait primitive is supplied for uma zones for similar reasons. This
eliminates some non-specific VM_WAIT calls in favor of more explicit
sleeps that may be satisfied without new pages.
Reviewed by: alc, kib, markj
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Clang apparently requires the explicit form of this instruction, and rejects
uses which ignore the optional cmpD register. This was the only use of the
shorthand form of the instruction, so just fix it up to match the others.
PR: kern/215681
Submitted by: Mark Millard
Reported by: Mark Millard <markmi _AT_ dsl-only.net>
MFC after: 2 weeks
Idle page zeroing has been disabled by default on all architectures since
r170816 and has some bugs that make it seemingly unusable. Specifically,
the idle-priority pagezero thread exacerbates contention for the free page
lock, and yields the CPU without releasing it in non-preemptive kernels. The
pagezero thread also does not behave correctly when superpage reservations
are enabled: its target is a function of v_free_count, which includes
reserved-but-free pages, but it is only able to zero pages belonging to the
physical memory allocator.
Reviewed by: alc, imp, kib
Differential Revision: https://reviews.freebsd.org/D7714
Summary:
There is often a need at the debugger to print arbitrary special
purpose registers (SPRs) on PowerPC. Using a rewritable asm stub, print any SPR
provided on the command line.
Note, as there is no checking in this, attempting to print a nonexistent SPR
may cause a Program exception (illegal instruction, or boundedly undefined).
Note also that this relies on the kernel text pages being writable. If in the
future this is made not the case, this will need to be reworked.
Test Plan:
Printing the Processor Version Register (PVR, SPR 287):
db> show spr 11f
SPR 287(11f): 80240012
Differential Revision: https://reviews.freebsd.org/D7403
late boot: enable it explicitly after installing the page tables. If booting
from an FDT, also make sure to escape the firmware's MMU context early
before overwriting firmware page tables.
Approved by: re (gjb)
Most of the effect of setting MSR[SF] is that the CPU will stop ignoring
the high 32 bits of registers containing addresses in load/store
instructions. As such, the kernel was setting it only when it began to
need access to high memory. MSR[SF] also affects the operation of some
conditional instructions, however, and so setting it at late times could
subtly break code at very early times. This fixes use of the FDT mode in
loader, and FDT boot more generally, on 64-bit PowerPC systems.
Hardware provided by: IBM LTC
Approved by: re (kib)
rounddown2 tends to produce longer lines than the original code
and when the code has a high indentation level it was not really
advantageous to do the replacement.
This tries to strike a balance between readability using the macros
and flexibility of having the expressions, so not everything is
converted.
For rs6000, most memory insns and addi/addis do not allow GPR0 for RA
(they use literal zero there instead). So use a 'b' constraint to make
sure to have a base register other than GPR0.
GCC-4.7 and up handles this with allocating r9 instead of r0.
OF_getprop() to get encode-int encoded values from the OF tree. This is
a no-op at present, since all existing PowerPC ports are big-endian, but
it is a correctness improvement and will be required if we have a
little-endian kernel at some future point.
Where it is totally impossible for the code ever to be used on a
little-endian system (much of powerpc/powermac, for instance), I have not
necessarily made the appropriate changes.
MFC after: 1 month
initial thread stack is not adjusted by the tunable, the stack is
allocated too early to get access to the kernel environment. See
TD0_KSTACK_PAGES for the thread0 stack sizing on i386.
The tunable was tested on x86 only. From the visual inspection, it
seems that it might work on arm and powerpc. The arm
USPACE_SVC_STACK_TOP and powerpc USPACE macros seems to be already
incorrect for the threads with non-default kstack size. I only
changed the macros to use variable instead of constant, since I cannot
test.
On arm64, mips and sparc64, some static data structures are sized by
KSTACK_PAGES, so the tunable is disabled.
Sponsored by: The FreeBSD Foundation
MFC after: 2 week
vm_offset_t pmap_quick_enter_page(vm_page_t m)
void pmap_quick_remove_page(vm_offset_t kva)
These will create and destroy a temporary, CPU-local KVA mapping of a specified page.
Guarantees:
--Will not sleep and will not fail.
--Safe to call under a non-sleepable lock or from an ithread
Restrictions:
--Not guaranteed to be safe to call from an interrupt filter or under a spin mutex on all platforms
--Current implementation does not guarantee more than one page of mapping space across all platforms. MI code should not make nested calls to pmap_quick_enter_page.
--MI code should not perform locking while holding onto a mapping created by pmap_quick_enter_page
The idea is to use this in busdma, for bounce buffer copies as well as virtually-indexed cache maintenance on mips and arm.
NOTE: the non-i386, non-amd64 implementations of these functions still need review and testing.
Reviewed by: kib
Approved by: kib (mentor)
Differential Revision: http://reviews.freebsd.org/D3013
It appears that the linker will not handle 64-bit relocations at addresses that
are not aligned to 8-byte boundaries. Prior to this change the line:
.llong generictrap
was aligned to a 4-byte address, and the linker replaced that with an 8-byte
0x0. Aligning that address to 8 bytes caused the linker to generate the proper
relocation. As a follow-through, the dblow from trap_subr33.S used the code
sequence 'lwz %r1, TRAP_GENTRAP(0)', so this reproduces the analogue of that for
64-bit.
Summary:
Both booke and AIM interrupt.c files contain nearly identical code. This merges
the two files, to reduce duplication.
Reviewers: #powerpc, marcel
Reviewed By: marcel
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D2991
On Book-E, physical addresses are actually 36-bits, not 32-bits. This is
currently worked around by ignoring the top bits. However, in some cases, the
boot loader configures CCSR to something above the 32-bit mark. This is stage 1
in updating the pmap to handle 36-bit physaddr.
Much of the code was common to begin with. There is one nit, which is likely
not an issue at all. With the old code, the AIM machdep would __syncicache()
the entire kernel core at setup. However, in the unified setup, that seems to
hang on the MPC7455, perhaps because it's running later than before. Removing
this allows it to boot just fine. Examining the code, the FreeBSD loader
already does syncicache of the full kernel, and each module loaded, so this
doesn't appear to be an actual problem.
Initial code by Nathan Whitehorn.
Summary:
Book-E and AIM trap.c are almost identical, except for a few bits. This is step
1 in unifying them.
This also renumbers EXC_DEBUG, to not conflict with AIM vector numbers. Since
this is the only one thus far that is used in the switch statement in trap(),
it's the only one renumbered. If others get added to the switch, which conflict
with AIM numbers, they should also be renumbered.
Reviewers: #powerpc, marcel, nwhitehorn
Reviewed By: marcel
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D2215
A couple of internal functions used by malloc(9) and uma truncated
a size_t down to an int. This could cause any number of issues
(e.g. indefinite sleeps, memory corruption) if any kernel
subsystem tried to allocate 2GB or more through malloc. zfs would
attempt such an allocation when run on a system with 2TB or more
of RAM.
Note to self: When this is MFCed, sparc64 needs the same fix.
Differential revision: https://reviews.freebsd.org/D2106
Reviewed by: kib
Reported by: Michael Fuckner <michael@fuckner.net>
Tested by: Michael Fuckner <michael@fuckner.net>
MFC after: 2 weeks
executables. The goal here, not yet accomplished, is to let the e500 kernel
run under QEMU by setting KERNBASE to something that fits in low memory and
then having the kernel relocate itself at runtime.
have the same meaning and occupy the same memory address in the trapframe
courtesy of union. Avoid some pointless #ifdef by spelling them both 'DAR'
in the trapframe.
this change is to improve concurrency:
- Drop global state stored in the shadow overflow page table (and all other
global state)
- Remove all global locks
- Use per-PTE lock bits to allow parallel page insertion
- Reconstruct state when requested for evicted PTEs instead of buffering
it during overflow
This drops total wall time for make buildworld on a 32-thread POWER8 system
by a factor of two and system time by a factor of three, providing performance
20% better than similarly clocked Core i7 Xeons per-core. Performance on
smaller SMP systems, where PMAP lock contention was not as much of an issue,
is nearly unchanged.
Tested on: POWER8, POWER5+, G5 UP, G5 SMP (64-bit and 32-bit kernels)
Merged from: user/nwhitehorn/ppc64-pmap-rework
Looked over by: jhibbits, andreast
MFC after: 3 months
Relnotes: yes
Sponsored by: FreeBSD Foundation
and POWER8. This instruction set unifies the 32 64-bit scalar floating
point registers with the 32 128-bit vector registers into a single bank
of 64 128-bit registers. Kernel support mostly amounts to saving and
restoring the wider version of the floating point registers and making
sure that both scalar FP and vector registers are enabled once a VSX
instruction is executed. get_mcontext() and friends currently cannot
see the high bits, which will require a little more work.
As the system compiler (GCC 4.2) does not support VSX, making use of this
from userland requires either newer GCC or clang.
Relnotes: yes
Sponsored by: FreeBSD Foundation
This port failed to gain traction and probably only a couple Wii consoles
ran FreeBSD all the way to single user mode with an md(4). IPC
support was never implemented, so it was impossible to use any peripheral
Any further development, if any, will happen at https://github.com/rpaulo/wii.
Discussed with: nathanw (a long time ago), jhibbits
every possible trap address by default. This also makes sure the kernel
notices (and panics at) traps from newer CPUs that the kernel was not
expecting rather than executing gibberish memory.
A "size" symbol with its address set to the length of handler would be
shifted forward with all other addresses when relocations are processed.
Instead, just note the end and do the subtraction at runtime.
mostly a no-op since all currently-supported instances of these CPUs give
the number of SLB slots in the device tree, but keep it here as well just
in case.
instructions to call through pointers instead. In general, these are set
implicitly through relocation processing. One has to be set explicitly in
machdep.c, however, to fit one handler in the tiny (8 instruction) space
available.
Reviewed by: andreast
Differential revision: D1554
Tested on: UP and SMP G5, Cell, POWER5+
sequences, like are used to read the HIDs. This is both easier to read
and avoids a miscompilation by GCC in certain circumstances. Also avoid
double restoration of HID4 and HID5.
MFC after: 2 weeks
PVO pool size. The default errs on the exceedingly large side, so absent
any intelligent automatic tuning, at least let the user set it to save
RAM on memory-constrained systems.
MFC after: 2 weeks
code in sys/kern/kern_dump.c. Most dumpsys() implementations are nearly
identical and simply redefine a number of constants and helper subroutines;
a generic implementation will make it easier to implement features around
kernel core dumps. This change does not alter any minidump code and should
have no functional impact.
PR: 193873
Differential Revision: https://reviews.freebsd.org/D904
Submitted by: Conrad Meyer <conrad.meyer@isilon.com>
Reviewed by: jhibbits (earlier version)
Sponsored by: EMC / Isilon Storage Division
the Open Firmware, as provided by petitboot, for example. Note that this is
not quite complete, since RTAS instantiation still depends on callable
firmware.
MFC after: 2 weeks
It's redundant at the moment since it can be obtained from the trapframe
on the architectures where DTrace is supported, but this won't be the case
with ARM.
Summary:
Revert the initial FBT-with-KDB changes for trap_subr*.S, and instead use the
db_trap filter function to handle dtrace trap filtering. With this, the MMU is
enabled by the support code, simplifying the codepath altogether.
Test Plan: Tested on my G4 PowerBook
Reviewers: #powerpc, nwhitehorn
Reviewed By: nwhitehorn
Differential Revision: https://reviews.freebsd.org/D1207
MFC after: 3 weeks
in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv().
This fixes a namespace collision with libc symbols.
Submitted by: kmacy
Tested by: make universe
When the FreeBSD kernel is loaded from Xen the symtab and strtab are
not loaded the same way as the native boot loader. This patch adds
three new global variables to ddb that can be used to specify the
exact position and size of those tables, so they can be directly used
as parameters to db_add_symbol_table. A new helper is introduced, so callers
that used to set ksym_start and ksym_end can use this helper to set the new
variables.
It also adds support for loading them from the Xen PVH port, that was
previously missing those tables.
Sponsored by: Citrix Systems R&D
Reviewed by: kib
ddb/db_main.c:
- Add three new global variables: ksymtab, kstrtab, ksymtab_size that
can be used to specify the position and size of the symtab and
strtab.
- Use those new variables in db_init in order to call db_add_symbol_table.
- Move the logic in db_init to db_fetch_symtab in order to set ksymtab,
kstrtab, ksymtab_size from ksym_start and ksym_end.
ddb/ddb.h:
- Add prototype for db_fetch_ksymtab.
- Declate the extern variables ksymtab, kstrtab and ksymtab_size.
x86/xen/pv.c:
- Add support for finding the symtab and strtab when booted as a Xen
PVH guest. Since Xen loads the symtab and strtab as NetBSD expects
to find them we have to adapt and use the same method.
amd64/amd64/machdep.c:
arm/arm/machdep.c:
i386/i386/machdep.c:
mips/mips/machdep.c:
pc98/pc98/machdep.c:
powerpc/aim/machdep.c:
powerpc/booke/machdep.c:
sparc64/sparc64/machdep.c:
- Use the newly introduced db_fetch_ksymtab in order to set ksymtab,
kstrtab and ksymtab_size.
mapping size (currently unused). The flags includes the fault access
bits, wired flag as PMAP_ENTER_WIRED, and a new flag
PMAP_ENTER_NOSLEEP to indicate that pmap should not sleep.
For powerpc aim both 32 and 64 bit, fix implementation to ensure that
the requested mapping is created when PMAP_ENTER_NOSLEEP is not
specified, in particular, wait for the available memory required to
proceed.
In collaboration with: alc
Tested by: nwhitehorn (ppc aim32 and booke)
Sponsored by: The FreeBSD Foundation and EMC / Isilon Storage Division
MFC after: 2 weeks
We continue to use pmap_enter() for that. For unwiring virtual pages, we
now use pmap_unwire(), which unwires a range of virtual addresses instead
of a single virtual page.
Sponsored by: EMC / Isilon Storage Division
by the combination of r268591 and r269134: When we attempt to add the
wired attribute to an existing mapping, moea{,64}_pvo_enter() do nothing.
(They only set the wired attribute on newly created mappings.)
Tested by: andreast
pmap_unwire(), the call to MOEA64_PVO_TO_PTE() must be performed before
any changes are made to the PVO. Otherwise, MOEA64_PVO_TO_PTE() will
panic.
Reported by: andreast
calling mmap on /dev/mem and add a handler for the possible userland
machine checks that may result. Remove some pointless and wrong copy/paste
that has been in here for a decade as well.
This results in a /dev/mem with identical semantics to the x86 version.
MFC after: 1 week
the upstream implementation and helps ensure that a trap induced by tracing
fbt::trap:entry is handled without recursively generating another trap.
This makes it possible to run most (but not all) of the DTrace tests under
common/safety/ without triggering a kernel panic.
Submitted by: Anton Rang <anton.rang@isilon.com> (original version)
Phabric: D95
entry was being tested. We were incrementing and decrementing the pmap's
wired mapping count based on whether the physical page being mapped or
unmapped was cache coherent, not whether it was a wired mapping.
Reviewed by: nwhitehorn