This partially reverts e81e77c5a0, leaving the option both in
GENERICs on amd64/arm64/arm, and in global NOTES file. Apparently
this better matches existing practice, where we do not try to hard
to make LINT and GENERIC complimentary.
Requested and reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Remove the option from NOTES/LINT, and add to NOTES for powerpc and
riscv.
PR: 259036
Requested by: John Hay <john@sanren.ac.za>
Discussed with: ian, imp
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The implementation of the progress bar is simple, but duplicated for
most minidump implementations. Extract the common bits to kern_dump.c.
Ensure that the bar is reset with each subsequent dump; this was only
done on some platforms previously.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31885
The function is identical in each minidump implementation, so move it to
vm_phys.c. The only slight exception is powerpc where the function was
public, for use in moea64_scan_pmap().
Reviewed by: kib, markj, imp (earlier version)
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31884
When running in a virtualized environment, TLB invalidations can only
be performed on process scope, as only the hypervisor is allowed to
invalidate a global scope, or else a Program Interrupt is triggered.
Since we are here, also make sure that the register process table
hypercall returns success.
Reviewed by: jhibbits
MFC after: 2 weeks
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D31775
Summary:
Failure to update the FP / vector state was causing daemon(3) to violate C ABI by failing to preserve nonvolatile registers.
This was causing a weird issue where moused was not working on PowerBook G4s when daemonizing, but was working fine when running it foreground.
Force saving off the same state that cpu_switch() does in cases where we are about to copy a thread.
MFC after: 1 week
Sponsored by: Tag1 Consulting, Inc.
Test Plan:
```
/*
* Test for ABI violation due to side effects of daemon(3).
*
* NOTE: Compile with -O2 to see the effect.
*/
/* Allow compiling for Linux too. */
static double test = 1234.56f;
/*
* This contrivance coerces clang to not bounce the double
* off of memory again in main.
*/
void __attribute__((noinline))
print_double(int j1, int j2, double d)
{
printf("%f\n", d);
}
int
main(int argc, char *argv[])
{
print_double(0, 0, test);
if (daemon(0, 1)) {
}
/* Compiler assumes nonvolatile regs are intact... */
print_double(0, 0, test);
return(0);
}
```
Working output:
```
1234.560059
1234.560059
```
Output in broken case:
```
1234.560059
0.0
```
Reviewers: #powerpc
Subscribers: jhibbits, luporl, alfredo
Tags: #powerpc
Differential Revision: https://reviews.freebsd.org/D29851
Move the common kernel function signatures from machine/reg.h to a new
sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2).
Reviewed by: imp, markj
Sponsored by: DARPA, AFRL (original work)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19830
ISA 3.0 allows for nested radix translations with minimal to no
involvement of the hypervisor. This should make pseries signficantly
faster on POWER9 pseries instances, as fewer hypercalls are needed to
manage pmap now.
MFC after: 2 weeks
Relnotes: yes
which is the place to put MD asserts about allocated pages.
On amd64, verify that allocated page does not belong to the kernel
(text, data) or early allocated pages.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31121
As the Processor Version Register (PVR) is a 32-bit PowerPC
register, change mfpvr() return type to match it and avoid
type casts on its callers.
Suggested by: jhibbits
Reviewed by: jhibbits, imp
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D31332
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned. This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.
This reapplies 3a522ba1bc with a fix for
the static assertion failure on i386.
Approved by: markj (mentor)
Reviewed by: kib, bcr (manpages)
Differential Revision: https://reviews.freebsd.org/D29185
amd64 and 32-bit ARM already had assertions to this effect. Add them to
other pmaps.
Reviewed by: alc, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31171
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned. This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.
Approved by: markj (mentor)
Reviewed by: kib, bcr (manpages)
Differential Revision: https://reviews.freebsd.org/D29185
Use sysentvec hooks to only call umtx_thread_exit/umtx_exec, which handle
robust mutexes, for native FreeBSD ABI. Similarly, there is no sense
in calling sigfastblock_clear() for non-native ABIs.
Requested by: dchagin
Reviewed by: dchagin, markj (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D30987
This adds `sv_elf_core_osabi`, `sv_elf_core_abi_vendor`,
and `sv_elf_core_prepare_notes` fields to `struct sysentvec`,
and modifies imgact_elf.c to make use of them instead
of hardcoding FreeBSD-specific values. It also updates all
of the ABI definitions to preserve current behaviour.
This makes it possible to implement non-native ELF coredump
support without unnecessary code duplication. It will be used
for Linux coredumps.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30921
ddfc9c4c59 was missing changes to two files to complete the
bus_child_pnpinfo_str->bus_child_pnpinfo. This fixes the broken kernel
builds.
Sponsored by: Netflix
Now that the upper layers all go through a layer to tie into these
information functions that translates an sbuf into char * and len. The
current interface suffers issues of what to do in cases of truncation,
etc. Instead, migrate all these functions to using struct sbuf and these
issues go away. The caller is also in charge of any memory allocation
and/or expansion that's needed during this process.
Create a bus_generic_child_{pnpinfo,location} and make it default. It
just returns success. This is for those busses that have no information
for these items. Migrate the now-empty routines to using this as
appropriate.
Document these new interfaces with man pages, and oversight from before.
Reviewed by: jhb, bcr
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29937
Many of these typedefs are the same across all architectures or can
be set based on an architecture-independent compiler-provided macro
(e.g. __SIZEOF_SIZE_T__). These macros have been available since GCC 4.6
and Clang sometime before 3.0 (godbolt.org does not have any older clang
versions installed).
I originally considered using the compiler-provided `__FOO_TYPE__` directly.
However, in order to do so we have to check that those match the previous
typedef exactly (not just that they have the same size) since any change
would be an ABI break. For example, changing `long` to `long long` results
in different C++ name mangling. Additionally, Clang and GCC disagree on
the underlying type for some of (u)int*_fast_t types, so this change
only moves the definitions that are identical across all architectures
and does not touch those types.
This de-deduplication will allow us to have a smaller diff downstream in
CheriBSD: we only have to only change the (u)intptr_t definition in
sys/_types.h in CheriBSD instead of having to change machine/_types.h for
all CHERI-enabled architectures (currently RISC-V, AArch64 and MIPS).
Reviewed By: imp, kib
Differential Revision: https://reviews.freebsd.org/D29895
Commit 49c894ddce introduced an issue that prevented pseries boot,
when hugepages were not available to the guest. Now large page
info must be available before moea64_install is called, so this change
moves the code that scans large page sizes before the call.
Reviewed by: jhibbits (IRC)
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Invalidate the last page of a demoted superpage mapping, instead of the
first page, as it results in slightly more promotions and fewer
failures. While here, replace 'boolean_t's with 'bool's in
mmu_radix_advise().
Simplify pmap_clear_modify() a bit, by assuming that since the superpage
demotion succeeded, all 4k mappings from it are valid. Deindent the
surrounding code, as there are no 'else' branches in the code anyway.
It's a class0 driver that implements some pcib methods and creates
a pci bus as its children.
The "ofw_pci" name will be used by a new driver that will be a subclass
of the pci bus.
No functional changes intended.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30226
While here, fix all links to older en_US.ISO8859-1 documentation
in the src/ tree.
PR: 255026
Reported by: Michael Büker <freebsd@michael-bueker.de>
Reviewed by: dbaio
Approved by: blackend (mentor), re (gjb)
MFC after: 10 days
Differential Revision: https://reviews.freebsd.org/D30265
Summary:
There's no need to use a while loop in the IPI handler, the message list
is cached once and processed. Instead, since the existing code calls
ffs(), sort the handlers, and use a simple 'if' sequence.
Reviewed By: nwhitehorn
Differential Revision: https://reviews.freebsd.org/D30018
Summary:
Some methods are split between DMAP and non-DMAP, conditional on
hw_direct_map variable. Rather than checking this variable every time,
use it to install different functions via IFUNCs.
Reviewed By: luporl
Differential Revision: https://reviews.freebsd.org/D30071
Summary:
Since PCPU can live in a GPR for a while longer, let it, rather than
re-getting it in yet another register. MFSPR is an expensive operation,
12 clock latency on POWER9, so the fewer operations we need, the better.
Since the check is tightly coupled to the fetch, by reducing the number
of fetch+check, we reduce the stalls, and improve the performance
marginally. Buildworld was measured at a ~5-7% improvement on a single
run.
Reviewed By: nwhitehorn
Differential Revision: https://reviews.freebsd.org/D30003
Adds OPAL_CONSOLE_WRITE error handling and implements a call to
OPAL_CONSOLE_WRITE_BUFFER_SPACE to verify if there's enough space
before writing to console.
This fixes serial port output getting corrupted on fast writes, like
on "dmesg" output.
Tested on Raptor Blackbird running powerpc64 BE kernel
Reviewed by: luporl
Sponsored by: Eldorado Reserach Institute (eldorado.org.br)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29063
types.h defines device_t as a typedef of struct device *. struct device
is defined in subr_bus.c and almost all of the kernel uses device_t.
The LinuxKPI also defines a struct device, so type confusion can occur.
This causes bugs and ambiguity for debugging tools. Rename the FreeBSD
struct device to struct _device.
Reviewed by: gbe (man pages)
Reviewed by: rpokala, imp, jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29676
This is intended to be used with memory mapped IO, e.g. from
bus_space_map with no flags, or pmap_mapdev.
Use this new memory type in the map request configured by
resource_init_map_request, and in pciconf.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D29692
Radix MMU code was missing TLB invalidations when some Level 3 PDEs were
modified. This caused TLB multi-hit machine check interrupts when
superpages were enabled.
Reviewed by: jhibbits
MFC after: 2 weeks
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D29511
Older G4 and G3 models have a programmer's switch that can be used to
generate an interrupt to drop into the debugger.
This code hadn't been tested for a long time. It had been broken back
in 2005 in r153050.
Repair and modernize the code and add it to GENERIC.
Reviewed by: jhibbits (approved w/ removal of unused sc_dev var)
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D29131
Summary:
They're nearly identical, so don't use two copies. Merge the newer
driver into the older one, and move it to a common location.
Add the Semihalf and associated copyrights in addition to mine, since
it's a non-trivial amount of code merged.
Reviewed By: mw
Differential Revision: https://reviews.freebsd.org/D29520
ULE uses this topology to try and preserve locality when migrating
threads between CPUs and when performing work stealing. Ensure that on
NUMA systems it will at least take the NUMA topology into account.
Reviewed by: bdragon, jhibbits (previous version)
Tested by: bdragon
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D28580
This only works on single-CPU G4 systems, and more work is needed for
dual-CPU systems. That said, platform sleep does not work, and this is
currently only used for PMU-based CPU speed change.
The elimination of the platform_smp_timebase_sync() call is so that the
timebase sync rendezvous can be enhanced to perform better
synchronization, which requires a full rendezvous. This would be
impossible to do on this single-threaded run.
Rename cpu_sleep() to mpc745x_sleep() to denote what it's actually
intended for. This function is very G4-specific, and will not work on
any other CPU. This will afterward eliminate a
platform_smp_timebase_sync() call by directly updating the timebase
instead.
The POWER7 subword atomics were not using the correct instructions for
byte and halfword stores in the atomic_fcmpset code.
This only affects builds with custom CFLAGS that have explicitly enabled
ISA_206_ATOMICS.
--Eliminate a big ifdef that encompassed all currently-supported
architectures except mips and powerpc32. This applied to the case
in which we've allocated a superpage but the pager-populated range
is insufficient for a superpage mapping. For platforms that don't
support superpages the check should be inexpensive as we shouldn't
get a superpage in the first place. Make the normal-page fallback
logic identical for all platforms and provide a simple implementation
of pmap_ps_enabled() for MIPS and Book-E/AIM32 powerpc.
--Apply the logic for handling pmap_enter() failure if a superpage
mapping can't be supported due to additional protection policy.
Use KERN_PROTECTION_FAILURE instead of KERN_FAILURE for this case,
and note Intel PKU on amd64 as the first example of such protection
policy.
Reviewed by: kib, markj, bdragon
Differential Revision: https://reviews.freebsd.org/D29439
ipi_msg_count is inaccessible outside this file and is never read.
It was introduced in the original SMP support code in r178628 and was never
actually used anywhere.
Remove it to slightly improve IPI performance.
Submitted by: jhibbits
MFC after: 1 week
Now that superpages for HPT MMU has landed, finish implementation of
pmap_mincore by adding support for superpages.
Submitted by: Fernando Eckhardt Valle <fernando.valle@eldorado.org.br>
Reviewed by: bdragon, luporl
MFC after: 1 week
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D29230
The remote protocol allows for implementations to report more specific
reasons for the break in execution back to the client [1]. This is
entirely optional, so it is only implemented for amd64, arm64, and i386
at the moment.
[1] https://sourceware.org/gdb/current/onlinedocs/gdb/Stop-Reply-Packets.html
Reviewed by: jhb
MFC after: 3 weeks
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
NetApp PR: 51
Differential Revision: https://reviews.freebsd.org/D29174
On an INVARIANTS kernel on 32-bit Book-E, we were panicing when running
the libproc tests. This was caused by extra pv entries being generated
accidentally by the pmap icache invalidation code.
Use the same VA (i.e. 0) when freeing the temporary mapping, instead of
some arbitrary address within the zero page.
Failure to do this was causing kernel-side icache syncing to leak
PVE entries when invalidating icache for a non page-aligned address, which
would later result in pages erroneously showing up as mapped to vm_page.
This bug was introduced in r347354 in 2019.
Reviewed by: jhibbits (in irc)
Sponsored by: Tag1 Consulting, Inc.
Implements bus_map_resource and bus_unmap_resource DEVMETHODs to be
used by powerpc targets. This is identical to the amd64 code.
Required by virtio-modern.
Reviewed by: bryanv
Sponsored by: Eldorado Research Institute (eldorado.org.br)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28012
Use the new kdb variants. Print more specific error messages.
Reviewed by: jhb, markj
MFC after: 3 weeks
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D29156
This basically mirrors what already exists in ddb, but provides a
slightly improved interface. It allows the caller to specify the
watchpoint access type, and returns more specific error codes to
differentiate failure cases.
This will be used to support hardware watchpoints in gdb(4).
Stubs are provided for architectures lacking hardware watchpoint logic
(mips, powerpc, riscv), while other architectures are added individually
in follow-up commits.
Reviewed by: jhb, kib, markj
MFC after: 3 weeks
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D29155
At this point in startup, vm_ndomains has not been initialized. Switch
to checking kenv instead.
Fixes incorrect NUMA information being set on multi-domain systems like
Talos II.
Submitted by: jhibbits
MFC after: 2 weeks
Summary:
Since powerpcspe doesn't have a traditional FPU, there's no FPSCR, and
no FPRs. Attempting to use them triggers an illegal instruction trap.
Fix this unconditional cleanup of FPSCR by conditionalizing it on the
FPU being used in the outgoing thread.
Reviewed By: bdragon
Differential Revision: https://reviews.freebsd.org/D29452
This change serves two purposes.
First, we take advantage of the compiler provided endian definitions to
eliminate some long-standing duplication between the different versions
of this header. __BYTE_ORDER__ has been defined since GCC 4.6, so there
is no need to rely on platform defaults or e.g. __MIPSEB__ to determine
endianness. A new common sub-header is added, but there should be no
changes to the visibility of these definitions.
Second, this eliminates the hand-rolled __bswapNN() routines, again in
favor of the compiler builtins. This was done already for x86 in
e6ff6154d2. The benefit here is that we no longer have to maintain our
own implementations on each arch, and can instead rely on the compiler
to emit appropriate instructions or libcalls, as available. This should
result in equivalent or better code generation. Notably 32-bit arm will
start using the `rev` instruction for these routines, which is available
on armv6+.
PR: 236920
Reviewed by: arichardson, imp
Tested by: bdragon (BE powerpc)
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D29012
PowerISA 2.07B says that the low-order p-12 bits of the real page number
contained in ARPN and LP fields of a PTE must be 0s and are ignored
by the hardware (Book III-S, 5.7.7.1), where 2^p is the actual page size
in bytes, but we were clearing only the LP field.
This worked on bare metal and QEMU with KVM, that ignore these bits,
but caused a kernel panic on QEMU with TCG, that expects them to be
cleared.
This fixes running FreeBSD with HPT superpages enabled on QEMU
with TCG.
MFC after: 2 weeks
Sponsored by: Eldorado Research Institute (eldorado.org.br)
e4b8deb222 removed the last in-tree uses of PCPU_INC(). Its
potential benefit is also practically nonexistent. Non-x86
platforms already implement it as PCPU_ADD(..., 1), and according
to [0] there are no recent x86 processors for which the 'inc'
instruction provides a performance benefit over the equivalent
memory-operand form of the 'add' instruction. The only remaining
benefit of 'inc' is smaller instruction size, which in this case
is inconsequential given the limited number of per-CPU data consumers.
[0]: https://www.agner.org/optimize/instruction_tables.pdf
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D29308
In r361544, the pmap drivers were converted to ifuncs. When doing so,
this changed the call type of pmap functions to be called via the
secure-plt stubs.
These stubs depend on the TOC base being loaded to r30 to run properly.
On SMP AIM (i.e. a dual processor G4 or running 32-bit on G5), since the
APs were being started up from the reset vector instead of going
through __start, they had never had r30 initialized properly, so when the
cpu_reset code in trap_subr32.S attempted to branch to
pmap_cpu_bootstrap(), it was loading the target from the wrong location.
Ensure r30 is set up directly in the cpu_reset trap code, so we can make
PLT calls as normal.
Fixes boot on my SMP G4.
Reviewed by: jhibbits
MFC after: 3 days
Sponsored by: Tag1 Consulting, Inc.
In uart_phyp_get(), when the internal buffer is empty, we make a
hypercall to retrieve up to 16 bytes of input data from the
hypervisor. As this is specified to be returned in BE format, we need
to do a 64-bit byte swap on the first and second half of the data.
If the buffer being passed in was insufficient to return the fetched
data, we store the remainder in the internal buffer and use it to
satisfy the following calls to uart_phyp_get() until it is drained.
However, in this case, we were accidentally byteswapping the internal
buffer again.
Move the byteswapping code to just after the hypercall so it only gets
swapped when we're filling the buffer.
Fixes arrow keys in qemu on pseries, among other console oddities.
Sponsored by: Tag1 Consulting, Inc.
MFC after: 3 days
This macro returns true if a provided virtual address is contained
in the kernel's clean submap.
In CHERI kernels, the buffer cache and transient I/O map are allocated
as separate regions. Abstracting this check reduces the diff relative
to FreeBSD. It is perhaps slightly more readable as well.
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D28710
In 78599c32ef, CFI endproc decoration was
added to locore64.S. However, it missed the subtle detail that
__restartkernel_virtual() falls through to __restartkernel(). This was
causing boot failure on PowerMac G5, as it tried to execute the
epilogue as code.
Fix this by branching to __restartkernel() instead of intentionally
running off the end of the function.
While here, add some additional notes on how the virtual mode restart
works.
MFC after: 3 days
lang/rust needs COMPAT_FREEBSD11 to build, even though powerpc64le itself is supported only since 13.0.
I also corrected a comment, because if we ever have lib32 for powerpc64le, it will be for powerpcle.
Reviewed by: bdragon (on IRC)
Submitted by: Andre Fernando da Silva <andre.silva@eldorado.org.br>
Reviewed by: luporl, alfredo, kadesai (on email)
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D26531
Summary:
The values returned by this function are reported to the gdb client as
the reason for the break in execution, a signal value such as SIGTRAP,
SIGEMT, or SIGSEGV. As such, exact vector numbers can be misidentified.
Return SIGEMT in the default case instead.
Reviewed by: alfredo
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D28046
Support for powerpc64le appeared in 13, so there's no point to enable COMPAT_* for older releases.
Also disable COMPAT_FREEBSD32, since there's no powerpcle. Since that may change in the future, leave the option commented out.
Approved by: bdragon, jhibbits (on IRC)
This is the superset of the nooptions found in the -DEBUG kernels.
Reviewed by: emaste, manu
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D28152
This does an import of quirk stubs, debugging macros from USB code and
numerous usage constants used by dependent drivers.
Besides, this change renames some functions to get a better matching
with userland library and NetBSD/OpenBSD HID code. Namely:
- Old hid_report_size() renamed to hid_report_size_max()
- New hid_report_size() calculates size of given report rather than
maximum size of all reports.
- hid_get_data_unsigned() renamed to hid_get_udata()
- hid_put_data_unsigned() renamed to hid_put_udata()
Compat shim functions are provided in usbhid.h to make possible compile
of legacy code unmodified after this change.
Reviewed by: manu, hselasky
Differential revision: https://reviews.freebsd.org/D27887
It will be used by the upcoming HID-over-i2C implementation. Should be
no-op, except hid.ko module dependency is to be added to affected drivers.
Reviewed by: hselasky, manu
Differential revision: https://reviews.freebsd.org/D27867
It's possible for a context switch, and CPU migration, to occur between
fetching the PCPU context and extracting the pc_curpcb. This can cause
the fault handler to be installed for the wrong thread, leading to a
panic in copyin()/copyout(). Since curthread is already in %r13, just
use that directly, as GPRs are migrated, so there is no migration race
risk.
Currently copyinstr() uses fubyte() to read each byte from userspace.
However, this means that for each byte, it calls pmap_map_user_ptr() to
map the string into memory. This is needlessly wasteful, since the
string will rarely ever cross a segment boundary. Instead, map a
segment at a time, and copy as much from that segment as possible at a
time.
Measured with the HPT pmap on powerpc64, this saves roughly 8% time on
buildkernel, and 5% on buildworld, in wallclock time.
- fix values returned by 'sysctls dev.opal_sensor.*.sensor'
- fix missing 'dev.opal_sensor.*.sensor_[max|min]' on sysctl
Reviewed-by: jhibbits
Sponsored-by: Eldorado Research Institute (eldorado.org.br)
Differential-Revision: https://reviews.freebsd.org/D27365
Ability to load-balance traffic over multiple path is a must-have thing for routers.
It may be used by the servers to balance outgoing traffic over multiple default gateways.
The previous implementation, RADIX_MPATH stayed in the shadow for too long.
It was not well maintained, which lead us to a vicious circle - people were using
non-contiguous mask or firewalls to achieve similar goals. As a result, some routing
daemons implementation still don't have multipath support enabled for FreeBSD.
Turning on ROUTE_MPATH by default would fix it. It will allow to reduce networking
feature gap to other operating systems. Linux and OpenBSD enabled similar support
at least 5 years ago.
ROUTE_MPATH does not consume memory unless actually used. It enables around ~1k LOC.
It does not bring any behaviour changes for userland.
Additionally, feature is (temporarily) turned off by the net.route.multipath sysctl
defaulting to 0.
Differential Revision: https://reviews.freebsd.org/D27428
Follow-up to r353959 and r368070: do the same for other architectures.
arm32 already seems to use its own .fnstart/.fnend directives, which
appear to be ARM-specific variants of the same thing. Likewise, MIPS
uses .frame directives.
Reviewed by: arichardson
Differential Revision: https://reviews.freebsd.org/D27387
In the PCB struct, we need to match the VSX register file layout
correctly, as the VSRs shadow the FPRs.
In LE, we need to have a dword of padding before the fprs so they end up
on the correct side, as the struct may be manipulated by either the FP
routines or the VSX routines.
Additionally, when saving and restoring fprs, we need to explicitly target
the fpr union member so it gets offset correctly on LE.
Fixes weirdness with FP registers in VSX-using programs (A FPR that was
saved by the FP routines but restored by the VSX routines was becoming 0
due to being loaded to the wrong side of the VSR.)
Original patch by jhibbits.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D27431
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.
Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*). Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.
Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys. Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight. Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.
Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.
Suggested by: mav (*)
Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27225
It can useful for code outside the VM system to look up the NUMA domain
of a page backing a virtual or physical address, specifically when
creating NUMA-aware data structures. We have _vm_phys_domain() for
this, but the leading underscore implies that it's an internal function,
and vm_phys.h has dependencies on a number of other headers.
Rename vm_phys_domain() to vm_page_domain(), and _vm_phys_domain() to
vm_phys_domain(). Make the latter an inline function.
Add _vm_phys.h and define struct vm_phys_seg there so that it's easier
to use in other headers. Include it from vm_page.h so that
vm_page_domain() can be defined there.
Include machine/vmparam.h from _vm_phys.h since it depends directly on
some constants defined there.
Reviewed by: alc
Reviewed by: dougm, kib (earlier versions)
Differential Revision: https://reviews.freebsd.org/D27207
r367416 should have called save_fpu() before kern_sigprocmask to avoid
race condition
Thanks jhibbits and bdragon for pointing it out
Reviewed by: jhibbits
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D27241
After r367417, both mmu_oea64 and mmu_radix were defining the vm.pmap
sysctl node, resulting in the later definition hiding the properties of
the previous one. Avoid this issue by defining vm.pmap in a common
source file and declaring it where needed.
This change also standardizes the tunable name used to enable superpages
and change its default to disabled on radix MMU, because it still has some
issues with superpages.
Reviewed by: bdragon, jhibbits
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D27156
There were many, many endianness fixes needed for Radix MMU. The Radix
pagetable is stored in BE (as it is read and written to by the MMU hw),
so we need to convert back and forth every time we interact with it when
running in LE.
With these changes, I can successfully boot with radix enabled on POWER9 hw.
Reviewed by: luporl, jhibbits
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D27181
The HPT is always stored in big-endian, as it is accessed directly by the
hardware as well as the kernel. As such, it is necessary to convert values
to and from native endian when running on LE.
Some unconverted accesses snuck in accidentally with r367417.
Apply the appropriate conversions to fix boot hanging on powerpc64le.
Sponsored by: Tag1 Consulting, Inc.
This brings its 'struct syscall_args' in sync with other architectures.
Reviewed by: bdragon, jhibbits
MFC after: 2 weeks
Sponsored by: EPSRC
Differential Revision: https://reviews.freebsd.org/D26605
Fix build errors introduced by r367417 and r367390:
- Guard label reached only by powerpc64
- Guard vm_reserv_level_iffullpop call, that is not defined on powerpc
variants that don't support superpages
- Add missing hwpmc file, for when hwpmc is built into kernel
This change adds support for transparent superpages for PowerPC64
systems using Hashed Page Tables (HPT). All pmap operations are
supported.
The changes were inspired by RISC-V implementation of superpages,
by @markj (r344106), but heavily adapted to fit PPC64 HPT architecture
and existing MMU OEA64 code.
While these changes are not better tested, superpages support is disabled by
default. To enable it, use vm.pmap.superpages_enabled=1.
In this initial implementation, when superpages are disabled, system
performance stays at the same level as without these changes. When
superpages are enabled, buildworld time increases a bit (~2%). However,
for workloads that put a heavy pressure on the TLB the performance boost
is much bigger (see HPC Challenge and pgbench on D25237).
Reviewed by: jhibbits
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D25237
Add support for Floating-Point Exception traps on 32 and 64 bit platforms.
Also make sure to clean FPSCR on EXEC and thread exit
Author of initial version: Renato Riolino <renato.riolino@eldorad.org.br>
Reviewed by: jhibbits
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D23623
This change adds support for POWER8 and POWER9 PMCs (bare metal and
pseries).
All PowerISA 2.07B non-random events are supported.
Implementation was based on that of PPC970.
Reviewed by: jhibbits
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D26110
And add a _74XX suffix to 74XX SPRs.
This is a preparation for adding support to POWER8/9 PMCs, which have most
SPRs equal to 970 ones.
Reviewed by: jhibbits
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D26532
VM_ALLOC_WAITOK and vm_page_unwire_noq(), have eliminated the need for
many of the #includes.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D27052
Move dump_avail[] extern declaration and inlines into a new header
vm/vm_dumpset.h. This fixes default gcc build for mips.
Reviewed by: alc, scottph
Tested by: kevans (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26741
Push the root seed version to userspace through the VDSO page, if
the RANDOM_FENESTRASX algorithm is enabled. Otherwise, there is no
functional change. The mechanism can be disabled with
debug.fxrng_vdso_enable=0.
arc4random(3) obtains a pointer to the root seed version published by
the kernel in the shared page at allocation time. Like arc4random(9),
it maintains its own per-process copy of the seed version corresponding
to the root seed version at the time it last rekeyed. On read requests,
the process seed version is compared with the version published in the
shared page; if they do not match, arc4random(3) reseeds from the
kernel before providing generated output.
This change does not implement the FenestrasX concept of PCPU userspace
generators seeded from a per-process base generator. That change is
left for future discussion/work.
Reviewed by: kib (previous version)
Approved by: csprng (me -- only touching FXRNG here)
Differential Revision: https://reviews.freebsd.org/D22839
Now that config(8) has supported include for 19 years, transition to
including the NOTES files. include support didn't exist at the time,
nor did the envvar stuff recently added. Now that it does, eliminate
the building of LINT files by just including everything you need.
Note: This may cause conflicts with updating in some cases.
find sys -name LINT\* -rm
is suggested across this commit to remove the generated LINT
files.
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D26540
Add support for sysctl 'machdep.uprintf_signal' that prints debugging
information on trap signal.
Reviewed by: jhibbits, luporl, bdragon
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D26004
It is unlikely, but possible, that an unrecognized or unsupported
relocation type is encountered while trying to load a kernel module. If
this occurs we should offer the symbol index as a hint to the user.
While here, fix some small style issues.
Reviewed by: markj, kib (amd64 part, in D26701)
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Similar to OPAL calls, switch to big endian to do calls to RTAS.
(Missed this one when I was doing the bulk commit of PowerPC64LE support.)
Sponsored by: Tag1 Consulting, Inc.
Due to enter_idle_powerx fabricating a MSR from scratch, it is necessary
for it to care about the endianness, so we don't accidentally switch
endian the first time we idle a thread.
Took about five seconds to spot after seeing an unmangled backtrace.
The hard bit was needing to temporarily set up a mutex to sort out the
logjam that happens when every thread simultaneously wakes up in the wrong
endian due to the panic IPI and panics, leaving what I can best describe as
"alphabet soup" on the console.
Luckily, I already had a patch sitting around to do that.
This brings POWER8 up to equivilence with POWER9 on PPC64LE.
Sponsored by: Tag1 Consulting, Inc.
OPAL unconditionally enters secondary CPUs with only HV and SF set.
I tried writing a secondary entry point instead, but OPAL rejected it
and I am unsure why, so I resorted to making the system reset interrupt
endian-flexible.
This means we take a slight performance hit on wakeup on LE, but it is
a good stopgap until we can figure out a reliable way to make OPAL enter
where we want it to.
It probably makes sense to have it around anyway, because I can imagine
scenarios where the cpu resets itself to BE and does a software reset.
Sponsored by: Tag1 Consulting, Inc.
More endian conversion.
* Install TCEs correctly (i.e. in big endian)
* Convert to big endian and back when setting up queue pages and IRQs.
Sponsored by: Tag1 Consulting, Inc.
Since OPAL runs in big endian, any data being passed back and forth
via memory instead of registers needs to be byteswapped.
From my notes during development:
"A good way to find candidates is to look for vtophys() in opal_call()
parameters. The memory being passed will be written into in BE."
Sponsored by: Tag1 Consulting, Inc.
It's much easier to implement this in an endian-independent way when we
don't also have to worry about masking half of the dword off.
Given that this code ran on a machine that ran a poudriere bulk with no
kernel oddities, I am relatively certain it is correctly implemented. ;)
This should be a minor performance boost on BE as well.
Sponsored by: Tag1 Consulting, Inc.
For a body of code that had its endian conversion bits written blind without
the ability to test, moea64 was VERY close to being correct.
There were only four instances where the existing code was getting it wrong.
Sponsored by: Tag1 Consulting, Inc.
This is slightly stripped down from GENERIC64, as PowerMac G5 machines
are incapable of running in LE mode (so we can skip the Mac drivers.)
While technically POWER6 and POWER7 have the hardware capability of running
in LE mode, they have a tendency to trap excessively when a load/store is
misaligned. (an extremely common occurrence in LE code, and one of the main
reasons I consider BE to be superior, as it turns potential security issues
into immediately obvious mangled numbers.)
Additionally, there was no mechanism to control what endian interrupts
are delivered in, so supporting LE operation on POWER6 and POWER7 involves
some really dirty tricks in the interrupt vectors that I would rather
avoid.
IBM drew the line in the sand at POWER8 some time around 2013, embracing
full support for LE in the platform, and making a push across the board
for LE code to target POWER8 as a minimum requirement. As such, usage of
LE kernels on POWER6 and POWER7 is practically nil, despite it being
technically possible to do.
The so-called "TRUELE" feature bit which is the baseline requirement for
needed for PowerPC64LE was introduced in POWER8.
Sponsored by: Tag1 Consulting, Inc.
When running without a hypervisor, we need to set the ILE bit in the LPCR
ourselves.
For the boot processor, handle it in powernv_attach() like we do for other
LPCR bits.
No change for the APs, as they will use the lpcr global to set up their own
LPCR when they do their own cpudep_ap_early_bootstrap() and pick up this
automatically.
Sponsored by: Tag1 Consulting, Inc.
OPAL runs in big endian, so we need to rfid into it to switch endian
atomically when branching to it, and we need to do the
RETURN_TO_NATIVE_ENDIAN dance when it returns to us.
Sponsored by: Tag1 Consulting, Inc.
Unlike virtio, which in legacy mode is guest endian, the hypervisor vscsi
interface operates in big endian, so we must convert back and forth in several
places.
These changes are enough to attach a rootdisk.
Sponsored by: Tag1 Consulting, Inc.
The TCG implementation of mtmsrd in qemu blindly copies the entire register
to the MSR, instead of the specific bit positions listed in the ISA.
This means that qemu will prematurely switch endian out from under the
running code instead of waiting for the rfid, causing an immediate trap
as it attempts to interpret the next instruction in the wrong endianness.
To work around this, ensure PSL_LE is still set before doing the mtmsrd.
In the future, we may wish to just turn off translation and unconditionally
use rfid to switch to the ofmsr instead of quasi-switching to the ofmsr.
Add a new platform option so this can be disabled. (And so that we can
conditonalize additional QEMU-specific hacks in the platform code.)
Sponsored by: Tag1 Consulting, Inc.
Since we will need to be able to take traps relatively early in the process,
ensure that the hypervisor changes our ILE for us as soon as we are ready.
Sponsored by: Tag1 Consulting, Inc.
Since OFW always runs in big endian in practice, we need to convert several
bits back and forth.
This is necessary to communicate with SLOF on LE pseries.
Sponsored by: Tag1 Consulting, Inc.
This is the initial set up for PowerPC64LE.
The current plan is for this arch to remain experimental for FreeBSD 13.
This started as a weekend learning project for me and kinda snowballed from
there.
(More to follow momentarily.)
Reviewed by: imp (earlier version), emaste
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D26399
On Ampere Altra systems, the sparse population of RAM within the
physical address space causes the vm_page_dump bitmap to be much
larger than necessary, increasing the size from ~8 Mib to > 2 Gib
(and overflowing `int` for the size).
Changing the page dump bitmap also changes the minidump file
format, so changes are also necessary in libkvm.
Reviewed by: jhb
Approved by: scottl (implicit)
MFC after: 1 week
Sponsored by: Ampere Computing, Inc.
Differential Revision: https://reviews.freebsd.org/D26131
These definitions were repeated by all architectures, with small
variations. Consolidate the common definitons in machine
independent code and use bitset(9) macros for manipulation. Many
opportunities for deduplication remain in the machine dependent
minidump logic. The only intended functional change is increasing
the bit index type to vm_pindex_t, allowing the indexing of pages
with address of 8 TiB and greater.
Reviewed by: kib, markj
Approved by: scottl (implicit)
MFC after: 1 week
Sponsored by: Ampere Computing, Inc.
Differential Revision: https://reviews.freebsd.org/D26129
One problem with the bus_space_read_N() and bus_space_write_N() family of
functions is that they provide no protection against exceptions which can
occur when no physical hardware or device responds to the read or write
cycles. In such a situation, the system typically would panic due to a
kernel-mode bus error. The bus_space_peek_N() and bus_space_poke_N() family
of functions provide a mechanism to handle these exceptions gracefully
without the risk of crashing the system.
Typical example is access to PCI(e) configuration space in bus enumeration
function on badly implemented PCI(e) root complexes (RK3399 or Neoverse
N1 N1SDP and/or access to PCI(e) register when device is in deep sleep state.
This commit adds a real implementation for arm64 only. The remaining
architectures have bus_space_peek()/bus_space_poke() emulated by using
bus_space_read()/bus_space_write() (without exception handling).
MFC after: 1 month
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D25371
Due to a check that should have been an endian check being an #if 0,
the wrong checksum mask table was being used on LE, which was causing
extreme strangeness in DNS resolution -- *some* hosts would be resolvable,
but most would not.
This fixes DNS resolution.
(I am committing some parts of the LE patchset ahead of time to reduce the
amount of work I have to do while committing the main patchset.)
Sponsored by: Tag1 Consulting, Inc.
On ibm,extended-clock-frequency, ensure we be64toh() the value.
On clock-frequency, remove the right-shifting hack (which was needed due to
reading a 32 bit value into a 64 bit variable) and switch to OF_getencprop()
for reading (which will handle endian conversion internally.)
Reviewed by: jhibbits (in irc)
Sponsored by: Tag1 Consulting, Inc.
that can be extended, but also ensure compile-time type checking. Refactor
common code out of arch-specific implementations. Move the mpr and mps
drivers to this new API. The template type remains visible to the consumer
so that it can be allocated on the stack, but should be considered opaque.
To make it easier to work with this in the future, convert to c99
designated initializer syntax.
Tested on powerpc, powerpc64, and powerpc64le. No functional change.
Sponsored by: Tag1 Consulting, Inc.
The intention of the bus_be naming was for those to be the no-endian-swapping
and for the bus_le to be endian-swapping in all the functions.
This naming breaks down when we're actually are running in LE and need to
use the opposite sense.
As such, rename bs_be_* to native_bs_* and rename bs_le_* to swapped_bs_*.
No functional change.
Sponsored by: Tag1 Consulting, Inc.
Swap the BE and LE bus_space tags when on LE, and adjust the nexus tag
to match.
This is prep for a a followup that makes the powerpc bus_space macros easier
to maintain in the future.
Sponsored by: Tag1 Consulting, Inc.
Implement pmap_mincore() for moea64.
This will need some slight tweaks when large page support in HPT lands.
Submitted by: Fernando Eckhardt Valle <fernando.valle@eldorado.org.br>
Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D26314
* Add LOAD_LR_NIA define. This is preferred to "bl 1f; 1:" because it
doesn't pollute the branch predictor.
* Add magic sequence to return the CPU to the correct endianness after
jumping to cross-endian code, similar to the sequence from Linux.
Sponsored by: Tag1 Consulting, Inc.
There were multiple bugs in the OPAL RTC code which had never been
discovered, as the default configuration of OPAL machines is to
have the BMC / FSP control the RTC.
* Fix calling convention for setting the time -- the variables are passed
directly in CPU registers, not via memory.
* Fix bug in the bcd encoding routines. (from jhibbits)
Tested on POWER9 Talos II (BE) and POWER9 Blackbird (LE).
Reviewed by: jhibbits (in irc)
Sponsored by: Tag1 Consulting, Inc.
When emulating a single thread system for testing reasons, mp_maxid can
be 0. This trips up our math for calculating the order.
Account for this to fix xive attachment when emulating a single-thread
core on qemu powernv (a configuration that doesn't exist in the real world.)
Sponsored by: Tag1 Consulting, Inc.
When doing a build for a modern CPUTYPE, llvm will throw errors if obsolete
instructions are used, even if they will never run due to runtime checks.
Hiding the dssall instruction from the assembler fixes kernel build when
overriding CPUTYPE, without having any effect on the generated binary.
This has been in my local tree for over a year and is well tested across
a variety of machines.
Sponsored by: Tag1 Consulting, Inc.
When enabling an interrupt, assert that we do in fact have a root PIC.
This would have saved me some debugging effort.
Sponsored by: Tag1 Consulting, Inc.
Implement the remaining pieces needed to allow userland timestamp reading.
Rewritten based on an intial essay into the problem by Justin Hibbits.
(Copyright changed to my own on his request.)
Tested on ppc64 (POWER9 Talos II), powerpcspe (e500v2 RB800), and
powerpc (g4 PowerBook).
Reviewed by: jhibbits (in irc)
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D26347
In order to enable VDSO timekeeping, it is necessary that there be exactly
one primary FreeBSD sysvec for each of the host and (optionally) compat32.
So, switch ELFv1 to being a secondary sysvec of ELFv2, so it does not get
double-allocated in the shared page.
Since secondary sysvecs use the same sigcode allocation as the primary,
define both to use the main sigcode64, and adjust the sv_sigcode_base on
ELFv2 after initialization to point to the correct offset.
This has the desirable side effect of avoiding having a separate copy of
the signal trampoline in the shared page. Our sigcode64 was already written
to take advantage of trampoline sharing, it was just not being allocated
that way until now.
Submitted by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Currently we use a single bit to indicate whether the virtual page is
part of a superpage. To support a forthcoming implementation of
non-transparent 1GB superpages, it is useful to provide more detailed
information about large page sizes.
The change converts MINCORE_SUPER into a mask for MINCORE_PSIND(psind)
values, indicating a mapping of size psind, where psind is an index into
the pagesizes array returned by getpagesizes(3), which in turn comes
from the hw.pagesizes sysctl. MINCORE_PSIND(1) is equal to the old
value of MINCORE_SUPER.
For now, two bits are used to record the page size, permitting values
of MAXPAGESIZES up to 4.
Reviewed by: alc, kib
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26238
This allows privileged userspace processes to find information about the
physical page backing a given mapping. It is useful in applications
such as DPDK which perform some of their own memory management.
Reviewed by: kib, jhb (previous version)
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara Inc.
Differential Revision: https://reviews.freebsd.org/D26237
PMCLOG macros were always using 32-bit addresses, even on PPC64.
This resulted in truncated addresses in logs, when running on 64-bit PPC
machines.
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D26112
Calling pmc_hook inside a critical section may result in a panic.
This happens when the user callchain is fetched, because it uses
pmap_map_user_ptr, that tries to get the (sleepable) pmap lock when the
needed vsid is not found.
Judging by the implementation in other platforms, intr_irq_handler in
kern/subr_intr.c and what pmc_hook do, it seems safe to move pmc_hook
outside the critical section.
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D26111
When SMP support for powerpc was added in r178628, the last callers of this
function were removed. All code that needs to manipulate the task priority
just does it directly instead.
Noticed while reading through the lint logs.
Sponsored by: Tag1 Consulting, Inc.
Assume ELF images without OSREL use the new auxv format.
This is specially important for rtld, that is not tagged. Using
direct exec mode with new (ELFv2) binaries that expect the new auxv
format would result in crashes otherwise.
Unfortunately, this may break direct exec'ing old binaries,
but it seems better to correctly support new binaries by default,
considering the transition to ELFv2 happened quite some time
ago. If needed, a sysctl may be added to allow old auxv format to
be used when OSREL is not found.
Reviewed by: bdragon
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D25651
Currently, we parse notes for the values of ELF FreeBSD feature flags
and osrel. Knowing these values, or knowing that image does not carry
the note if pointers are NULL, is useful to decide which ABI variant
(brand) we want to activate for the image.
Right now this is only a plumbing change
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D25273
After spending a lot of time trying to track down what was going on, I have
isolated the "black screen" failures when using boot1 to boot a G4 machine.
It turns out we were replacing the traps before installing the temporary
BAT entry for the bottom of physical memory. That meant that until the MMU
was bootstrapped, the cached translations were the only thing keeping us
from losing.
Throwing boot1 into the mix was affecting execution flow enough to cause us
to hit an uncached page and crash.
Fix this by properly setting up the initial BAT entry at the same time we
are replacing the OpenFirmware traps, so we can continue executing in
segment 0 until the rest of the DMAP has been set up.
A second thing discovered while researching this is that we were entering a
BAT region for segment 16. It turns out this range was a) considered part
of KVA, and b) has firmware mappings with varying attributes.
If we ever accessed an unmapped page in segment 16, it would cause a BAT
entry to be installed for the whole segment, which would bypass the
existing mappings until it was flushed out again.
Instead, translate the OFW memory attributes into VM memory attributes and
install the ranges into the kernel address space properly.
Reviewed by: adalava
MFC after: 3 weeks
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D25547
This fixes spurious "XIVE[ IC 00 ] ISN 1 lead to invalid IVE !" messages
generated by OPAL when running with the debug level cranked up.
Discussed with jhibbits.
Sponsored by: Tag1 Consulting, Inc.
Being able to use tmpfs without kernel modules is very useful when building
small MFS_ROOT kernels without a real file system.
Including TMPFS also matches arm/GENERIC and the MIPS std.MALTA configs.
Compiling TMPFS only adds 4 .c files so this should not make much of a
difference to NO_MODULES build times (as we do for our minimal RISC-V
images).
Reviewed By: br (earlier version for riscv), brooks, emaste
Differential Revision: https://reviews.freebsd.org/D25317
Subsequent to r240317, kmem_free() was replaced with kva_free() (r254025).
kva_free() releases the KVA allocation for the mapped region, but no longer
clears the pmap (pagetable) entries.
An affected pmap_unmapdev operation would leave the still-pmap'd VA space
free for allocation by other KVA consumers. However, this bug easily
avoided notice for ~7 years because most devices (1) never call
pmap_unmapdev and (2) on amd64, mostly fit within the DMAP and do not need
KVA allocations. Other affected arch are less popular: i386, MIPS, and
PowerPC. Arm64, arm32, and riscv are not affected.
Reported by: Don Morris <dgmorris AT earthlink.net>
Submitted by: Don Morris (amd64 part)
Reviewed by: kib, markj, Don (!amd64 parts)
MFC after: I don't intend to, but you might want to
Sponsored by: Dell Isilon
Differential Revision: https://reviews.freebsd.org/D25689
This removes SCTP from in-tree kernel configuration files. Now, SCTP
can be enabled by simply loading the module, as discussed on
freebsd-net@.
Reviewed by: tuexen
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25611
Use PVO_PADDR macro to get the physical address from a PVO, instead of
explicitly ANDing pvo_pte.pa with LPTE_RPGN where it is needed. Besides
improving readability, this is needed to support superpages (D25237), where
the steps to get the PA from a PVO are different.
Reviewed by: markj
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D25654
* Only read the DPCPU pointer once per xive_dispatch call.
* Optimize HE decoding for the common cases.
Reported by: jhibbits (in irc)
Reviewed by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D25545
It turns out relocating the symbol table itself can cause issues, like fbt
crashing because it applies the offsets to the kernel twice.
This had been previously brought up in rS333447 when the stoffs hack was
added, but I had been unaware of this and reimplemented symtab relocation.
Instead of relocating the symbol table, keep track of the relocation base
in ddb, so the ddb symbols behave like the kernel linker-provided symbols.
This is intended to be NFC on platforms other than PowerPC, which do not
use fully relocatable kernels. (The relbase will always be 0)
* Remove the rest of the stoffs hack.
* Remove my half-baked displace_symbol_table() function.
* Extend ddb initialization to cope with having a relocation offset on the
kernel symbol table.
* Fix my kernel-as-initrd hack to work with booke64 by using a temporary
mapping to access the data.
* Fix another instance of __powerpc__ that is actually RELOCATABLE_KERNEL.
* Change the behavior or X_db_symbol_values to apply the relocation base
when updating valp, to match link_elf_symbol_values() behavior.
Reviewed by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D25223
Due to kldxref not being able to generate hints for nonnative platforms,
any cross built VM images do not have /boot/kernel/linker.hints.
This prevents the virtio modules from being loaded, as the fallback code
will always fail the version check when the hints are missing.
Since we want to be able to generate VM images for 32 bit powerpc, add the
virtio modules to GENERIC like we do on powerpc64.
Reviewed by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D25271
Since qemu does not implement the L2 cache, we get stuck forever waiting
for a bit to be set when trying to invalidate it.
To prevent that, we should bail out if the L2 cache is missing.
One easy way to check this is L2CFG0 == 0 (since L2CSIZE always has at
least one bit set in a valid implementation)
(tested on qemu, rb800, and x5000)
Reviewed by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D25225
After r361988 fixed the reference count leak on booke64, it became possible
for an iteration somewhere in the middle of a page to become stale, with the
page vanishing (correctly) due to all PTEs on that page going away.
pte_find_next() would start at that iterator, and move along 'higher' order
directory pages until it finds a valid one, without zeroing out the lower
order pages. For instance:
/* Find next pte at or above 0x10002000. */
pte = pte_find_next(pmap, &(0x10002000));
pte_remove(pmap, pte);
/* This pte was the last reference in the page table page, page is
* gone.
*/
pte = pte_find_next(pmap, 0x10002000);
/* pte_find_next will see 0x10002000's page is gone, and jump to the
* next one, but starting iteration at the '0x2000' slot, skipping
* 0x0000 and 0x1000.
*/
This caused some processes, like git, to trip the KASSERT() in
pmap_release().
Fix this by zeroing all lower order iterators at each level.
vmem quantum cache is only needed when doing a lot of concurrent allocations,
which doesn't happen when allocating MSIs. This wastes memory for the cache
zones. Avoid this waste and don't use the quantum cache.
Reported by: markj
Properly handle reference counts in the 64-bit pmap page directories.
Otherwise all page table pages would leak due to over-referencing. This
would cause a quick enter to swap on a desktop system (AmigaOne X5000) when
quitting and rerunning applications, or just building world.
Add an INVARIANTS check to validate no leakage at pmap release time.
If the POWER firmware detects a bad CPU core, it will "GUARD" it out,
marking it disabled. Any attempt to spin up a bad CPU will trigger a panic
later on when waiting for threads on said core to wake up. Support limping
along on fewer cores instead.
Summary:
Radix on AIM, and all of Book-E (currently), can do direct addressing of
user space, instead of needing to map user addresses into kernel space.
Take advantage of this to optimize the copy(9) functions for this
behavior, and avoid effectively NOP translations.
Test Plan: Tested on powerpcspe, powerpc64/booke, powerpc64/AIM
Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D25129
Summary:
The point of this addition is to cache CPU behavior 'features', to avoid
having to recompute based on CPU, etc.
The first such use case is to avoid the unnecessary manipulation of the
SLBs (Segment Lookaside Buffers) when using the Radix pmap on POWER9.
Since we already get the PCPU pointer wherever we swap the SLB entries,
we can use a cached flag to check if it's necessary to perform the
operation anyway, and skip it when not.
Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D24908
HTM is on the chopping block, doesn't work on FreeBSD, and has only token
support in PowerISA 3.1 and POWER10. Don't advertise something we'll never
support.
Found by running libc tests with radix enabled.
Detect unsigned integer wrapping with a postcondition.
Note: Radix MMU is not enabled by default yet.
Sponsored by: Tag1 Consulting, Inc.
With IFUNC support in the kernel, we can finally get rid of our poor-man's
ifunc for pmap, utilizing kobj. Since moea64 uses a second tier kobj as
well, for its own private methods, this adds a second pmap install function
(pmap_mmu_init()) to perform pmap 'post-install pre-bootstrap'
initialization, before the IFUNCs get initialized.
Reviewed by: bdragon
In this context, 0 actually means 0 (i.e. this is a li instruction).
While most assemblers will ignore this, I did have a compile failure at one
point when using an external toolchain.
In the future, we should use the li syntax to make this clearer.
Sponsored by: Tag1 Consulting, Inc.
A recent kernel change caused the previously unused atomic_cmpset_masked() to
be used.
It had a typo in it.
Instead of reading the old value from an uninitialized variable, read it
from the passed-in pointer as intended.
This fixes crashes on 64 bit Book-E.
Obtained from: jhibbits
Kernel page tables actually start at index 4096, given kernel base address
of 0xc008000000000000, not index 0, which would yield 0xc000000000000000.
Fix this by indexing at the real base, instead of the assumed base.
This is a correctness fix needed to enable the ifunc conversion of the pmap
in D24993.
Since we are making function calls that may need to go through the PLT, ensure
r30 is set up correctly.
This fixes crashes when booting with D24993 applied.
Reviewed by: jhibbits (in IRC)
Sponsored by: Tag1 Consulting, Inc.
This reapplies logical r360944 and r360946 (reverting r360955), with fixed
copystr() stand-in replacement macro. Eventually the goal is to convert
consumers and kill the macro, but for a first step it helps if the macro is
correct.
Prior commit message:
Unlike the other copy*() functions, it does not serve to copy from one
address space to another or protect against potential faults. It's just
an older incarnation of the now-more-common strlcpy().
Add a coccinelle script to tools/ which can be used to mechanically
convert existing instances where replacement with strlcpy is trivial.
In the two cases which matched, fuse_vfsops.c and union_vfsops.c, the
code was further refactored manually to simplify.
Replace the declaration of copystr() in systm.h with a small macro
wrapper around strlcpy (with correction from brooks@ -- thanks).
Remove N redundant MI implementations of copystr. For MIPS, this
entailed inlining the assembler copystr into the only consumer,
copyinstr, and making the latter a leaf function.
Reviewed by: jhb (earlier version)
Discussed with: brooks (thanks!)
Differential Revision: https://reviews.freebsd.org/D24672
Recent changes have caused the vmspace objects to start coming from KVA
instead of direct-mapped memory on powerpc. As far as I can tell, this is
not actually a problem, so we should stop arbitrarily asserting that it is.
I do not know why this was not being triggered before.
Approved by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Instead of crashing the user process when a D-ERAT multihit is detected, try
to flush the ERAT, and continue. This machine check indicates a likely PMAP
invalidation shortcoming that will need to be addressed, but it's
recoverable, so just recover. The recovery is pmap-specific to flush the
ERAT, so add a pmap function to do so, currently only implemented by the
POWER9 radix pmap.
x86 needs delayed TLB invalidation because invalidation requires an
expensive IPI. PowerPC has had a TLB invalidation instruction since the
POWER1 in 1990, so there's no need to delay anything.
A page (even physmem) can be marked as cache-inhibited. Attempting to use
'dcbz' to zero a page mapped cache-inhibited triggers an alignment
exception, which is fatal in kernel. This was seen when testing hardware
acceleration with X on POWER9.
At some point in the future, this should be changed to a more straight
forward zero loop instead of bzero(), and a similar change be made to the
other pmaps.
Reported by: pkubaj@
The most likely users of the QORIQ64 config nowadays are users of AmigaOne
X5000 systems, which are desktops. They need a framebuffer and
keyboard/mouse, so add these to the config so it works by default once
drm-current-kmod is installed.
TRAP_ENTRY(0) should be TRAP_GENTRAP(0) here.
However, in practice, it doesn't matter, as the only time TRAP_ENTRY and
TRAP_GENTRAP can differ is when bridge mode is active, which is impossible
on the 64 bit kernel.
Fix it anyway in case we ever need to add a trap preamble on PPC64.