The pindex values are assigned from the L3 leaves upwards, meaning there
are NUL2E L3 tables and then NUL1E L2 tables (with a futher NUL0E L1
tables in future when we implement Sv48 support). Therefore anything
below NUL2E is an L3 table's page and anything above or equal to NUL2E
is an L2 table's page (with the threshold of NUL2E + NUL1E marking the
start of the L1 tables' pages in Sv48). Thus all the comparisons and
arithmetic operations must use NUL2E to handle the L3/L2 allocation (and
thus L2/L1 entry) transition point, not NUL1E as all but pmap_alloc_l2
were doing.
To make matters confusing, the NUL1E and NUL2E definitions in the RISC-V
pmap are based on a 4-level page hierarchy but we currently use the
3-level Sv39 format (as that's the only required one, and hardware
support for the 4-level Sv48 is not widespread). This means that, in
effect, the above bug cancels out with the bloated NULxE definitions
such that things "work" (but are still technically wrong, and thus would
break when adding Sv48 support), with one exception. pmap_enter_l2 is
currently the only function to use the correct constant, but since
_pmap_alloc_l3 uses the incorrect constant, it will do complete nonsense
when it needs to allocate a new L2 table (which is rather rare). In this
instance, _pmap_alloc_l3, whilst it would correctly determine the pindex
was for an L2 table, would only subtract NUL1E when computing l1index
and thus go way out of bounds (by 511*512*512 bytes, or 127.75 GiB) of
its own L1 table and, thanks to pmap_distribute_l1, of every other
pmap's L1 table in the whole system. This has likely never been hit as
it would presumably instantly fault and panic.
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31087
These use the raw console interface and poll. Unfortunately, the SiFive
UART puts the FIFO empty bit inside the FIFO data register, which means
that the act of checking whether a character is available also dequeues
any character from the FIFO, requiring the user to press each key twice.
However, since we configure the watermark to be 0 and, when the UART has
been grabbed for the console, we have interrupts off, we can abuse the
interrupt pending register to act as a substitute for the FIFO empty
bit.
This perhaps suggests that the console interface should move from having
rxready and getc to having getc_nonblock and getc (or make getc take a
bool), as all the places that call rxready do so to avoid blocking on
getc when there is no character available.
Reviewed by: kp, philip
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31025
This is required for the SiFive FU740's PCIe controller. Copied from
arm64 with the only difference being changing pmap_mapdev_attr to
pmap_mapdev as riscv only has the latter.
Reviewed by: mhorne
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31032
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned. This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.
This reapplies 3a522ba1bc with a fix for
the static assertion failure on i386.
Approved by: markj (mentor)
Reviewed by: kib, bcr (manpages)
Differential Revision: https://reviews.freebsd.org/D29185
amd64 and 32-bit ARM already had assertions to this effect. Add them to
other pmaps.
Reviewed by: alc, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31171
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned. This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.
Approved by: markj (mentor)
Reviewed by: kib, bcr (manpages)
Differential Revision: https://reviews.freebsd.org/D29185
Use sysentvec hooks to only call umtx_thread_exit/umtx_exec, which handle
robust mutexes, for native FreeBSD ABI. Similarly, there is no sense
in calling sigfastblock_clear() for non-native ABIs.
Requested by: dchagin
Reviewed by: dchagin, markj (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D30987
This adds `sv_elf_core_osabi`, `sv_elf_core_abi_vendor`,
and `sv_elf_core_prepare_notes` fields to `struct sysentvec`,
and modifies imgact_elf.c to make use of them instead
of hardcoding FreeBSD-specific values. It also updates all
of the ABI definitions to preserve current behaviour.
This makes it possible to implement non-native ELF coredump
support without unnecessary code duplication. It will be used
for Linux coredumps.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30921
Many of these typedefs are the same across all architectures or can
be set based on an architecture-independent compiler-provided macro
(e.g. __SIZEOF_SIZE_T__). These macros have been available since GCC 4.6
and Clang sometime before 3.0 (godbolt.org does not have any older clang
versions installed).
I originally considered using the compiler-provided `__FOO_TYPE__` directly.
However, in order to do so we have to check that those match the previous
typedef exactly (not just that they have the same size) since any change
would be an ABI break. For example, changing `long` to `long long` results
in different C++ name mangling. Additionally, Clang and GCC disagree on
the underlying type for some of (u)int*_fast_t types, so this change
only moves the definitions that are identical across all architectures
and does not touch those types.
This de-deduplication will allow us to have a smaller diff downstream in
CheriBSD: we only have to only change the (u)intptr_t definition in
sys/_types.h in CheriBSD instead of having to change machine/_types.h for
all CHERI-enabled architectures (currently RISC-V, AArch64 and MIPS).
Reviewed By: imp, kib
Differential Revision: https://reviews.freebsd.org/D29895
This is consistent with other platforms, specifically arm and arm64. No
functional change intended.
Reviewed by: jrtc27
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30645
pmap_promote_l2() failed to handle implementations which set the
accessed and dirty flags. In particular, when comparing the attributes
of a run of 512 PTEs, we must handle the possibility that the hardware
will set PTE_D on a clean, writable mapping.
Following the example of amd64 and arm64, change riscv's
pmap_promote_l2() to downgrade clean, writable mappings to read-only, so
that updates are synchronized by the pmap lock.
Fixes: f6893f09d
Reported by: Nathaniel Filardo <nwf20@cl.cam.ac.uk>
Tested by: Nathaniel Filardo <nwf20@cl.cam.ac.uk>
Reviewed by: jrtc27, alc, Nathaniel Filardo
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30644
While here, fix all links to older en_US.ISO8859-1 documentation
in the src/ tree.
PR: 255026
Reported by: Michael Büker <freebsd@michael-bueker.de>
Reviewed by: dbaio
Approved by: blackend (mentor), re (gjb)
MFC after: 10 days
Differential Revision: https://reviews.freebsd.org/D30265
During early qemu development, the /soc node was marked as compatible
with "riscv-virtio-soc" instead of "simple-bus".
This was changed in qemu 53f54508dae6 in Sep 2018, and predates the
baseline required qemu version (5.0) for riscv by a wide margin.
The generic simplebus code handles attachment in all cases nowadays.
Sponsored by: Tag1 Consulting, Inc.
Reviewed by: jrtc27, mhorne
Differential Revision: https://reviews.freebsd.org/D30011
Previously, a page fault taken during copyin/out and related functions
would run the entire fault handler while permitting direct access to
user addresses. This could also leak across context switches (e.g. if
the page fault handler was preempted by an interrupt or slept for disk
I/O).
To fix, clear SUM in assembly after saving the original version of
SSTATUS in the supervisor mode trapframe.
Reviewed by: mhorne, jrtc27
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D29763
Use the new kdb variants. Print more specific error messages.
Reviewed by: jhb, markj
MFC after: 3 weeks
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D29156
This basically mirrors what already exists in ddb, but provides a
slightly improved interface. It allows the caller to specify the
watchpoint access type, and returns more specific error codes to
differentiate failure cases.
This will be used to support hardware watchpoints in gdb(4).
Stubs are provided for architectures lacking hardware watchpoint logic
(mips, powerpc, riscv), while other architectures are added individually
in follow-up commits.
Reviewed by: jhb, kib, markj
MFC after: 3 weeks
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D29155
This change serves two purposes.
First, we take advantage of the compiler provided endian definitions to
eliminate some long-standing duplication between the different versions
of this header. __BYTE_ORDER__ has been defined since GCC 4.6, so there
is no need to rely on platform defaults or e.g. __MIPSEB__ to determine
endianness. A new common sub-header is added, but there should be no
changes to the visibility of these definitions.
Second, this eliminates the hand-rolled __bswapNN() routines, again in
favor of the compiler builtins. This was done already for x86 in
e6ff6154d2. The benefit here is that we no longer have to maintain our
own implementations on each arch, and can instead rely on the compiler
to emit appropriate instructions or libcalls, as available. This should
result in equivalent or better code generation. Notably 32-bit arm will
start using the `rev` instruction for these routines, which is available
on armv6+.
PR: 236920
Reviewed by: arichardson, imp
Tested by: bdragon (BE powerpc)
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D29012
e4b8deb222 removed the last in-tree uses of PCPU_INC(). Its
potential benefit is also practically nonexistent. Non-x86
platforms already implement it as PCPU_ADD(..., 1), and according
to [0] there are no recent x86 processors for which the 'inc'
instruction provides a performance benefit over the equivalent
memory-operand form of the 'add' instruction. The only remaining
benefit of 'inc' is smaller instruction size, which in this case
is inconsequential given the limited number of per-CPU data consumers.
[0]: https://www.agner.org/optimize/instruction_tables.pdf
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D29308
This appears to be a copy-and-paste error that has simply been
overlooked. The tree contains only two calls to any of the affected
variants, but recent additions to the test suite started exercising the
call to atomic_clear_rel_int() in ng_leave_write(), reliably causing
panics.
Apparently, the issue was inherited from the arm64 atomic header. That
instance was addressed in c90baf6817, but the fix did not make its way
to RISC-V.
Note that the particular test case ng_macfilter_test:main still appears
to fail on this platform, but this change reduces the panic to a
timeout.
PR: 253237
Reported by: Jenkins, arichardson
Reviewed by: kp, arichardson
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29064
This macro returns true if a provided virtual address is contained
in the kernel's clean submap.
In CHERI kernels, the buffer cache and transient I/O map are allocated
as separate regions. Abstracting this check reduces the diff relative
to FreeBSD. It is perhaps slightly more readable as well.
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D28710
The System Reset extension provides functions to shutdown or reboot the
system via SBI firmware. This newly defined extension supersedes the
functionality of the legacy shutdown extension.
Update the SBI code to use the new System Reset extension when
available, and fall back to the legacy one.
Reviewed By: kp, jhb
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D28226
This setting limits the amount of memory that can be allocated to UMA.
On systems with a direct map and ample KVA, however, there is no reason
for VM_KMEM_SIZE_SCALE to be larger than 1. This appears to have been
inherited from the 32-bit ARM platform definitions.
Also remove VM_KMEM_SIZE_MIN, which is not needed when
VM_KMEM_SIZE_SCALE is defined to be 1.[*]
Reviewed by: alc, kp, kib
Reported by: alc [*]
Submitted by: Klara, Inc.
Sponsored by: Ampere Computing
Differential Revision: https://reviews.freebsd.org/D28225
This is the superset of the nooptions found in the -DEBUG kernels.
Reviewed by: emaste, manu
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D28152
The purpose of this KASSERT is to ensure that we do not run out of space
in the early devmap. However, the devmap grew beyond its initial size of
2MB in r336519, and this assertion did not grow with it.
A devmap mapping of a 1080p framebuffer requires 1920x1080 bytes, or
1.977 MB, so it is just barely able to fit without triggering the
assertion, provided no other devices are mapped before it. With the
addition of `options GDB` in GENERIC by bbfa199cbc, the uart is now
mapped for the purposes of a debug port, before mapping the framebuffer.
The presence of both these conditions pushes the selected virtual
address just below the threshold, triggering the assertion.
To fix this, use the correct size of the devmap, defined by
PMAP_MAPDEV_EARLY_SIZE. Since this code is shared with RISC-V, define
it for that platform as well (although it is a different size).
PR: 25241
Reported by: gbe
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Ensure that we don't end up with a superpage in the vm_page_t's pv list.
This may help with debugging the panic reported in PR 250866, in which
l3 in pmap_remove_write() was found to be NULL. Adding a KASSERT to this
function will help narrow down the cause of this panic the next time it
occurs.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D28109
Add 64-bit address support to Cadence CGEM Ethernet driver for use in
other SoCs such as the Zynq UltraScale+ and SiFive HighFive Unleashed.
Reviewed by: philip, 0mp (manpages)
Differential Revision: https://reviews.freebsd.org/D24304
Add an additional set of braces to clarify intention. The '&' operator
has a higher precedence than '|', but the reader may not always remember
this. No functional change.
This sysctl node can generate very verbose output, so don't trigger it
for sysctl -a or sysctl vm.pmap.
Reviewed by: markj, kib
Differential Revision: https://reviews.freebsd.org/D27504
These implementation IDs are defined in the SBI spec, so we should print
their name if detected.
Submitted by: Danjel Qyteza <danq1222@gmail.com>
Reviewed by: jhb, kp
Differential Revision: https://reviews.freebsd.org/D27660
Ability to load-balance traffic over multiple path is a must-have thing for routers.
It may be used by the servers to balance outgoing traffic over multiple default gateways.
The previous implementation, RADIX_MPATH stayed in the shadow for too long.
It was not well maintained, which lead us to a vicious circle - people were using
non-contiguous mask or firewalls to achieve similar goals. As a result, some routing
daemons implementation still don't have multipath support enabled for FreeBSD.
Turning on ROUTE_MPATH by default would fix it. It will allow to reduce networking
feature gap to other operating systems. Linux and OpenBSD enabled similar support
at least 5 years ago.
ROUTE_MPATH does not consume memory unless actually used. It enables around ~1k LOC.
It does not bring any behaviour changes for userland.
Additionally, feature is (temporarily) turned off by the net.route.multipath sysctl
defaulting to 0.
Differential Revision: https://reviews.freebsd.org/D27428
Prefer atomics to critical section. This reduces the cost of the
increment operation and removes the possibility of it being interrupted
by counter_u64_zero().
Use CPU_FOREACH() macro to skip absent CPUs.
Replace hand-rolled address calculation with zpcpu_get().
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27536
Allows recovery or diagnosis of a fatal page fault before panicking the
system.
Reviewed by: jhb, kp
Differential Revision: https://reviews.freebsd.org/D27534
- Push the kstack_contains check down into unwind_frame() so that it
is honored by DDB and DTrace.
- Check that the trapframe for an exception frame is contained in the
traced thread's kernel stack for DDB traces.
Reviewed by: markj
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27357
This is useful for stack unwinders which need to avoid out-of-bounds
reads of a kernel stack which can trigger kernel faults.
Reviewed by: kib, markj
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27356
This allows GDB to print more useful backtraces when setting a breakpoint
on an assembly function.
Reviewed By: jhb
Differential Revision: https://reviews.freebsd.org/D27177
Ensure we initialize the static environment when not booting via
loader(8), and provide a static buffer if this is the case. This fixes
two issues.
First, performing the initialization ensures that kenv variables set in
the kernel's config file are honored. Previously, any new or overridden
values were ignored.
Second, providing the static buffer allows variables to be set in the
device tree's bootargs property of the chosen node. This can be set by
u-boot or by QEMU's '-append' flag. Attempting to this prior to this
change resulted in an early panic, since the static environment had no
buffer backing it.
Submitted by: syrinx (earlier version)
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D25034
In pmap_bootstrap(), we fill kernel_pmap->pm_active since it is
invariably active on all harts. However, this marks it as active even
for harts that don't exist in the system, which can cause issue when the
mask is passed to the SBI firmware via sbi_remote_sfence_vma().
Specifically, the SBI spec allows SBI_ERR_INVALID_PARAM to be returned
when an invalid hart is set in the mask.
The latest version of OpenSBI does not have this issue, but v0.6 does,
and this is triggering a recently added KASSERT in CI. Switch to only
setting bits in pm_active for harts that enter the system.
Reported by: Jenkins
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27080
VM_ALLOC_WAITOK and vm_page_unwire_noq(), have eliminated the need for
many of the #includes.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D27052
- remove setting of register value which is not used until the next value is
set
- Use the L2_SHIFT constant when setting up L2 superpages
Submitted by: Antonin Houska <ah AT melesmeles DOT cz>
Version 0.2 of the SBI specification [1] marked the existing SBI
functions as "legacy" in order to move to a newer calling convention. It
also introduced a set of replacement extensions for some of the legacy
functionality. In particular, the TIME, IPI, and RFENCE extensions
implement and extend the semantics of their legacy counterparts, while
conforming to the newer version of the spec.
Update our SBI code to use the new replacement extensions when
available, and fall back to the legacy ones. These will eventually be
dropped, when support for version 0.2 is ubiquitous.
[1] https://github.com/riscv/riscv-sbi-doc/blob/master/riscv-sbi.adoc
Submitted by: Danjel Q. <danq1222@gmail.com>
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D26953
S-mode software has write access to the SIP.SSIP bit, so instead of
making a second round-trip through the SBI we can clear it ourselves.
The SBI spec has deprecated this function for this exactly this reason.
Submitted by: Danjel Q. <danq1222@gmail.com
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D26952
The existing names were inherited from arm64, but we should prefer
RISC-V terminology. Change the prefix to SCAUSE, and further change the
names to better match the RISC-V spec and be more consistent with one
another. Also, remove two codes that are not defined for S-mode (machine
and hypervisor ecall).
While here, apply style(9) to some condition checks.
Reviewed by: kp
Discussed with: jrtc27
Differential Revision: https://reviews.freebsd.org/D26918