All consumers already do it and it was required on amd64 and i386
until recently (1c1bf5bd7c1e479a7889839b941f53e689aa2569).
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D34932
These files no longer depend on the macros required when these checks
were added.
PR: 263102 (exp-run)
Reviewed by: brooks, imp, emaste
Differential Revision: https://reviews.freebsd.org/D34804
Since physical memory management is now handled by subr_physmem.c, the
need to keep this global array has diminished. It is not referenced
outside of early boot-time, and is populated by physmem_avail() in
pmap_bootstrap(). Just allocate the array on the stack for the duration
of its lifetime.
The check against physmap[0] in initriscv() can be dropped altogether,
as there is no consequence for excluding a memory range twice.
Reviewed by: markj
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34778
This increases the size of the user map from 256GB to 128TB. The kernel
map is left unchanged for now.
For now SV48 mode is left disabled by default, but can be enabled with a
tunable. Note that extant hardware does not implement SV48, but QEMU
does.
- In pmap_bootstrap(), allocate a L0 page and attempt to enable SV48
mode. If the write to SATP doesn't take, the kernel continues to run
in SV39 mode.
- Define VM_MAX_USER_ADDRESS to refer to the SV48 limit. In SV39 mode,
the region [VM_MAX_USER_ADDRESS_SV39, VM_MAX_USER_ADDRESS_SV48] is not
mappable.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34280
Instead of having the one-off load_satp(), just use csr_write(). No
functional change intended.
Reviewed by: alc, jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34271
In SV48 mode, the top-level page will be an L0 page rather than an L1
page. Rename the field accordingly. No functional change intended.
Reviewed by: alc, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34270
- Move busdma_lock_mutex to subr_bus_dma.c.
- Move _busdma_lock_dflt to subr_bus_dma.c. This function was named a
couple of different things previously. It is not a public API but
an internal helper used in place of a NULL pointer. The prototype
is in <sys/bus_dma.h> as not all backends include
<sys/bus_dma_internal.h>.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33694
When a DMA request using bounce pages completes, a swi is triggered to
schedule pending DMA requests using the just-freed bounce pages. For
a long time this bus_dma swi has been tied to a "virtual memory" swi
(swi_vm). However, all of the swi_vm implementations are the same and
consist of checking a flag (busdma_swi_pending) which is always true
and if set calling busdma_swi. I suspect this dates back to the
pre-SMPng days and that the intention was for swi_vm to serve as a
mux. However, in the current scheme there's no need for the mux.
Instead, remove swi_vm and vm_ih. Each bus_dma implementation that
uses bounce pages is responsible for creating its own swi (busdma_ih)
which it now schedules directly. This swi invokes busdma_swi directly
removing the need for busdma_swi_pending.
One consequence is that the swi now works on RISC-V which had previously
failed to invoke busdma_swi from swi_vm.
Reviewed by: imp, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33447
The header exports the following:
- Definition of struct tcb.
- Helpers to get/set the tcb for the current thread.
- TLS_TCB_SIZE (size of TCB)
- TLS_TCB_ALIGN (alignment of TCB)
- TLS_VARIANT_I or TLS_VARIANT_II
- TLS_DTV_OFFSET (bias of pointers in dtv[])
- TLS_TP_OFFSET (bias of "thread pointer" relative to TCB)
Note that TLS_TP_OFFSET does not account for if the unbiased thread
pointer points to the start of the TCB (arm and x86) or the end of the
TCB (MIPS, PowerPC, and RISC-V).
Note also that for amd64, the struct tcb does not include the unused
tcb_spare field included in the current structure in libthr. libthr
does not use this field, and the existing calls in libc and rtld that
allocate a TCB for amd64 assume it is the size of 3 Elf_Addr's (and
thus do not allocate room for tcb_spare).
A <sys/_tls_variant_i.h> header is used by architectures using
Variant I TLS which uses a common struct tcb.
Reviewed by: kib (older version of x86/tls.h), jrtc27
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33351
After a round of cleanups in late 2020, all definitions are
functionally identical.
This removes a rotted __aligned(8) on arm. It was added in
b7112ead32bc50ef9744099bdbb1cfbd6e906b2a and was intended to align the
args member so that 64-bit types (off_t, etc) could be safely read on
armeb compiled with clang. With the removal of armev, this is no
longer needed (armv7 requires that 32-bit aligned reads of 64-bit
values be supported and we enable such support on armv6). As further
evidence this is unnecessary, cleanups to struct syscall_args have
resulted in args being 32-bit aligned on 32-bit systems. The sole
effect is to bloat the struct by 4 bytes.
Reviewed by: kib, jhb, imp
Differential Revision: https://reviews.freebsd.org/D33308
This definition enables callers to estimate remaining space on the
kstack, and take action on it. Notably, it enables optimizations in the
GEOM and netgraph subsystems to directly dispatch work items when there
is sufficient stack space, rather than queuing them for a worker thread.
Implement it for riscv, arm, and mips. Remove the #ifdefs, so it will
not go unimplemented elsewhere.
PR: 259157
Reviewed by: mav, kib, markj (previous version)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32580
The minidump code is written assuming that certain global state will not
change, and rightly so, since it executes from a kernel debugger
context. In order to support taking minidumps of a live system, we
should allow copies of relevant global state that is likely to change to
be passed as parameters to the minidumpsys() function.
This patch does the work of parameterizing this function, by adding a
struct minidumpstate argument. For now, this struct allows for copies of
the kernel message buffer, and the bitset that tracks which pages should
be dumped (vm_page_dump). Follow-up changes will actually make use of
these arguments.
Notably, dump_avail[] does not need a snapshot, since it is not expected
to change after system initialization.
The existing minidumpsys() definitions are renamed, and a thin MI
wrapper is added to kern_dump.c, which handles the construction of
the state struct. Thus, calling minidumpsys() remains as simple as
before.
Reviewed by: kib, markj, jhb
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31989
This is needed for LinuxKPI's _ioremap_attr. This reuses the generic
implementation introduced for aarch64, and itself requires implementing
pmap_kenter, which is trivial to do given riscv currently treats all
mapping attributes the same due to the Svpbmt extension not yet being
ratified and in hardware.
Reviewed by: markj, mhorne
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D32445
When handling a kernel page fault, check explicitly that stval resides
in either the user or kernel address spaces, and make the page fault
fatal if not. Otherwise, a properly crafted address may appear to
pmap_fault() as a valid and present page in the kernel map, causing the
page fault to be retried continuously. This is mainly due to the fact
that the upper bits of virtual addresses are not validated by most of
the pmap code.
Faults of this nature should only occur due to some kind of bug in the
kernel, but it is best to handle them gracefully when they do.
Handle user page faults in the same way, sending a SIGSEGV immediately
when a malformed address is encountered.
Add an assertion to pmap_l1(), which should help catch other bugs of
this kind that make it this far.
Reviewed by: jrtc27, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31208
pmap_change_attr is required by drm-kmod so we need the function to
exist. Since the Svpbmt extension is on the horizon we will likely end
up with a real implementation of it, so this stub implementation does
all the necessary page table walking to validate the input, ensuring
that no new errors are returned once it's implemented fully (other than
due to out of memory conditions when demoting L2 entries) and providing
a skeleton for that future implementation.
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31996
The implementation of the progress bar is simple, but duplicated for
most minidump implementations. Extract the common bits to kern_dump.c.
Ensure that the bar is reset with each subsequent dump; this was only
done on some platforms previously.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31885
Move the common kernel function signatures from machine/reg.h to a new
sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2).
Reviewed by: imp, markj
Sponsored by: DARPA, AFRL (original work)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19830
These ones were unambiguous cases where the Foundation was the only
listed copyright holder (in the associated license block).
Sponsored by: The FreeBSD Foundation
which is the place to put MD asserts about allocated pages.
On amd64, verify that allocated page does not belong to the kernel
(text, data) or early allocated pages.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31121
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned. This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.
This reapplies 3a522ba1bc852c3d4660a4fa32e4a94999d09a47 with a fix for
the static assertion failure on i386.
Approved by: markj (mentor)
Reviewed by: kib, bcr (manpages)
Differential Revision: https://reviews.freebsd.org/D29185
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned. This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.
Approved by: markj (mentor)
Reviewed by: kib, bcr (manpages)
Differential Revision: https://reviews.freebsd.org/D29185
Many of these typedefs are the same across all architectures or can
be set based on an architecture-independent compiler-provided macro
(e.g. __SIZEOF_SIZE_T__). These macros have been available since GCC 4.6
and Clang sometime before 3.0 (godbolt.org does not have any older clang
versions installed).
I originally considered using the compiler-provided `__FOO_TYPE__` directly.
However, in order to do so we have to check that those match the previous
typedef exactly (not just that they have the same size) since any change
would be an ABI break. For example, changing `long` to `long long` results
in different C++ name mangling. Additionally, Clang and GCC disagree on
the underlying type for some of (u)int*_fast_t types, so this change
only moves the definitions that are identical across all architectures
and does not touch those types.
This de-deduplication will allow us to have a smaller diff downstream in
CheriBSD: we only have to only change the (u)intptr_t definition in
sys/_types.h in CheriBSD instead of having to change machine/_types.h for
all CHERI-enabled architectures (currently RISC-V, AArch64 and MIPS).
Reviewed By: imp, kib
Differential Revision: https://reviews.freebsd.org/D29895
This is consistent with other platforms, specifically arm and arm64. No
functional change intended.
Reviewed by: jrtc27
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30645
This basically mirrors what already exists in ddb, but provides a
slightly improved interface. It allows the caller to specify the
watchpoint access type, and returns more specific error codes to
differentiate failure cases.
This will be used to support hardware watchpoints in gdb(4).
Stubs are provided for architectures lacking hardware watchpoint logic
(mips, powerpc, riscv), while other architectures are added individually
in follow-up commits.
Reviewed by: jhb, kib, markj
MFC after: 3 weeks
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D29155
This change serves two purposes.
First, we take advantage of the compiler provided endian definitions to
eliminate some long-standing duplication between the different versions
of this header. __BYTE_ORDER__ has been defined since GCC 4.6, so there
is no need to rely on platform defaults or e.g. __MIPSEB__ to determine
endianness. A new common sub-header is added, but there should be no
changes to the visibility of these definitions.
Second, this eliminates the hand-rolled __bswapNN() routines, again in
favor of the compiler builtins. This was done already for x86 in
e6ff6154d203. The benefit here is that we no longer have to maintain our
own implementations on each arch, and can instead rely on the compiler
to emit appropriate instructions or libcalls, as available. This should
result in equivalent or better code generation. Notably 32-bit arm will
start using the `rev` instruction for these routines, which is available
on armv6+.
PR: 236920
Reviewed by: arichardson, imp
Tested by: bdragon (BE powerpc)
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D29012
e4b8deb22227 removed the last in-tree uses of PCPU_INC(). Its
potential benefit is also practically nonexistent. Non-x86
platforms already implement it as PCPU_ADD(..., 1), and according
to [0] there are no recent x86 processors for which the 'inc'
instruction provides a performance benefit over the equivalent
memory-operand form of the 'add' instruction. The only remaining
benefit of 'inc' is smaller instruction size, which in this case
is inconsequential given the limited number of per-CPU data consumers.
[0]: https://www.agner.org/optimize/instruction_tables.pdf
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D29308
This appears to be a copy-and-paste error that has simply been
overlooked. The tree contains only two calls to any of the affected
variants, but recent additions to the test suite started exercising the
call to atomic_clear_rel_int() in ng_leave_write(), reliably causing
panics.
Apparently, the issue was inherited from the arm64 atomic header. That
instance was addressed in c90baf6817a0, but the fix did not make its way
to RISC-V.
Note that the particular test case ng_macfilter_test:main still appears
to fail on this platform, but this change reduces the panic to a
timeout.
PR: 253237
Reported by: Jenkins, arichardson
Reviewed by: kp, arichardson
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29064
The System Reset extension provides functions to shutdown or reboot the
system via SBI firmware. This newly defined extension supersedes the
functionality of the legacy shutdown extension.
Update the SBI code to use the new System Reset extension when
available, and fall back to the legacy one.
Reviewed By: kp, jhb
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D28226
This setting limits the amount of memory that can be allocated to UMA.
On systems with a direct map and ample KVA, however, there is no reason
for VM_KMEM_SIZE_SCALE to be larger than 1. This appears to have been
inherited from the 32-bit ARM platform definitions.
Also remove VM_KMEM_SIZE_MIN, which is not needed when
VM_KMEM_SIZE_SCALE is defined to be 1.[*]
Reviewed by: alc, kp, kib
Reported by: alc [*]
Submitted by: Klara, Inc.
Sponsored by: Ampere Computing
Differential Revision: https://reviews.freebsd.org/D28225
The purpose of this KASSERT is to ensure that we do not run out of space
in the early devmap. However, the devmap grew beyond its initial size of
2MB in r336519, and this assertion did not grow with it.
A devmap mapping of a 1080p framebuffer requires 1920x1080 bytes, or
1.977 MB, so it is just barely able to fit without triggering the
assertion, provided no other devices are mapped before it. With the
addition of `options GDB` in GENERIC by bbfa199cbc16, the uart is now
mapped for the purposes of a debug port, before mapping the framebuffer.
The presence of both these conditions pushes the selected virtual
address just below the threshold, triggering the assertion.
To fix this, use the correct size of the devmap, defined by
PMAP_MAPDEV_EARLY_SIZE. Since this code is shared with RISC-V, define
it for that platform as well (although it is a different size).
PR: 25241
Reported by: gbe
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
These implementation IDs are defined in the SBI spec, so we should print
their name if detected.
Submitted by: Danjel Qyteza <danq1222@gmail.com>
Reviewed by: jhb, kp
Differential Revision: https://reviews.freebsd.org/D27660
Prefer atomics to critical section. This reduces the cost of the
increment operation and removes the possibility of it being interrupted
by counter_u64_zero().
Use CPU_FOREACH() macro to skip absent CPUs.
Replace hand-rolled address calculation with zpcpu_get().
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27536
- Push the kstack_contains check down into unwind_frame() so that it
is honored by DDB and DTrace.
- Check that the trapframe for an exception frame is contained in the
traced thread's kernel stack for DDB traces.
Reviewed by: markj
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27357
This allows GDB to print more useful backtraces when setting a breakpoint
on an assembly function.
Reviewed By: jhb
Differential Revision: https://reviews.freebsd.org/D27177
Version 0.2 of the SBI specification [1] marked the existing SBI
functions as "legacy" in order to move to a newer calling convention. It
also introduced a set of replacement extensions for some of the legacy
functionality. In particular, the TIME, IPI, and RFENCE extensions
implement and extend the semantics of their legacy counterparts, while
conforming to the newer version of the spec.
Update our SBI code to use the new replacement extensions when
available, and fall back to the legacy ones. These will eventually be
dropped, when support for version 0.2 is ubiquitous.
[1] https://github.com/riscv/riscv-sbi-doc/blob/master/riscv-sbi.adoc
Submitted by: Danjel Q. <danq1222@gmail.com>
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D26953
S-mode software has write access to the SIP.SSIP bit, so instead of
making a second round-trip through the SBI we can clear it ourselves.
The SBI spec has deprecated this function for this exactly this reason.
Submitted by: Danjel Q. <danq1222@gmail.com
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D26952
The existing names were inherited from arm64, but we should prefer
RISC-V terminology. Change the prefix to SCAUSE, and further change the
names to better match the RISC-V spec and be more consistent with one
another. Also, remove two codes that are not defined for S-mode (machine
and hypervisor ecall).
While here, apply style(9) to some condition checks.
Reviewed by: kp
Discussed with: jrtc27
Differential Revision: https://reviews.freebsd.org/D26918
Every other architecture defines this and this is required for
interrupts to work when using QEMU's PCI VirtIO devices (which all
report an interrupt line of 0) for two reasons.
Firstly, interrupt line 0 is wrong; they use one of 0x20-0x23 with the
lines being cycled across devices like normal. Moreover, RISC-V uses
INTRNG, whose IRQs are virtual as indices into its irq_map, so even if
we have the right interrupt line we still need to try and route the
interrupt in order to ultimately call into intr_map_irq and get back a
unique index into the map for the given line, otherwise we will use
whatever happens to be in irq_map[line] (which for QEMU where the line
is initialised to 0 results in using the first allocated interrupt,
namely the RTC on IRQ 11 at time of commit).
Note that pci_assign_interrupt will still do the wrong thing for INTRNG
when using a tunable, as it will bypass INTRNG entirely and use the
tunable's value as the index into irq_map, when it should instead
(indirectly) call intr_map_irq to allocate a new entry for the given
IRQ and treat the tunable as stating the physical line in use, which is
what one would expect. This, however, is a problem shared by all INTRNG
architectures, and not exclusive to RISC-V.
Reviewed by: kib
Approved by: kib
Differential Revision: https://reviews.freebsd.org/D26564
On Ampere Altra systems, the sparse population of RAM within the
physical address space causes the vm_page_dump bitmap to be much
larger than necessary, increasing the size from ~8 Mib to > 2 Gib
(and overflowing `int` for the size).
Changing the page dump bitmap also changes the minidump file
format, so changes are also necessary in libkvm.
Reviewed by: jhb
Approved by: scottl (implicit)
MFC after: 1 week
Sponsored by: Ampere Computing, Inc.
Differential Revision: https://reviews.freebsd.org/D26131
These definitions were repeated by all architectures, with small
variations. Consolidate the common definitons in machine
independent code and use bitset(9) macros for manipulation. Many
opportunities for deduplication remain in the machine dependent
minidump logic. The only intended functional change is increasing
the bit index type to vm_pindex_t, allowing the indexing of pages
with address of 8 TiB and greater.
Reviewed by: kib, markj
Approved by: scottl (implicit)
MFC after: 1 week
Sponsored by: Ampere Computing, Inc.
Differential Revision: https://reviews.freebsd.org/D26129