off the stack, initialized to default values, and then filled in with
driver-specific values, all without having to worry about the numerous
other fields in the tag. The resulting template is then passed into
busdma and the normal opaque tag object created. See the man page for
details on how to initialize a template.
Templates do not support tag filters. Filters have been broken for many
years, and only existed for an ancient make/model of hardware that had a
quirky DMA engine. Instead of breaking the ABI/API and changing the
arugment signature of bus_dma_tag_create() to remove the filter arguments,
templates allow us to ignore them, and also significantly reduce the
complexity of creating and managing tags.
Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D22906
The SiFive FU540 Power Reset Clocking Interrupt block contains a PLL
that turns the input crystal (33.3MHz) into a 1-1.5GHz clock.
This clock in turn is divided by two to produce the tlclk, which is fed
into devices such as the SPI and I2C controllers.
Register a new clock device for the PRCI so that those devices can
read the correct clock through the clk framework.
Submitted by: kp
Sponsored by: Axiado
fix an assert violation introduced in r355784. Without this spinlock_exit()
may see owepreempt and switch before reducing the spinlock count. amd64
had been optimized to do a single critical enter/exit regardless of the
number of spinlocks which avoided the problem and this optimization had
not been applied elsewhere.
Reported by: emaste
Suggested by: rlibby
Discussed with: jhb, rlibby
Tested by: manu (arm64)
This is a 32-bit structure embedded in each vm_page, consisting mostly
of page queue state. The use of a structure makes it easy to store a
snapshot of a page's queue state in a stack variable and use cmpset
loops to update that state without requiring the page lock.
This change merely adds the structure and updates references to atomic
state fields. No functional change intended.
Reviewed by: alc, jeff, kib
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D22650
o Remove All Rights Reserved from my notices
o imp@FreeBSD.org everywhere
o regularize punctiation, eliminate date ranges
o Make sure that it's clear that I don't claim All Rights reserved by listing
All Rights Reserved on same line as other copyright holders (but not
me). Other such holders are also listed last where it's clear.
- Use ustringp for the location of the argv and environment strings
and allow destp to travel further down the stack for the stackgap
and auxv regions.
- Update the Linux copyout_strings variants to move destp down the
stack as was done for the native ABIs in r263349.
- Stop allocating a space for a stack gap in the Linux ABIs. This
used to hold translated system call arguments, but hasn't been used
since r159992.
Reviewed by: kib
Tested on: md64 (amd64, i386, linux64), i386 (i386, linux)
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D22501
RISC-V inherited this code from arm64, so implement the fix from r354712.
See the revision for the full description.
Submitted by: kevans (arm64 version)
Change the FreeBSD ELF ABIs to use this new hook to copyout ELF auxv
instead of doing it in the sv_fixup hook. In particular, this new
hook allows the stack space to be allocated at the same time the auxv
values are copied out to userland. This allows us to avoid wasting
space for unused auxv entries as well as not having to recalculate
where the auxv vector is by walking back up over the argv and
environment vectors.
Reviewed by: brooks, emaste
Tested on: amd64 (amd64 and i386 binaries), i386, mips, mips64
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D22355
SBI version 0.2 introduces functions for obtaining the details of the
SBI implementation, such as version and implemntation ID. Print this
info at startup when it is available.
Reviewed by: jhb, kp
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D22327
The Supervisor Binary Interface (SBI) specification v0.2 is a backwards
incompatible update to the SBI call interface for kernels running in
supervisor mode. The goal of this update was to make it easier for new
and optional functionality to be added to the SBI.
SBI functions are now called by passing an "extension ID" and a
"function ID" which are passed in a7 and a6 respectively. SBI calls
will also return an error and value in the following struct:
struct sbi_ret {
long error;
long value;
}
This version introduces several new functions under the "base"
extension. It is expected that all SBI implementations >= 0.2 will
support this base set of functions, as they implement some essential
services such as obtaining the SBI version, CPU implementation info, and
extension probing.
Existing SBI functions have been designated as "legacy". For the time
being they will remain implemented, but it is expected that in the
future their functionality will be duplicated or replaced by new SBI
extensions. Each legacy function has been assigned its own extension ID,
and for now we simply probe and assert for their existence.
Compatibility with legacy SBI implementations (such as BBL) is
maintained by checking the output of sbi_get_spec_version(). This
function is guaranteed to succeed by the new spec, but will return an
error in legacy implementations. We use this as an indicator of whether
or not we can rely on the new SBI base extensions.
For further info on the Supervisor Binary Interface, see:
https://github.com/riscv/riscv-sbi-doc/blob/master/riscv-sbi.adoc
Reviewed by: kp, jhb
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D22326
Allow for an additional argument to sbi_call which will be passed in a6.
This is required for SBI spec 0.2 support, as a6 will indicate the SBI
function ID.
While here, introduce some macros to clean up the calls.
Reviewed by: kp, jhb
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D22325
Our PLIC implementation only enables interrupts on the boot cpu.
Implement plic_bind_intr() so that they can be redistributed near the
end of boot during intr_irq_shuffle().
This also slightly modifies how enable bits are handled in an attempt to
better fit the PIC interface. plic_enable_intr()/plic_disable_intr() are
converted to manage an interrupt source's threshold value, since this
value can be used as to globally enable/disable an irq. All handing of the
per-context enable bits is moved to the new methods plic_setup_intr()
and plic_bind_intr().
Reviewed by: br
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D21928
The RISC-V PLIC (platform level interrupt controller) registers are divided up
by "context", which is purposefully left ambiguous in the PLIC spec. Currently
we assume each CPU number corresponds 1-to-1 with a context number, but that is
not correct. Most existing PLIC implementations (such as SiFive's) have
multiple contexts per-cpu. For example, a single CPU might have a context for
machine mode interrupts and a context for supervisor mode interrupts. To
complicate things further, FreeBSD renumbers the CPUs during boot, but the PLIC
driver still assumes that CPU ID equals the RISC-V hart number, meaning
interrupt enables/claims might be performed for the wrong context registers.
To fix this, we must calculate each CPU's context number during
attachment. This is done by reading the interrupt properties from the
device tree, from which a mapping from context to RISC-V hart to CPU
number can be created.
Reviewed by: br
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D21927
This option is causing boot to fail for the Hifive Unleashed and older
versions of QEMU (3.1.1). Remove it from the GENERIC config for now.
Reported by: br
MFC after: 1 week
The fill_elf_hwcap() function expects to find only cpu nodes under the
/cpus entry of the device tree. Newer versions of QEMU insert a cpu-map
node which describes the CPU topology, breaking this function. To fix
this, simply skip any non-cpu entries.
Reviewed by: markj, kp, jhb
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22151
The lr.w instruction used to read the value from memory sign-extends
the value read from memory. GCC sign-extends the 32-bit comparison
value passed in whereas clang currently does not. As a result, if the
value being compared has the MSB set, the comparison fails for
matching 32-bit values when compiled with clang.
Use a cast to explicitly sign-extend the unsigned comparison value.
This works with both GCC and clang.
There is commentary in the RISC-V spec that suggests that GCC's
approach is more correct, but it is not clear if the commentary in the
RISC-V spec is binding.
Reviewed by: mhorne
Obtained from: Axiado
MFC after: 2 weeks
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D22084
- td_kstack_pages was not being initialized.
- td_kstack is supposed to be the base address of the stack region,
not the top.
The arm ports seem to have similar problems and will be fixed next.
Reported by: Jenkins via lwhsu
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
After r352110 the page lock no longer protects a page's identity, so
there is no purpose in locking the page in pmap_mincore(). Instead,
if vm.mincore_mapped is set to the non-default value of 0, re-lookup
the page after acquiring its object lock, which holds the page's
identity stable.
The change removes the last callers of vm_page_pa_tryrelock(), so
remove it.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21823
callers hold it.
This simplifies pmap code and removes a dependency on the object lock.
Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21596
busy acquires while held.
This allows code that would need to acquire and release a very large number
of page busy locks to use the old mechanism where busy is only checked and
not held. This comes at the cost of false positives but never false
negatives which the single consumer, vm_fault_soft_fast(), handles.
Reviewed by: kib
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21592
be called on bootverbose. Do so on RISV-V too.
Submitted by: Nicholas O'Brien <nickisobrien_gmail.com>
Reviewed by: imp, kp
Sponsored by: Axiado
Differential Revision: https://reviews.freebsd.org/D21998
the static device mappings.
While RISC-V support was added to subr_devmap.c in r298631, it was never
actually initialised in the machine dependent code.
Submitted by: Nicholas O'Brien <nickisobrien_gmail.com>
Reviewed by: br, kp
Sponsored by: Axiado
Differential Revision: https://reviews.freebsd.org/D21975
During a CoW fault, we must check for both 4KB and 2MB mappings before
clearing PGA_WRITEABLE on the old mapping's page. Previously we were
only checking for 4KB mappings. This was missed in r344106.
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
pmap_remove_l3() may remove the last mapping of a page, in which case
it must clear PGA_WRITEABLE.
Reported by: Jenkins, via lwhsu
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
We must also check for large mappings. pmap_page_is_mapped() is
mostly used in assertions, so the problem was not very noticeable.
Reviewed by: alc
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D21824
Centralize calculation of signal and ucode delivered on unhandled page
fault in new function vm_fault_trap(). MD trap_pfault() now almost
always uses the signal numbers and error codes calculated in
consistent MI way.
This introduces the protection fault compatibility sysctls to all
non-x86 architectures which did not have that bug, but apparently they
were already much more wrong in selecting delivered signals on
protection violations.
Change the delivered signal for accesses to mapped area after the
backing object was truncated. According to POSIX description for
mmap(2):
The system shall always zero-fill any partial page at the end of an
object. Further, the system shall never write out any modified
portions of the last page of an object which are beyond its
end. References within the address range starting at pa and
continuing for len bytes to whole pages following the end of an
object shall result in delivery of a SIGBUS signal.
An implementation may generate SIGBUS signals when a reference
would cause an error in the mapped object, such as out-of-space
condition.
Adjust according to the description, keeping the existing
compatibility code for SIGSEGV/SIGBUS on protection failures.
For situations where kernel cannot handle page fault due to resource
limit enforcement, SIGBUS with a new error code BUS_OBJERR is
delivered. Also, provide a new error code SEGV_PKUERR for SIGSEGV on
amd64 due to protection key access violation.
vm_fault_hold() is renamed to vm_fault(). Fixed some nits in
trap_pfault()s like mis-interpreting Mach errors as errnos. Removed
unneeded truncations of the fault addresses reported by hardware.
PR: 211924
Reviewed by: alc
Discussed with: jilles, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21566
In a few cases, the symbol lookup is missing before attempting to
perform the relocation. While the relocation types affected are
currently unused, this results in an uninitialized variable warning,
that is escalated to an error when building with clang.
Reviewed by: markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D21773
Fix some style(9) violations.
This also changes the name of the machine-dependent sysctl kern.debug_kld to
debug.kld_reloc, and changes its type from int to bool. This is acceptable
since we are not currently concerned with preserving the RISC-V ABI.
Reviewed by: markj, kp
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D21772
Convert all remaining references to that field to "ref_count" and update
comments accordingly. No functional change intended.
Reviewed by: alc, kib
Sponsored by: Intel, Netflix
Differential Revision: https://reviews.freebsd.org/D21768
The EARLY_AP_STARTUP option initializes non-boot processors
much sooner during startup. This adds support for this option
on RISC-V and enables it by default for GENERIC.
Reviewed by: jhb, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D21661
- Remove a dead variable from the amd64 pmap_extract_and_hold().
- Fix grammar in the vm_page_wire man page.
Reported by: alc
Reviewed by: alc, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21639
This is re-appearance of the nop code removed from other arches in r287625.
Reviewed by: alc (as part of the larger patch), markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
DIfferential revision: https://reviews.freebsd.org/D21645
fdt_is_compatible_strict() inspects the first compatible property.
We need to inspect the following properties for 'riscv'.
ofw_bus_node_is_compatible() does a recursive search.
This patch fixes "Can't find CPU" error message when bootverbose = true.
Submitted by: Nicholas O'Brien (nickisobrien_gmail.com)
Reviewed by: philip, kp
Sponsored by: Axiado
Differential Revision: https://reviews.freebsd.org/D21576
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
The branch from _start to mpentry has to cross a large section of data;
an offset larger than can be specified with a 12-bit branch immediate.
Fix this by converting the branch to an unconditional jump. The gcc
assembler does this conversion silently but it is not done automatically
by clang.
Reported by: Jeremy Bennett <jeremy.bennett@embecosm.com>
Reviewed by: markj
Approved by: markj (mentor)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D21437
Many extern struct pcpu <something>__pcpu declarations were
copied/pasted in sources. The issue is that the definition is MD, but
it cannot be provided by machine/pcpu.h due to actual struct pcpu
defined in sys/pcpu.h later than the inclusion of machine/pcpu.h.
This forced the copying when other code needed direct access to
__pcpu. There is no way around it, due to machine/pcpu.h supplying
part of struct pcpu fields.
To work around the problem, add a new machine/pcpu_aux.h header, which
should fill any needed MD definitions after struct pcpu definition is
completed. This allows to remove copies of __pcpu spread around the
source. Also on x86 it makes it possible to remove work arounds like
OFFSETOF_CURTHREAD or clang specific warnings supressions.
Reported and tested by: lwhsu, bcran
Reviewed by: imp, markj (previous version)
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D21418
doing so adds more flexibility with less redundant code.
Reviewed by: jhb, markj, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21250
r343275 introduced a performance optimisation to the copyin/copyout
routines by attempting to copy word-per-word rather than byte-per-byte
where possible.
This optimisation failed to account for cases where the buffer is longer
than XLEN_BYTES, but due to misalignment does not not allow for any
word-sized copies. E.g. a 9 byte buffer (with XLEN_BYTES == 8) which is
misaligned by 2 bytes. The code nevertheless did a single full-word
copy, which meant we copied too much data. This potentially clobbered
other data.
This is most easily demonstrated by a simple `sysctl -a`.
Fix it by not assuming that we'll always have at least one full-word
copy to do, but instead checking the remaining length first.
Reviewed by: markj@, mhorne@, br@ (previous version)
MFC after: 1 week
Sponsored by: Axiado
Differential Revision: https://reviews.freebsd.org/D21100
if a demotion succeeds, then all of the 4KB page mappings within the
superpage-sized region must be valid, so there is no point in testing the
validity of the 4KB page mapping that is going to be write protected.
Deindent the nearby code.
Reviewed by: kib, markj
Tested by: pho (amd64, i386)
X-MFC after: r350004 (this change depends on arm64 dirty bit emulation)
Differential Revision: https://reviews.freebsd.org/D21027
We can't use a u_int to compute the physical address in
pmap_early_vtophys(). Our int is 32-bit, but the physical address is
64-bit. This works fine if everything lives in below 0x100000000, but as
soon as it doesn't this breaks.
MFC after: 1 week
Sponsored by: Axiado
syscallret() doesn't use error anymore. Fix a few other places to permit
removing the return value from syscallenter() entirely.
- Remove a duplicated assertion from arm's syscall().
- Use td_errno for amd64_syscall_ret_flush_l1d.
Reviewed by: kib
MFC after: 1 month
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D2090
pmap_ts_referenced() does not necessarily clear the access bit from
all accessed mappings of a given page. Thus, if a scan of the mappings
needs to be restarted, we should be careful to avoid double-counting
accessed mappings whose access bits were not cleared in a previous
attempt.
Reported by: alc
Reviewed by: alc
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20926
Casueword(9) on ll/sc architectures must be prepared for userspace
constantly modifying the same cache line as containing the CAS word,
and not loop infinitely. Otherwise, rogue userspace livelocks the
kernel.
To fix the issue, change casueword(9) interface to return new value 1
indicating that either comparision or store failed, instead of relying
on the oldval == *oldvalp comparison. The primitive no longer retries
the operation if it failed spuriously. Modify callers of
casueword(9), all in kern_umtx.c, to handle retries, and react to
stops and requests to terminate between retries.
On x86, despite cmpxchg should not return spurious failures, we can
take advantage of the new interface and just return PSL.ZF.
Reviewed by: andrew (arm64, previous version), markj
Tested by: pho
Reported by: https://xenbits.xen.org/xsa/advisory-295.txt
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D20772
The hold_count and wire_count fields of struct vm_page are separate
reference counters with similar semantics. The remaining essential
differences are that holds are not counted as a reference with respect
to LRU, and holds have an implicit free-on-last unhold semantic whereas
vm_page_unwire() callers must explicitly determine whether to free the
page once the last reference to the page is released.
This change removes the KPIs which directly manipulate hold_count.
Functions such as vm_fault_quick_hold_pages() now return wired pages
instead. Since r328977 the overhead of maintaining LRU for wired pages
is lower, and in many cases vm_fault_quick_hold_pages() callers would
swap holds for wirings on the returned pages anyway, so with this change
we remove a number of page lock acquisitions.
No functional change is intended. __FreeBSD_version is bumped.
Reviewed by: alc, kib
Discussed with: jeff
Discussed with: jhb, np (cxgbe)
Tested by: pho (previous version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19247
vm_page_dirty() when, in fact, we are write protecting the page and the L3
entry has PTE_D set. However, pmap_protect() was always calling
vm_page_dirty() when an L2 entry has PTE_D set. Handle L2 entries the
same as L3 entries so that we won't perform unnecessary calls to
vm_page_dirty().
Simplify the loop calling vm_page_dirty() on L2 entries.
AT_HWCAP is a field in the elf auxiliary vector meant to describe
cpu-specific hardware features. For RISC-V we want to use this to
indicate the presence of any standard extensions supported by the CPU.
This allows userland applications to query the system for supported
extensions using elf_aux_info(3).
Support for an extension is indicated by the presence of its
corresponding bit in AT_HWCAP -- e.g. systems supporting the 'c'
extension (compressed instructions) will have the second bit set.
Extensions advertised through AT_HWCAP are only those that are supported
by all harts in the system.
Reviewed by: jhb, markj
Approved by: markj (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20493
The mcall_trap() dummy function is unused, and should be removed as we
are unlikely to support M-mode traps any time soon.
Reviewed by: markj
Approved by: markj (mentor)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D20494
Some of the config options that are disabled by default seem to be only
for historical reasons. Enable those that appear to no longer be
problematic. This includes WITH_CTF, STACK, GEOM_RAID, and re-enabling
blacklisted kernel modules.
Reviewed by: markj
Approved by: markj (mentor)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20495
Most architectures print their total (real) and available memory during
boot. Properly initialize the realmem global and print these messages.
Also print the physical memory chunks (behind a bootverbose flag).
Reviewed by: markj
Approved by: markj (mentor)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D20496
Add the enter and exit events, similar to what's found in
hammer_time() on amd64. We must use TSRAW as the pcpu isn't yet
initialized.
Reviewed by: markj
Approved by: markj (mentor)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D20497
The gp register is intended to used by the linker as another means of
performing relaxations, and should point to the small data section (.sdata).
Currently gp is being used as the pcpu pointer within the kernel, but the more
appropriate choice for this is the tp register, which is unused.
Swap existing usage of gp with tp within the kernel, and set up gp properly
at boot with the value of __global_pointer$ for all harts.
Additionally, remove some cases of accessing tp from the PCB, as it is not
part of the per-thread state. The user's tp and gp should be tracked only
through the trapframe.
Reviewed by: markj, jhb
Approved by: markj (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D19893
previously addressed in r348246.
This pmap problem also exists on arm64 and riscv. However, the original
solution developed for amd64 and i386 cannot be used on arm64 and riscv. In
particular, arm64 and riscv do not define a PG_PROMOTED flag in their level
2 PTEs. (A PG_PROMOTED flag makes no sense on arm64, where unlike x86 or
riscv we are required to break the old 4KB mappings before making the 2MB
mapping; and on riscv there are no unused bits in the PTE to define a
PG_PROMOTED flag.)
This commit implements an alternative solution that can be used on all four
architectures. Moreover, this solution has two other advantages. First, on
older AMD processors that required the Erratum 383 workaround, it is less
costly. Specifically, it avoids unnecessary calls to pmap_fill_ptp() on a
superpage demotion. Second, it enables the elimination of some calls to
pagezero() in pmap_kernel_remove_{l2,pde}().
In addition, remove a related stale comment from pmap_enter_{l2,pde}().
Reviewed by: kib, markj (an earlier version)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20538
These calls are not the same in general: the former will dequeue the
page if it is enqueued, while the latter will just leave it alone. But,
all existing uses of the former apply to unmanaged pages, which are
never enqueued in the first place. No functional change intended.
Reviewed by: kib
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20470
Similar to r348026, exhaustive search for uses of CTRn() and cross reference
ktr.h includes. Where it was obvious that an OS compat header of some kind
included ktr.h indirectly, .c files were left alone. Some of these files
clearly got ktr.h via header pollution in some scenarios, or tinderbox would
not be passing prior to this revision, but go ahead and explicitly include it
in files using it anyway.
Like r348026, these CUs did not show up in tinderbox as missing the include.
Reported by: peterj (arm64/mp_machdep.c)
X-MFC-With: r347984
Sponsored by: Dell EMC Isilon
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.
EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).
As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions. The remainder of the patch addresses
adding appropriate includes to fix those files.
LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).
No functional change (intended). Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.
from SiFive, Inc.
The first core on this SoC (hart 0) is a 64-bit microcontroller.
o Pick a hart to run boot process using hart lottery.
This allows to exclude hart 0 from running the boot process.
(BBL releases hart 0 after the main harts, so it never wins the lottery).
o Renumber CPUs early on boot.
Exclude non-MMU cores. Store the original hart ID in struct pcpu. This
allows to find out the correct destination for IPIs and remote sfence
calls.
Thanks to SiFive, Inc for the board provided.
Reviewed by: markj
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D20225
So do nothing in pmap_page_set_memattr() and don't panic.
Reviewed by: markj
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D20209
Having IPSEC compiled into the kernel imposes a non-trivial
performance penalty on multi-threaded workloads due to IPSEC
refcounting. In my benchmarks of multi-threaded UDP
transmit (connected sockets), I've seen a roughly 20% performance
penalty when the IPSEC option is included in the kernel (16.8Mpps
vs 13.8Mpps with 32 senders on a 14 core / 28 HTT Xeon
2697v3)). This is largely due to key_addref() incrementing and
decrementing an atomic reference count on the default
policy. This cause all CPUs to stall on the same cacheline, as it
bounces between different CPUs.
Given that relatively few users use ipsec, and that it can be
loaded as a module, it seems reasonable to ask those users to
load the ipsec module so as to avoid imposing this penalty on the
GENERIC kernel. Its my hope that this will make FreeBSD look
better in "out of the box" benchmark comparisons with other
operating systems.
Many thanks to ae for fixing auto-loading of ipsec.ko when
ifconfig tries to configure ipsec, and to cy for volunteering
to ensure the the racoon ports will load the ipsec.ko module
Reviewed by: cem, cy, delphij, gnn, jhb, jpaetzel
Differential Revision: https://reviews.freebsd.org/D20163
tun(4) and tap(4) share the same general management interface and have a lot
in common. Bugs exist in tap(4) that have been fixed in tun(4), and
vice-versa. Let's reduce the maintenance requirements by merging them
together and using flags to differentiate between the three interface types
(tun, tap, vmnet).
This fixes a couple of tap(4)/vmnet(4) issues right out of the gate:
- tap devices may no longer be destroyed while they're open [0]
- VIMAGE issues already addressed in tun by kp
[0] emaste had removed an easy-panic-button in r240938 due to devdrn
blocking. A naive glance over this leads me to believe that this isn't quite
complete -- destroy_devl will only block while executing d_* functions, but
doesn't block the device from being destroyed while a process has it open.
The latter is the intent of the condvar in tun, so this is "fixed" (for
certain definitions of the word -- it wasn't really broken in tap, it just
wasn't quite ideal).
ifconfig(8) also grew the ability to map an interface name to a kld, so
that `ifconfig {tun,tap}0` can continue to autoload the correct module, and
`ifconfig vmnet0 create` will now autoload the correct module. This is a
low overhead addition.
(MFC commentary)
This may get MFC'd if many bugs in tun(4)/tap(4) are discovered after this,
and how critical they are. Changes after this are likely easily MFC'd
without taking this merge, but the merge will be easier.
I have no plans to do this MFC as of now.
Reviewed by: bcr (manpages), tuexen (testing, syzkaller/packetdrill)
Input also from: melifaro
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D20044
RISC-V ISA specifies no cache management instructions so leave cache
operations in cpufunc.h as no-op for now.
Note some new hardware comes with their own memory-mapped cache
management controller.
Tested on HiFive Unleashed board with cgem(4).
Reviewed by: markj
Obtained from: arm64
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D20126
In certain scenarios, it is possible for PCPU data to be
accessed before it has been initialized (e.g. during printf
if the kernel was built with the TSLOG option).
Initialize the PCPU pointer for hart 0 at the beginning of
initriscv() rather than near the end.
Reviewed by: markj
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D19726
o Fix bug in PLIC_ENABLE macro when irq >= 32.
Tested on the real hardware, which is HiFive Unleashed board.
Thanks to SiFive, Inc. for the board provided.
Reviewed by: markj
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D19775
RISC-V timer has no dedicated DTS node and we have to get timer
frequency from cpus node.
Tested on Government Furnished Equipment (GFE) cores synthesized
on Xilinx VCU118.
Reviewed by: markj
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D19727
Add the infrastructure to allow MD procctl(2) commands, and use it to
introduce amd64 PTI control and reporting. PTI mode cannot be
modified for existing pmap, the knob controls PTI of the new vmspace
created on exec.
Requested by: jhb
Reviewed by: jhb, markj (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D19514
PTI mode for the process pmap on exec is activated iff P_MD_PTI is set.
On exec, the existing vmspace can be reused only if pti mode of the
pmap matches the P_MD_PTI flag of the process. Add MD
cpu_exec_vmspace_reuse() callback for exec_new_vmspace() which can
vetoed reuse of the existing vmspace.
MFC note: md_flags change struct proc KBI.
Reviewed by: jhb, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D19514
This mirrors the arm64 implementation and is for use in the minidump
code.
Submitted by: Mitchell Horne <mhorne063@gmail.com>
Differential Revision: https://reviews.freebsd.org/D18321
In all of the architectures we have today, we always use PAGE_SIZE.
While in theory one could define different things, none of the
current architectures do, even the ones that have transitioned from
32-bit to 64-bit like i386 and arm. Some ancient mips binaries on
other systems used 8k instead of 4k, but we don't support running
those and likely never will due to their age and obscurity.
Reviewed by: imp (who also contributed the commit message)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D19280
Skylake Xeons.
See SDM rev. 68 Vol 3 4.6.2 Protection Keys and the description of the
RDPKRU and WRPKRU instructions.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D18893
This reduces the overhead of TLB invalidations by ensuring that we
only interrupt CPUs which are using the given pmap. Tracking is
performed in pmap_activate(), which gets called during context switches:
from cpu_throw(), if a thread is exiting or an AP is starting, or
cpu_switch() for a regular context switch.
For now, pmap_sync_icache() still must interrupt all CPUs.
Reviewed by: kib (earlier version), jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18874
This includes support for pmap_enter(..., psind=1) as described in the
commit log message for r321378.
The changes are largely modelled after amd64. arm64 has more stringent
requirements around superpage creation to avoid the possibility of TLB
conflict aborts, and these requirements do not apply to RISC-V, which
like amd64 permits simultaneous caching of 4KB and 2MB translations for
a given page. RISC-V's PTE format includes only two software bits, and
as these are already consumed we do not have an analogue for amd64's
PG_PROMOTED. Instead, pmap_remove_l2() always invalidates the entire
2MB address range.
pmap_ts_referenced() is modified to clear PTE_A, now that we support
both hardware- and software-managed reference and dirty bits. Also
fix pmap_fault_fixup() so that it does not set PTE_A or PTE_D on kernel
mappings.
Reviewed by: kib (earlier version)
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18863
Differential Revision: https://reviews.freebsd.org/D18864
Differential Revision: https://reviews.freebsd.org/D18865
Differential Revision: https://reviews.freebsd.org/D18866
Differential Revision: https://reviews.freebsd.org/D18867
Differential Revision: https://reviews.freebsd.org/D18868
There's no need to worry about potential backwards compatibility issues
in a brand-new architecture, so avoid stack PROT_EXEC as with arm64.
Discussed with: br
The existence of a PV entry for a mapping guarantees that the mapping
exists, so we should not need to test for that.
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18866
The existing copyin(9) and copyout(9) routines on RISC-V perform only a
simple byte-by-byte copy. Improve their performance by performing
word-sized copies where possible.
Submitted by: Mitchell Horne <mhorne063@gmail.com>
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D18851
The MI kernel assumes that interrupts will not be enabled on APs until
after the first context switch. In particular, the problem was causing
occasional deadlocks during boot.
Remove an unneeded intr_disable() added in r335005.
Reviewed by: jhb (previous version)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18738
Don't bother zeroing the top-level page before freeing it. Previously,
the page was freed before being zeroed.
Reviewed by: jhb, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18720
The list will be removed with some future work.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18721
- Handle VM_PROT_EXECUTE.
- Clear PTE_D and mark the page dirty when removing write access
from a mapping.
- Atomically clear PTE_W to avoid clobbering a hardware PTE update.
Reviewed by: jhb, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18719
Otherwise prefaulted entries are not accessible from user mode and
end up triggering a fault upon access, so prefaulting has no effect.
Reviewed by: jhb, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18718
There's no need to use atomics when the previous value isn't needed.
No functional change intended.
Reviewed by: kib
Discussed with: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18717
We currently don't have a good way to dynamically detect whether the
kernel is running as a guest.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18715
The sbadaddr register was renamed in version 1.10 of the privileged
architecture specification. No functional change intended.
Submitted by: Mitchell Horne <mhorne063@gmail.com>
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D18594
- Panic immediately if witness says we're holding non-sleepable locks.
This helps ensure that we don't recurse on the pmap lock in
pmap_fault_fixup().
- Panic if the kernel faults on a user address without setting an
onfault handler.
- Panic if the fault occurred in a critical section or interrupt
handler, like we do on other platforms.
- Fix some style issues in trap_pfault().
Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18561
pmap_remove_pages() is called during process termination, when it is
guaranteed that no other CPU may access the mappings being torn down.
In particular, it unnecessary to invalidate each mapping individually
since we do a pmap_invalidate_all() at the end of the function.
Also don't call pmap_invalidate_all() while holding a PV list lock, the
global pvh lock is sufficient.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18562
pmaps on RISC-V always have an L1 page table page, so we don't need to
check for this when performing lookups.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18563
- Build up phys_avail[] in a single loop, excluding memory used by
the loaded kernel.
- Fix an array indexing bug in the aforementioned phys_avail[]
initialization.[1]
- Remove some unneeded code copied from the arm64 implementation.
PR: 231515 [1]
Reviewed by: jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18464
Inline tests for PTE_* bits are easy to read and don't really require a
predicate function, and predicates which operate on a pt_entry_t are
inconvenient when working with L1 and L2 page table entries.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18461
This adds more detail and fixes some inaccuracies.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18463
Add a subroutine for updating satp, for use when updating the
active pmap. No functional change intended.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18462
Fix reporting of SS_ONSTACK in nested signal delivery when sigaltstack()
is used on some architectures.
Add a unit test for this. I tested the test by introducing the bug
on amd64. I did not test it on other architectures.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D18347
On arm64 and riscv platforms, sendsig() failed to zero the signal
frame before copying it out to userspace. Zero it.
On arm, I believe all the contents of the frame were initialized,
so there was no disclosure. However, explicitly zero the whole frame
because that fact could inadvertently change in the future,
it's more clear to the reader, and I could be wrong in the first place.
MFC after: 2 days
Security: similar to FreeBSD-EN-18:12.mem and CVE-2018-17155
Sponsored by: Dell EMC Isilon
The RISC-V spec defines several performance counter CSRs such as: cycle,
time, instret, hpmcounter(3...31). They are defined to be 64-bits wide
on all RISC-V architectures. On RV64 and RV128 they can be read from a
single CSR. On RV32, additional CSRs (given the suffix "h") are present
which contain the upper 32 bits of these counters, and must be read as
well. (See section 2.8 in the User ISA Spec for full details.)
This change adds macros for reading these values safely on any RISC-V
ISA length. Obviously we aren't supporting anything other than RV64
at the moment, but this ensures we won't need to change how we read
these values if we ever do.
Submitted by: Mitchell Horne <mhorne063@gmail.com>
Reviewed by: jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D17952
These architectures never shipped binaries with an rtld path of
/usr/libexec/ld-elf.so.1.
Reviewed by: markj
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D17876
machine/vmparam.h already defines the SHAREDPAGE constant. This
change just enables it for ELF executables. The only use of the
shared page currently is to hold the signal trampoline.
Reviewed by: markj, kib
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D17875
Replace a call to DELAY(1) with a new cpu_lock_delay() KPI. Currently
cpu_lock_delay() is defined to DELAY(1) on all platforms. However,
platforms with a DELAY() implementation that uses spin locks should
implement a custom cpu_lock_delay() doesn't use locks.
Reviewed by: kib
MFC after: 3 days
Rather than unconditionally setting PTE_D for all writeable kernel
mappings, set PTE_D for writable mappings of unmanaged pages (whether
user or kernel). This matches what amd64 does and also matches what
the RISC-V spec suggests (preset the A and D bits on mappings where
the OS doesn't care about the state).
Suggested by: alc
Reviewed by: alc, markj
Sponsored by: DARPA
Previously, RISC-V was enabling execute permissions in PTEs for any
readable page. Now, execute permissions are only enabled if they were
explicitly specified (e.g. via PROT_EXEC to mmap). The one exception
is that the initial kernel mapping in locore still maps all of the
kernel RWX.
While here, change the fault type passed to vm_fault and
pmap_fault_fixup to only include a single VM_PROT_* value representing
the faulting access to match other architectures rather than passing a
bitmask.
Reviewed by: markj
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D17783
This assumes that an access according to the prot in 'flags' triggered
a fault and is going to be retried after the fault returns, so the two
flags are set preemptively to avoid refaulting on the retry.
While here, only bother setting PTE_D for kernel mappings in pmap_enter
for writable mappings.
Reviewed by: markj
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D17782
This is just cosmetic.
A weirder issue is that the SBI doc claims the hart mask pointer should
be a physical address, not a virtual address. However, the implementation
in bbl seems to just dereference the address directly.
Reviewed by: markj
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D17781
The loader tunable 'debug.verbose_sysinit' may be used to toggle verbosity.
This is added to the debugging section of these kernconfs to be turned off
in stable branches for clarity of intent.
MFC after: never
modify l3 pte after we loaded old value and before we stored new value.
o Preset A(accessed), D(dirty) bits for kernel mappings.
Reported by: kib
Reviewed by: markj
Discussed with: jhb
Sponsored by: DARPA, AFRL
All platforms except powerpc use the same values and powerpc shares a
majority of them.
Go ahead and declare AT_NOTELF, AT_UID, and AT_EUID in favor of the
unused AT_DCACHEBSIZE, AT_ICACHEBSIZE, and AT_UCACHEBSIZE for powerpc.
Reviewed by: jhb, imp
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17397
(e.g. RocketChip, lowRISC and derivatives).
RISC-V page table entries support A (accessed) and D (dirty) bits. The
spec makes hardware support for these bits optional. Implementations that
do not manage these bits in hardware raise page faults for accesses to a
valid page without A set and writes to a writable page without D set.
Check for these types of faults when handling a page fault and fixup the
PTE without calling vm_fault if they occur.
Reviewed by: jhb, markj
Approved by: re (gjb)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17424
(e.g. RocketChip, lowRISC and derivatives).
RISC-V page table entries support A (accessed) and D (dirty) bits. The
spec makes hardware support for these bits optional. Implementations that
do not manage these bits in hardware raise page faults for accesses to a
valid page without A set and writes to a writable page without D set.
Check for these types of faults when handling a page fault and fixup the
PTE without calling vm_fault if they occur.
Reviewed by: jhb, markj
Approved by: re (gjb)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17424
This was missed in r339367 ("Various fixes for TLB management on RISC-V.").
This fixes operation on lowRISC.
Reviewed by: jhb
Approved by: re (gjb)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17583
- Remove the arm64-specific cpu_*cache* and cpu_tlb_flush* functions.
Instead, add RISC-V specific inline functions in cpufunc.h for the
fence.i and sfence.vma instructions.
- Catch up to changes in the arm64 pmap and remove all the cpu_dcache_*
calls, pmap_is_current, pmap_l3_valid_cacheable, and PTE_NEXT bits from
pmap.
- Remove references to the unimplemented riscv_setttb().
- Remove unused cpu_nullop.
- Add a link to the SBI doc to sbi.h.
- Add support for a 4th argument in SBI calls. It's not documented but
it seems implied for the asid argument to SBI_REMOVE_SFENCE_VMA_ASID.
- Pass the arguments from sbi_remote_sfence*() to the SEE. BBL ignores
them so this is just cosmetic.
- Flush icaches on other CPUs when they resume from kdb in case the
debugger wrote any breakpoints while the CPUs were paused in the IPI_STOP
handler.
- Add SMP vs UP versions of pmap_invalidate_* similar to amd64. The
UP versions just use simple fences. The SMP versions use the
sbi_remove_sfence*() functions to perform TLB shootdowns. Since we
don't have a valid pm_active field in the riscv pmap, just IPI all
CPUs for all invalidations for now.
- Remove an extraneous TLB flush from the end of pmap_bootstrap().
- Don't do a TLB flush when writing new mappings in pmap_enter(), only if
modifying an existing mapping. Note that for COW faults a TLB flush is
only performed after explicitly clearing the old mapping as is done in
other pmaps.
- Sync the i-cache on all harts before updating the PTE for executable
mappings in pmap_enter and pmap_enter_quick. Previously the i-cache was
only sync'd after updating the PTE in pmap_enter.
- Use sbi_remote_fence() instead of smp_rendezvous in pmap_sync_icache().
Reviewed by: markj
Approved by: re (gjb, kib)
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D17414
Without this hardware raises an interrupt regardless of any
pending bits set.
This fixes operation on RocketChip and derivatives (e.g. lowRISC).
Approved by: re (kib)
Sponsored by: DARPA, AFRL
The only source of documentation for this device is verilog,
so driver is minimalistic.
Reviewed by: Dr Jonathan Kimmitt <jrrk2@cam.ac.uk>
Approved by: re (kib)
Sponsored by: DARPA, AFRL
This invokes "fence" on the hart performing the write followed by an IPI
to execute "fence.i" on all harts.
This is required to support userland debuggers setting breakpoints in
user processes.
Reviewed by: br (earlier version), markj
Approved by: re (gjb)
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D17139
- Explicitly load an empty initial state into FP registers when taking
the fault on the first FP instruction in a thread. Setting
SSTATE.FS to INITIAL is just a marker to let context switch restore
code know that it can load FP registers with zeroes instead of
memory loads. It does not imply that the hardware will reset all
registers to zero on first access. In addition, set the state to
CLEAN instead of INITIAL after the first FP instruction.
cpu_switch() doesn't do anything for INITIAL and only restores from
the pcb if the state is CLEAN. We could perhaps change cpu_switch
to call fpe_state_clear if the state was INITIAL and leave SSTATE.FS
set to INITIAL instead of CLEAN after the first FP instruction.
However, adding this complexity to cpu_switch() doesn't seem worth
the supposed gain.
- Only save the current FPU registers in fill_fpregs() if the request
is made to save the current thread's registers. Previously if a
debugger requested FP registers via ptrace() it was getting a copy
of the debugger's FP registers rather than the debugee's.
- Zero the entire FP register set structure returned for ptrace() if a
thread hasn't used FP registers rather than leaking garbage in the
fp_fcsr field.
- If a debugger writes FP registers via ptrace(), always mark the pcb
as having valid FP registers and set SSTATUS.FS_MASK to CLEAN so
that the registers will be restored when the debugged thread
resumes.
- Be more explicit about clearing the SSTATUS.FS field before setting
it to CLEAN on the first FP instruction trap.
Submitted by: br, markj
Approved by: re (rgrimes)
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D17141
relocation.
elf_relocaddr() has a hook to handle VIMAGE data addresses.
This fixes VIMAGE support for RISC-V when built as a module.
Approved by: re (gjb)
Sponsored by: DARPA, AFRL
This is done by setting SUM (permit Supervisor User Memory access)
bit in sstatus register.
The functions we allow access for are routines in assembly that
explicitly handle crossing the user kernel boundary.
Approved by: re (kib)
Sponsored by: DARPA, AFRL