correctly for the data contained on each memory page.
There are several components to this change:
* Add a variable to indicate the start of the R/W portion of the
initial memory.
* Stop detecting NX bit support for each AP. Instead, use the value
from the BSP and, if supported, activate the feature on the other
APs just before loading the correct page table. (Functionally, we
already assume that the BSP and all APs had the same support or
lack of support for the NX bit.)
* Set the RW and NX bits correctly for the kernel text, data, and
BSS (subject to some caveats below).
* Ensure DDB can write to memory when necessary (such as to set a
breakpoint).
* Ensure GDB can write to memory when necessary (such as to set a
breakpoint). For this purpose, add new MD functions gdb_begin_write()
and gdb_end_write() which the GDB support code can call before and
after writing to memory.
This change is not comprehensive:
* It doesn't do anything to protect modules.
* It doesn't do anything for kernel memory allocated after the kernel
starts running.
* In order to avoid excessive memory inefficiency, it may let multiple
types of data share a 2M page, and assigns the most permissions
needed for data on that page.
Reviewed by: jhb, kib
Discussed with: emaste
MFC after: 2 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D14282
We don't support float in the boot loaders, so don't include
interfaces for float or double in systems headers. In addition, take
the unusual step of spiking double and float to prevent any more
accidental seepage.
We don't support older compilers. Most of the code in these files is
for pre-3.0 gcc, which is at least 15 years obsolete. Move to using
phk's sys/_stdargs.h for all these platforms.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D14323
kernel by PHYS_TO_DMAP() as previously present on amd64, arm64, riscv, and
powerpc64. This introduces a new MI macro (PMAP_HAS_DMAP) that can be
evaluated at runtime to determine if the architecture has a direct map;
if it does not (or does) unconditionally and PMAP_HAS_DMAP is either 0 or
1, the compiler can remove the conditional logic.
As part of this, implement PHYS_TO_DMAP() on sparc64 and mips64, which had
similar things but spelled differently. 32-bit MIPS has a partial direct-map
that maps poorly to this concept and is unchanged.
Reviewed by: kib
Suggestions from: marius, alc, kib
Runtime tested on: amd64, powerpc64, powerpc, mips64
They provide relaxed-ordered atomic access semantic. Due to the
FreeBSD memory model, the operations are syntaxical wrappers around
the volatile accesses. The volatile qualifier is used to ensure that
the access not optimized out and in turn depends on the volatile
semantic as implemented by supported compilers.
The motivation for adding the operation is to help people coming from
other systems or knowing the C11/C++ standards where atomics have
special type and require use of the special access operations. It is
still the case that FreeBSD requires plain load and stores of aligned
integer types to be atomic.
Suggested by: jhb
Reviewed by: alc, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D13534
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
Initially, only tag files that use BSD 4-Clause "Original" license.
RelNotes: yes
Differential Revision: https://reviews.freebsd.org/D13133
- allocate value for new AT_HWCAP2 auxiliary vector on all platforms.
- expand 'struct sysentvec' by new 'u_long *sv_hwcap2', in exactly
same way as for AT_HWCAP.
MFC after: 1 month
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D12699
A new 'u_long *sv_hwcap' field is added to 'struct sysentvec'. A
process ABI can set this field to point to a value holding a mask of
architecture-specific CPU feature flags. If an ABI does not wish to
supply AT_HWCAP to processes the field can be left as NULL.
The support code for AT_EHDRFLAGS was already present on all systems,
just the #define was not present. This is a step towards unifying the
AT_* constants across platforms.
Reviewed by: kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D12290
--Remove special-case handling of sparc64 bus_dmamap* functions.
Replace with a more generic mechanism that allows MD busdma
implementations to generate inline mapping functions by
defining WANT_INLINE_DMAMAP in <machine/bus_dma.h>. This
is currently useful for sparc64, x86, and arm64, which all
implement non-load dmamap operations as simple wrappers
around map objects which may be bus- or device-specific.
--Remove NULL-checked bus_dmamap macros. Implement the
equivalent NULL checks in the inlined x86 implementation.
For non-x86 platforms, these checks are a minor pessimization
as those platforms do not currently allow NULL maps. NULL
maps were originally allowed on arm64, which appears to have
been the motivation behind adding arm[64]-specific barriers
to bus_dma.h, but that support was removed in r299463.
--Simplify the internal interface used by the bus_dmamap_load*
variants and move it to bus_dma_internal.h
--Fix some drivers that directly include sys/bus_dma.h
despite the recommendations of bus_dma(9)
Reviewed by: kib (previous revision), marius
Differential Revision: https://reviews.freebsd.org/D10729
from machine/proc.h, consistently on all architectures.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
X-Differential revision: https://reviews.freebsd.org/D11080
A long long time ago the register keyword told the compiler to store
the corresponding variable in a CPU register, but it is not relevant
for any compiler used in the FreeBSD world today.
ANSIfy related prototypes while here.
Reviewed by: cem, jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D10193
in place. To do per-cpu stats, convert all fields that previously were
maintained in the vmmeters that sit in pcpus to counter(9).
- Since some vmmeter stats may be touched at very early stages of boot,
before we have set up UMA and we can do counter_u64_alloc(), provide an
early counter mechanism:
o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter.
o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter,
so that at early stages of boot, before counters are allocated we already
point to a counter that can be safely written to.
o For sparc64 that required a whole dummy pcpu[MAXCPU] array.
Further related changes:
- Don't include vmmeter.h into pcpu.h.
- vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit,
to match kernel representation.
- struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion.
This is based on benno@'s 4-year old patch:
https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html
Reviewed by: kib, gallatin, marius, lidl
Differential Revision: https://reviews.freebsd.org/D10156
The MFC will include a compat definition of smp_no_rendevous_barrier()
that calls smp_no_rendezvous_barrier().
Reviewed by: gnn, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D10313
I fixed this in 1997, but the fix was over-engineered and fragile and
was broken in 2003 if not before. i386 parameters were copied to 8
other arches verbatim, mostly after they stopped working on i386, and
mostly without the large comment saying how the values were chosen on
i386. powerpc has a non-verbatim copy which just changes the uncritical
parameter and seems to add a sign extension bug to it.
Just treat negative offsets as offsets if they are no more negative than
-db_offset_max (default -64K), and remove all the broken parameters.
-64K is not very negative, but it is enough for frame and stack pointer
offsets since kernel stacks are small.
The over-engineering was mainly to go more negative than -64K for the
negative offset format, without affecting printing for more than a
single address.
Addresses in the top 64K of a (full 32-bit or 64-bit) address space
are now printed less well, but there aren't many interesting ones.
For arches that have many interesting ones very near the top (e.g.,
68k has interrupt vectors there), there would be no good limit for
the negative offset format and -64K is a good as anything.
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.
Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96
Implement get_pcpu() for amd64/sparc64/mips/powerpc, and use it to
replace pcpu_find(curcpu) in MI code.
Reviewed by: andreast, kan, lidl
Tested by: lidl(mips, sparc64), andreast(powerpc)
Differential Revision: https://reviews.freebsd.org/D9587
The types are for the byte offset and page index in vm object. They
are similar to off_t, which is defined as 64bit MI integer. Using MI
definitions will allow to provide consistent MD values of vm
object-related maximum sizes.
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The switch to get_pcpu() in MI code seems to cause hangs on MIPS.
Back out until we can get a better idea of what's happening there.
Reported by: kan, lidl
with no creative content. Include "lost" changes from git:
o Use /dev/efi instead of /dev/efidev
o Remove redundant NULL checks.
Submitted by: kib@, dim@, zbb@, emaste@
Add a pair of bus methods that can be used to "map" resources for direct
CPU access using bus_space(9). bus_map_resource() creates a mapping and
bus_unmap_resource() releases a previously created mapping. Mappings are
described by 'struct resource_map' object. Pointers to these objects can
be passed as the first argument to the bus_space wrapper API used for bus
resources.
Drivers that wish to map all of a resource using default settings
(for example, using uncacheable memory attributes) do not need to change.
However, drivers that wish to use non-default settings can now do so
without jumping through hoops.
First, an RF_UNMAPPED flag is added to request that a resource is not
implicitly mapped with the default settings when it is activated. This
permits other activation steps (such as enabling I/O or memory decoding
in a device's PCI command register) to be taken without creating a
mapping. Right now the AGP drivers don't set RF_ACTIVE to avoid using
up a large amount of KVA to map the AGP aperture on 32-bit platforms.
Once RF_UNMAPPED is supported on all platforms that support AGP this
can be changed to using RF_UNMAPPED with RF_ACTIVE instead.
Second, bus_map_resource accepts an optional structure that defines
additional settings for a given mapping.
For example, a driver can now request to map only a subset of a resource
instead of the entire range. The AGP driver could also use this to only
map the first page of the aperture (IIRC, it calls pmap_mapdev() directly
to map the first page currently). I will also eventually change the
PCI-PCI bridge driver to request mappings of the subset of the I/O window
resource on its parent side to create mappings for child devices rather
than passing child resources directly up to nexus to be mapped. This
also permits bridges that do address translation to request suitable
mappings from a resource on the "upper" side of the bus when mapping
resources on the "lower" side of the bus.
Another attribute that can be specified is an alternate memory attribute
for memory-mapped resources. This can be used to request a
Write-Combining mapping of a PCI BAR in an MI fashion. (Currently the
drivers that do this call pmap_change_attr() directly for x86 only.)
Note that this commit only adds the MI framework. Each platform needs
to add support for handling RF_UNMAPPED and thew new
bus_map/unmap_resource methods. Generally speaking, any drivers that
are calling rman_set_bustag() and rman_set_bushandle() need to be
updated.
Discussed on: arch
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D5237
addresses exceeding 32 bit, so bump BUS_SPACE_MAXADDR to 64 bit.
The whole situation is sub par, though; prior to r296250 and despite
what their names imply, BUS_SPACE_MAX* were primarily, even almost
exclusively used for bus_dma(9). Now these macros also have a vital
role for bus_space(9). However, it does not necessarily hold that
both bus DMA and space addresses universally have the same limits
per platform.
As for sparc64, 64 bit clearly is beyond what can be addressed via
the various IOMMUs. With this change in place, we now rely on the
parent bus DMA tags of the host-to-foo drivers causing the child
tags to be capped as necessary.
PR: 207998
ucontext_t available. Our code even has XXX comment about this.
Add a bit of compliance by moving struct __ucontext definition into
sys/_ucontext.h and including it into signal.h and sys/ucontext.h.
Several machine/ucontext.h headers were changed to use namespace-safe
types (like uint64_t->__uint64_t) to not depend on sys/types.h.
struct __stack_t from sys/signal.h is made always visible in private
namespace to satisfy sys/_ucontext.h requirements.
Apparently mips _types.h pollutes global namespace with f_register_t
type definition. This commit does not try to fix the issue.
PR: 207079
Reported and tested by: Ting-Wei Lan <lantw44@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
tree parsing opt-out rather than opt-in. All FDT-based systems as well as
PowerPC systems with real Open Firmware use the CHRP-derived binding that
includes it, which makes SPARC the odd man out here. Making it opt-out
avoids astonishment on new platform bring up.
CPU_ISSET(), CPU_SET etc. in sparc64 asm. This approach has the
benefit of not clobbering %y, allowing to revert r222827 and
partially r222828.
- In r222828, CATR() already was changed to use the equivalent of
PCPU_GET(cpuid) instead of the MD module ID for KTR_CPU, so
belatedly also catch up with the C side of ktr(9). Originally,
in r203838 CATR() was moved away from directly reading the
module ID or equivalent as that became impractical with other
CPU types than USI/II supported. With r222828 in place, per-CPU
data generally is set up soon enough, though, that employing
PCPU things in ktr(9) also for use during early stages works.
- Unfortunately, an exception to the latter is the ktr(9) use
in pmap_bootstrap(), which actually is run so early that even
checking for bootverbose being set via the loader doesn't work.
Consequently, replace the ktr(9) use in pmap_bootstrap() with
OF_printf(9) and put it under #ifdef DIAGNOSTIC instead.
MFC after: 3 days
- Add a kvaddr_type to represent kernel virtual addresses instead of
unsigned long.
- Add a struct kvm_nlist which is a stripped down version of struct nlist
that uses kvaddr_t for n_value.
- Add a kvm_native() routine that returns true if an open kvm descriptor
is for a native kernel and memory image.
- Add a kvm_open2() function similar to kvm_openfiles(). It drops the
unused 'swapfile' argument and adds a new function pointer argument for
a symbol resolving function. Native kernels still use _fdnlist() from
libc to resolve symbols if a resolver function is not supplied, but cross
kernels require a resolver.
- Add a kvm_nlist2() function similar to kvm_nlist() except that it uses
struct kvm_nlist instead of struct nlist.
- Add a kvm_read2() function similar to kvm_read() except that it uses
kvaddr_t instead of unsigned long for the kernel virtual address.
- Add a new kvm_arch switch of routines needed by a vmcore backend.
Each backend is responsible for implementing kvm_read2() for a given
vmcore format.
- Use libelf to read headers from ELF kernels and cores (except for
powerpc cores).
- Add internal helper routines for the common page offset hash table used
by the minidump backends.
- Port all of the existing kvm backends to implement a kvm_arch switch and
to be cross-friendly by using private constants instead of ones that
vary by platform (e.g. PAGE_SIZE). Static assertions are present when
a given backend is compiled natively to ensure the private constants
match the real ones.
- Enable all of the existing vmcore backends on all platforms. This means
that libkvm on any platform should be able to perform KVA translation
and read data from a vmcore of any platform.
Tested on: amd64, i386, sparc64 (marius)
Differential Revision: https://reviews.freebsd.org/D3341
Since r289279 bufinit() uses mp_ncpus so adapt to what x86 does and
set this variable already in cpu_mp_setmaxid().
While at it, rename cpu_cpuid_prop() to cpu_portid_prop() as well as
the MD cpuid variable to portid to avoid confusion with the MI use
of "cpuid" and make some variable static/global in order to reduce
stack usage.
PR: 204685
- While at it, arrange #ifndefs in kern_dump.c more intelligently; it's
rather confusing to have multiple competing and/or unused functions in
the kernel.
Formally pair store_rel(&smp_started) with load_acq(&smp_started).
Similarly to x86, this change is mostly a NOP due to the kernel
being run in total store order.
MFC after: 1 week
vm_offset_t pmap_quick_enter_page(vm_page_t m)
void pmap_quick_remove_page(vm_offset_t kva)
These will create and destroy a temporary, CPU-local KVA mapping of a specified page.
Guarantees:
--Will not sleep and will not fail.
--Safe to call under a non-sleepable lock or from an ithread
Restrictions:
--Not guaranteed to be safe to call from an interrupt filter or under a spin mutex on all platforms
--Current implementation does not guarantee more than one page of mapping space across all platforms. MI code should not make nested calls to pmap_quick_enter_page.
--MI code should not perform locking while holding onto a mapping created by pmap_quick_enter_page
The idea is to use this in busdma, for bounce buffer copies as well as virtually-indexed cache maintenance on mips and arm.
NOTE: the non-i386, non-amd64 implementations of these functions still need review and testing.
Reviewed by: kib
Approved by: kib (mentor)
Differential Revision: http://reviews.freebsd.org/D3013
from x86 to use smp_ipi_mtx spin lock not only for smp_rendezvous_cpus()
but also for the MD cache invalidation, TLB demapping and remote register
reading IPIs due to the following reasons:
- The cross-IPI SMP deadlock x86 otherwise is subject to can't happen on
sparc64. That's because on sparc64, spin locks don't disable interrupts
completely but only raise the processor interrupt level to PIL_TICK. This
means that IPIs still get delivered and direct dispatch IPIs such as the
cache invalidation etc. IPIs in question are still executed.
- In smp_rendezvous_cpus(), smp_ipi_mtx is held not only while sending an
IPI_RENDEZVOUS, but until all CPUs have processed smp_rendezvous_action().
Consequently, smp_ipi_mtx may be locked for an extended amount of time as
queued IPIs (as opposed to the direct ones) such as IPI_RENDEZVOUS are
scheduled via a soft interrupt. Moreover, given that this soft interrupt
is only delivered at PIL_RENDEZVOUS, processing of smp_rendezvous_action()
on a target may be interrupted by f. e. a tick interrupt at PIL_TICK, in
turn leading to the target in question trying to send an IPI by itself
while IPI_RENDEZVOUS isn't fully handled, yet, and, thus, resulting in a
deadlock.
o As mentioned in the commit message of r245850, on least some sun4u platforms
concurrent sending of IPIs by different CPUs is fatal. Therefore, hold the
reintroduced MD ipi_mtx also while delivering cross-traps via MI helpers,
i. e. ipi_{all_but_self,cpu,selected}().
o Akin to x86, let the last CPU to process cpu_mp_bootstrap() set smp_started
instead of the BSP in cpu_mp_unleash(). This ensures that all APs actually
are started, when smp_started is no longer 0.
o In all MD and MI IPI helpers, check for smp_started == 1 rather than for
smp_cpus > 1 or nothing at all. This avoids races during boot causing IPIs
trying to be delivered to APs that in fact aren't up and running, yet.
While at it, move setting of the cpu_ipi_{selected,single}() pointers to
the appropriate delivery functions from mp_init() to cpu_mp_start() where
it's better suited and allows to get rid of the global isjbus variable.
o Given that now concurrent IPI delivery no longer is possible, also nuke
the delays before completely disabling interrupts again in the CPU-specific
cross-trap delivery functions, previously giving other CPUs a window for
sending IPIs on their part. Actually, we now should be able to entirely get
rid of completely disabling interrupts in these functions. Such a change
needs more testing, though.
o In {s,}tick_get_timecount_mp(), make the {s,}tick variable static. While not
necessary for correctness, this avoids page faults when accessing the stack
of a foreign CPU as {s,}tick now is locked into the TLBs as part of static
kernel data. Hence, {s,}tick_get_timecount_mp() always execute as fast as
possible, avoiding jitter.
PR: 201245
MFC after: 3 days
provide a semantic defined by the C11 fences with corresponding
memory_order.
atomic_thread_fence_acq() gives r | r, w, where r and w are read and
write accesses, and | denotes the fence itself.
atomic_thread_fence_rel() is r, w | w.
atomic_thread_fence_acq_rel() is the combination of the acquire and
release in single operation. Note that reads after the acq+rel fence
could be made visible before writes preceeding the fence.
atomic_thread_fence_seq_cst() orders all accesses before/after the
fence, and the fence itself is globally ordered against other
sequentially consistent atomic operations.
Reviewed by: alc
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
and export them to userland.
- Define __HAVE_REG32 on platforms that define a reg32 structure and check
for this in <sys/procfs.h> to control when to export prstatus32, etc.
- Add prstatus32_t and prpsinfo32_t typedefs for the 32-bit structures.
libbfd looks for these types, and having them fixes 'gcore' in gdb of a
32-bit process on a 64-bit platform.
- Use the structure definitions from <sys/procfs.h> in gcore's elf32 core
dump code instead of duplicating the definitions.
Differential Revision: https://reviews.freebsd.org/D2142
Reviewed by: kib, nathanw (powerpc bits)
MFC after: 1 week