tree parsing opt-out rather than opt-in. All FDT-based systems as well as
PowerPC systems with real Open Firmware use the CHRP-derived binding that
includes it, which makes SPARC the odd man out here. Making it opt-out
avoids astonishment on new platform bring up.
providing compiled-in static environment data that is used instead of any
data passed in from a boot loader.
Previously 'env' worked only on i386 and arm xscale systems, because it
required the MD startup code to examine the global envmode variable and
decide whether to use static_env or an environment obtained from the boot
loader, and set the global kern_envp accordingly. Most startup code wasn't
doing so. Making things even more complex, some mips startup code uses an
alternate scheme that involves calling init_static_kenv() to pass an empty
buffer and its size, then uses a series of kern_setenv() calls to populate
that buffer.
Now all MD startup code calls init_static_kenv(), and that routine provides
a single point where envmode is checked and the decision is made whether to
use the compiled-in static_kenv or the values provided by the MD code.
The routine also continues to serve its original purpose for mips; if a
non-zero buffer size is passed the routine installs the empty buffer ready
to accept kern_setenv() values. Now if the size is zero, the provided buffer
full of existing env data is installed. A NULL pointer can be passed if the
boot loader provides no env data; this allows the static env to be installed
if envmode is set to do so.
Most of the work here is a near-mechanical change to call the init function
instead of directly setting kern_envp. A notable exception is in xen/pv.c;
that code was originally installing a buffer full of preformatted env data
along with its non-zero size (like mips code does), which would have allowed
kern_setenv() calls to wipe out the preformatted data. Now it passes a zero
for the size so that the buffer of data it installs is treated as
non-writeable.
CPU_ISSET(), CPU_SET etc. in sparc64 asm. This approach has the
benefit of not clobbering %y, allowing to revert r222827 and
partially r222828.
- In r222828, CATR() already was changed to use the equivalent of
PCPU_GET(cpuid) instead of the MD module ID for KTR_CPU, so
belatedly also catch up with the C side of ktr(9). Originally,
in r203838 CATR() was moved away from directly reading the
module ID or equivalent as that became impractical with other
CPU types than USI/II supported. With r222828 in place, per-CPU
data generally is set up soon enough, though, that employing
PCPU things in ktr(9) also for use during early stages works.
- Unfortunately, an exception to the latter is the ktr(9) use
in pmap_bootstrap(), which actually is run so early that even
checking for bootverbose being set via the loader doesn't work.
Consequently, replace the ktr(9) use in pmap_bootstrap() with
OF_printf(9) and put it under #ifdef DIAGNOSTIC instead.
MFC after: 3 days
sysent.
sv_prepsyscall is unused.
sv_sigsize and sv_sigtbl translate signal number from the FreeBSD
namespace into the ABI domain. It is only utilized on i386 for iBCS2
binaries. The issue with this approach is that signals for iBCS2 were
delivered with the FreeBSD signal frame layout, which does not follow
iBCS2. The same note is true for any other potential user if
sv_sigtbl. In other words, if ABI needs signal number translation, it
really needs custom sv_sendsig method instead.
Sponsored by: The FreeBSD Foundation
- Add a kvaddr_type to represent kernel virtual addresses instead of
unsigned long.
- Add a struct kvm_nlist which is a stripped down version of struct nlist
that uses kvaddr_t for n_value.
- Add a kvm_native() routine that returns true if an open kvm descriptor
is for a native kernel and memory image.
- Add a kvm_open2() function similar to kvm_openfiles(). It drops the
unused 'swapfile' argument and adds a new function pointer argument for
a symbol resolving function. Native kernels still use _fdnlist() from
libc to resolve symbols if a resolver function is not supplied, but cross
kernels require a resolver.
- Add a kvm_nlist2() function similar to kvm_nlist() except that it uses
struct kvm_nlist instead of struct nlist.
- Add a kvm_read2() function similar to kvm_read() except that it uses
kvaddr_t instead of unsigned long for the kernel virtual address.
- Add a new kvm_arch switch of routines needed by a vmcore backend.
Each backend is responsible for implementing kvm_read2() for a given
vmcore format.
- Use libelf to read headers from ELF kernels and cores (except for
powerpc cores).
- Add internal helper routines for the common page offset hash table used
by the minidump backends.
- Port all of the existing kvm backends to implement a kvm_arch switch and
to be cross-friendly by using private constants instead of ones that
vary by platform (e.g. PAGE_SIZE). Static assertions are present when
a given backend is compiled natively to ensure the private constants
match the real ones.
- Enable all of the existing vmcore backends on all platforms. This means
that libkvm on any platform should be able to perform KVA translation
and read data from a vmcore of any platform.
Tested on: amd64, i386, sparc64 (marius)
Differential Revision: https://reviews.freebsd.org/D3341
Since r289279 bufinit() uses mp_ncpus so adapt to what x86 does and
set this variable already in cpu_mp_setmaxid().
While at it, rename cpu_cpuid_prop() to cpu_portid_prop() as well as
the MD cpuid variable to portid to avoid confusion with the MI use
of "cpuid" and make some variable static/global in order to reduce
stack usage.
PR: 204685
- While at it, arrange #ifndefs in kern_dump.c more intelligently; it's
rather confusing to have multiple competing and/or unused functions in
the kernel.
This will enable the elimination of a workaround in the USB driver that
artifically allocates buffers twice as big as they need to be (which
actually saves memory for very small buffers on the buggy platforms).
When deciding how to allocate a dma buffer, armv4, armv6, mips, and
x86/iommu all correctly check for the tag alignment <= maxsize as enabling
simple uma/malloc based allocation. Powerpc, sparc64, x86/bounce, and
arm64/bounce were all checking for alignment < maxsize; on those platforms
when alignment was equal to the max size it would fall back to page-based
allocators even for very small buffers.
This change makes all platforms use the <= check. It should be noted that
on all platforms other than arm[v6] and mips, this check is relying on
undocumented behavior in malloc(9) that if you allocate a block of a given
size it will be aligned to the next larger power-of-2 boundary. There is
nothing in the malloc(9) man page that makes that explicit promise (but the
busdma code has been relying on this behavior all along so I guess it works).
Arm and mips code uses the allocator in kern/subr_busdma_buffalloc.c, which
does explicitly implement this promise about size and alignment. Other
platforms probably should switch to the aligned allocator.
linkers no longer raise an error when undefined weak symbols are
found, but relocate as if the symbol value was 0. Note that we do not
repeat the mistake of userspace dynamic linker of making the symbol
lookup prefer non-weak symbol definition over the weak one, if both
are available. In fact, kernel linker uses the first definition
found, and ignores duplicates.
Signature of the elf_lookup() and elf_obj_lookup() functions changed
to split result/error code and the symbol address returned.
Otherwise, it is impossible to return zero address as the symbol
value, to MD relocation code. This explains the mechanical changes in
elf_machdep.c sources.
The powerpc64 R_PPC_JMP_SLOT handler did not checked error from the
lookup() call, the patch leaves the code as is (untested).
Reported by: glebius
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
of PCI-EBus-bridges actually match the BARs as specified in and
required by [1, p. 113 f.]. Doing so earlier would have simplified
diagnosing a bug in QEMU/OpenBIOS getting the mapping of child
addresses wrong, which still needs to be fixed there.
In theory, we could try to change the BARs accordingly if we hit
this problem. However, at least with real machines changing the
decoding likely won't work, especially if the PCI-EBus-bridge is
beneath an APB one. So implementing such functionality generally
is rather pointless.
- Actually change the allocation type of EBus resources if they
change from SYS_RES_MEMORY to SYS_RES_IOPORT when mapping them
to PCI ranges in ebus_alloc_resource() and passing them up to
bus_activate_resource(9). This may happen with the QEMU/OpenBIOS
PCI-EBus-bridge but not real ones. Still, this is only cleans up
the code and the result of resource allocation and activation is
unchanged.
- Change the remainder of printf(9) to device_printf(9) calls and
canonicalize their wording.
MFC after: 1 week
Peripheral Component Interconnect Input Output Controller,
Part No.: 802-7837-01, Sun Microelectronics, March 1997 [1]
Formally pair store_rel(&smp_started) with load_acq(&smp_started).
Similarly to x86, this change is mostly a NOP due to the kernel
being run in total store order.
MFC after: 1 week
drivers into the revived sys/sparc64/pci/ofw_pci.c, previously already
serving a similar purpose. This has been done with sun4v in mind, which
explains a) the otherwise not that obvious scheme employed and b) why
reusing sys/powerpc/ofw/ofw_pci.c was even lesser an option.
- Add a workaround for QEMU once again not emulating real machines, in
this case by not providing the OFW_PCI_CS_MEM64 range. [1]
Submitted by: jhb [1]
MFC after: 1 week
running thread.
It is currently implemented only on amd64 and i386; on these
architectures, it is implemented by raising an NMI on the CPU on which
the target thread is currently running. Unlike stack_save_td(), it may
fail, for example if the thread is running in user mode.
This change also modifies the kern.proc.kstack sysctl to use this function,
so that stacks of running threads are shown in the output of "procstat -kk".
This is handy for debugging threads that are stuck in a busy loop.
Reviewed by: bdrewery, jhb, kib
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3256
The only operation which is prevented by the hold is the kernel stack
swapout for the faulted thread, which should be fine to allow.
Remove useless checks for NULL curproc or curproc->p_vmspace from the
trap_pfault() wrappers on x86 and powerpc.
Reviewed by: alc (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
vm_offset_t pmap_quick_enter_page(vm_page_t m)
void pmap_quick_remove_page(vm_offset_t kva)
These will create and destroy a temporary, CPU-local KVA mapping of a specified page.
Guarantees:
--Will not sleep and will not fail.
--Safe to call under a non-sleepable lock or from an ithread
Restrictions:
--Not guaranteed to be safe to call from an interrupt filter or under a spin mutex on all platforms
--Current implementation does not guarantee more than one page of mapping space across all platforms. MI code should not make nested calls to pmap_quick_enter_page.
--MI code should not perform locking while holding onto a mapping created by pmap_quick_enter_page
The idea is to use this in busdma, for bounce buffer copies as well as virtually-indexed cache maintenance on mips and arm.
NOTE: the non-i386, non-amd64 implementations of these functions still need review and testing.
Reviewed by: kib
Approved by: kib (mentor)
Differential Revision: http://reviews.freebsd.org/D3013
from x86 to use smp_ipi_mtx spin lock not only for smp_rendezvous_cpus()
but also for the MD cache invalidation, TLB demapping and remote register
reading IPIs due to the following reasons:
- The cross-IPI SMP deadlock x86 otherwise is subject to can't happen on
sparc64. That's because on sparc64, spin locks don't disable interrupts
completely but only raise the processor interrupt level to PIL_TICK. This
means that IPIs still get delivered and direct dispatch IPIs such as the
cache invalidation etc. IPIs in question are still executed.
- In smp_rendezvous_cpus(), smp_ipi_mtx is held not only while sending an
IPI_RENDEZVOUS, but until all CPUs have processed smp_rendezvous_action().
Consequently, smp_ipi_mtx may be locked for an extended amount of time as
queued IPIs (as opposed to the direct ones) such as IPI_RENDEZVOUS are
scheduled via a soft interrupt. Moreover, given that this soft interrupt
is only delivered at PIL_RENDEZVOUS, processing of smp_rendezvous_action()
on a target may be interrupted by f. e. a tick interrupt at PIL_TICK, in
turn leading to the target in question trying to send an IPI by itself
while IPI_RENDEZVOUS isn't fully handled, yet, and, thus, resulting in a
deadlock.
o As mentioned in the commit message of r245850, on least some sun4u platforms
concurrent sending of IPIs by different CPUs is fatal. Therefore, hold the
reintroduced MD ipi_mtx also while delivering cross-traps via MI helpers,
i. e. ipi_{all_but_self,cpu,selected}().
o Akin to x86, let the last CPU to process cpu_mp_bootstrap() set smp_started
instead of the BSP in cpu_mp_unleash(). This ensures that all APs actually
are started, when smp_started is no longer 0.
o In all MD and MI IPI helpers, check for smp_started == 1 rather than for
smp_cpus > 1 or nothing at all. This avoids races during boot causing IPIs
trying to be delivered to APs that in fact aren't up and running, yet.
While at it, move setting of the cpu_ipi_{selected,single}() pointers to
the appropriate delivery functions from mp_init() to cpu_mp_start() where
it's better suited and allows to get rid of the global isjbus variable.
o Given that now concurrent IPI delivery no longer is possible, also nuke
the delays before completely disabling interrupts again in the CPU-specific
cross-trap delivery functions, previously giving other CPUs a window for
sending IPIs on their part. Actually, we now should be able to entirely get
rid of completely disabling interrupts in these functions. Such a change
needs more testing, though.
o In {s,}tick_get_timecount_mp(), make the {s,}tick variable static. While not
necessary for correctness, this avoids page faults when accessing the stack
of a foreign CPU as {s,}tick now is locked into the TLBs as part of static
kernel data. Hence, {s,}tick_get_timecount_mp() always execute as fast as
possible, avoiding jitter.
PR: 201245
MFC after: 3 days
If KSTACK_PAGES was changed to anything alse than the default,
the value from param.h was taken instead in some places and
the value from KENRCONF in some others. This resulted in
inconsistency which caused corruption in SMP envorinment.
Ensure all places where KSTACK_PAGES are used the opt_kstack_pages.h
is included.
The file opt_kstack_pages.h could not be included in param.h
because was breaking the toolchain compilation.
Reviewed by: kib
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3094
provide a semantic defined by the C11 fences with corresponding
memory_order.
atomic_thread_fence_acq() gives r | r, w, where r and w are read and
write accesses, and | denotes the fence itself.
atomic_thread_fence_rel() is r, w | w.
atomic_thread_fence_acq_rel() is the combination of the acquire and
release in single operation. Note that reads after the acq+rel fence
could be made visible before writes preceeding the fence.
atomic_thread_fence_seq_cst() orders all accesses before/after the
fence, and the fence itself is globally ordered against other
sequentially consistent atomic operations.
Reviewed by: alc
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Thread credentials are maintained as follows: each thread has a pointer to
creds and a reference on them. The pointer is compared with proc's creds on
userspace<->kernel boundary and updated if needed.
This patch introduces a counter which can be compared instead, so that more
structures can use this scheme without adding more comparisons on the boundary.
Native ABI do not need signal conversion, only emulators may want this. Usually
emulators implements its own sv_sendsig method. For now only ibcs2 emulator does
not have own sv_sendsig implementation and depends on native sendsig() method.
So, remove any extra attempts to convert signal numbers from native sendsig()
methods except from i386 where ibsc2 is living.
The replacement started at r283088 was necessarily incomplete without
replacing boolean_t with bool. This also involved cleaning some type
mismatches and ansifying old C function declarations.
Pointed out by: bde
Discussed with: bde, ian, jhb
needs to be enabled by adding "kern.racct.enable=1" to /boot/loader.conf.
Differential Revision: https://reviews.freebsd.org/D2407
Reviewed by: emaste@, wblock@
MFC after: 1 month
Relnotes: yes
Sponsored by: The FreeBSD Foundation
This is needed with the pl011 driver. Before this change it would default
to a shift of 0, however the hardware places the registers at 4-byte
addresses meaning the value should be 2.
This patch fixes this for the pl011 when configured using the fdt. The
other drivers have a default value of 0 to keep this a no-op.
MFC after: 1 week
and export them to userland.
- Define __HAVE_REG32 on platforms that define a reg32 structure and check
for this in <sys/procfs.h> to control when to export prstatus32, etc.
- Add prstatus32_t and prpsinfo32_t typedefs for the 32-bit structures.
libbfd looks for these types, and having them fixes 'gcore' in gdb of a
32-bit process on a 64-bit platform.
- Use the structure definitions from <sys/procfs.h> in gcore's elf32 core
dump code instead of duplicating the definitions.
Differential Revision: https://reviews.freebsd.org/D2142
Reviewed by: kib, nathanw (powerpc bits)
MFC after: 1 week
A couple of internal functions used by malloc(9) and uma truncated
a size_t down to an int. This could cause any number of issues
(e.g. indefinite sleeps, memory corruption) if any kernel
subsystem tried to allocate 2GB or more through malloc. zfs would
attempt such an allocation when run on a system with 2TB or more
of RAM.
Note to self: When this is MFCed, sparc64 needs the same fix.
Differential revision: https://reviews.freebsd.org/D2106
Reviewed by: kib
Reported by: Michael Fuckner <michael@fuckner.net>
Tested by: Michael Fuckner <michael@fuckner.net>
MFC after: 2 weeks
const. On x86, even after the machine context is supposedly read into
the struct ucontext, lazy FPU state save code might only mark the FPU
data as hardware-owned. Later, set_fpcontext() needs to fetch the
state from hardware, modifying the *mcp.
The set_mcontext(9) is called from sigreturn(2) and setcontext(2)
implementations and old create_thread(2) interface, which throw the
*mcp out after the set_mcontext() call.
Reported by: dim
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
for i386, and from the code inspection, nothing in the
arm/mips/sparc64 implementations depends on it.
Discussed with: imp, nwhitehorn
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
code in sys/kern/kern_dump.c. Most dumpsys() implementations are nearly
identical and simply redefine a number of constants and helper subroutines;
a generic implementation will make it easier to implement features around
kernel core dumps. This change does not alter any minidump code and should
have no functional impact.
PR: 193873
Differential Revision: https://reviews.freebsd.org/D904
Submitted by: Conrad Meyer <conrad.meyer@isilon.com>
Reviewed by: jhibbits (earlier version)
Sponsored by: EMC / Isilon Storage Division
WITNESS and INVARIANTS checking, which are known to have significant
performance impact on running systems. When benchmarking new features
this kernel should be used instead of the standard GENERIC.
This kernel configuration should never appear outside of the HEAD
of the FreeBSD tree.
It is automatically set when -fPIC is passed to the compiler.
Reviewed by: dim, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D1179
have chosen different (and more traditional) stateless/statuful
NAT64 as translation mechanism. Last non-trivial commits to both
faith(4) and faithd(8) happened more than 12 years ago, so I assume
it is time to drop RFC3142 in FreeBSD.
No objections from: net@
and casuword(9), but do not mix value read and indication of fault.
I know (or remember) enough assembly to handle x86 and powerpc. For
arm, mips and sparc64, implement fueword() and casueword() as wrappers
around fuword() and casuword(), which means that the functions cannot
distinguish between -1 and fault.
On architectures where fueword() and casueword() are native, implement
fuword() and casuword() using fueword() and casuword(), to reduce
assembly code duplication.
Sponsored by: The FreeBSD Foundation
Tested by: pho
MFC after: 2 weeks (ia64 needs treating)