These changes prevent sysctl(8) from returning proper output,
such as:
1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.
Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.
MFC after: 2 weeks
Sponsored by: Mellanox Technologies
the queue where to enqueue pages that are going to be unwired.
- Add stronger checks to the enqueue/dequeue for the pagequeues when
adding and removing pages to them.
Of course, for unmanaged pages the queue parameter of vm_page_unwire() will
be ignored, just as the active parameter today.
This makes adding new pagequeues quicker.
This change effectively modifies the KPI. __FreeBSD_version will be,
however, bumped just when the full cache of free pages will be
evicted.
Sponsored by: EMC / Isilon storage division
Reviewed by: alc
Tested by: pho
portsnap extract, where previously it would panic.. clearly someone
who knows pmap should optimize this code per alc's comment...
Submitted by: alc
MFC after: probably
a partially populated reservation becomes fully populated, and decrease this
field when a fully populated reservation becomes partially populated.
Use this field to simplify the implementation of pmap_enter_object() on
amd64, arm, and i386.
On all architectures where we support superpages, the cost of creating a
superpage mapping is roughly the same as creating a base page mapping. For
example, both kinds of mappings entail the creation of a single PTE and PV
entry. With this in mind, use the page size field to make the
implementation of vm_map_pmap_enter(..., MAP_PREFAULT_PARTIAL) a little
smarter. Previously, if MAP_PREFAULT_PARTIAL was specified to
vm_map_pmap_enter(), that function would only map base pages. Now, it will
create up to 96 base page or superpage mappings.
Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division
(XScale mainly) expects the memory located before the kernel to be mapped,
and use it to allocate the page tables, the various stacks, etc.
A better fix would probably be to rewrite the various bla_machdep.c to stop
using that RAM, but I'm not so inclined to do it, especially since I don't
have hardware for all of them.
Use armv7_setttb that sets proper PT attributes.
Get rid of unused CPU functions, put nullop instead.
Exchange obsolete pj4b_/arm11_ functions to the appropriate armv7_ ones.
memory ordering model allows writes to different devices to complete out
of order, leading to a situation where the write that clears an interrupt
source at a device can complete after a write that unmasks and EOIs the
interrupt at the interrupt controller, leading to a spurious re-interrupt.
This adds a generic barrier function specific to the needs of interrupt
controllers, and calls that function from the GIC and TI AINTC controllers.
There may still be other soc-specific controllers that need to make the call.
Reviewed by: cognet, Svatopluk Kraus <onwahe@gmail.com>
MFC after: 3 days
shared flag is set on normal-memory mappings made via pmap_kenter() for SMP.
The "shared flag" part of this change isn't obvious from the diff, here's
the deal... by using the array of preformatted page table entry templates
instead of constructing the PTE from scratch, we automatically get the
right attribute bits set for both caching and shared.
MFC after: 1 week
platform code, it is expected these will be merged in the future when the
ARM code is more complete.
Until more boards can be tested only use this with the Raspberry Pi and
rrename the functions on the other SoCs.
Reviewed by: ian@
Here, "suitably endowed" means that the System Control Coprocessor
(#15) has Performance Monitoring Registers, including a CCNT (Cycle
Count) register.
The CCNT register is used in a way similar to the TSC register in
x86 processors by the get_cyclecount(9) function. The entropy-harvesting
thread is a heavy user of this function, and will benefit from not
having to call binuptime(9) instead.
One problem with the CCNT register is that it is 32-bit only, so
the upper 32-bits of the returned number are always 0. The entropy
harvester does not care, but in case any one else does, follow-up
work may include an interrup trap to increment an upper-32-bit
counter on CCNT overflow.
Another problem is that the CCNT register is not readable in user-mode
code; in can be made readable by userland, but then it is also
writable, and so is a good chunk of the PMU system. For that reason,
the CCNT is not enabled for user-mode access in this commit.
Like the x86, there is one CCNT per core, so they don't all run in
perfect sync.
Reviewed by: ian@ (an earlier version)
Tested by: ian@ (same earlier version)
Committed from: WANDBOARD-QUAD
that and the need to be in a critical section when switching to idleclock
mode for event timers, use spinlock_enter()/exit() to achieve both needs.
The ARM WFI (wait for interrupt) instruction blocks until an interrupt is
asserted, and it will unblock even if interrupts are masked, and it will
unblock immediately if an interrupt is already pending. It is necessary
to execute it with interrupts disabled, otherwise the interrupt that
should unblock it may occur and be serviced just prior to executing the
instruction. At that point the system is inappropriately asleep until
the next timer tick or some other random interrupt happens.
In general, interrupts need to be disabled continuously from the time the
decision is made that there is no work to be done and sleeping is needed
until actually going to sleep, to avoid a race where handling a new
interrupt changes the basis for deciding there is no work to be done.
Submitted by: hps@ (in slightly different form)
On modern ARM SoCs the L2 cache controller sits between the CPU and the
AXI bus, and most on-chip memory-mapped devices are on the AXI bus. We
map the device registers using the 'Device' memory attribute, which means
the memory is not cached, but writes to it are buffered. Ensuring that a
write has made it all the way to a device may require that the L2
controller take some action.
There is currently only one implementation of the new function, for the
PL310 cache controller. It invokes a function that the controller
manual calls "cache sync" but it actually has nothing to do with cache at
all, it triggers a drain of all pending store buffer writes and it blocks
until they complete.
The sheeva and xscale L2 controllers (which predate the concept of Device
memory) don't seem to have a corresponding function. It appears that the
standard armv5 drain_writebuf function includes draining all the way
through the L2 controller.
Remove some other ifdefs that came in with a copy/paste that mean basically
"if this processor supports multicore stuff", because if you're starting up
an AP core... it does.
case where the controller is already enabled.
Some of the pl310 configuration registers cannot be changed while the
controller is active, so if there is any platform-specific init to be done
it must happen before enabling the controller.
The controller should not be enabled upon entry to the kernel, but u-boot
has recently developed the bad habit of leaving caches enabled when
launching the kernel, and since we have no control over that source code
we have to do our best to cope with it. The PL310 manual doesn't document
a safe sequence for disabling the controller, but the sequence used here
(force write-through mode and disable linefill allocations, then clean and
invalidate the current contents before disabling the hardware) appears to
be sound both by analysis and empirical testing.
These changes were developed and tested in collaboration with
Svatopluk Kraus <onwahe@gmail.com>.
Reviewed by: cognet@
Flushing the caches is required before doing a panic dump, but ARM
doesn't provide a flavor of flush that gets broadcast to other cores.
However, all cores except one are stopped before doing a dump, so this
works around the lack of a global flush/invalidate by doing it locally
on each CPU as part of stopping.
Discussed with: cognet@
using armv7_idcache_wbinv_all, because wbinv_all doesn't broadcast the
operation to other cores. In elf_cpu_load_file() use icache_sync_all()
and explain why it's needed (and why other sync operations aren't).
As part of doing this, all callers of cpu_icache_sync_all() were
inspected to ensure they weren't relying on the old side effect of
doing a wbinv_all along with the icache work.
the cpufreq code. Replace its use with smp_started. There's at least
one userland tool that still looks at the kern.smp.active sysctl, so
preserve it but point it to smp_started as well.
Discussed with: peter, jhb
MFC after: 3 days
Obtained from: Netflix
- These were needed on armv4/5 (VIVT cache), not needed on armv6.
- The wbinv_all call can't be used on SMP systems; cache operations by
set/way are not broadcast to other cores.
- The TLB maintenance operations needed for pmap_growkernel() happen in
pmap_grow_l2_bucket(), so there's no need to flush all TLB entries at
the end.
- There may not be any need for the TLB flush at the beginning of
pmap_release(), but it's left in for now pending more investigation.
Pointed out by: Svatopluk Kraus <onwahe@gmail.com>
Discussed with: cognet@
These should have been part of r264129, they are part of the overall set
of changes that got several weeks of testing. I must have fumbled them
while merging various patchsets.
- Add cpu_cpwait to comply with the convention.
- Add missing TLB invalidations, especially in pmap_kenter & pmap_kremove
with distinguishing between D and ID pages.
- Modify pmap init/bootstrap invalidations to ID, just to be safe.
- Fix TLB-inv and PTE_SYNC ordering.
This combines changes submitted by ian@, cognet@, and Wojciech Macek,
which have all been tested together as a unit.
Perform sychronization (by "isb" barrier) after TTB is set. This
is done to ensure that TLB invalidation always executes after
TTB modification and operates on valid CP15 data (per specification).
Submitted by: Wojciech Macek <wma@semihalf.com>
Reviewed by: ian@, cognet@
We don't know our ARM security state, so one of them will operate.
- Don't set frequency, since it's unpossible in non-secure state.
Only rely on DTS clock-frequency value or get clock from timer.
Discussed with: ian, cognet
description was eaten by the dog (or an editor crash or something).
Add variable-frequency support to the arm mpcore eventtimer driver.
This allows a platform's early init code to tell the mpcore driver that the
clock frequency can vary. That causes the mpcore driver to register an
eventtimer, but not a timecounter. The platform has to provide a time
counter using some other fixed-frequency clock, but can still use the
per-cpu goodness of the mpcore hardware for event timers.
When the platform support code does something to change the frequency of
the CPU clocks (power saving, thermal management) it must tell the mpcore
driver code about it using arm_tmr_change_frequency().
register values, then restart the timer. This prevents a situation where
an old event fires just as we're about to load a new value into the timer,
when the start routine is called to change the time of the current event.
Also re-nest the parens properly for casting the result of converting
time and frequency to a count. This doesn't actually change the result of
the calcs, but will some day prevent a loss-of-precision warning on the
assignment, if that warning gets enabled.
* Save the required VFP registers on context switch. If the exception bit
is set we need to save and restore the FPINST register, and if the fp2v
bit is also set we need to save and restore FPINST2.
* Move saving and restoring the floating point control registers to C.
* Clear the fpexc exception and fp2v flags on a floating-point exception.
* Signal a SIGFPE if the fpexc exception flag is set on an undefined
instruction. This is how the ARM core signals to software there is a
floating-point exception.
To reduce the diff struct pcu.cnt field was not renamed, so
PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in
kvm(3) and vmstat(8). The goal was to not affect externally used KPI.
Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the
the global cnt variable.
Exp-run revealed no ports using it directly.
No objection from: arch@
Sponsored by: EMC / Isilon Storage Division