Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Formally pair store_rel(&smp_started) with load_acq(&smp_started).
Similarly to x86, this change is mostly a NOP due to the kernel
being run in total store order.
MFC after: 1 week
from x86 to use smp_ipi_mtx spin lock not only for smp_rendezvous_cpus()
but also for the MD cache invalidation, TLB demapping and remote register
reading IPIs due to the following reasons:
- The cross-IPI SMP deadlock x86 otherwise is subject to can't happen on
sparc64. That's because on sparc64, spin locks don't disable interrupts
completely but only raise the processor interrupt level to PIL_TICK. This
means that IPIs still get delivered and direct dispatch IPIs such as the
cache invalidation etc. IPIs in question are still executed.
- In smp_rendezvous_cpus(), smp_ipi_mtx is held not only while sending an
IPI_RENDEZVOUS, but until all CPUs have processed smp_rendezvous_action().
Consequently, smp_ipi_mtx may be locked for an extended amount of time as
queued IPIs (as opposed to the direct ones) such as IPI_RENDEZVOUS are
scheduled via a soft interrupt. Moreover, given that this soft interrupt
is only delivered at PIL_RENDEZVOUS, processing of smp_rendezvous_action()
on a target may be interrupted by f. e. a tick interrupt at PIL_TICK, in
turn leading to the target in question trying to send an IPI by itself
while IPI_RENDEZVOUS isn't fully handled, yet, and, thus, resulting in a
deadlock.
o As mentioned in the commit message of r245850, on least some sun4u platforms
concurrent sending of IPIs by different CPUs is fatal. Therefore, hold the
reintroduced MD ipi_mtx also while delivering cross-traps via MI helpers,
i. e. ipi_{all_but_self,cpu,selected}().
o Akin to x86, let the last CPU to process cpu_mp_bootstrap() set smp_started
instead of the BSP in cpu_mp_unleash(). This ensures that all APs actually
are started, when smp_started is no longer 0.
o In all MD and MI IPI helpers, check for smp_started == 1 rather than for
smp_cpus > 1 or nothing at all. This avoids races during boot causing IPIs
trying to be delivered to APs that in fact aren't up and running, yet.
While at it, move setting of the cpu_ipi_{selected,single}() pointers to
the appropriate delivery functions from mp_init() to cpu_mp_start() where
it's better suited and allows to get rid of the global isjbus variable.
o Given that now concurrent IPI delivery no longer is possible, also nuke
the delays before completely disabling interrupts again in the CPU-specific
cross-trap delivery functions, previously giving other CPUs a window for
sending IPIs on their part. Actually, we now should be able to entirely get
rid of completely disabling interrupts in these functions. Such a change
needs more testing, though.
o In {s,}tick_get_timecount_mp(), make the {s,}tick variable static. While not
necessary for correctness, this avoids page faults when accessing the stack
of a foreign CPU as {s,}tick now is locked into the TLBs as part of static
kernel data. Hence, {s,}tick_get_timecount_mp() always execute as fast as
possible, avoiding jitter.
PR: 201245
MFC after: 3 days
reading registers from other CPUs. As it turns out, the hardware doesn't
really like concurrent IPI'ing causing adverse effects. Also the thought
deadlock when using this spin lock here and the targeted CPU(s) are also
holding or in case of nested locks can't actually happen. This is due to
the fact that on sparc64, spinlock_enter() only raises the PIL but doesn't
disable interrupts completely. Thus direct cross calls as used for the
register reading (and all other MD IPI needs) still will be executed by
the targeted CPU(s) in that case.
MFC after: 3 days
other CPUs doesn't require locking so get rid of it. As the latter is used
for the timecounter on certain machine models, using a spin lock in this
case can lead to a deadlock with the upcoming callout(9) rework.
- Merge r134227/r167250 from x86:
Avoid cross-IPI SMP deadlock by using the smp_ipi_mtx spin lock not only
for smp_rendezvous_cpus() but also for the MD cache invalidation and TLB
demapping IPIs.
- Mark some unused function arguments as such.
MFC after: 1 week
have to ignore it when sending the IPI anyway. Actually I can't think of
a good reason why this ever was done that way in the first place as it's
not even usefull for debugging.
While at it replace the use of pc_other_cpus as it's slated for deorbit.
must not, otherwise we tell the CPU to IPI itself, which the sun4u CPUs don't
support. For reasons unknown so far MD and MI IPI use actually still triggers
that assertion though.
The main goal of this is to generate timer interrupts only when there is
some work to do. When CPU is busy interrupts are generating at full rate
of hz + stathz to fullfill scheduler and timekeeping requirements. But
when CPU is idle, only minimum set of interrupts (down to 8 interrupts per
second per CPU now), needed to handle scheduled callouts is executed.
This allows significantly increase idle CPU sleep time, increasing effect
of static power-saving technologies. Also it should reduce host CPU load
on virtualized systems, when guest system is idle.
There is set of tunables, also available as writable sysctls, allowing to
control wanted event timer subsystem behavior:
kern.eventtimer.timer - allows to choose event timer hardware to use.
On x86 there is up to 4 different kinds of timers. Depending on whether
chosen timer is per-CPU, behavior of other options slightly differs.
kern.eventtimer.periodic - allows to choose periodic and one-shot
operation mode. In periodic mode, current timer hardware taken as the only
source of time for time events. This mode is quite alike to previous kernel
behavior. One-shot mode instead uses currently selected time counter
hardware to schedule all needed events one by one and program timer to
generate interrupt exactly in specified time. Default value depends of
chosen timer capabilities, but one-shot mode is preferred, until other is
forced by user or hardware.
kern.eventtimer.singlemul - in periodic mode specifies how much times
higher timer frequency should be, to not strictly alias hardclock() and
statclock() events. Default values are 2 and 4, but could be reduced to 1
if extra interrupts are unwanted.
kern.eventtimer.idletick - makes each CPU to receive every timer interrupt
independently of whether they busy or not. By default this options is
disabled. If chosen timer is per-CPU and runs in periodic mode, this option
has no effect - all interrupts are generating.
As soon as this patch modifies cpu_idle() on some platforms, I have also
refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions
(if supported) under high sleep/wakeup rate, as fast alternative to other
methods. It allows SMP scheduler to wake up sleeping CPUs much faster
without using IPI, significantly increasing performance on some highly
task-switching loads.
Tested by: many (on i386, amd64, sparc64 and powerc)
H/W donated by: Gheorghe Ardelean
Sponsored by: iXsystems, Inc.
td_critnest > 1 when not already running on the desired CPU read the
TICK counter of the BSP via a direct cross trap request in that case
instead.
- Treat the STICK based timecounter the same way as the TICK based one
regarding its quality and obtaining the counter value from the BSP.
Like the TICK timers the STICK ones also are only synchronized during
their startup (which might not result in good synchronicity in the
first place) but not afterwards and might drift over time, causing
problems when the time is read from different CPUs (see r135972).
to single CPUs more efficiently with Cheetah(-class) and Jalapeno CPUs.
Besides being used to implement the ipi_cpu() introduced in r210939,
cpu_ipi_single() will also be used internally by the sparc64 MD code.
- Factor out the Jalapeno support from the Cheetah IPI send functions
in order to be able to more easily and efficiently implement support
for more than 32 target CPUs as well as a workaround for Cheetah+
erratum 25 for the latter.
IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that
constructed a mask for a single CPU with calls to ipi_cpu() instead. This
will matter more in the future when we transition from cpumask_t to
cpuset_t for CPU masks in which case building a CPU mask is more expensive.
Submitted by: peter, sbruno
Reviewed by: rookie
Obtained from: Yahoo! (x86)
MFC after: 1 month
between determining the other CPUs and calling cpu_ipi_selected(), which
apart from generally doing the wrong thing can lead to a panic when a
CPU is told to IPI itself (which sun4u doesn't support).
Reported and tested by: Nathaniel W Filardo
- Add __unused where appropriate.
MFC after: 3 days
but also of different types, f.e. Sun Fire V890 can be equipped with a
mix of UltraSPARC IV and IV+ CPUs, requiring different MMU initialization
and different workarounds for model specific errata. Therefore move the
CPU implementation number from a global variable to the per-CPU data.
Functions which are called before the latter is available are passed the
implementation number as a parameter now.
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions. This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Adds an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding
This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.
Please don't forget to update your config file with the STOP_NMI
option removal
Reviewed by: jhb
Tested by: pho, bz, rink
Approved by: re (kib)
disable interrupts and loop forever with these.
- Hide all MP-related bits in <machine/smp.h> underneath #ifdef SMP.
- Inline ipi_all_but_self(9) and ipi_selected(9). We don't expose any
additional bits but save a few cycles by doing so.
- Remove ipi_all(9), which actually only called panic(9). It can't be
implemented natively anyway and having it removed at least causes
MI users to fail already fail when linking.
frequencies (and having different cache sizes) so use the STICK
(System TICK) timer, which was introduced due to this and is
driven by the same frequency across all CPUs, instead of the
TICK timer, whose frequency varies with the CPU clock, to drive
hardclock. We try to use the STICK counter with all CPUs that are
USIII or beyond, even when not necessary due to identical CPUs,
as we can can also avoid the workaround for the BlackBird erratum
#1 there. Unfortunately, using the STICK counter currently causes
a hang with USIIIi MP machines for reasons unknown, so we still
use the TICK timer there (which is okay as they can only consist
of identical CPUs).
- Given that we only (try to) synchronize the (S)TICK timers of APs
with the BSP during startup, we could end up spinning forever in
DELAY(9) if that function is migrated to another CPU while we're
spinning due to clock drift afterwards, so pin to the CPU in order
to avoid migration. Unfortunately, pinning doesn't work at the
point DELAY(9) is required by the low-level console drivers, yet,
so switch to a function pointer, which is updated accordingly, for
implementing DELAY(9). For USIII and beyond, this would also allow
to easily use the STICK counter instead of the TICK one here,
there's no benefit in doing so however.
While at it, use cpu_spinwait(9) for spinning in the delay-
functions. This currently is a NOP though.
- Don't set the TICK timer of the BSP to 0 during at startup as
there's no need to do so.
- Implement cpu_est_clockrate().
- Unfortunately, USIIIi-based machines don't provide a timecounter
device besides the STICK and TICK counters (well, in theory the
Tomatillo bridges have a performance counter that can be (ab)used
as timecounter by configuring it to count bus cycles, though unlike
the performance counter of Schizo bridges, the Tomatillo one is
broken and counts Sun knows what in this mode). This means that
we've to use a (S)TICK counter for timecounting, which has the old
problem of not being in sync across CPUs, so provide an additional
timecounter function which binds itself to the BSP but has an
adequate low priority.
These CPUs use an enhanced layout of the interrupt vector dispatch
and dispatch status registers in order to allow sending IPIs to
multiple targets simultaneously. Thus support for these CPUs was
put in a newly added cheetah_ipi_selected(). This is intended to
be pointed to by cpu_ipi_selected, which now is a function pointer,
in order to avoid cpu_impl checks once booted. Alternatively it
can point to spitfire_ipi_selected(), which was renamed from
cpu_ipi_selected(). Consequently cpu_ipi_send() was also renamed
to spitfire_ipi_send() (there's no need for a cheetah equivalent
of this so far). Initialization of the cpu_ipi_selected pointer
and other requirements is done in mp_init(), which was renamed
from mp_tramp_alloc(), as cpu_mp_start() isn't called on UP
systems while cpu_ipi_selected() is. As a side-effect this allows
to make mp_tramp static to sys/sparc64/sparc64/mp_machdep.c.
For the sake of avoiding #ifdef SMP and for keeping the history in
place cheetah_ipi_selected() and spitfire_ipi_{selected,send}()
where not put into/moved to sys/sparc64/sparc64/{cheetah,spitfire}.c
- Add some CTASSERTs and KASSERTs ensuring that MAXCPU doesn't
exceed the data types we use to store the CPU bit fields or the
number of USIII and greater CPUs supported by the current
cheetah_ipi_selected() implementation (which for JBus-CPUs is
only 4; that should be fine though as according to OpenSolaris
there are no sun4u machines with more than 4 JBus-CPUs).
- In cpu_mp_start() don't enumerate and start more than MAXCPU CPUs
as we can't handle more than that.
- In cpu_mp_start() check for upa-portid vs. portid depending on
cpu_impl for consistency with nexus(4).
- In spitfire_ipi_selected() add KASSERTs ensuring that a CPU isn't
told to IPI itself as sun4u CPUs just can't do that.
- In spitfire_ipi_send() do a MEMBAR #Sync after writing the
interrupt vector data as we want to make sure the payload was
actually written before we trigger the dispatch.
- In spitfire_ipi_send() also verify IDR_BUSY when checking whether
the dispatch was successful as it has to be cleared for this to
be the case.
- Remove some redundant variables.
referenced outside of mp_machdep.c
- Replace a magic 14 with the newly added IDC_ITID_SHIFT macro.
- Remove the global mp_boot_mid variable as it's not really necessary
and just replacing it with PCPU_GET(mid) doesn't have any impact on
performance once booted.
- Replace PCPU_GET(cpuid) with the curcpu shortcut.
- Replace hardcoded function names in panic strings etc with __func__
so they don't need to be updated when renaming the function.
- Use register_t instead of u_long for variables used to hold the
return value of intr_disable() so we don't need to apply any
knowledge about the actual width of that value here.
- Improve the wording of some comments.
- Fix several style(9) bugs.
current context in the IPI_STOP handler so that we can get accurate stack
traces of threads on other CPUs on these two archs like we do now on i386
and amd64.
Tested on: alpha, sparc64
longer than 'normal'. The cause is still being tracked down but
in the meantime there are machines where raising IPI_RETRIES does
help - it's not just a case of the machine staying locked up longer
and then panic-ing anyway. Several helpful folks on sparc64@ tried
a patch that helped figure out what to raise this number to.
Discussed on: sparc64@
MFC after: 3 days
on future UltraSPARC cpus for which the data cache is not direct mapped.
- Move UltraSPARC I and II (spitfire, blackbird, sapphire, sabre) specific
functions to spitfire.c, and add cheetah.c for UltraSPARC III specific
functions. Initially just cache flushing, but there are a few other
functions that will need to move here.
- Add an ipi handler for data cache flushing on UltraSPARC III.
- Use function pointers to select the right cache flushing functions based
on cpu_impl.
With this it is possible to boot single user from an mfs root on UltraSPARC
III systems, including spinning up secondary processors. There is currently
no support for the host to pci bridge, and no documentation for it is
publically available.
Thanks to Oleg Derevenetz for providing access to a system with UltraSPARC
III+ cpus.
itself; this causes undefined behaviour on UltraSPARCs. In particular,
the interrupt packet data words will not necessarily be delivered
correctly, which would result in a crash.
This bug also caused the cache-flushing work to be done twice on the
triggering CPU (when it did not cause crashes).
Reviewed by: jake
wait for those cpus, instead of all of them by using a count. Oops.
Make the pointer to the mask that the primary cpu spins on volatile, so
gcc doesn't optimize out an important load. Oops again.
Activate tlb shootdown ipi synchronization now that it works. We have
all involved cpus wait until all the others are done. This may not be
necessary, it is mostly for sanity.
Make the trigger level interrupt ipi handler work.
Submitted by: tmm
than the other implementations; we have complete control over the tlb, so we
only demap specific pages. We take advantage of the ranged tlb flush api
to send one ipi for a range of pages, and due to the pm_active optimization
we rarely send ipis for demaps from user pmaps.
Remove now unused routines to load the tlb; this is only done once outside
of the tlb fault handlers.
Minor cleanups to the smp startup code.
This boots multi user with both cpus active on a dual ultra 60 and on a
dual ultra 2.
on the loader to do it. Improve smp startup code to be less racy and to
defer certain things until the right time. This almost boots single user
on my dual ultra 60, it is still very fragile:
SMP: AP CPU #1 Launched!
Enter full pathname of shell or RETURN for /bin/sh:
# ls
Debugger("trapsig")
Stopped at Debugger+0x1c: ta %xcc, 1
db> heh
No such command
db>
cpu(s) into the kernel, and sync-ing them up to "kernel" mode so we can
send them ipis, which also work.
Thanks to John Baldwin for providing me with access to the hardware
that made this possible.
Parts obtained from: bsd/os
to a new architecture. This is the base of the sparc64 port, but contains
limited machine dependent code, and can be used a base for ports. Included
are:
- standard machine dependent headers, tweaked for a 64 bit, big endian
architecture, including empty versions of all the machine dependent
structures
- a machine independent atomic.h, which can be used until a port has
support for interrupts and the operations really need to be atomic
- stub versions of all the machine dependent functions, which panic
when called and print out the name of the function that needs to
be implemented. functions which are normally in assembly files are
not included, but this should reduce the number of different undefined
references on the first few compiles from hundreds to 5 or 6
Given minimal startup code and console support it should be trivial to
make this compile and run the first few sysinits on almost any architecture.
Requested by: alfred, imp, jhb