Commit Graph

238 Commits

Author SHA1 Message Date
ed
0c56cf839d Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
attilio
90682e14db Disable interrupt and preemption for smp_rendezvous() also in the
UP/!SMP case.
The callbacks may be relying on this feature and having 2 different
ways to deal with them is not correct.

Reported by:	rstone
Reviewed by:	jhb
MFC after:	2 weeks
2011-11-03 14:36:56 +00:00
avg
24506a8064 smp_rendezvous: master cpu should wait until all slaves are fully done
This is a followup to r222032 and a reimplementation of it.
While that revision fixed the race for the smp_rv_waiters[2] exit
sentinel, it still left a possibility for a target CPU to access
stale or wrong smp_rv_func_arg in smp_rv_teardown_func.
To fix this race the slave CPUs signal when they are really fully
done with the rendezvous and the master CPU waits until all slaves
are done.

Diagnosed by:	kib
Reviewed by:	jhb, mlaier, neel
Approved by:	re (kib)
MFC after:	2 weeks
2011-07-30 20:29:39 +00:00
rwatson
7c21db8ed3 Define two new sysctl node flags: CTLFLAG_CAPRD and CTLFLAG_CAPRW, which
may be jointly referenced via the mask CTLFLAG_CAPRW.  Sysctls with these
flags are available in Capsicum's capability mode; other sysctl nodes are
not.

Flag several useful sysctls as available in capability mode, such as memory
layout sysctls required by the run-time linker and malloc(3).  Also expose
access to randomness and available kernel features.

A few sysctls are enabled to support name->MIB conversion; these may leak
information to capability mode by virtue of providing resolution on names
not flagged for access in capability mode.  This is, generally, not a huge
problem, but might be something to resolve in the future.  Flag these cases
with XXX comments.

Submitted by:	jonathan
Sponsored by:	Google, Inc.
2011-07-17 23:05:24 +00:00
attilio
364d0522f7 With retirement of cpumask_t and usage of cpuset_t for representing a
mask of CPUs, pc_other_cpus and pc_cpumask become highly inefficient.

Remove them and replace their usage with custom pc_cpuid magic (as,
atm, pc_cpumask can be easilly represented by (1 << pc_cpuid) and
pc_other_cpus by (all_cpus & ~(1 << pc_cpuid))).

This change is not targeted for MFC because of struct pcpu members
removal and dependency by cpumask_t retirement.

MD review by:	marcel, marius, alc
Tested by:	pluknet
MD testing by:	marcel, marius, gonzo, andreast
2011-07-04 12:04:52 +00:00
avg
57d68644db generic_stop_cpus: pull timeout logic from under DIAGNOSTIC
... and also increase the timeout.
It's better to try to proceed somehow despite stuck CPUs than to hang
indefinitely.  Especially so during shutdown and when entering kdb or panic.

Timeout value is still an aribitrary value.
Timeout diagnostic is just a printf; the work on something more
debuggable is planned by attilio.  Need to be careful here as
stop_cpus_hard is called very early while enetering kdb and soon(-ish)
it may become called very early when entering panic.

Reviewed by:	attilio
MFC after:	2 months
2011-06-25 10:01:43 +00:00
attilio
867c6223e7 MFC 2011-05-26 17:38:00 +00:00
jhb
3c1a24d701 Silly spelling typos.
Submitted by:	"b. f."
2011-05-24 19:55:57 +00:00
jhb
7028e129fd Fix an issue with critical sections and SMP rendezvous handlers.
Specifically, a critical_exit() call that drops the nesting level to zero
has a brief window where the pending preemption flag is set and the
nesting level is set to zero.  This is done purposefully to avoid races
where a preemption scheduled by an interrupt could be lost otherwise (see
revision 144777).  However, this does mean that if an interrupt fires
during this window and enters and exits a critical section, it may preempt
from the interrupt context.  This is generally fine as the interrupt code
is careful to arrange critical sections so that they are not exited until
it is safe to preempt (e.g. interrupts EOI'd and masked if necessary).

However, the SMP rendezvous IPI handler does not quite follow this rule,
and in general a rendezvous can never be preempted.  Rendezvous handlers
are also not permitted to schedule threads to execute, so they will not
typically trigger preemptions.  SMP rendezvous handlers may use
spinlocks (carefully) such as the rm_cleanIPI() handler used in rmlocks,
but using a spinlock also enters and exits a critical section.  If the
interrupted top-half code is in the brief window of critical_exit() where
the nesting level is zero but a preemption is pending, then releasing the
spinlock can trigger a preemption.  Because we know that SMP rendezvous
handlers can never schedule a thread, we know that a critical_exit() in
an SMP rendezvous handler will only preempt in this edge case.  We also
know that the top-half thread will happily handle the deferred preemption
once the SMP rendezvous has completed, so the preemption will not be lost.

This makes it safe to employ a workaround where we use a nested critical
section in the SMP rendezvous code itself around rendezvous action
routines to prevent any preemptions during an SMP rendezvous.  The
workaround intentionally avoids checking for a deferred preemption
when leaving the critical section on the assumption that if there is a
pending preemption it will be handled by the interrupted top-half code.

Submitted by:	mlaier (variation specific to rm_cleanIPI())
Obtained from:	Isilon
MFC after:	1 week
2011-05-24 13:36:41 +00:00
attilio
0828d417d4 Fix mismerge.
Reported by:	pluknet
2011-05-18 15:50:12 +00:00
attilio
2cdf500faf MFC 2011-05-17 22:03:01 +00:00
jhb
8d84cd707e Fix a race in the SMP rendezvous code. Specifically, the write by the
last CPU to to finish the rendezvous action may become visible to
different CPUs at different times.  As a result, the CPU that initiated
the rendezvous may exit the rendezvous and drop the lock allowing another
rendezvous to be initiated on the same CPU or a different CPU.  In that
case the exit sentinel may be cleared before all CPUs have noticed causing
those CPUs to hang forever.

Workaround this by using a generation count to notice when this race
occurs and to exit the rendezvous in that case.

The problem was independently diagnosted by mlaier@ and avg@ as well.

Submitted by:	neel
Reviewed by:	avg, mlaier
Obtained from:	NetApp
MFC after:	1 week
2011-05-17 16:39:08 +00:00
attilio
fd96a5afd1 Merge r221278 from largeSMP project:
idle_cpus_mask is just used in sched_4bsd, thus make it private for it.

Tested by:	several
2011-05-16 23:20:12 +00:00
attilio
fe4de567b5 Commit the support for removing cpumask_t and replacing it directly with
cpuset_t objects.
That is going to offer the underlying support for a simple bump of
MAXCPU and then support for number of cpus > 32 (as it is today).

Right now, cpumask_t is an int, 32 bits on all our supported architecture.
cpumask_t on the other side is implemented as an array of longs, and
easilly extendible by definition.

The architectures touched by this commit are the following:
- amd64
- i386
- pc98
- arm
- ia64
- XEN

while the others are still missing.
Userland is believed to be fully converted with the changes contained
here.

Some technical notes:
- This commit may be considered an ABI nop for all the architectures
  different from amd64 and ia64 (and sparc64 in the future)
- per-cpu members, which are now converted to cpuset_t, needs to be
  accessed avoiding migration, because the size of cpuset_t should be
  considered unknown
- size of cpuset_t objects is different from kernel and userland (this is
  primirally done in order to leave some more space in userland to cope
  with KBI extensions). If you need to access kernel cpuset_t from the
  userland please refer to example in this patch on how to do that
  correctly (kgdb may be a good source, for example).
- Support for other architectures is going to be added soon
- Only MAXCPU for amd64 is bumped now

The patch has been tested by sbruno and Nicholas Esborn on opteron
4 x 12 pack CPUs. More testing on big SMP is expected to came soon.
pluknet tested the patch with his 8-ways on both amd64 and i386.

Tested by:	pluknet, sbruno, gianni, Nicholas Esborn
Reviewed by:	jeff, jhb, sbruno
2011-05-05 14:39:14 +00:00
attilio
d0e06d02bc idle_cpus_mask is just used in the SMP case and within sched_4BSD.
Declare appropriately.
2011-04-30 22:30:18 +00:00
jmallett
96da23f472 With smp_topo_none, set cg_mask to all_cpus rather than setting the mp_ncpus
low bits.

Submitted by:	Bhanu Prakash
Reviewed by:	jeffr
2011-02-11 22:43:10 +00:00
mdf
f6a71a40b2 sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.
Commit the kernel changes.
2011-01-12 19:54:19 +00:00
avg
55173efe7f generic_stop_cpus: prevent parallel execution
This is based on the same approach as used in panic().
In theory parallel execution of generic_stop_cpus()  could lead to two CPUs
stopping each other and everyone else, and thus a total system halt.
Also, in theory, we should have some smarter locking here, because two
(or more CPUs) could be stopping unrelated sets of CPUs.
But in practice, it seems, this function is only used to stop
"all other" CPUs.

Additionally, I took this opportunity to make amd64-specific suspend_cpus()
function use generic_stop_cpus() instead of rolling out essentially
duplicate code.

This code is based on code by Sandvine Incorporated.

Suggested by:	mdf
Reviewed by:	jhb, jkim (earlier version)
MFC after:	2 weeks
2010-10-12 17:40:45 +00:00
attilio
307b2c04a2 The r208165 fixed a bug related to unsigned integer overflowing for the
number of CPUs detection.
However, that was not mention at all, the problem was not reported, the
patch has not been MFCed and the fix is mostly improper.

Fix the original overflow (caused when 32 CPUs must be detected) by
just using a different mathematical computation (it also makes more
explicit the size of operands involved, which is good in the moment
waiting for a more complete support for a large number of CPUs).

PR:		kern/148698
Submitted by:	Joe Landers <jlanders at vmware dot com>
Tested by:	gianni
MFC after:	10 days
2010-08-09 00:23:57 +00:00
jhb
19ddbf5c38 Add a new ipi_cpu() function to the MI IPI API that can be used to send an
IPI to a specific CPU by its cpuid.  Replace calls to ipi_selected() that
constructed a mask for a single CPU with calls to ipi_cpu() instead.  This
will matter more in the future when we transition from cpumask_t to
cpuset_t for CPU masks in which case building a CPU mask is more expensive.

Submitted by:	peter, sbruno
Reviewed by:	rookie
Obtained from:	Yahoo! (x86)
MFC after:	1 month
2010-08-06 15:36:59 +00:00
jhb
9b74a62d73 Update several places that iterate over CPUs to use CPU_FOREACH(). 2010-06-11 18:46:34 +00:00
rrs
8ea4ab29a0 This pushes all of JC's patches that I have in place. I
am now able to run 32 cores ok.. but I still will hang
on buildworld with a NFS problem. I suspect I am missing
a patch for the netlogic rge driver.

JC check and see if I am missing anything except your
core-mask changes

Obtained from:	JC
2010-05-16 19:43:48 +00:00
attilio
31c196b3b9 Fix a hang introduced in r206878 for kernel compiled with SMP support but
being not actual SMP and similar situations by always initializing the
smp ipi mutex.

Reported by:	marius
MFC after:	3 days
X-MFC:		r206878
2010-05-11 15:36:16 +00:00
kib
9357da94fe Remove forward_roundrobin(), it is unused for quite some time.
Reviewed by:	jhb
MFC after:	1 week
2009-09-21 13:09:56 +00:00
attilio
e85ca71aad * Completely Remove the option STOP_NMI from the kernel. This option
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions.  This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Adds an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding

This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.

Please don't forget to update your config file with the STOP_NMI
option removal

Reviewed by:	jhb
Tested by:	pho, bz, rink
Approved by:	re (kib)
2009-08-13 17:09:45 +00:00
jeff
88a1cd92bb - Remove the bogus idle thread state code. This may have a race in it
and it only optimized out an ipi or mwait in very few cases.
 - Skip the adaptive idle code when running on SMT or HTT cores.  This
   just wastes cpu time that could be used on a busy thread on the same
   core.
 - Rename CG_FLAG_THREAD to CG_FLAG_SMT to be more descriptive.  Re-use
   CG_FLAG_THREAD to mean SMT or HTT.

Sponsored by:   Nokia
2009-04-29 03:15:43 +00:00
jkim
3eda4741da Initial suspend/resume support for amd64.
This code is heavily inspired by Takanori Watanabe's experimental SMP patch
for i386 and large portion was shamelessly cut and pasted from Peter Wemm's
AP boot code.
2009-03-17 00:48:11 +00:00
dchagin
ab0c175b17 as suggested by jhb@, panic in case the ncpus == 0.
it helps to catch bugs in the callers.

Approved by:	kib (mentor)
MFC after:	5 days
2009-03-03 17:34:09 +00:00
dchagin
718161215b Fix range-check error introduced in r182292. Also do not do anything
if all processors in the map are not available, simply return.

Approved by:	kib (mentor)
MFC after:	1 week
2009-03-01 14:26:24 +00:00
jhb
b8c520ae3c Whitespace tweak. 2009-01-26 15:32:39 +00:00
jhb
9757361d22 Adjust the license statement to more closely match a standard 3-clause BSD
license.

MFC after:	3 days
2008-11-03 21:17:02 +00:00
jhb
8011873cf5 - Only count the number of CPUs in the rendezvous map once rather than
doing it on every CPU.
- Use CPU_ABSENT() rather than pcpu_find() to determine if a CPU is not
  present.
- Count up to mp_maxid rather than MAXCPU when iterating over CPUs to
  match the rest of the code in the kernel.

MFC after:	1 week
2008-08-27 18:23:55 +00:00
jb
e922b9b976 Allow a rendezvous with just a specified CPU too.
Make the API work in the non-smp case too so that a kernel module
can work the same regardless of whether or not it is loaded on a SMP
kernel or not.
2008-05-23 04:05:26 +00:00
rwatson
877d7c65ba In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation.  This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after:	1 month
Discussed with:	imp, rink
2008-03-16 10:58:09 +00:00
jeff
128f1c2547 - Add the missing '2' case to the switch table for kern.smp.topology and
assign it to create the flat 'none' topology where all cpus are scheduled
   as if they are equal and unrelated.
2008-03-10 01:38:53 +00:00
jeff
ad2a31513f - Remove the old smp cpu topology specification with a new, more flexible
tree structure that encodes the level of cache sharing and other
   properties.
 - Provide several convenience functions for creating one and two level
   cpu trees as well as a default flat topology.  The system now always
   has some topology.
 - On i386 and amd64 create a seperate level in the hierarchy for HTT
   and multi-core cpus.  This will allow the scheduler to intelligently
   load balance non-uniform cores.  Presently we don't detect what level
   of the cache hierarchy is shared at each level in the topology.
 - Add a mechanism for testing common topologies that have more information
   than the MD code is able to provide via the kern.smp.topology tunable.
   This should be considered a debugging tool only and not a stable api.

Sponsored by:	Nokia
2008-03-02 07:58:42 +00:00
jhb
311f35fbe0 A few whitespace fixes. 2008-01-02 17:09:15 +00:00
ups
9c75ede409 Initial checkin for rmlock (read mostly lock) a multi reader single writer
lock optimized for almost exclusive reader access. (see also rmlock.9)

TODO:
    Convert to per cpu variables linkerset as soon as it is available.
    Optimize UP (single processor)  case.
2007-11-08 14:47:55 +00:00
attilio
ae7f786cc5 This is a follow-up, cleaning-up commit about recent changes involving
topology foo functions.
Working at the patch for topology problems in ia32/amd64 evicted some
problems regarding functions ordering in the SI_SUB_CPU family of
SYSINIT'ed subsystems.
In order to avoid problems with new modified to involved functions, a
correct ordering is not semantically specified for SI_SUB_CPU functions
(for a larger view of the issue please visit:
http://lists.freebsd.org/pipermail/freebsd-current/2007-July/075409.html )

Discussed with: peter
Tested by: kris, Rui Paulo <rpaulo@FreeBSD.org>
Approved by: jeff
Approved by: re
2007-09-11 22:54:09 +00:00
jhb
160998c890 Tweak the low-level MI SMP code some:
- Use cpu_spinwait() in the spin loops in stop_cpus(), restart_cpus(), and
  smp_rendezvous_action().
- Remove unneeded acq memory barriers in stop_cpus(), restart_cpus(), and
  smp_rendezvous_action().
- Add an additional synch point in smp_rendezvous() to ensure that all the
  CPUs will always see an up-to-date value of smp_rv_setup_func.

Reviewed by:	attilio
Approved by:	re (kensmith)
Tested on:	alpha, amd64, i386, sparc64 SMP (for several years)
2007-07-03 18:37:06 +00:00
jeff
91d1501790 Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
   sychronization.
 - Use the per-process spinlock rather than the sched_lock for per-process
   scheduling synchronization.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-05 00:00:57 +00:00
julian
80d6cde009 Instead of doing comparisons using the pcpu area to see if
a thread is an idle thread, just see if it has the IDLETD
flag set. That flag will probably move to the pflags word
as it's permenent and never chenges for the life of the
system so it doesn't need locking.
2007-03-08 06:44:34 +00:00
jhb
679189182b Rename the KDB_STOP_NMI kernel option to STOP_NMI and make it apply to all
IPI_STOP IPIs.
- Change the i386 and amd64 MD IPI code to send an NMI if STOP_NMI is
  enabled if an attempt is made to send an IPI_STOP IPI.  If the kernel
  option is enabled, there is also a sysctl to change the behavior at
  runtime (debug.stop_cpus_with_nmi which defaults to enabled).  This
  includes removing stop_cpus_nmi() and making ipi_nmi_selected() a
  private function for i386 and amd64.
- Fix ipi_all(), ipi_all_but_self(), and ipi_self() on i386 and amd64 to
  properly handle bitmapped IPIs as well as IPI_STOP IPIs when STOP_NMI is
  enabled.
- Fix ipi_nmi_handler() to execute the restart function on the first CPU
  that is restarted making use of atomic_readandclear() rather than
  assuming that the BSP is always included in the set of restarted CPUs.
  Also, the NMI handler didn't clear the function pointer meaning that
  subsequent stop and restarts could execute the function again.
- Define a new macro HAVE_STOPPEDPCBS on i386 and amd64 to control the use
  of stoppedpcbs[] and always enable it for i386 and amd64 instead of
  being dependent on KDB_STOP_NMI.  It works fine in both the NMI and
  non-NMI cases.
2005-10-24 21:04:19 +00:00
peter
17f62c51ca Second part of commit for moving KDB_STOP_NMI from opt_global.h to
opt_kdb.h.

Found by:     kris
Approved by:  re
2005-06-30 03:38:10 +00:00
dwhite
c8fa809967 Implement an alternate method to stop CPUs when entering DDB. Normally we use
a regular IPI vector, but this vector is blocked when interrupts are disabled.
With "options KDB_STOP_NMI" and debug.kdb.stop_cpus_with_nmi set, KDB will
send an NMI to each CPU instead. The code also has a context-stuffing
feature which helps ddb extract the state of processes running on the
stopped CPUs.

KDB_STOP_NMI is only useful with SMP and complains if SMP is not defined.
This feature only applies to i386 and amd64 at the moment, but could be
used on other architectures with the appropriate MD bits.

Submitted by:	ups
2005-04-30 20:01:00 +00:00
imp
20280f1431 /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
julian
373bbfc184 Move 4bsd specific experimental IP code into the 4bsd file.
Move the sysctls into kern.sched
2004-09-03 07:42:31 +00:00
julian
7f91bb5d9a *Blush* forgot to test non SMP builds.. oddly enough some UP code (particularly
in the acpi code) seems to want this in a UP build. (I guess so you can have
a sigle kernel module that works for both)
2004-09-01 18:05:43 +00:00
julian
8354ba9e3a Give the 4bsd scheduler the ability to wake up idle processors
when there is new work to be done.

MFC after:	5 days
2004-09-01 06:42:02 +00:00
obrien
0fe47008f6 s/smp_rv_mtx/smp_ipi_mtx/g
Requested by:	jhb
2004-08-28 00:49:55 +00:00