freebsd-nq

Author	SHA1	Message	Date
Justin T. Gibbs	255424ddb7	Fix ia64 and mips kernel builds due to XENHVM=>GENERIC integration in revision 255744. sys/kern/subr_smp.c: IPI_SUSPEND is only available on amd64 and i386. Protect new uses of this constant with #ifdefs to avoid impacting other platforms. Approved by: re (blanket Xen)	2013-09-22 02:46:13 +00:00
Justin T. Gibbs	566a5f5020	Merge Xen PVHVM support into the GENERIC kernel config for both amd64 and i386. Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Reviewed by: gibbs Approved by: re (blanket Xen) MFC after: 2 weeks sys/amd64/amd64/mp_machdep.c: sys/amd64/include/cpu.h: sys/i386/i386/mp_machdep.c: sys/i386/include/cpu.h: - Introduce two new CPU hooks for initialization and resume purposes. This allows us to get rid of the XENHVM ifdefs in mp_machdep, and also sets some hooks into common code that can be used by other hypervisor implementations. sys/amd64/conf/XENHVM: sys/i386/conf/XENHVM: - Remove these configs now that GENERIC has builtin support for Xen HVM. sys/kern/subr_smp.c: - Make sure there are no pending IPIs when suspending a system. sys/x86/xen/hvm.c: - Add cpu init and resume vectors that are called from mp_machdep using the new hooks. - Only clear the vcpu_info mapping data on resume. It is already clear for the BSP on a cold boot and is set correctly as APs are started. - Gate xen_hvm_init_cpu only to systems running under Xen. sys/x86/xen/xen_intr.c: - Gate the setup of event channels only to systems running under Xen.	2013-09-20 22:59:22 +00:00
Justin T. Gibbs	428b7ca290	Add support for suspend/resume/migration operations when running as a Xen PVHVM guest. Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Reviewed by: gibbs Approved by: re (blanket Xen) MFC after: 2 weeks sys/amd64/amd64/mp_machdep.c: sys/i386/i386/mp_machdep.c: - Make sure that are no MMU related IPIs pending on migration. - Reset pending IPI_BITMAP on resume. - Init vcpu_info on resume. sys/amd64/include/intr_machdep.h: sys/i386/include/intr_machdep.h: sys/x86/acpica/acpi_wakeup.c: sys/x86/x86/intr_machdep.c: sys/x86/isa/atpic.c: sys/x86/x86/io_apic.c: sys/x86/x86/local_apic.c: - Add a "suspend_cancelled" parameter to pic_resume(). For the Xen PIC, restoration of interrupt services differs between the aborted suspend and normal resume cases, so we must provide this information. sys/dev/acpica/acpi_timer.c: sys/dev/xen/timer/timer.c: sys/timetc.h: - Don't swap out "suspend safe" timers across a suspend/resume cycle. This includes the Xen PV and ACPI timers. sys/dev/xen/control/control.c: - Perform proper suspend/resume process for PVHVM: - Suspend all APs before going into suspension, this allows us to reset the vcpu_info on resume for each AP. - Reset shared info page and callback on resume. sys/dev/xen/timer/timer.c: - Implement suspend/resume support for the PV timer. Since FreeBSD doesn't perform a per-cpu resume of the timer, we need to call smp_rendezvous in order to correctly resume the timer on each CPU. sys/dev/xen/xenpci/xenpci.c: - Don't reset the PCI interrupt on each suspend/resume. sys/kern/subr_smp.c: - When suspending a PVHVM domain make sure there are no MMU IPIs in-flight, or we will get a lockup on resume due to the fact that pending event channels are not carried over on migration. - Implement a generic version of restart_cpus that can be used by suspended and stopped cpus. sys/x86/xen/hvm.c: - Implement resume support for the hypercall page and shared info. - Clear vcpu_info so it can be reset by APs when resuming from suspension. sys/dev/xen/xenpci/xenpci.c: sys/x86/xen/hvm.c: sys/x86/xen/xen_intr.c: - Support UP kernel configurations. sys/x86/xen/xen_intr.c: - Properly rebind per-cpus VIRQs and IPIs on resume.	2013-09-20 05:06:03 +00:00
Jeff Roberson	5b39d5c739	- Correctly handle EWOULDBLOCK in quiesce_cpus Discussed with: mav	2012-12-19 20:08:06 +00:00
Jeff Roberson	28d91af30f	- Implement run-time expansion of the KTR buffer via sysctl. - Implement a function to ensure that all preempted threads have switched back out at least once. Use this to make sure there are no stale references to the old ktr_buf or the lock profiling buffers before updating them. Reviewed by: marius (sparc64 parts), attilio (earlier patch) Sponsored by: EMC / Isilon Storage Division	2012-11-15 00:51:57 +00:00
Mitsuru IWASAKI	c1b0dc80b5	Another fixe for r236772. - Adjust correct cpuset (stopped_cpus/suspended_cpus) for cpu_spinwait() in generic_stop_cpus().	2012-06-11 18:47:26 +00:00
Mitsuru IWASAKI	fb864578af	Add x86/acpica/acpi_wakeup.c for amd64 and i386. Difference of suspend/resume procedures are minimized among them. common: - Add global cpuset suspended_cpus to indicate APs are suspended/resumed. - Remove acpi_waketag and acpi_wakemap from acpivar.h (no longer used). - Add some variables in acpi_wakecode.S in order to minimize the difference among amd64 and i386. - Disable load_cr3() because now CR3 is restored in resumectx(). amd64: - Add suspend/resume related members (such as MSR) in PCB. - Modify savectx() for above new PCB members. - Merge acpi_switch.S into cpu_switch.S as resumectx(). i386: - Merge(and remove) suspendctx() into savectx() in order to match with amd64 code. Reviewed by: attilio@, acpi@	2012-06-09 00:37:26 +00:00
Mitsuru IWASAKI	e3fd0bc1b2	Add SMP/i386 suspend/resume support. Most part is merged from amd64. - i386/acpica/acpi_wakecode.S Replaced with amd64 code (from realmode to paging enabling code). - i386/acpica/acpi_wakeup.c Replaced with amd64 code (except for wakeup_pagetables stuff). - i386/include/pcb.h - i386/i386/genassym.c Added PCB new members (CR0, CR2, CR4, DS, ED, FS, SS, GDT, IDT, LDT and TR) needed for suspend/resume, not for context switch. - i386/i386/swtch.s Added suspendctx() and resumectx(). Note that savectx() was not changed and used for suspending (while amd64 code uses it). BSP and AP execute the same sequence, suspendctx(), acpi_wakecode() and resumectx() for suspend/resume (in case of UP system also). - i386/i386/apic_vector.s Added cpususpend(). - i386/i386/mp_machdep.c - i386/include/smp.h Added cpususpend_handler(). - i386/include/apicvar.h - kern/subr_smp.c - sys/smp.h Added IPI_SUSPEND and suspend_cpus(). - i386/i386/initcpu.c - i386/i386/machdep.c - i386/include/md_var.h - pc98/pc98/machdep.c Moved initializecpu() declarations to md_var.h. MFC after: 3 days	2012-05-18 18:55:58 +00:00
Ed Schouten	6472ac3d8a	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
Attilio Rao	2b10b1f872	Disable interrupt and preemption for smp_rendezvous() also in the UP/!SMP case. The callbacks may be relying on this feature and having 2 different ways to deal with them is not correct. Reported by: rstone Reviewed by: jhb MFC after: 2 weeks	2011-11-03 14:36:56 +00:00
Andriy Gapon	35edc49853	smp_rendezvous: master cpu should wait until all slaves are fully done This is a followup to r222032 and a reimplementation of it. While that revision fixed the race for the smp_rv_waiters[2] exit sentinel, it still left a possibility for a target CPU to access stale or wrong smp_rv_func_arg in smp_rv_teardown_func. To fix this race the slave CPUs signal when they are really fully done with the rendezvous and the master CPU waits until all slaves are done. Diagnosed by: kib Reviewed by: jhb, mlaier, neel Approved by: re (kib) MFC after: 2 weeks	2011-07-30 20:29:39 +00:00
Robert Watson	ff66f6a404	Define two new sysctl node flags: CTLFLAG_CAPRD and CTLFLAG_CAPRW, which may be jointly referenced via the mask CTLFLAG_CAPRW. Sysctls with these flags are available in Capsicum's capability mode; other sysctl nodes are not. Flag several useful sysctls as available in capability mode, such as memory layout sysctls required by the run-time linker and malloc(3). Also expose access to randomness and available kernel features. A few sysctls are enabled to support name->MIB conversion; these may leak information to capability mode by virtue of providing resolution on names not flagged for access in capability mode. This is, generally, not a huge problem, but might be something to resolve in the future. Flag these cases with XXX comments. Submitted by: jonathan Sponsored by: Google, Inc.	2011-07-17 23:05:24 +00:00
Attilio Rao	cfdfd32d34	MFC	2011-06-26 17:30:46 +00:00
Andriy Gapon	1aac6ac94a	generic_stop_cpus: pull timeout logic from under DIAGNOSTIC ... and also increase the timeout. It's better to try to proceed somehow despite stuck CPUs than to hang indefinitely. Especially so during shutdown and when entering kdb or panic. Timeout value is still an aribitrary value. Timeout diagnostic is just a printf; the work on something more debuggable is planned by attilio. Need to be careful here as stop_cpus_hard is called very early while enetering kdb and soon(-ish) it may become called very early when entering panic. Reviewed by: attilio MFC after: 2 months	2011-06-25 10:01:43 +00:00
Attilio Rao	a38f1f263b	Remove pc_cpumask and pc_other_cpus usage from MI code. Tested by: pluknet	2011-06-13 13:28:31 +00:00
Attilio Rao	7fcdc9a26f	MFC	2011-05-26 17:38:00 +00:00
John Baldwin	5b41f90fd1	Silly spelling typos. Submitted by: "b. f."	2011-05-24 19:55:57 +00:00
John Baldwin	47ad691f87	Fix an issue with critical sections and SMP rendezvous handlers. Specifically, a critical_exit() call that drops the nesting level to zero has a brief window where the pending preemption flag is set and the nesting level is set to zero. This is done purposefully to avoid races where a preemption scheduled by an interrupt could be lost otherwise (see revision 144777). However, this does mean that if an interrupt fires during this window and enters and exits a critical section, it may preempt from the interrupt context. This is generally fine as the interrupt code is careful to arrange critical sections so that they are not exited until it is safe to preempt (e.g. interrupts EOI'd and masked if necessary). However, the SMP rendezvous IPI handler does not quite follow this rule, and in general a rendezvous can never be preempted. Rendezvous handlers are also not permitted to schedule threads to execute, so they will not typically trigger preemptions. SMP rendezvous handlers may use spinlocks (carefully) such as the rm_cleanIPI() handler used in rmlocks, but using a spinlock also enters and exits a critical section. If the interrupted top-half code is in the brief window of critical_exit() where the nesting level is zero but a preemption is pending, then releasing the spinlock can trigger a preemption. Because we know that SMP rendezvous handlers can never schedule a thread, we know that a critical_exit() in an SMP rendezvous handler will only preempt in this edge case. We also know that the top-half thread will happily handle the deferred preemption once the SMP rendezvous has completed, so the preemption will not be lost. This makes it safe to employ a workaround where we use a nested critical section in the SMP rendezvous code itself around rendezvous action routines to prevent any preemptions during an SMP rendezvous. The workaround intentionally avoids checking for a deferred preemption when leaving the critical section on the assumption that if there is a pending preemption it will be handled by the interrupted top-half code. Submitted by: mlaier (variation specific to rm_cleanIPI()) Obtained from: Isilon MFC after: 1 week	2011-05-24 13:36:41 +00:00
Attilio Rao	a8586beeb0	Fix mismerge. Reported by: pluknet	2011-05-18 15:50:12 +00:00
Attilio Rao	fea3a3fa94	MFC	2011-05-17 22:03:01 +00:00
John Baldwin	f83e8b25c1	Fix a race in the SMP rendezvous code. Specifically, the write by the last CPU to to finish the rendezvous action may become visible to different CPUs at different times. As a result, the CPU that initiated the rendezvous may exit the rendezvous and drop the lock allowing another rendezvous to be initiated on the same CPU or a different CPU. In that case the exit sentinel may be cleared before all CPUs have noticed causing those CPUs to hang forever. Workaround this by using a generation count to notice when this race occurs and to exit the rendezvous in that case. The problem was independently diagnosted by mlaier@ and avg@ as well. Submitted by: neel Reviewed by: avg, mlaier Obtained from: NetApp MFC after: 1 week	2011-05-17 16:39:08 +00:00
Attilio Rao	d59dd76c22	Merge r221278 from largeSMP project: idle_cpus_mask is just used in sched_4bsd, thus make it private for it. Tested by: several	2011-05-16 23:20:12 +00:00
Attilio Rao	71a19bdc64	Commit the support for removing cpumask_t and replacing it directly with cpuset_t objects. That is going to offer the underlying support for a simple bump of MAXCPU and then support for number of cpus > 32 (as it is today). Right now, cpumask_t is an int, 32 bits on all our supported architecture. cpumask_t on the other side is implemented as an array of longs, and easilly extendible by definition. The architectures touched by this commit are the following: - amd64 - i386 - pc98 - arm - ia64 - XEN while the others are still missing. Userland is believed to be fully converted with the changes contained here. Some technical notes: - This commit may be considered an ABI nop for all the architectures different from amd64 and ia64 (and sparc64 in the future) - per-cpu members, which are now converted to cpuset_t, needs to be accessed avoiding migration, because the size of cpuset_t should be considered unknown - size of cpuset_t objects is different from kernel and userland (this is primirally done in order to leave some more space in userland to cope with KBI extensions). If you need to access kernel cpuset_t from the userland please refer to example in this patch on how to do that correctly (kgdb may be a good source, for example). - Support for other architectures is going to be added soon - Only MAXCPU for amd64 is bumped now The patch has been tested by sbruno and Nicholas Esborn on opteron 4 x 12 pack CPUs. More testing on big SMP is expected to came soon. pluknet tested the patch with his 8-ways on both amd64 and i386. Tested by: pluknet, sbruno, gianni, Nicholas Esborn Reviewed by: jeff, jhb, sbruno	2011-05-05 14:39:14 +00:00
Attilio Rao	3121f5347e	idle_cpus_mask is just used in the SMP case and within sched_4BSD. Declare appropriately.	2011-04-30 22:30:18 +00:00
Juli Mallett	37142d9e87	With smp_topo_none, set cg_mask to all_cpus rather than setting the mp_ncpus low bits. Submitted by: Bhanu Prakash Reviewed by: jeffr	2011-02-11 22:43:10 +00:00
Matthew D Fleming	fbbb13f962	sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly. Commit the kernel changes.	2011-01-12 19:54:19 +00:00
Andriy Gapon	9ddb6637b8	generic_stop_cpus: prevent parallel execution This is based on the same approach as used in panic(). In theory parallel execution of generic_stop_cpus() could lead to two CPUs stopping each other and everyone else, and thus a total system halt. Also, in theory, we should have some smarter locking here, because two (or more CPUs) could be stopping unrelated sets of CPUs. But in practice, it seems, this function is only used to stop "all other" CPUs. Additionally, I took this opportunity to make amd64-specific suspend_cpus() function use generic_stop_cpus() instead of rolling out essentially duplicate code. This code is based on code by Sandvine Incorporated. Suggested by: mdf Reviewed by: jhb, jkim (earlier version) MFC after: 2 weeks	2010-10-12 17:40:45 +00:00
Attilio Rao	2d8b420b9f	The r208165 fixed a bug related to unsigned integer overflowing for the number of CPUs detection. However, that was not mention at all, the problem was not reported, the patch has not been MFCed and the fix is mostly improper. Fix the original overflow (caused when 32 CPUs must be detected) by just using a different mathematical computation (it also makes more explicit the size of operands involved, which is good in the moment waiting for a more complete support for a large number of CPUs). PR: kern/148698 Submitted by: Joe Landers <jlanders at vmware dot com> Tested by: gianni MFC after: 10 days	2010-08-09 00:23:57 +00:00
John Baldwin	d9d8d1449d	Add a new ipi_cpu() function to the MI IPI API that can be used to send an IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that constructed a mask for a single CPU with calls to ipi_cpu() instead. This will matter more in the future when we transition from cpumask_t to cpuset_t for CPU masks in which case building a CPU mask is more expensive. Submitted by: peter, sbruno Reviewed by: rookie Obtained from: Yahoo! (x86) MFC after: 1 month	2010-08-06 15:36:59 +00:00
John Baldwin	3aa6d94e0c	Update several places that iterate over CPUs to use CPU_FOREACH().	2010-06-11 18:46:34 +00:00
Randall Stewart	4542827d4d	This pushes all of JC's patches that I have in place. I am now able to run 32 cores ok.. but I still will hang on buildworld with a NFS problem. I suspect I am missing a patch for the netlogic rge driver. JC check and see if I am missing anything except your core-mask changes Obtained from: JC	2010-05-16 19:43:48 +00:00
Attilio Rao	de6648745c	Fix a hang introduced in r206878 for kernel compiled with SMP support but being not actual SMP and similar situations by always initializing the smp ipi mutex. Reported by: marius MFC after: 3 days X-MFC: r206878	2010-05-11 15:36:16 +00:00
Konstantin Belousov	51a6ef34fb	Remove forward_roundrobin(), it is unused for quite some time. Reviewed by: jhb MFC after: 1 week	2009-09-21 13:09:56 +00:00
Attilio Rao	dc6fbf6545	* Completely Remove the option STOP_NMI from the kernel. This option has proven to have a good effect when entering KDB by using a NMI, but it completely violates all the good rules about interrupts disabled while holding a spinlock in other occasions. This can be the cause of deadlocks on events where a normal IPI_STOP is expected. * Adds an new IPI called IPI_STOP_HARD on all the supported architectures. This IPI is responsible for sending a stop message among CPUs using a privileged channel when disponible. In other cases it just does match a normal IPI_STOP. Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64 architectures, while on the other has a normal IPI_STOP effect. It is responsibility of maintainers to eventually implement an hard stop when necessary and possible. * Use the new IPI facility in order to implement a new userend SMP kernel function called stop_cpus_hard(). That is specular to stop_cpu() but it does use the privileged channel for the stopping facility. * Let KDB use the newly introduced function stop_cpus_hard() and leave stop_cpus() for all the other cases * Disable interrupts on CPU0 when starting the process of APs suspension. * Style cleanup and comments adding This patch should fix the reboot/shutdown deadlocks many users are constantly reporting on mailing lists. Please don't forget to update your config file with the STOP_NMI option removal Reviewed by: jhb Tested by: pho, bz, rink Approved by: re (kib)	2009-08-13 17:09:45 +00:00
Jeff Roberson	7b55ab0534	- Remove the bogus idle thread state code. This may have a race in it and it only optimized out an ipi or mwait in very few cases. - Skip the adaptive idle code when running on SMT or HTT cores. This just wastes cpu time that could be used on a busy thread on the same core. - Rename CG_FLAG_THREAD to CG_FLAG_SMT to be more descriptive. Re-use CG_FLAG_THREAD to mean SMT or HTT. Sponsored by: Nokia	2009-04-29 03:15:43 +00:00
Jung-uk Kim	c66d2b38c8	Initial suspend/resume support for amd64. This code is heavily inspired by Takanori Watanabe's experimental SMP patch for i386 and large portion was shamelessly cut and pasted from Peter Wemm's AP boot code.	2009-03-17 00:48:11 +00:00
Dmitry Chagin	b2421c29f6	as suggested by jhb@, panic in case the ncpus == 0. it helps to catch bugs in the callers. Approved by: kib (mentor) MFC after: 5 days	2009-03-03 17:34:09 +00:00
Dmitry Chagin	6485a22ccb	Fix range-check error introduced in r182292. Also do not do anything if all processors in the map are not available, simply return. Approved by: kib (mentor) MFC after: 1 week	2009-03-01 14:26:24 +00:00
John Baldwin	4e30a2db51	Whitespace tweak.	2009-01-26 15:32:39 +00:00
John Baldwin	4482f952b1	Adjust the license statement to more closely match a standard 3-clause BSD license. MFC after: 3 days	2008-11-03 21:17:02 +00:00
John Baldwin	9c2bf0cce2	- Only count the number of CPUs in the rendezvous map once rather than doing it on every CPU. - Use CPU_ABSENT() rather than pcpu_find() to determine if a CPU is not present. - Count up to mp_maxid rather than MAXCPU when iterating over CPUs to match the rest of the code in the kernel. MFC after: 1 week	2008-08-27 18:23:55 +00:00
John Birrell	833b4a131a	Allow a rendezvous with just a specified CPU too. Make the API work in the non-smp case too so that a kernel module can work the same regardless of whether or not it is loaded on a SMP kernel or not.	2008-05-23 04:05:26 +00:00
Robert Watson	237fdd787b	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink	2008-03-16 10:58:09 +00:00
Jeff Roberson	1bf6461e98	- Add the missing '2' case to the switch table for kern.smp.topology and assign it to create the flat 'none' topology where all cpus are scheduled as if they are equal and unrelated.	2008-03-10 01:38:53 +00:00
Jeff Roberson	81aa71755b	- Remove the old smp cpu topology specification with a new, more flexible tree structure that encodes the level of cache sharing and other properties. - Provide several convenience functions for creating one and two level cpu trees as well as a default flat topology. The system now always has some topology. - On i386 and amd64 create a seperate level in the hierarchy for HTT and multi-core cpus. This will allow the scheduler to intelligently load balance non-uniform cores. Presently we don't detect what level of the cache hierarchy is shared at each level in the topology. - Add a mechanism for testing common topologies that have more information than the MD code is able to provide via the kern.smp.topology tunable. This should be considered a debugging tool only and not a stable api. Sponsored by: Nokia	2008-03-02 07:58:42 +00:00
John Baldwin	c0cfd9d113	A few whitespace fixes.	2008-01-02 17:09:15 +00:00
Stephan Uphoff	f53d15fe1b	Initial checkin for rmlock (read mostly lock) a multi reader single writer lock optimized for almost exclusive reader access. (see also rmlock.9) TODO: Convert to per cpu variables linkerset as soon as it is available. Optimize UP (single processor) case.	2007-11-08 14:47:55 +00:00
Attilio Rao	0b2e598c14	This is a follow-up, cleaning-up commit about recent changes involving topology foo functions. Working at the patch for topology problems in ia32/amd64 evicted some problems regarding functions ordering in the SI_SUB_CPU family of SYSINIT'ed subsystems. In order to avoid problems with new modified to involved functions, a correct ordering is not semantically specified for SI_SUB_CPU functions (for a larger view of the issue please visit: http://lists.freebsd.org/pipermail/freebsd-current/2007-July/075409.html ) Discussed with: peter Tested by: kris, Rui Paulo <rpaulo@FreeBSD.org> Approved by: jeff Approved by: re	2007-09-11 22:54:09 +00:00
John Baldwin	fb1faf2082	Tweak the low-level MI SMP code some: - Use cpu_spinwait() in the spin loops in stop_cpus(), restart_cpus(), and smp_rendezvous_action(). - Remove unneeded acq memory barriers in stop_cpus(), restart_cpus(), and smp_rendezvous_action(). - Add an additional synch point in smp_rendezvous() to ensure that all the CPUs will always see an up-to-date value of smp_rv_setup_func. Reviewed by: attilio Approved by: re (kensmith) Tested on: alpha, amd64, i386, sparc64 SMP (for several years)	2007-07-03 18:37:06 +00:00
Jeff Roberson	982d11f836	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-05 00:00:57 +00:00

1 2 3 4 5

247 Commits