freebsd-dev

Author	SHA1	Message	Date
Nathan Whitehorn	a3e9e259b3	Make sure to call vm_page_dirty() before the pmap lock is released to prevent a race where another process could conclude the page was clean. Submitted by: alc	2012-03-27 01:26:00 +00:00
Nathan Whitehorn	5afcb4c91e	More PMAP concurrency improvements: replace the table lock and (almost) all uses of the page queues mutex with a new rwlock that protects the page table and the PV lists. This reduces system time during a parallel buildworld by 35%. Reviewed by: alc	2012-03-27 01:24:18 +00:00
Nathan Whitehorn	e71dfa7b84	More PMAP performance improvements: on powerpc64, when TLBIE can be run with exceptions enabled, leave them enabled and use a regular mutex to guard TLB invalidations instead of a spinlock.	2012-03-25 06:01:34 +00:00
Nathan Whitehorn	d456d3e31f	Only call vm_page_dirty() on pages that are writable in order not to confuse the VM.	2012-03-24 22:32:19 +00:00
Nathan Whitehorn	8e7c7ea2ea	Following suggestions from alc, skip wired mappings in pmap_remove_pages() and remove moea64_attr_*() in favor of direct calls to vm_page_dirty() and friends.	2012-03-24 19:59:14 +00:00
Nathan Whitehorn	07b638a98e	Remove acquisition of VM page queues lock from pmap_protect(). Any actual manipulation of the pvo_vlink and pvo_olink entries is already protected by the table lock, so most remaining instances of the acquisition of the page queues lock can likely be replaced with the table lock, or removed if the table lock is already held. Reviewed by: alc	2012-03-18 13:22:42 +00:00
Nathan Whitehorn	cd907a68aa	Implement pmap_remove_pages(). This will be added later to the 32-bit MMU module. Suggested by: alc	2012-03-15 22:50:48 +00:00
Nathan Whitehorn	246e44956e	Improve algorithm for deciding whether to loop through all process pages or look them up individually in pmap_remove() and apply the same logic in the other ranged operation (pmap_protect). This speeds up make installworld by a factor of 2 on powerpc64. MFC after: 1 week	2012-03-15 19:36:52 +00:00
Nathan Whitehorn	cbfa304088	Use LIST_FOREACH_SAFE() instead of LIST_FOREACH() in pmap_remove(), since the point of this loop is to remove elements. This worked by accident before. MFC after: 2 days	2012-03-14 20:19:49 +00:00
Andreas Tobler	179e996c9f	Revert the _NOPROF entries on cpu_throw, cpu_switch and savectx. They can be profiled too now. MFC after: 2 weeks	2012-02-05 15:59:18 +00:00
Konstantin Belousov	75ce221fa1	Fix build for the case of powerpc64 kernel without COMPAT_FREEBSD32. MFC after: 2 months	2012-01-30 19:31:17 +00:00
Konstantin Belousov	62c625fdd2	Finally, try to enable the nxstacks on amd64 and powerpc64 for both 64bit and 32bit ABIs. Also try to enable nxstacks for PAE/i386 when supported, and some variants of powerpc32. MFC after: 2 months (if ever)	2012-01-30 07:56:00 +00:00
Andreas Tobler	9eab2f146a	This commit adds profiling support for powerpc64. Now we can do application profiling and kernel profiling. To enable kernel profiling one has to build kgmon(8). I will enable the build once I managed to build and test powerpc (32-bit) kernels with profiling support. - add a powerpc64 PROF_PROLOGUE for _mcount. - add macros to avoid adding the PROF_PROLOGUE in certain assembly entries. - apply these macros where needed. - add size information to the MCOUNT function. MFC after: 3 weeks, together with r230291	2012-01-20 22:34:19 +00:00
Nathan Whitehorn	ae09ab8f63	Rework SLB trap handling so that double-faults into an SLB trap handler are possible, and double faults within an SLB trap handler are not. The result is that it possible to take an SLB fault at any time, on any address, for any reason, at any point in the kernel. This lets us do two important things. First, it removes the (soft) 16 GB RAM ceiling on PPC64 as well as any architectural limitations on KVA space. Second, it lets the kernel tolerate poorly designed hypervisors that have a tendency to fail to restore the SLB properly after a hypervisor context switch. MFC after: 6 weeks	2012-01-15 00:08:14 +00:00
Justin Hibbits	7b25dcca76	Implement hwpmc counting PMC support for PowerPC G4+ (MPC745x/MPC744x). Sampling is in progress. Approved by: nwhitehorn (mentor) MFC after: 9.0-RELEASE	2011-12-24 19:34:52 +00:00
Nathan Whitehorn	e347e23bfe	Allow this to work on embedded systems without Open Firmware by making lack of a /chosen non-fatal, and manually removing memory in use by the kernel from the physical memory map. Submitted by: rpaulo	2011-12-16 23:46:05 +00:00
Nathan Whitehorn	b059c637fb	Zero BSS on start, in case the ELF loader that started the kernel did not do this for us. This can happen on some embedded systems. Submitted by: rpaulo	2011-12-16 23:40:56 +00:00
Alan Cox	3b03ca3bbe	Eliminate vestiges of page coloring.	2011-12-15 05:07:16 +00:00
Nathan Whitehorn	598d99ddee	Keep track of PVO entries in each pmap, which allows much faster pmap_remove() for large sparse requests. This can prevent pmap_remove() operations on 64-bit process destruction or swapout that would take several hundred times the lifetime of the universe to complete. This behavior is largely indistinguishable from a hang.	2011-12-11 17:19:48 +00:00
Marius Strobl	4b7ec27007	- There's no need to overwrite the default device method with the default one. Interestingly, these are actually the default for quite some time (bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9) since r52045) but even recently added device drivers do this unnecessarily. Discussed with: jhb, marcel - While at it, use DEVMETHOD_END. Discussed with: jhb - Also while at it, use __FBSDID.	2011-11-22 21:28:20 +00:00
Nathan Whitehorn	a897298940	Use a global __pure2 function instead of a global register variable for curthread, like on x86 and sparc64. This makes the kernel somewhat more clang friendly, which doesn't support global register variables.	2011-11-17 15:49:42 +00:00
Nathan Whitehorn	46e93cbbc5	Add an extra invariant here which was useful on 64-bit CPUs.	2011-11-17 15:48:12 +00:00
Alan Cox	fbd80bd047	Refactor the code that performs physically contiguous memory allocation, yielding a new public interface, vm_page_alloc_contig(). This new function addresses some of the limitations of the current interfaces, contigmalloc() and kmem_alloc_contig(). For example, the physically contiguous memory that is allocated with those interfaces can only be allocated to the kernel vm object and must be mapped into the kernel virtual address space. It also provides functionality that vm_phys_alloc_contig() doesn't, such as wiring the returned pages. Moreover, unlike that function, it respects the low water marks on the paging queues and wakes up the page daemon when necessary. That said, at present, this new function can't be applied to all types of vm objects. However, that restriction will be eliminated in the coming weeks. From a design standpoint, this change also addresses an inconsistency between vm_phys_alloc_contig() and the other vm_phys_alloc*() functions. Specifically, vm_phys_alloc_contig() manipulated vm_page fields that other functions in vm/vm_phys.c didn't. Moreover, vm_phys_alloc_contig() knew about vnodes and reservations. Now, vm_page_alloc_contig() is responsible for these things. Reviewed by: kib Discussed with: jhb	2011-11-16 16:46:09 +00:00
Nathan Whitehorn	8a4cf006f4	Fix a bug where the pmap_cpu_bootstrap() ap argument could be clobbered. Luckily, it mostly wasn't important, so this didn't cause major problems. Also improve register reuse when setting up trap frames very slightly. Submitted by: Justin Hibbits <chmeeedalf at gmail dot com> MFC after: 5 days	2011-11-09 13:48:23 +00:00
Konstantin Belousov	26ccf4f10f	Inline the syscallenter() and syscallret(). This reduces the time measured by the syscall entry speed microbenchmarks by ~10% on amd64. Submitted by: jhb Approved by: re (bz) MFC after: 2 weeks	2011-09-11 16:05:09 +00:00
Konstantin Belousov	3407fefef6	Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs. Document the changes to flags field to only require the page lock. Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced. Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)	2011-09-06 10:30:11 +00:00
Konstantin Belousov	d98d0ce27a	- Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag to VPO_UNMANAGED (and also making the flag protected by the vm object lock, instead of vm page queue lock). - Mark the fake pages with both PG_FICTITIOUS (as it is now) and VPO_UNMANAGED. As a consequence, pmap code now can use use just VPO_UNMANAGED to decide whether the page is unmanaged. Reviewed by: alc Tested by: pho (x86, previous version), marius (sparc64), marcel (arm, ia64, powerpc), ray (mips) Sponsored by: The FreeBSD Foundation Approved by: re (bz)	2011-08-09 21:01:36 +00:00
Andreas Tobler	42f2270475	This a follow up commit from r224216 for powerpc 32-bit. Increase the storage size for sintrcnt/sintrnames to .long. Reviewed by: nwhitehorn Approved by: re (kib)	2011-07-25 20:10:01 +00:00
Attilio Rao	732772c701	On 64 bit architectures size_t is 8 bytes, thus it should use an 8 bytes storage. Fix the sintrcnt/sintrnames specification. No MFC is previewed for this patch. Reported, reviewed and tested by: marcel Approved by: re (kib)	2011-07-19 12:41:57 +00:00
Attilio Rao	521ea19d1c	- Remove the eintrcnt/eintrnames usage and introduce the concept of sintrcnt/sintrnames which are symbols containing the size of the 2 tables. - For amd64/i386 remove the storage of intr* stuff from assembly files. This area can be widely improved by applying the same to other architectures and likely finding an unified approach among them and move the whole code to be MI. More work in this area is expected to happen fairly soon. No MFC is previewed for this patch. Tested by: pluknet Reviewed by: jhb Approved by: re (kib)	2011-07-18 15:19:40 +00:00
Attilio Rao	cfdfd32d34	MFC	2011-06-26 17:30:46 +00:00
Nathan Whitehorn	1b17fa33dc	Revert r223479. It is unnecessary and served only to slightly ameliorate some manifestations of the bug actually fixed in r223485.	2011-06-26 15:08:14 +00:00
Attilio Rao	de138ec703	MFC	2011-06-24 16:35:40 +00:00
Nathan Whitehorn	e69dff491d	Use the ABI-mandated thread pointer register (r2 for ppc32, r13 for ppc64) instead of a PCPU field for curthread. This averts a race on SMP systems with a high interrupt rate where the thread looking up the value of curthread could be preempted and migrated between obtaining the PCPU pointer and reading the value of pc_curthread, resulting in curthread being observed to be the current thread on the thread's original CPU. This played merry havoc with the system, in particular with mutexes. Many thanks to jhb for helping me work this one out. Note that Book-E is in principle susceptible to the same problem, but has not been modified yet due to lack of Book-E hardware. MFC after: 2 weeks	2011-06-23 22:21:28 +00:00
Nathan Whitehorn	045aee08f3	Clear any outstanding atomic reservations when traps are taken. This fixes some interesting bugs (mostly on SMP systems) with atomic operations silently failing in interrupt heavy situations, especially when using overflow pages.	2011-06-23 16:34:41 +00:00
Andreas Tobler	dcf496e844	Fix merge typo.	2011-06-23 09:46:12 +00:00
Attilio Rao	c7c2767e33	Remove pc_other_cpus and pc_cpumask usage from powerpc support. Tested and reviewed by: andreast	2011-06-16 07:27:13 +00:00
Attilio Rao	3bce356ea4	MFC	2011-06-04 22:05:20 +00:00
Nathan Whitehorn	4770f5380e	Fix a typo derived from a mismerge from mmu_oea that would cause pmap_sync_icache() to sync random (possibly uncached or nonexisting!) memory, causing kernel page faults or machine checks, most easily triggered by using GDB. While here, add an additional safeguard to only sync cacheable memory. MFC after: 2 days	2011-06-04 03:22:16 +00:00
Attilio Rao	d7073a2b3b	MFC	2011-06-03 17:09:15 +00:00
Nathan Whitehorn	cd507188bc	Quantities stored on the stack on ppc64 tend to be twice as large as on ppc32, so make the early stack correspondingly twice as big.	2011-06-03 00:11:13 +00:00
Nathan Whitehorn	17763042e4	The POWER7 has only 32 SLB slots instead of 64, like other supported 64-bit PowerPC CPUs. Add infrastructure to support variable numbers of SLB slots and move the user slot from 63 to 0, so that it is always available.	2011-06-02 14:25:52 +00:00
Nathan Whitehorn	1dff98d9bb	If running under a hypervisor, don't yell at the user about starting unknown CPU types, instead relying on the hypervisor to have given us a reasonable environment.	2011-06-02 14:23:36 +00:00
Nathan Whitehorn	20ae1015b9	Explicitly initialize the first thread's MSR to PSL_KERNSET.	2011-06-02 14:21:20 +00:00
Nathan Whitehorn	6dd24ab3f1	Include the modules area in the mapped kernel code. This fixes the kernel's access to modules and loader metadata when started from real mode, but without a direct map.	2011-06-02 14:19:18 +00:00
Nathan Whitehorn	97f7cde42c	Remove some dead code: unnecessary isyncs and memory sorting, which are handled in mtmsr() and mem_regions(), respectively.	2011-06-02 14:15:44 +00:00
Nathan Whitehorn	1787909001	MFpseries: Renovate and improve the AIM Open Firmware support: - Add RTAS (Run-Time Abstraction Services) support, found on all IBM systems and some Apple ones - Improve support for 32-bit real mode Open Firmware systems - Pull some more OF bits over from the AIM directory - Fix memory detection on IBM LPARs and systems with more than one /memory node (by andreast@)	2011-06-02 14:12:37 +00:00
Attilio Rao	7fcdc9a26f	MFC	2011-05-26 17:38:00 +00:00
Nathan Whitehorn	2ec6a5984c	Add a missing isync.	2011-05-26 14:34:22 +00:00
Attilio Rao	7e7a34e520	MFC	2011-05-16 16:34:03 +00:00
Nathan Whitehorn	43db7b0eab	Remove a useless check that served only to make 64-bit PPC systems unbootable after r221855. Submitted by: andreast MFC after: 1 week	2011-05-16 03:32:40 +00:00
Attilio Rao	c47dd3db8c	Add the powerpc support. Note that there is a dirty hack for calling openpic_write(), but nwhitehorn approved it. Discussed with: nwhitehorn	2011-05-09 16:16:15 +00:00
Andreas Tobler	c819dfaeba	Add leading zeros when printing the physical memory chunks on __powerpc64__. Approved by: nwhitehorn (mentor)	2011-04-19 07:49:58 +00:00
Andreas Tobler	415a54c8c5	Adjust debugging string to match the actual function. Approved by: nwhitehorn (mentor)	2011-04-14 19:37:31 +00:00
Andreas Tobler	8fd7d65779	The macro MOEA_PVO_CHECK is empty and not used. It is a left over from the NetBSD import. Remove the definition and all its occurrences. Approved by: nwhitehorn (mentor)	2011-04-14 18:26:50 +00:00
Matthew D Fleming	c77715ef6c	Mostly revert r219468, as I had misremembered the C standard regarding the size of an extern array. Keep one change from strncpy to strlcpy.	2011-03-11 18:56:55 +00:00
Matthew D Fleming	cd67ac41ae	Use MAXPATHLEN rather than the size of an extern array when copying the kernel name. Also consistenly use strlcpy(). Suggested by: Warner Losh	2011-03-10 22:56:00 +00:00
Nathan Whitehorn	79c77d726e	Turn off default generation of userland dot symbols on powerpc64 now that we have a binutils that supports it. Kernel dot symbols remain on to assist DDB.	2011-02-18 21:44:53 +00:00
Dmitry Chagin	a5c1afadeb	Add macro to test the sv_flags of any process. Change some places to test the flags instead of explicit comparing with address of known sysentvec structures. MFC after: 1 month	2011-01-26 20:03:58 +00:00
Sergey Kandaurov	4053b05b91	Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize. Submitted by: perryh pluto.rain.com (previous version) Reviewed by: jhb Approved by: kib (mentor) Tested by: universe	2011-01-21 10:26:26 +00:00
Konstantin Belousov	55aabb7fd1	For architectures not using direct map , and requiring real KVA page for sf buf allocation, use wakeup() instead of wakeup_one() to notify sf buffer waiters about free buffer. sf_buf_alloc() calls msleep(PCATCH) when SFB_CATCH flag was given, and for simultaneous wakeup and signal delivery, msleep() returns EINTR/ERESTART despite the thread was selected for wakeup_one(). As result, we loose a wakeup, and some other waiter will not be woken up. Reported and tested by: az Reviewed by: alc, jhb MFC after: 1 week	2011-01-18 21:57:02 +00:00
Andreas Tobler	49ffb2cf8c	Remove unused variables. Spotted by a cppcheck (devel/cppcheck, http://sourceforge.net/projects/cppcheck) run. Approved by: nwhitehorn (mentor)	2011-01-15 19:16:05 +00:00
Nathan Whitehorn	ff30eecffe	Fix handling of NX pages on capable CPUs. Thanks to kib for prodding me in the right direction.	2011-01-13 04:37:48 +00:00
Andreas Tobler	7dbe66c157	Remove unused variables. Spotted by a cppcheck (devel/cppcheck, http://sourceforge.net/projects/cppcheck) run. Approved by: nwhitehorn (mentor)	2011-01-06 20:19:01 +00:00
Nathan Whitehorn	52a190480f	Only keep track of PTE validity statistics for pages not locked in the table. The 'locked' attribute is used to circumvent the regular page table locking for some special pages, with the result that including locked pages here causes races when updating the stats.	2010-12-28 17:02:15 +00:00
Nathan Whitehorn	ed1e1e2a9e	Garbage-collect unused variable.	2010-12-19 16:07:53 +00:00
Nathan Whitehorn	41f15bbbd9	Add some isync()s related to the 64-bit MMU scratch page to avoid race conditions on its invalidation.	2010-12-11 20:29:52 +00:00
Nathan Whitehorn	bef5da7f98	Add an abstraction layer to the 64-bit AIM MMU's page table manipulation logic to support modifying the page table through a hypervisor. This uses KOBJ inheritance to provide subclasses of the base 64-bit AIM MMU class with additional methods for page table manipulation. Many thanks to Peter Grehan for suggesting this design and implementing the MMU KOBJ inheritance mechanism.	2010-12-04 02:42:52 +00:00
Dimitry Andric	3e288e6238	After some off-list discussion, revert a number of changes to the DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various people working on the affected files. A better long-term solution is still being considered. This reversal may give some modules empty set_pcpu or set_vnet sections, but these are harmless. Changes reverted: ------------------------------------------------------------------------ r215318 \| dim \| 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) \| 4 lines Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined. ------------------------------------------------------------------------ r215317 \| dim \| 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) \| 3 lines Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree. ------------------------------------------------------------------------ r215316 \| dim \| 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) \| 2 lines Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.	2010-11-22 19:32:54 +00:00
Dimitry Andric	31c6a0037e	Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree.	2010-11-14 20:38:11 +00:00
Nathan Whitehorn	cebdaa5881	Partially revert r215182. There appears to be a silicon bug on the 970 that causes AP bringup to fail if some of the Cell HID-register code is anywhere in the instruction stream. Pending a better solution, cache performance on SMP Cell systems running without a hypervisor will be suboptimal.	2010-11-12 20:26:34 +00:00
Nathan Whitehorn	2971d3bb6e	Add CPU support code for the IBM Cell Broadband Engine.	2010-11-12 15:20:10 +00:00
Nathan Whitehorn	fe3b4685c7	Remove use of a separate ofw_pmap on 32-bit CPUs. Many Open Firmware mappings need to end up in the kernel anyway since the kernel begins executing in OF context. Separating them adds needless complexity, especially since the powerpc64 and mmu_oea64 code gave up on it a long time ago. As a side effect, the PPC ofw_machdep code is no longer AIM-specific, so move it to powerpc/ofw.	2010-11-12 05:12:38 +00:00
Nathan Whitehorn	16bfd6f347	Remove or conditionalize some hypervisor-unfriendly instruction sequences.	2010-11-12 04:22:00 +00:00
Nathan Whitehorn	6413b05739	Add some platform KOBJ extensions and continue integrating PowerPC hypervisor infrastructure support: - Fix coexistence of multiple platform modules in the same kernel - Allow platform modules to provide an SMP topology - PowerPC hypervisors limit the amount of memory accessible in real mode. Allow the platform modules to specify the maximum real-mode address, and modify the bits of the kernel that need to allocate real-mode-accessible buffers to respect this limits.	2010-11-12 04:18:19 +00:00
Nathan Whitehorn	b13c7dec5f	Fix an error in r215067. An existing /chosen/mmu but missing translations property just means we shouldn't add any translations, not that we should panic.	2010-11-12 04:13:48 +00:00
Nathan Whitehorn	5b7ed13bc8	Centralize CPU idle routines into powerpc/cpu.c and use the same cpu_idle_hook mechanism that x86 uses for overriding the idle routine. This is required for supporting ilding the CPU under PowerPC hypervisors.	2010-11-12 03:43:22 +00:00
Rafal Jaworowski	1f87b29431	Fix typo in the comment.	2010-11-11 13:46:28 +00:00
Nathan Whitehorn	5ebee02036	Add support for the IMISS, DLMISS, and DSMISS traps required to run FreeBSD on a G2 core. PR: powerpc/111296 Submitted by: Andrew Turner	2010-11-11 02:40:00 +00:00
Nathan Whitehorn	2824d5d2f4	Make AIM early-boot code function correctly without Open Firmware.	2010-11-09 23:53:47 +00:00
John Baldwin	0108cce0a4	Adjust the order of operations in spinlock_enter() and spinlock_exit() to work properly with single-stepping in a kernel debugger. Specifically, these routines have always disabled interrupts before increasing the nesting count and restored the prior state of interrupts after decreasing the nesting count to avoid problems with a nested interrupt not disabling interrupts when acquiring a spin lock. However, trap interrupts for single-stepping can still occur even when interrupts are disabled. Now the saved state of interrupts is not saved in the thread until after interrupts have been disabled and the nesting count has been increased. Similarly, the saved state from the thread cannot be read once the nesting count has been decreased to zero. To fix this, use temporary variables to store interrupt state and shuffle it between the thread's MD area and the appropriate registers. In cooperation with: bde MFC after: 1 month	2010-11-05 13:42:58 +00:00
Nathan Whitehorn	87acfc2a51	Fix two mistakes on 32-bit systems. The slbmte code in syscall() is 64-bit only, and should be protected with an ifdef, and the no-execute bit in 32-bit set_user_sr() should be set before the comparison, not after, or it will never match.	2010-11-03 16:21:47 +00:00
Nathan Whitehorn	e0f88469c7	Clean up the user segment handling code a little more. Now that set_user_sr() itself caches the user segment VSID, there is no need for cpu_switch() to do it again. This change also unifies the 32 and 64-bit code paths for kernel faults on user pages and remaps the user SLB slot on 64-bit systems when taking a syscall to avoid some unnecessary segment exception traps.	2010-11-03 15:15:48 +00:00
Alan Cox	e396eb604f	Implement pmap_is_prefaultable(). Reviewed by: nwhitehorn	2010-11-01 02:22:48 +00:00
Nathan Whitehorn	e36e3d8221	Add a security nit to recent copyin/out changes: map the user segment no-execute in case of exploitable kernel bugs. MFC after: 1 week	2010-10-31 23:04:15 +00:00
Nathan Whitehorn	ad6b3047a4	Next-to-leading-order perturbation of synchronization operations for switching the user segment register. All races should now be closed and a minimum of pipelines flushes be required to close them.	2010-10-31 22:55:51 +00:00
Nathan Whitehorn	c4bcebed17	Add some missing parentheses so that moea_bat_mapped() actually works. Submitted by: alc MFC after: 3 days	2010-10-31 15:07:09 +00:00
Nathan Whitehorn	54c562081f	Restructure the way the copyin/copyout segment is stored to prevent a concurrency bug. Since all SLB/SR entries were invalidated during an exception, a decrementer exception could cause the user segment to be invalidated during a copyin()/copyout() without a thread switch that would cause it to be restored from the PCB, potentially causing the operation to continue on invalid memory. This is now handled by explicit restoration of segment 12 from the PCB on 32-bit systems and a check in the Data Segment Exception handler on 64-bit. While here, cause copyin()/copyout() to check whether the requested user segment is already installed, saving some pipeline flushes, and fix the synchronization primitives around the mtsr and slbmte instructions to prevent accessing stale segments. MFC after: 2 weeks	2010-10-30 23:07:30 +00:00
Nathan Whitehorn	2639d62ec2	Handle vector assist traps without a kernel panic, by setting denormalized values to zero. A correct solution would involve emulating vector operations on denormalized values, but this has little effect on accuracy and is much less complicated for now. MFC after: 2 weeks	2010-10-05 18:08:07 +00:00
Nathan Whitehorn	94363f5311	Follow exactly the steps in architecture manual for correctly invalidating TLB entries instead of trying to cut corners.	2010-10-04 16:07:48 +00:00
Nathan Whitehorn	cd6a97f065	Fix pmap_page_set_memattr() behavior in the presence of fictitious pages by just caching the mode for later use by pmap_enter(), following amd64. While here, correct some mismerges from mmu_oea64 -> mmu_oea and clean up some dead code found while fixing the fictitious page behavior.	2010-10-01 18:59:30 +00:00
Nathan Whitehorn	c1f4123b05	Add support for memory attributes (pmap_mapdev_attr() and friends) on PowerPC/AIM. This is currently stubbed out on Book-E, since I have no idea how to implement it there.	2010-09-30 18:14:12 +00:00
Nathan Whitehorn	6416b9a85d	Split the SLB mirror cache into two kinds of object, one for kernel maps which are similar to the previous ones, and one for user maps, which are arrays of pointers into the SLB tree. This changes makes user SLB updates atomic, closing a window for memory corruption. While here, rearrange the allocation functions to make context switches faster.	2010-09-16 03:46:17 +00:00
Nathan Whitehorn	95fa3335e1	Replace the SLB backing store splay tree used on 64-bit PowerPC AIM hardware with a lockless sparse tree design. This marginally improves the performance of PMAP and allows copyin()/copyout() to run without acquiring locks when used on wired mappings. Submitted by: mdf	2010-09-16 00:22:25 +00:00
Peter Grehan	33529b98d5	Introduce inheritance into the PowerPC MMU kobj interface. include/mmuvar.h - Change the MMU_DEF macro to also create the class definition as well as define the DATA_SET. Add a macro, MMU_DEF_INHERIT, which has an extra parameter specifying the MMU class to inherit methods from. Update the comments at the start of the header file to describe the new macros. booke/pmap.c aim/mmu_oea.c aim/mmu_oea64.c - Collapse mmu_def_t declaration into updated MMU_DEF macro The MMU_DEF_INHERIT macro will be used in the PS3 MMU implementation to allow it to inherit the stock powerpc64 MMU methods. Reviewed by: nwhitehorn	2010-09-15 00:17:52 +00:00
Peter Grehan	44633af3c7	Resurrect PSIM support by moving the cacheline size-detection warning printf outside of the MMU-disabled region. A call into OpenFirmware with the MMU off resulted in an internal PSIM assert.	2010-09-14 03:18:11 +00:00
Alexander Motin	a157e42516	Refactor timer management code with priority to one-shot operation mode. The main goal of this is to generate timer interrupts only when there is some work to do. When CPU is busy interrupts are generating at full rate of hz + stathz to fullfill scheduler and timekeeping requirements. But when CPU is idle, only minimum set of interrupts (down to 8 interrupts per second per CPU now), needed to handle scheduled callouts is executed. This allows significantly increase idle CPU sleep time, increasing effect of static power-saving technologies. Also it should reduce host CPU load on virtualized systems, when guest system is idle. There is set of tunables, also available as writable sysctls, allowing to control wanted event timer subsystem behavior: kern.eventtimer.timer - allows to choose event timer hardware to use. On x86 there is up to 4 different kinds of timers. Depending on whether chosen timer is per-CPU, behavior of other options slightly differs. kern.eventtimer.periodic - allows to choose periodic and one-shot operation mode. In periodic mode, current timer hardware taken as the only source of time for time events. This mode is quite alike to previous kernel behavior. One-shot mode instead uses currently selected time counter hardware to schedule all needed events one by one and program timer to generate interrupt exactly in specified time. Default value depends of chosen timer capabilities, but one-shot mode is preferred, until other is forced by user or hardware. kern.eventtimer.singlemul - in periodic mode specifies how much times higher timer frequency should be, to not strictly alias hardclock() and statclock() events. Default values are 2 and 4, but could be reduced to 1 if extra interrupts are unwanted. kern.eventtimer.idletick - makes each CPU to receive every timer interrupt independently of whether they busy or not. By default this options is disabled. If chosen timer is per-CPU and runs in periodic mode, this option has no effect - all interrupts are generating. As soon as this patch modifies cpu_idle() on some platforms, I have also refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions (if supported) under high sleep/wakeup rate, as fast alternative to other methods. It allows SMP scheduler to wake up sleeping CPUs much faster without using IPI, significantly increasing performance on some highly task-switching loads. Tested by: many (on i386, amd64, sparc64 and powerc) H/W donated by: Gheorghe Ardelean Sponsored by: iXsystems, Inc.	2010-09-13 07:25:35 +00:00
Alexander Motin	707c2fb950	Update PowerPC event timer code to use new event timers infrastructure. Reviewed by: nwitehorn Tested by: andreast H/W donated by: Gheorghe Ardelean	2010-09-11 04:45:51 +00:00
Andriy Gapon	3d844eddb7	bus_add_child: change type of order parameter to u_int This reflects actual type used to store and compare child device orders. Change is mostly done via a Coccinelle (soon to be devel/coccinelle) semantic patch. Verified by LINT+modules kernel builds. Followup to: r212213 MFC after: 10 days	2010-09-10 11:19:03 +00:00
Nathan Whitehorn	61473c5fd1	Reorder statistics tracking and table lock acquisitions already in place to avoid race conditions updating the PVO statistics.	2010-09-09 16:06:55 +00:00
Nathan Whitehorn	bcb478eb35	Fix a printf specifier on 64-bit systems.	2010-09-08 19:28:43 +00:00
Nathan Whitehorn	0dfddf6e65	Fix a typo in the original import of this code from NetBSD that caused the wrong element of the VSID bitmap array to be examined after a collision, leading to reallocation of in-use VSIDs under some circumstances, with attendant memory corruption. Also add an assert to check for this kind of problem in the future. MFC after: 4 days	2010-09-08 16:58:06 +00:00
Nathan Whitehorn	4982c539ae	Fix an error made in r209975 related to context ID allocation for 64-bit PowerPC CPUs running a 32-bit kernel. This bug could cause in-use VSIDs to be allocated again to another process, causing memory space overlaps and corruption. Reported by: linimon	2010-09-07 23:31:48 +00:00
Nathan Whitehorn	e9b5f21819	Fix the same race condition on 32-bit AIM CPUs that was fixed for 64-bit ones in r211967 involving VSID allocation.	2010-09-06 23:07:58 +00:00
Alexander Motin	a3b31d37df	Make nexus report name and compat fields as pnpinfo for devices on the first level of hierarchy, same as done on deeper levels.	2010-09-05 19:57:24 +00:00
Nathan Whitehorn	b2a237be5c	Restructure how reset and poweroff are handled on PowerPC systems, since the existing code was very platform specific, and broken for SMP systems trying to reboot from KDB. - Add a new PLATFORM_RESET() method to the platform KOBJ interface, and migrate existing reset functions into platform modules. - Modify the OF_reboot() routine to submit the request by hand to avoid the IPIs involved in the regular openfirmware() routine. This fixes reboot from KDB on SMP machines. - Move non-KDB reset and poweroff functions on the Powermac platform into the relevant power control drivers (cuda, pmu, smu), instead of using them through the Open Firmware backdoor. - Rename platform_chrp to platform_powermac since it has become increasingly Powermac specific. When we gain support for IBM systems, we will grow a new platform_chrp.	2010-08-31 15:27:46 +00:00
Nathan Whitehorn	68181d0091	Remove some code made obsolete by the powerpc64 import.	2010-08-31 15:22:09 +00:00
Nathan Whitehorn	1264a5f4c5	Missed one place the SLB lock should be held in r211967.	2010-08-31 02:07:13 +00:00
Nathan Whitehorn	7eeda62ca9	Avoid a race in the allocation of new segment IDs that could result in memory corruption on heavily loaded SMP systems. MFC after: 2 weeks	2010-08-29 18:17:38 +00:00
Nathan Whitehorn	50e64c14a2	pmap_mapdev() does not appear to actually need GIANT to be held here, and asserting that is held breaks drm. MFC after: 2 weeks	2010-08-27 05:29:59 +00:00
John Baldwin	8c7a92bd4a	Remove unused KTRACE includes.	2010-08-19 16:41:27 +00:00
Nathan Whitehorn	3b4b38304e	Improve hash coverage for kernel page table entries by modifying the kernel ESID -> VSID map function. This makes ZFS run stably on PowerPC under heavy loads (repeated simultaneous SVN checkouts and updates).	2010-07-31 21:35:15 +00:00
Nathan Whitehorn	c3e289e1ce	MFppc64: Kernel sources for 64-bit PowerPC, along with build-system changes to keep 32-bit kernels compiling (build system changes for 64-bit kernels are coming later). Existing 32-bit PowerPC kernel configurations must be updated after this change to specify their architecture.	2010-07-13 05:32:19 +00:00
Nathan Whitehorn	cc81c44dd8	Unify ABI-related bits of the Book-E and AIM machdep routines (exec_setregs, etc.) in order to simplify the addition of 64-bit support, and possible future extension of the Book-E code to handle hard floating point and Altivec. MFC after: 1 month	2010-07-12 16:08:07 +00:00
Nathan Whitehorn	932773c882	The number after 2 is 3, not 4. MFC after: 3 days	2010-07-09 14:04:16 +00:00
Nathan Whitehorn	ce0df83f13	Remove an unnecessary include of opt_psim.h, which is not present on powerpc64.	2010-07-09 14:02:57 +00:00
Nathan Whitehorn	945c08644e	MFppc64: Minor 64-bit-cleanliness upgrades and support for platform detection on subtly-broken OF implementations like in the Mambo simulator.	2010-07-09 14:02:24 +00:00
Nathan Whitehorn	f6421f31e3	Replace the existing PowerPC busdma implementation with the one from amd64 (with slight modifications). This provides support for bounce buffers, which are required on systems with RAM above 4 GB.	2010-07-08 15:38:55 +00:00
Marcel Moolenaar	8a35d194f2	Remove the unneeded header <machine/intr.h>.	2010-07-02 02:17:39 +00:00
John Baldwin	fc0de8f0b6	Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to <sys/syscallsubr.h> where all other kern_<syscall> prototypes live.	2010-06-30 18:03:42 +00:00
Nathan Whitehorn	08393b3efa	Configure interrupts on SMP systems to be distributed among all online CPUs by default, and provide a functional version of BUS_BIND_INTR(). While here, fix some potential concurrency problems in the interrupt handling code.	2010-06-23 22:33:03 +00:00
Nathan Whitehorn	976cc6975b	Temporarily disable instruction relocation while setting up the kernel's IBAT entry in early boot in order to prevent possible faults from races between the instruction cache and the MMU. PR: powerpc/148003 MFC after: 3 days	2010-06-20 16:56:48 +00:00
Nathan Whitehorn	eaef5f0af8	Provide for multiple, cascaded PICs on PowerPC systems, and extend the OFW interrupt map interface to also return the device's interrupt parent. MFC after: 8.1-RELEASE	2010-06-18 14:06:27 +00:00
Nathan Whitehorn	ed6e65a2fe	Make SMP work on MPC7400-based Apple desktops like the PowerMac3,3.	2010-06-12 21:14:22 +00:00
Alan Cox	9124d0d6a3	Relax one of the new assertions in pmap_enter() a little. Specifically, allow pmap_enter() to be performed on an unmanaged page that doesn't have VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages. (See, for example, pmap_remove_write().)	2010-06-11 15:49:39 +00:00
Alan Cox	ce18658792	Reduce the scope of the page queues lock and the number of PG_REFERENCED changes in vm_pageout_object_deactivate_pages(). Simplify this function's inner loop using TAILQ_FOREACH(), and shorten some of its overly long lines. Update a stale comment. Assert that PG_REFERENCED may be cleared only if the object containing the page is locked. Add a comment documenting this. Assert that a caller to vm_page_requeue() holds the page queues lock, and assert that the page is on a page queue. Push down the page queues lock into pmap_ts_referenced() and pmap_page_exists_quick(). (As of now, there are no longer any pmap functions that expect to be called with the page queues lock held.) Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever be passed an unmanaged page. Assert this rather than returning "0" and "FALSE" respectively. ARM: Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH(). Push down the page queues lock inside of pmap_clearbit(), simplifying pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write(). Additionally, this allows for avoiding the acquisition of the page queues lock in some cases. PowerPC/AIM: moea_page_exits_quick() and moea_page_wired_mappings() will never be called before pmap initialization is complete. Therefore, the check for moea_initialized can be eliminated. Push down the page queues lock inside of moea_clear_bit(), simplifying moea_clear_modify() and moea_clear_reference(). The last parameter to moea_clear_bit() is never used. Eliminate it. PowerPC/BookE: Simplify mmu_booke_page_exists_quick()'s control flow. Reviewed by: kib@	2010-06-10 16:56:35 +00:00
Nathan Whitehorn	c668b5b488	Correct a harmless typo introduced when copying code from mmu_oea64. Submitted by: alc MFC after: 8.1-RELEASE	2010-06-05 18:24:41 +00:00
Alan Cox	2368a37125	Don't set PG_WRITEABLE in pmap_enter() unless the page is managed.	2010-06-05 06:56:06 +00:00
Alan Cox	c46b90e90a	Push down page queues lock acquisition in pmap_enter_object() and pmap_is_referenced(). Eliminate the corresponding page queues lock acquisitions from vm_map_pmap_enter() and mincore(), respectively. In mincore(), this allows some additional cases to complete without ever acquiring the page queues lock. Assert that the page is managed in pmap_is_referenced(). On powerpc/aim, push down the page queues lock acquisition from moea_is_modified() and moea_is_referenced() into moea*_query_bit(). Again, this will allow some additional cases to complete without ever acquiring the page queues lock. Reorder a few statements in vm_page_dontneed() so that a race can't lead to an old reference persisting. This scenario is described in detail by a comment. Correct a spelling error in vm_page_dontneed(). Assert that the object is locked in vm_page_clear_dirty(), and restrict the page queues lock assertion to just those cases in which the page is currently writeable. Add object locking to vnode_pager_generic_putpages(). This was the one and only place where vm_page_clear_dirty() was being called without the object being locked. Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call to vm_page_clear_dirty(). Change vnode_pager_generic_putpages() to the modern-style of function definition. Also, change the name of one of the parameters to follow virtual memory system naming conventions. Reviewed by: kib	2010-05-26 18:00:44 +00:00
Alan Cox	567e51e18c	Roughly half of a typical pmap_mincore() implementation is machine- independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)	2010-05-24 14:26:57 +00:00
Konstantin Belousov	afe1a68827	Reorganize syscall entry and leave handling. Extend struct sysvec with three new elements: sv_fetch_syscall_args - the method to fetch syscall arguments from usermode into struct syscall_args. The structure is machine-depended (this might be reconsidered after all architectures are converted). sv_set_syscall_retval - the method to set a return value for usermode from the syscall. It is a generalization of cpu_set_syscall_retval(9) to allow ABIs to override the way to set a return value. sv_syscallnames - the table of syscall names. Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding the call to cpu_set_syscall_retval(). The new functions syscallenter(9) and syscallret(9) are provided that use sv_syscall pointers and contain the common repeated code from the syscall() implementations for the architecture-specific syscall trap handlers. Syscallenter() fetches arguments, calls syscall implementation from ABI sysent table, and set up return frame. The end of syscall bookkeeping is done by syscallret(). Take advantage of single place for MI syscall handling code and implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the thread is stopped at syscall entry or return point respectively. The EXEC flag augments SCX and notifies debugger that the process address space was changed by one of exec(2)-family syscalls. The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are changed to use syscallenter()/syscallret(). MIPS and arm are not converted and use the mostly unchanged syscall() implementation. Reviewed by: jhb, marcel, marius, nwhitehorn, stas Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc), stas (mips) MFC after: 1 month	2010-05-23 18:32:02 +00:00
Nathan Whitehorn	50b8f14f71	Now that single-threaded access to firmware is enforced by IPI_RENDEZVOUS, the ofw mutex is irrelevant.	2010-05-21 20:46:01 +00:00
Nathan Whitehorn	96a985c51d	Fix a long-standing bug in the PowerPC OFW call function on SMP machines where running ofwdump could cause hangs by forcing all secondary CPUs into a busy wait with interrupts off during the call. Following section 8.4 of the Open Firmware PowerPC processor binding, the firmware is free to overwrite the system interrupt handlers during OF calls, restoring the OS handlers on exit. On single CPU systems, this process is invisible to the operating system. On multiple CPU systems, taking any exception on a secondary CPU while an OF call is in progress ends with that exception vectored into OF, resulting in a slow movement of the entire system into firmware context and a machine hang. MFC after: 3 days	2010-05-20 21:07:58 +00:00
Alan Cox	9ab6032f73	On entry to pmap_enter(), assert that the page is busy. While I'm here, make the style of assertion used by pmap_enter() consistent across all architectures. On entry to pmap_remove_write(), assert that the page is neither unmanaged nor fictitious, since we cannot remove write access to either kind of page. With the push down of the page queues lock, pmap_remove_write() cannot condition its behavior on the state of the PG_WRITEABLE flag if the page is busy. Assert that the object containing the page is locked. This allows us to know that the page will neither become busy nor will PG_WRITEABLE be set on it while pmap_remove_write() is running. Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly do copy-on-write-based zero-copy transmit on unmanaged or fictitious pages, so don't even try. Previously, the call to pmap_remove_write() would have failed silently.	2010-05-16 23:45:10 +00:00
Nathan Whitehorn	4a26780b9a	Pull OF_quiesce() out of the MI Open Firmware layer and entirely into PPC ofw_machdep.c, in recognition of its state as a machine specific hack. Requested by: marius	2010-05-16 22:01:43 +00:00
Nathan Whitehorn	79bf3fcd18	On PowerMac11,2 and (presumably) PowerMac12,1, we need to quiesce the firmware in order to take over control of the SMU. Without doing this, the firmware background process doing fan control will run amok as we take over the system and crash the management chip. This is limited to these two machines because our kernel is heavily dependent on firmware accesses, and so quiescing firmware can cause nasty problems.	2010-05-16 15:56:59 +00:00
Alan Cox	3c4a24406b	Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.	2010-05-08 20:34:01 +00:00
Kip Macy	2965a45315	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib	2010-04-30 00:46:43 +00:00
Alan Cox	7b85f59183	Resurrect pmap_is_referenced() and use it in mincore(). Essentially, pmap_ts_referenced() is not always appropriate for checking whether or not pages have been referenced because it clears any reference bits that it encounters. For example, in mincore(), clearing the reference bits has two negative consequences. First, it throws off the activity count calculations performed by the page daemon. Specifically, a page on which mincore() has called pmap_ts_referenced() looks less active to the page daemon than it should. Consequently, the page could be deactivated prematurely by the page daemon. Arguably, this problem could be fixed by having mincore() duplicate the activity count calculation on the page. However, there is a second problem for which that is not a solution. In order to clear a reference on a 4KB page, it may be necessary to demote a 2/4MB page mapping. Thus, a mincore() by one process can have the side effect of demoting a superpage mapping within another process!	2010-04-24 17:32:52 +00:00
Nathan Whitehorn	a107d8aac9	Change the arguments of exec_setregs() so that it receives a pointer to the image_params struct instead of several members of that struct individually. This makes it easier to expand its arguments in the future without touching all platforms. Reviewed by: jhb	2010-03-25 14:24:00 +00:00
Nathan Whitehorn	3df9e0375a	Get nexus(4) out of the RTC business. The interface used by nexus(4) in Open Firmware was Apple-specific, and we have complete coverage of Apple system controllers, so move RTC responsibilities into the system controller drivers. This avoids interesting problems from manipulating these devices through Open Firmware behind the backs of their drivers. Obtained from: NetBSD MFC after: 2 weeks	2010-03-23 03:14:44 +00:00
Nathan Whitehorn	46c3bbc0ea	Open Firmware on powerpc is generally non-reetrant, so serialize all OF calls with a mutex.	2010-03-23 01:11:10 +00:00
Nathan Whitehorn	cb8617b275	Revisit locking in the 64-bit AIM PMAP. The PVO head for a page is generally protected by the VM page queue mutex. Instead of extending the table lock to cover the PVO heads, add some asserts that the page queue mutex is in fact held. This fixes several LORs and possible deadlocks. This also adds an optimization to moea64_kextract() useful for direct-mapped quantities, like UMA buffers. Being able to use this from inside UMA removes an additional LOR.	2010-03-20 14:35:24 +00:00
Nathan Whitehorn	5cf13d9573	Fix two small bugs. The PowerPC 970 does not support non-coherent memory access, and reflects this by autonomously writing LPTE_M into PTE entries. As such, we should not panic if LPTE_M changes by itself. While here, fix a harmless typo in moea64_sync_icache().	2010-03-15 00:27:40 +00:00
Nathan Whitehorn	ec3c90f3c8	Place interrupt handling in a critical section and remove double counting in incrementing the interrupt nesting level. This fixes a number of bugs in which the interrupt thread could be preempted by an IPI, indefinitely delaying acknowledgement of the interrupt to the PIC, causing interrupt starvation and hangs. Reported by: linimon Reviewed by: marcel, jhb MFC after: 1 week	2010-03-09 02:00:53 +00:00
Nathan Whitehorn	5d7fdd31c8	Fix an obvious lock escape and fix a typo in a comment.	2010-03-04 17:24:31 +00:00
Nathan Whitehorn	9fcd9ccb86	Patch some more concurrency issues here. This expands the page table lock to cover the PVOs, and removes the scratchpage PTEs from the PVOs entirely to avoid the system trying to be helpful and rewriting them.	2010-03-04 06:39:58 +00:00
Joel Dahl	2598954edc	The NetBSD Foundation has granted permission to remove clause 3 and 4 from their software. Obtained from: NetBSD	2010-03-03 17:07:02 +00:00
Nathan Whitehorn	44f06ae57d	Move the OEA64 scratchpage to the end of KVA from the beginning, and set its PVO to map physical address 0 instead of kernelstart. This fixes a situation in which a user process could attempt to return this address via KVM, have it fault while being modified, and then panic the kernel because (a) it is supposed to map a valid address and (b) it lies in the no-fault region between VM_MIN_KERNEL_ADDRESS and virtual_avail. While here, move msgbuf and dpcpu make into regular KVA space for consistency with other implementations.	2010-02-25 03:53:21 +00:00
Nathan Whitehorn	07d5198098	Provide an implementation of pmap_dev_direct_mapped() on OEA64. This is required in order to be able to mmap the running kernel, which is turn required to avoid fstat returning gibberish.	2010-02-25 03:49:17 +00:00
Nathan Whitehorn	9b3829abc7	Use dcbz instead of word stores for page zeroing, providing a factor of 3-4 speedup.	2010-02-24 00:55:55 +00:00
Nathan Whitehorn	83c01b8cf6	Close a race involving the OEA64 scratchpage. When the scratch page's physical address is changed, there is a brief window during which its PTE is invalid. Since moea64_set_scratchpage_pa() does not and cannot hold the page table lock, it was possible for another CPU to insert a new PTE into the scratch page's PTEG slot during this interval, corrupting both mappings. Solve this by creating a new flag, LPTE_LOCKED, such that moea64_pte_insert will avoid claiming locked PTEG slots even if they are invalid. This change also incorporates some additional paranoia added to solve things I thought might be this bug. Reported by: linimon	2010-02-24 00:54:37 +00:00
Nathan Whitehorn	a6b62947cf	Allow user programs to execute mfpvr instructions. Linux allows this, and some math-related software like GMP expects to be able to use it to pick a target appropriately. MFC after: 1 week	2010-02-22 14:17:23 +00:00
Nathan Whitehorn	ab73970649	Reduce KVA pressure on OEA64 systems running in bridge mode by mapping UMA segments at their physical addresses instead of into KVA. This emulates the direct mapping behavior of OEA32 in an ad-hoc way. To make this work properly required sharing the entire kernel PMAP with Open Firmware, so ofw_pmap is transformed into a stub on 64-bit CPUs. Also implement some more tweaks to get more mileage out of our limited amount of KVA, principally by extending KVA into segment 16 until the beginning of the first OFW mapping. Reported by: linimon	2010-02-20 16:23:29 +00:00
Nathan Whitehorn	062c8f4c86	Fix a bug where pages being removed from memory entirely no longer have PVOs, and so the modified state of the page can no longer be communicated to the VM layer, causing pages not to be flushed to swap when needed, in turn causing memory corruption. Also make several correctness adjustments to I-Cache synchronization and TLB invalidation for 64-bit Book-S CPUs. Obtained from: projects/ppc64 Discussed with: grehan MFC after: 2 weeks	2010-02-18 15:00:43 +00:00
Martin Blapp	c2ede4b379	Remove extraneous semicolons, no functional changes. Submitted by: Marc Balmer <marc@msys.ch> MFC after: 1 week	2010-01-07 21:01:37 +00:00
Nathan Whitehorn	9fbbaac0bb	The first argument of dcbz interprets r0 as a literal zero, not the second. This worked before by accident. MFC after: 1 week	2009-12-03 20:55:09 +00:00
Nathan Whitehorn	227f66048e	Add a CPU features framework on PowerPC and simplify CPU setup a little more. This provides three new sysctls to user space: hw.cpu_features - A bitmask of available CPU features hw.floatingpoint - Whether or not there is hardware FP support hw.altivec - Whether or not Altivec is available PR: powerpc/139154 MFC after: 10 days	2009-11-28 17:33:19 +00:00
Alan Cox	e2997fea72	Simplify the invocation of vm_fault(). Specifically, eliminate the flag VM_FAULT_DIRTY. The information provided by this flag can be trivially inferred by vm_fault(). Discussed with: kib	2009-11-27 20:24:11 +00:00
Nathan Whitehorn	961e3d1410	Garbage collect some code that was never compiled in to handle Altivec during traps. It predates actual Altivec support and was never used.	2009-11-22 20:45:15 +00:00
Nathan Whitehorn	4603558264	Provide a real fix to the too-many-translations problem when booting from CD on 64-bit hardware to replace existing band-aids. This occurred when the preloaded mdroot required too many mappings for the static buffer. Since we only use the translations buffer once, allocate a dynamic buffer on the stack. This early in the boot process, the call chain is quite short and we can be assured of having sufficient stack space. Reviewed by: grehan	2009-11-12 15:19:09 +00:00
Konstantin Belousov	a7b890448c	Extract the code that records syscall results in the frame into MD function cpu_set_syscall_retval(). Suggested by: marcel Reviewed by: marcel, davidxu PowerPC, ARM, ia64 changes: marcel Sparc64 tested and reviewed by: marius, also sunv reviewed MIPS tested by: gonzo MFC after: 1 month	2009-11-10 11:43:07 +00:00
Nathan Whitehorn	c10d3c2cd8	Spell sz correctly. Pointed out by: jmallett	2009-11-09 21:12:28 +00:00
Nathan Whitehorn	f90550c2d2	Increase the size of the OFW translations buffer to handle G5 systems that use many translation regions in firmware, and add bounds checking to prevent buffer overflows in case even the new value is exceeded. Reported by: Jacob Lambert MFC after: 3 days	2009-11-09 14:26:23 +00:00
Nathan Whitehorn	156ef7611e	Unbreak cpu_switch(). The register allocator in my brain is clearly broken. Also, Altivec context switching worked before only by accident, but should work now by design.	2009-10-31 20:59:13 +00:00
Nathan Whitehorn	011db0bc81	Remove an unnecessary sync that crept in the last commit.	2009-10-31 18:04:34 +00:00
Nathan Whitehorn	e2cd4c2a65	Fix a race in casuword() exposed by csup. casuword() non-atomically read the current value of its argument before atomically replacing it, which could occasionally return the wrong value on an SMP system. This resulted in user mutex operations hanging when using threaded applications.	2009-10-31 17:59:24 +00:00
Nathan Whitehorn	ad26477892	Loop on blocked threads when using ULE scheduler, removing an XXX MP comment.	2009-10-31 17:55:48 +00:00
Nathan Whitehorn	2bbd567acb	Garbage collect set_user_sr(), which is declared static inline and never called.	2009-10-31 17:46:50 +00:00
Nathan Whitehorn	d8cd25d022	Turn off Altivec data-stream prefetching before going into power-save mode on those CPUs that need it.	2009-10-29 14:22:09 +00:00
Konstantin Belousov	d6e029adbe	In r197963, a race with thread being selected for signal delivery while in kernel mode, and later changing signal mask to block the signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls. Use kern_sigprocmask() instead of direct manipulation of td_sigmask to reschedule newly blocked signals, closing the race. Reviewed by: davidxu Tested by: pho MFC after: 1 month	2009-10-27 10:47:58 +00:00
Nathan Whitehorn	4c1826ad5d	Remove debugging printf that snuck in here. Pointy hat to: me	2009-10-23 21:44:46 +00:00
Nathan Whitehorn	dab90f68ed	Add some more paranoia to setting HID registers, and update the AIM clock routines to work better with SMP. This makes SMP work fully and stably on an Xserve G5. Obtained from: Book-E (clock bits)	2009-10-23 21:36:33 +00:00
Nathan Whitehorn	e2ee8728ee	Do not map the trap vectors into the kernel's address space. They are only used in real mode and keeping them mapped only serves to make NULL a valid address, which results in silent NULL pointer deferences. Suggested by: Patrick Kerharo Obtained from: projects/ppc64	2009-10-23 14:27:40 +00:00
Nathan Whitehorn	999987e51a	Add SMP support on U3-based G5 systems. This does not yet work perfectly: at least on my Xserve, getting the decrementer and timebase on APs to tick requires setting up a clock chip over I2C, which is not yet done. While here, correct the 64-bit tlbie function to set the CPU to 64-bit mode correctly. Hardware donated by: grehan	2009-10-23 03:17:02 +00:00
Marcel Moolenaar	1a4fcaebe3	o Introduce vm_sync_icache() for making the I-cache coherent with the memory or D-cache, depending on the semantics of the platform. vm_sync_icache() is basically a wrapper around pmap_sync_icache(), that translates the vm_map_t argumument to pmap_t. o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc it replaces the pmap_page_executable() function, added to solve the I-cache problem in uiomove_fromphys(). o In proc_rwmem() call vm_sync_icache() when writing to a page that has execute permissions. This assures that when breakpoints are written, the I-cache will be coherent and the process will actually hit the breakpoint. o This also fixes the Book-E PMAP implementation that was missing necessary locking while trying to deal with the I-cache coherency in pmap_enter() (read: mmu_booke_enter_locked). The key property of this change is that the I-cache is made coherent after writes have been done. Doing it in the PMAP layer when adding or changing a mapping means that the I-cache is made coherent before any writes happen. The difference is key when the I-cache prefetches.	2009-10-21 18:38:02 +00:00
Nathan Whitehorn	c7e1669396	Don't assume that physical addresses are identity mapped. This allows the second processor on G5 systems to start. Note that SMP is still non-functional on these systems because of IPI delivery problems.	2009-10-18 17:22:08 +00:00
Nathan Whitehorn	a8211b27b9	Correct another typo. Actually save the condition register instead of overwriting r12 by mistake.	2009-10-11 16:44:58 +00:00
Nathan Whitehorn	b70b3ebfdd	Correct a typo here and actually save DSISR instead of overwriting it.	2009-10-11 16:41:39 +00:00
Nathan Whitehorn	f136543e41	Increase the size of the page table on 64-bit PowerPC machines as a bandaid to prevent exhaustion of the primary and secondary hash groups in the event of extreme stress on the PMAP layer (e.g. a forkbomb). This wastes memory, and should be revised to properly handle PTEG spills instead. Suggested by: grehan Approved by: re (kensmith)	2009-07-12 04:07:52 +00:00
Jeff Roberson	50c202c592	Implement a facility for dynamic per-cpu variables. - Modules and kernel code alike may use DPCPU_DEFINE(), DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined PCPU_. Requires only one extra instruction more than PCPU_ and is virtually the same as __thread for builtin and much faster for shared objects. DPCPU variables can be initialized when defined. - Modules are supported by relocating the module's per-cpu linker set over space reserved in the kernel. Modules may fail to load if there is insufficient space available. - Track space available for modules with a one-off extent allocator. Free may block for memory to allocate space for an extent. Reviewed by: jhb, rwatson, kan, sam, grehan, marius, marcel, stas	2009-06-23 22:42:39 +00:00
Peter Grehan	f61afb4498	Get the gdb/psim emulator functioning again. aim/machdep.c: - the RI status register bit needs to be set when doing the mtmsrd 64-bit instruction test - psim doesn't implement the dcbz instruction so the run-time cacheline test fails. Set the cachline size to 32 to avoid infinite loops in future calls to __syncicache() aim/platform_chrp.c: - if after iterating through / and a name property of "cpus" still isn't found, just search directly for '/cpus'. - psim doesn't put a "reg" property on it's cpu nodes, so assume 0 since it is uniprocessor-only at this point powerpc/openpic.c - the number of CPUs reported is 1 too many on psim's openpic Reviewed by: nwhitehorn MFC after: 1 week (openpic part)	2009-06-10 12:47:54 +00:00
Nathan Whitehorn	9eb9db93da	Introduce support for cpufreq on PowerPC with the dynamic frequency switching capabilities of the MPC7447A and MPC7448.	2009-05-31 09:01:23 +00:00
Marcel Moolenaar	dbb95048da	Add cpu_flush_dcache() for use after non-DMA based I/O so that a possible future I-cache coherency operation can succeed. On ARM for example the L1 cache can be (is) virtually mapped, which means that any I/O that uses temporary mappings will not see the I-cache made coherent. On ia64 a similar behaviour has been observed. By flushing the D-cache, execution of binaries backed by md(4) and/or NFS work reliably. For Book-E (powerpc), execution over NFS exhibits SIGILL once in a while as well, though cpu_flush_dcache() hasn't been implemented yet. Doing an explicit D-cache flush as part of the non-DMA based I/O read operation eliminates the need to do it as part of the I-cache coherency operation itself and as such avoids pessimizing the DMA-based I/O read operations for which D-cache are already flushed/invalidated. It also allows future optimizations whereby the bcopy() followed by the D-cache flush can be integrated in a single operation, which could be implemented using on-chips DMA engines, by-passing the D-cache altogether.	2009-05-18 18:37:18 +00:00
Rafal Jaworowski	7ad9c533ef	PowerPC common SMP startup and time base rework. - make mftb() shared, rewrite in C, provide complementary mttb() - adjust SMP startup per the above, additional comments, minor naming changes - eliminate redundant TB defines, other minor cosmetics Reviewed by: marcel, nwhitehorn Obtained from: Freescale, Semihalf	2009-05-14 16:48:25 +00:00
Nathan Whitehorn	b40ce02a2f	Factor out platform dependent things unrelated to device drivers into a new platform module. These are probed in early boot, and have the responsibility of determining the layout of physical memory, determining the CPU timebase frequency, and handling the zoo of SMP mechanisms found on PowerPC. Reviewed by: marcel, raj Book-E parts by: raj	2009-05-14 00:34:26 +00:00
Rafal Jaworowski	6a5f0fd39d	Zero PCB during early AIM PowerPC init. When memory is not zero'ed by firmware, uninitialized PCB can have bogus contents, which appear as a saved onfault condition, Altivec context to restore etc. and lead to corruption/crashes. This commit fixes such issues. Submitted by: Michal Mazur arg ! semihalf dot com Tested by: Andreas Tobler andreast-list ! fgznet dot ch	2009-04-24 08:57:54 +00:00
Nathan Whitehorn	55fba05bf5	Fix a typo in the SRR1 comparison for program exceptions. While here, replace magic numbers with constants to keep this from happening again. Without this fix, some programs would occasionally get SIGTRAP instead of SIGILL on an illegal instruction. This affected Altivec detection in pixman, and possibly other software. Reported by: Andreas Tobler MFC after: 1 week	2009-04-19 06:30:00 +00:00
Nathan Whitehorn	e3bcab29e6	Changing the overflow trap to use bla to branch to dbtrap in r190946 was bogus. Revert to a branch that does not set LR. It's been a long week...	2009-04-14 04:15:56 +00:00
Nathan Whitehorn	8cf9d6cd7e	Rework the way we get the cacheline size. Instead of having a table of CPUs known to use 128 byte cache lines and defaulting to 32, use the dcbz instruction to measure it. Also make dcbz behave the way you would expect on PPC 970.	2009-04-12 03:03:55 +00:00
Nathan Whitehorn	1e89943aa3	Fix recognition of kernel-mode traps that pass through the KDB trap handler but do not actually invoke KDB. This includes recoverable machine checks encountered in kernel mode. This patch causes machines with Grackle host-PCI bridges to be able to correctly enumerate them again. MFC after: 3 days	2009-04-11 20:43:41 +00:00
Nathan Whitehorn	029c6e958c	Fix the build when KDB is disabled. The second instance of rfi in trap_subr.S that is patched at runtime to rfid on 64-bit systems is inside KDB-specific code, so don't patch it without KDB.	2009-04-05 21:52:13 +00:00
Marcel Moolenaar	e9b3f3045d	Perform a dummy stwcx. when we switch contexts. The context being switched out may hold a reservation. The stwcx. will clear the reservation. This is architecturally recommended. The scenario this addresses is as follows: 1. Thread 1 performs a lwarx and as such holds a reservation. 2. Thread 1 gets switched out (before doing the matching stwcx.) and thread 2 is switched in. 3. Thread 2 performs a stwcx. to the same reservation granule. This will succeed because the processor has the reservation even though thread 2 didn't do the lwarx. Note that on some processors the address given the stwcx. is not checked. On these processors the mere condition of having a reservation would cause the stwcx. to succeed, irrespective of whether the addresses are the same. The dummy stwcx. is especially important for those processors.	2009-04-04 22:23:03 +00:00
Nathan Whitehorn	1c96bdd146	Add support for 64-bit PowerPC CPUs operating in the 64-bit bridge mode provided, for example, on the PowerPC 970 (G5), as well as on related CPUs like the POWER3 and POWER4. This also adds support for various built-in hardware found on Apple G5 hardware (e.g. the IBM CPC925 northbridge). Reviewed by: grehan	2009-04-04 00:22:44 +00:00
Nathan Whitehorn	a130b35f13	Change the PVO zone for fictitious pages to the unmanaged PVO zone, to match the unmanaged flag set in the PVO attributes. Without doing this, pmap_remove() could try to remove fictitious pages (like those created by mmap of physical memory) from the wrong UMA zone, causing a panic. Reported by: Justin Hibbits MFC after: 1 week	2009-03-11 03:19:19 +00:00
Nathan Whitehorn	539fe40650	Fix comment: we write the trap vector to SPRG3, not SPRG0.	2009-02-23 19:31:48 +00:00
Nathan Whitehorn	1ac37bcb77	Add Altivec support for supported CPUs. This is derived from the FPU support code, and also reducing the size of trapcode to fit inside a 32 byte handler slot. Reviewed by: grehan MFC after: 2 weeks	2009-02-20 17:48:40 +00:00
Nathan Whitehorn	91416fb268	Modularize the Open Firmware client interface to allow run-time switching of OFW access semantics, in order to allow future support for real-mode OF access and flattened device frees. OF client interface modules are implemented using KOBJ, in a similar way to the PPC PMAP modules. Because we need Open Firmware to be available before mutexes can be used on sparc64, changes are also included to allow KOBJ to be used very early in the boot process by only using the mutex once we know it has been initialized. Reviewed by: marius, grehan	2008-12-20 00:33:10 +00:00
Marcel Moolenaar	dc9d16844c	Add support for kernel profiling for both AIM and BookE. Obtained from: Juniper Networks, Inc (BookE support).	2008-10-27 02:36:03 +00:00
Nathan Whitehorn	51d163d3e9	Convert PowerPC AIM PCI and nexus busses to standard OFW bus interface. This simplifies certain device attachments (Kauai ATA, for instance), and makes possible others on new hardware. On G5 systems, there are several otherwise standard PCI devices (Serverworks SATA) that will not allow their interrupt properties to be written, so this information must be supplied directly from Open Firmware. Obtained from: sparc64	2008-10-14 14:54:14 +00:00

... 2 3 4 5 6 ...

842 Commits