freebsd-skq

Author	SHA1	Message	Date
Konstantin Belousov	ee75e7de7b	Implement the concept of the unmapped VMIO buffers, i.e. buffers which do not map the b_pages pages into buffer_map KVA. The use of the unmapped buffers eliminate the need to perform TLB shootdown for mapping on the buffer creation and reuse, greatly reducing the amount of IPIs for shootdown on big-SMP machines and eliminating up to 25-30% of the system time on i/o intensive workloads. The unmapped buffer should be explicitely requested by the GB_UNMAPPED flag by the consumer. For unmapped buffer, no KVA reservation is performed at all. The consumer might request unmapped buffer which does have a KVA reserve, to manually map it without recursing into buffer cache and blocking, with the GB_KVAALLOC flag. When the mapped buffer is requested and unmapped buffer already exists, the cache performs an upgrade, possibly reusing the KVA reservation. Unmapped buffer is translated into unmapped bio in g_vfs_strategy(). Unmapped bio carry a pointer to the vm_page_t array, offset and length instead of the data pointer. The provider which processes the bio should explicitely specify a readiness to accept unmapped bio, otherwise g_down geom thread performs the transient upgrade of the bio request by mapping the pages into the new bio_transient_map KVA submap. The bio_transient_map submap claims up to 10% of the buffer map, and the total buffer_map + bio_transient_map KVA usage stays the same. Still, it could be manually tuned by kern.bio_transient_maxcnt tunable, in the units of the transient mappings. Eventually, the bio_transient_map could be removed after all geom classes and drivers can accept unmapped i/o requests. Unmapped support can be turned off by the vfs.unmapped_buf_allowed tunable, disabling which makes the buffer (or cluster) creation requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped buffers are only enabled by default on the architectures where pmap_copy_page() was implemented and tested. In the rework, filesystem metadata is not the subject to maxbufspace limit anymore. Since the metadata buffers are always mapped, the buffers still have to fit into the buffer map, which provides a reasonable (but practically unreachable) upper bound on it. The non-metadata buffer allocations, both mapped and unmapped, is accounted against maxbufspace, as before. Effectively, this means that the maxbufspace is forced on mapped and unmapped buffers separately. The pre-patch bufspace limiting code did not worked, because buffer_map fragmentation does not allow the limit to be reached. By Jeff Roberson request, the getnewbuf() function was split into smaller single-purpose functions. Sponsored by: The FreeBSD Foundation Discussed with: jeff (previous version) Tested by: pho, scottl (previous version), jhb, bf MFC after: 2 weeks	2013-03-19 14:13:12 +00:00
Justin Hibbits	80a5635c8b	Add FBT for PowerPC DTrace. Also, clean up the DTrace assembly code, much of which is not necessary for PowerPC. The FBT module can likely be factored into 3 separate files: common, intel, and powerpc, rather than duplicating most of the code between the x86 and PowerPC flavors. All DTrace modules for PowerPC will be MFC'd together once Fasttrap is completed.	2013-03-18 05:30:18 +00:00
Konstantin Belousov	e8a4a618cf	Add pmap function pmap_copy_pages(), which copies the content of the pages around, taking array of vm_page_t both for source and destination. Starting offsets and total transfer size are specified. The function implements optimal algorithm for copying using the platform-specific optimizations. For instance, on the architectures were the direct map is available, no transient mappings are created, for i386 the per-cpu ephemeral page frame is used. The code was typically borrowed from the pmap_copy_page() for the same architecture. Only i386/amd64, powerpc aim and arm/arm-v6 implementations were tested at the time of commit. High-level code, not committed yet to the tree, ensures that the use of the function is only allowed after explicit enablement. For sparc64, the existing code has known issues and a stab is added instead, to allow the kernel linking. Sponsored by: The FreeBSD Foundation Tested by: pho (i386, amd64), scottl (amd64), ian (arm and arm-v6) MFC after: 2 weeks	2013-03-14 20:18:12 +00:00
Attilio Rao	89f6b8632c	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho	2013-03-09 02:32:23 +00:00
Alexander Motin	fdc5dd2d2f	MFcalloutng: Switch eventtimers(9) from using struct bintime to sbintime_t. Even before this not a single driver really supported full dynamic range of struct bintime even in theory, not speaking about practical inexpediency. This change legitimates the status quo and cleans up the code.	2013-02-28 13:46:03 +00:00
Attilio Rao	dc1558d1cd	Merge from vmobj-rwlock: VM_OBJECT_LOCKED() macro is only used to implement a custom version of lock assertions right now (which likely spread out thanks to copy and paste). Remove it and implement actual assertions. Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho	2013-02-27 18:12:13 +00:00
Attilio Rao	590f9303e5	Merge from vmobj-rwlock branch: Remove unused inclusion of vm/vm_pager.h and vm/vnode_pager.h. Sponsored by: EMC / Isilon storage division Tested by: pho Reviewed by: alc	2013-02-26 01:00:11 +00:00
Adrian Chadd	aef8ef5168	Setup BAT0 and BAT1 on the Wii. This is the missing piece for FreeBSD/Wii, but there's still a lot of work ahead. We have to reset the MMU in locore before continuing the boot process because we don't know how the boot loaders might have setup the BATs. We also disable the PCI BAT because there's no PCI bus on the Wii. Thanks to Nathan Whitehorn and Peter Grenhan for their help. Submitted by: Margarida Gouveia	2012-11-21 08:04:21 +00:00
Konstantin Belousov	b32ecf44bc	Flip the semantic of M_NOWAIT to only require the allocation to not sleep, and perform the page allocations with VM_ALLOC_SYSTEM class. Previously, the allocation was also allowed to completely drain the reserve of the free pages, being translated to VM_ALLOC_INTERRUPT request class for vm_page_alloc() and similar functions. Allow the caller of malloc* to request the 'deep drain' semantic by providing M_USE_RESERVE flag, now translated to VM_ALLOC_INTERRUPT class. Previously, it resulted in less aggressive VM_ALLOC_SYSTEM allocation class. Centralize the translation of the M_* malloc(9) flags in the single inline function malloc2vm_flags(). Discussion started by: "Sears, Steven" <Steven.Sears@netapp.com> Reviewed by: alc, mdf (previous version) Tested by: pho (previous version) MFC after: 2 weeks	2012-11-14 20:01:40 +00:00
Justin Hibbits	c757049235	Implement DTrace for PowerPC. This includes both 32-bit and 64-bit. There is one known issue: Some probes will display an error message along the lines of: "Invalid address (0)" I tested this with both a simple dtrace probe and dtruss on a few different binaries on 32-bit. I only compiled 64-bit, did not run it, but I don't expect problems without the modules loaded. Volunteers are welcome. MFC after: 1 month	2012-11-07 23:45:09 +00:00
Attilio Rao	cfedf924d3	Rework the known rwlock to benefit about staying on their own cache line in order to avoid manual frobbing but using struct rwlock_padalign. Reviewed by: alc, jimharris	2012-11-03 23:03:14 +00:00
Alan Cox	e4b8a2fc5a	Eliminate a stale comment. It describes another use case for the pmap in Mach that doesn't exist in FreeBSD.	2012-09-28 05:30:59 +00:00
Attilio Rao	324e57150d	userret() already checks for td_locks when INVARIANTS is enabled, so there is no need to check if Giant is acquired after it. Reviewed by: kib MFC after: 1 week	2012-09-08 18:27:11 +00:00
Rui Paulo	800bd92190	Unbreak tinderbox.	2012-08-25 17:15:33 +00:00
Rui Paulo	794bad6548	Set mdp only under #ifdef WII.	2012-08-25 00:47:55 +00:00
Adrian Chadd	2467c62fc6	On Nintendo Wii CPUs, the mdp value will be garbage. Set it to NULL so as to not confuse things. Submitted by: Margarida Gouveia	2012-08-21 06:34:21 +00:00
Alan Cox	8d9e6d9f93	Avoid recursion on the pvh global lock in the aim oea pmap. Correct the return type of the pmap_ts_referenced() implementations. Reported by: jhibbits [1] Tested by: andreast	2012-07-10 22:10:21 +00:00
Alan Cox	3653f5cbcb	Replace all uses of the vm page queues lock by a r/w lock that is private to this pmap. Tested by: andreast, jhibbits	2012-07-06 02:18:49 +00:00
Rui Paulo	0381f4905f	The `end' symbol doesn't match the end of the kernel image because it's relative to the start address (unless the start address is 0, which is not the case). This is currently not a problem because all powerpc architectures are using loader(8) which passes metadata to the kernel including the correct `endkernel' address. If we don't use loader(8), register 4 and 5 will have the size of the kernel ELF file, not its end address. We fix that simply by adding `kernel_text' to `end' to compute `endkernel'. Discussed with: nathanw	2012-06-29 01:55:20 +00:00
Rafal Jaworowski	d7c8c7fdfb	Fix physical address type to vm_paddr_t also for powerpc64.	2012-05-25 18:17:26 +00:00
Rafal Jaworowski	20b7961267	Fix physical address type to vm_paddr_t.	2012-05-24 21:13:24 +00:00
Nathan Whitehorn	ccc4a5c761	Replace the list of PVOs owned by each PMAP with an RB tree. This simplifies range operations like pmap_remove() and pmap_protect() as well as allowing simple operations like pmap_extract() not to involve any global state. This substantially reduces lock coverages for the global table lock and improves concurrency.	2012-05-20 14:33:28 +00:00
Nathan Whitehorn	bc96dccc69	Fix final bugs in memory barriers on PowerPC: - Use isync/lwsync unconditionally for acquire/release. Use of isync guarantees a complete memory barrier, which is important for serialization of bus space accesses with mutexes on multi-processor systems. - Go back to using sync as the I/O memory barrier, which solves the same problem as above with respect to mutex release using lwsync, while not penalizing non-I/O operations like a return to sync on the atomic release operations would. - Place an acquisition barrier around thread lock acquisition in cpu_switchin().	2012-05-04 16:00:22 +00:00
Nathan Whitehorn	284ea61312	Fix build on 32-bit systems.	2012-04-28 14:42:49 +00:00
Nathan Whitehorn	50e13823c8	After switching mutexes to use lwsync, they no longer provide sufficient guarantees on acquire for the tlbie mutex. Conversely, the TLB invalidation sequence provides guarantees that do not need to be redundantly applied on release. Roll a small custom lock that is just right. Simultaneously, convert the SLB tree changes back to lwsync, as changing them to sync was a misdiagnosis of the tlbie barrier problem this commit actually fixes.	2012-04-28 00:12:23 +00:00
Nathan Whitehorn	8387bb0c78	Revert r234581 for this file. The lockless SLB tree code does in fact need a heavyweight sync instead of a lightweight sync to function properly. Thanks to mdf for the clarification.	2012-04-24 13:36:41 +00:00
Nathan Whitehorn	6f26a88999	Use lwsync to provide memory barriers on systems that support it instead of sync (lwsync is an alternate encoding of sync on systems that do not support it, providing graceful fallback). This provides more than an order of magnitude reduction in the time required to acquire or release a mutex. MFC after: 2 months	2012-04-22 19:00:51 +00:00
Nathan Whitehorn	0b852c03eb	Avoid a lock order reversal in pmap_extract_and_hold() from relocking the page. This PMAP requires an additional lock besides the PMAP lock in pmap_extract_and_hold(), which vm_page_pa_tryrelock() did not release. Suggested by: kib MFC after: 4 days	2012-04-22 17:58:30 +00:00
Nathan Whitehorn	c13aac3896	Make sure all pending operations have completed on the existing thread before (potentially) migrating it to a different CPU. MFC after: 5 days	2012-04-20 23:01:36 +00:00
Nathan Whitehorn	e3c2930d36	We don't need kcopy() in any of the remaining places it is used, so remove it. MFC after: 2 weeks	2012-04-11 22:23:50 +00:00
Nathan Whitehorn	b6aeb1ab97	Only manipulate the PGA_EXECUTABLE flag on managed pages. This is a proxy for whether the page is physical. On dense phys mem systems (32-bit), VM_PHYS_TO_PAGE will not return NULL for device memory pages if device memory is above physical memory even if there is no allocated vm_page. Attempting to use the returned page could then cause either memory corruption or a page fault.	2012-04-11 21:56:55 +00:00
Nathan Whitehorn	805bee55eb	Fix error in r233949. Synchronizing icaches on uncacheable pages turns out not to be a good idea, and of course the PV entry list for a page is never empty after the page has been mapped.	2012-04-11 20:28:05 +00:00
Nathan Whitehorn	b7d0d1fabf	Execute an initial ptesync if and only if the PTE is actually being invalidated, as opposed to a ref/changed bit update.	2012-04-06 22:33:13 +00:00
Nathan Whitehorn	348bc07000	Substantially reduce the scope of the locks held in pmap_enter(), which improves concurrency slightly.	2012-04-06 18:18:48 +00:00
Nathan Whitehorn	57bd5cce62	Reduce the frequency that the PowerPC/AIM pmaps invalidate instruction caches, by invalidating kernel icaches only when needed and not flushing user caches for shared pages. Suggested by: kib MFC after: 2 weeks	2012-04-06 16:03:38 +00:00
Nathan Whitehorn	7e55df27cb	More PMAP performance improvements: skip 256 MB segments entirely if they are are not mapped during ranged operations and reduce the scope of the tlbie lock only to the actual tlbie instruction instead of the entire sequence. There are a few more optimization possibilities here as well.	2012-03-28 17:25:29 +00:00
Nathan Whitehorn	a3e9e259b3	Make sure to call vm_page_dirty() before the pmap lock is released to prevent a race where another process could conclude the page was clean. Submitted by: alc	2012-03-27 01:26:00 +00:00
Nathan Whitehorn	5afcb4c91e	More PMAP concurrency improvements: replace the table lock and (almost) all uses of the page queues mutex with a new rwlock that protects the page table and the PV lists. This reduces system time during a parallel buildworld by 35%. Reviewed by: alc	2012-03-27 01:24:18 +00:00
Nathan Whitehorn	e71dfa7b84	More PMAP performance improvements: on powerpc64, when TLBIE can be run with exceptions enabled, leave them enabled and use a regular mutex to guard TLB invalidations instead of a spinlock.	2012-03-25 06:01:34 +00:00
Nathan Whitehorn	d456d3e31f	Only call vm_page_dirty() on pages that are writable in order not to confuse the VM.	2012-03-24 22:32:19 +00:00
Nathan Whitehorn	8e7c7ea2ea	Following suggestions from alc, skip wired mappings in pmap_remove_pages() and remove moea64_attr_*() in favor of direct calls to vm_page_dirty() and friends.	2012-03-24 19:59:14 +00:00
Nathan Whitehorn	07b638a98e	Remove acquisition of VM page queues lock from pmap_protect(). Any actual manipulation of the pvo_vlink and pvo_olink entries is already protected by the table lock, so most remaining instances of the acquisition of the page queues lock can likely be replaced with the table lock, or removed if the table lock is already held. Reviewed by: alc	2012-03-18 13:22:42 +00:00
Nathan Whitehorn	cd907a68aa	Implement pmap_remove_pages(). This will be added later to the 32-bit MMU module. Suggested by: alc	2012-03-15 22:50:48 +00:00
Nathan Whitehorn	246e44956e	Improve algorithm for deciding whether to loop through all process pages or look them up individually in pmap_remove() and apply the same logic in the other ranged operation (pmap_protect). This speeds up make installworld by a factor of 2 on powerpc64. MFC after: 1 week	2012-03-15 19:36:52 +00:00
Nathan Whitehorn	cbfa304088	Use LIST_FOREACH_SAFE() instead of LIST_FOREACH() in pmap_remove(), since the point of this loop is to remove elements. This worked by accident before. MFC after: 2 days	2012-03-14 20:19:49 +00:00
Andreas Tobler	179e996c9f	Revert the _NOPROF entries on cpu_throw, cpu_switch and savectx. They can be profiled too now. MFC after: 2 weeks	2012-02-05 15:59:18 +00:00
Konstantin Belousov	75ce221fa1	Fix build for the case of powerpc64 kernel without COMPAT_FREEBSD32. MFC after: 2 months	2012-01-30 19:31:17 +00:00
Konstantin Belousov	62c625fdd2	Finally, try to enable the nxstacks on amd64 and powerpc64 for both 64bit and 32bit ABIs. Also try to enable nxstacks for PAE/i386 when supported, and some variants of powerpc32. MFC after: 2 months (if ever)	2012-01-30 07:56:00 +00:00
Andreas Tobler	9eab2f146a	This commit adds profiling support for powerpc64. Now we can do application profiling and kernel profiling. To enable kernel profiling one has to build kgmon(8). I will enable the build once I managed to build and test powerpc (32-bit) kernels with profiling support. - add a powerpc64 PROF_PROLOGUE for _mcount. - add macros to avoid adding the PROF_PROLOGUE in certain assembly entries. - apply these macros where needed. - add size information to the MCOUNT function. MFC after: 3 weeks, together with r230291	2012-01-20 22:34:19 +00:00
Nathan Whitehorn	ae09ab8f63	Rework SLB trap handling so that double-faults into an SLB trap handler are possible, and double faults within an SLB trap handler are not. The result is that it possible to take an SLB fault at any time, on any address, for any reason, at any point in the kernel. This lets us do two important things. First, it removes the (soft) 16 GB RAM ceiling on PPC64 as well as any architectural limitations on KVA space. Second, it lets the kernel tolerate poorly designed hypervisors that have a tendency to fail to restore the SLB properly after a hypervisor context switch. MFC after: 6 weeks	2012-01-15 00:08:14 +00:00

1 2 3 4 5 ...

728 Commits