freebsd-dev

Author	SHA1	Message	Date
Konstantin Belousov	a546448b8d	Rewrite amd64 PCID implementation to follow an algorithm described in the Vahalia' "Unix Internals" section 15.12 "Other TLB Consistency Algorithms". The same algorithm is already utilized by the MIPS pmap to handle ASIDs. The PCID for the address space is now allocated per-cpu during context switch to the thread using pmap, when no PCID on the cpu was ever allocated, or the current PCID is invalidated. If the PCID is reused, bit 63 of %cr3 can be set to avoid TLB flush. Each cpu has PCID' algorithm generation count, which is saved in the pmap pcpu block when pcpu PCID is allocated. On invalidation, the pmap generation count is zeroed, which signals the context switch code that already allocated PCID is no longer valid. The implication is the TLB shootdown for the given cpu/address space, due to the allocation of new PCID. The pm_save mask is no longer has to be tracked, which (significantly) reduces the targets of the TLB shootdown IPIs. Previously, pm_save was reset only on pmap_invalidate_all(), which made it accumulate the cpuids of all processors on which the thread was scheduled between full TLB shootdowns. Besides reducing the amount of TLB shootdowns and removing atomics to update pm_saves in the context switch code, the algorithm is much simpler than the maintanence of pm_save and selection of the right address space in the shootdown IPI handler. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2015-05-09 19:11:01 +00:00
Roger Pau Monné	927dc0e02a	amd64: make uiomove_fromphys functional for pages not mapped by the DMAP Place the code introduced in r268660 into a separate function that can be called from uiomove_fromphys. Instead of pre-allocating two KVA pages use vmem_alloc to allocate them on demand when needed. This prevents blocking if a page fault is taken while physical addresses from outside the DMAP are used, since the lock is now removed. Also introduce a safety catch in PHYS_TO_DMAP and DMAP_TO_PHYS. Sponsored by: Citrix Systems R&D Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D947 amd64/amd64/pmap.c: - Factor out the code to deal with non DMAP addresses from pmap_copy_pages and place it in pmap_map_io_transient. - Change the code to use vmem_alloc instead of a set of pre-allocated pages. - Use pmap_qenter and don't pin the thread if there can be page faults. amd64/amd64/uio_machdep.c: - Use pmap_map_io_transient in order to correctly deal with physical addresses not covered by the DMAP. amd64/include/pmap.h: - Add the prototypes for the new functions. amd64/include/vmparam.h: - Add safety catches to make sure PHYS_TO_DMAP and DMAP_TO_PHYS are only used with addresses covered by the DMAP.	2014-10-24 09:48:58 +00:00
Konstantin Belousov	07a92f34d6	Add an argument to the x86 pmap_invalidate_cache_range() to request forced invalidation of the cache range regardless of the presence of self-snoop feature. Some recent Intel GPUs in some modes are not coherent, and dirty lines in CPU cache must be flushed before the pages are transferred to GPU domain. Reviewed by: alc (previous version) Tested by: pho (amd64) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-10-08 16:48:03 +00:00
Konstantin Belousov	1fb1da0366	Add change forgotten in r263475. Make dmaplimit accessible outside amd64/pmap.c. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-03-21 17:17:19 +00:00
Neel Natu	5515bb73e6	Re-arrange bits in the amd64/pmap 'pm_flags' field. The least significant 8 bits of 'pm_flags' are now used for the IPI vector to use for nested page table TLB shootdown. Previously we used IPI_AST to interrupt the host cpu which is functionally correct but could lead to misleading interrupt counts for AST handler. The AST handler was also doing a lot more than what is required for the nested page table TLB shootdown (EOI and IRET).	2013-12-20 05:50:22 +00:00
Neel Natu	318224bbe6	Merge projects/bhyve_npt_pmap into head. Make the amd64/pmap code aware of nested page table mappings used by bhyve guests. This allows bhyve to associate each guest with its own vmspace and deal with nested page faults in the context of that vmspace. This also enables features like accessed/dirty bit tracking, swapping to disk and transparent superpage promotions of guest memory. Guest vmspace: Each bhyve guest has a unique vmspace to represent the physical memory allocated to the guest. Each memory segment allocated by the guest is mapped into the guest's address space via the 'vmspace->vm_map' and is backed by an object of type OBJT_DEFAULT. pmap types: The amd64/pmap now understands two types of pmaps: PT_X86 and PT_EPT. The PT_X86 pmap type is used by the vmspace associated with the host kernel as well as user processes executing on the host. The PT_EPT pmap is used by the vmspace associated with a bhyve guest. Page Table Entries: The EPT page table entries as mostly similar in functionality to regular page table entries although there are some differences in terms of what bits are used to express that functionality. For e.g. the dirty bit is represented by bit 9 in the nested PTE as opposed to bit 6 in the regular x86 PTE. Therefore the bitmask representing the dirty bit is now computed at runtime based on the type of the pmap. Thus PG_M that was previously a macro now becomes a local variable that is initialized at runtime using 'pmap_modified_bit(pmap)'. An additional wrinkle associated with EPT mappings is that older Intel processors don't have hardware support for tracking accessed/dirty bits in the PTE. This means that the amd64/pmap code needs to emulate these bits to provide proper accounting to the VM subsystem. This is achieved by using the following mapping for EPT entries that need emulation of A/D bits: Bit Position Interpreted By PG_V 52 software (accessed bit emulation handler) PG_RW 53 software (dirty bit emulation handler) PG_A 0 hardware (aka EPT_PG_RD) PG_M 1 hardware (aka EPT_PG_WR) The idea to use the mapping listed above for A/D bit emulation came from Alan Cox (alc@). The final difference with respect to x86 PTEs is that some EPT implementations do not support superpage mappings. This is recorded in the 'pm_flags' field of the pmap. TLB invalidation: The amd64/pmap code has a number of ways to do invalidation of mappings that may be cached in the TLB: single page, multiple pages in a range or the entire TLB. All of these funnel into a single EPT invalidation routine called 'pmap_invalidate_ept()'. This routine bumps up the EPT generation number and sends an IPI to the host cpus that are executing the guest's vcpus. On a subsequent entry into the guest it will detect that the EPT has changed and invalidate the mappings from the TLB. Guest memory access: Since the guest memory is no longer wired we need to hold the host physical page that backs the guest physical page before we can access it. The helper functions 'vm_gpa_hold()/vm_gpa_release()' are available for this purpose. PCI passthru: Guest's with PCI passthru devices will wire the entire guest physical address space. The MMIO BAR associated with the passthru device is backed by a vm_object of type OBJT_SG. An IOMMU domain is created only for guest's that have one or more PCI passthru devices attached to them. Limitations: There isn't a way to map a guest physical page without execute permissions. This is because the amd64/pmap code interprets the guest physical mappings as user mappings since they are numerically below VM_MAXUSER_ADDRESS. Since PG_U shares the same bit position as EPT_PG_EXECUTE all guest mappings become automatically executable. Thanks to Alan Cox and Konstantin Belousov for their rigorous code reviews as well as their support and encouragement. Thanks for John Baldwin for reviewing the use of OBJT_SG as the backing object for pci passthru mmio regions. Special thanks to Peter Holm for testing the patch on short notice. Approved by: re Discussed with: grehan Reviewed by: alc, kib Tested by: pho	2013-10-05 21:22:35 +00:00
Neel Natu	74d1d2b7cc	Merge the following changes from projects/bhyve_npt_pmap: - add fields to 'struct pmap' that are required to manage nested page tables. - add a parameter to 'vmspace_alloc()' that can be used to override the default pmap initialization routine 'pmap_pinit()'. These changes are pushed ahead of the remaining changes in 'bhyve_npt_pmap' in anticipation of the upcoming KBI freeze for 10.0. Reviewed by: kib@, alc@ Approved by: re (glebius)	2013-09-20 17:06:49 +00:00
Konstantin Belousov	6aceaa3e17	Tidy up some loose ends in the PCID code: - Restore the pre-PCID TLB shootdown handlers for whole address space and single page invalidation asm code, and assign the IPI handler to them when PCID is not supported or disabled. Old handlers have linear control flow. But, still use the common return sequence. - Stop using pcpu for INVPCID descriptors in the invlrg handler. It is enough to allocate descriptors on the stack. As result, two SWAPGS instructions are shaved off from the code for Haswell+. - Fix the reverted condition in invlrng for checking of the PCID support [1], also in invlrng check that pmap is kernel pmap before performing other tests. For the kernel pmap, which provides global mappings, the INVLPG must be used for invalidation always. - Save the pre-computed pmap' %CR3 register in the struct pmap. This allows to remove several checks for pm_pcid validity when %CR3 is reloaded [2]. Noted by: gibbs [1] Discussed with: alc [2] Tested by: pho, flo Sponsored by: The FreeBSD Foundation	2013-09-04 23:31:29 +00:00
Konstantin Belousov	37eed8419c	Implement support for the process-context identifiers ('PCID') on Intel CPUs. The feature tags TLB entries with the Id of the address space and allows to avoid TLB invalidation on the context switch, it is available only in the long mode. In the microbenchmarks, using the PCID decreased latency of the context switches by ~30% on SandyBridge class desktop CPUs, measured with the lat_ctx program from lmbench. If available, use INVPCID instruction when a TLB entry in non-current address space needs to be invalidated. The instruction is typically available on the Haswell. If needed, the use of PCID can be turned off with the vm.pmap.pcid_enabled loader tunable set to 0. The state of the feature is reported by the vm.pmap.pcid_enabled sysctl. The sysctl vm.pmap.pcid_save_cnt reports the number of context switches which avoided invalidating the TLB; compare with the total number of context switches, available as sysctl vm.stats.sys.v_swtch. Sponsored by: The FreeBSD Foundation Reviewed by: alc Tested by: pho, bf	2013-08-30 07:59:49 +00:00
Jung-uk Kim	1533b9f714	Reimplement atomic operations on PDEs and PTEs in pmap.h. This change significantly reduces duplicate code and make it easier to read. Reviewed by: alc, bde	2013-08-21 22:40:29 +00:00
Neel Natu	0ef2ab3ab8	Bump up the maximum addressable memory on amd64 systems from 1TB to 4TB. Bump up the KVA size proportionally from 512GB to 2TB. The number of page table pages used by the direct map is now calculated at run time based on 'Maxmem'. This means the small memory systems will not see any additional tax in terms of page table pages for the direct map. However all amd64 systems, regardless of the memory size, will use 3 more pages to accomodate the bump in the KVA size. More details available here: http://lists.freebsd.org/pipermail/freebsd-hackers/2013-June/043015.html http://lists.freebsd.org/pipermail/freebsd-current/2013-July/043143.html Tested with the following configurations: - Sandybridge server with 64GB of memory. - bhyve VM with 64MB of memory. - bhyve VM with a 8GB of memory with the memory segment above 4GB cuddling right up against the 4TB maximum memory limit. Discussed on: hackers@, current@ Submitted by: Chris Torek (torek@torek.net)	2013-08-17 19:49:08 +00:00
Konstantin Belousov	872d995f76	Change the pmap_ts_referenced() method of amd64 pmap to use shared pvh_global_lock. This allows the method to be executed in parallel, avoiding undue contention on the pvh_global_lock for the multithreaded pagedaemon. The pmap_ts_referenced() function has to inspect the page mappings for several pmaps, which need to be locked while pv list lock is owned. This contradicts to the lock order, where pmap lock is before pv list lock. Introduce the generation count for the pv list of the page or superpage, which indicate any change in the pv list, and, as usual, perform restart of the iteration if generation changed while pv lock was dropped for blocking acquire of a pmap lock. Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation	2013-08-07 16:33:15 +00:00
Neel Natu	8f1664b724	Remove unused macros PTESHIFT, PDESHIFT, PDPESHIFT and PML4ESHIFT. Reviewed by: alc	2013-06-14 00:03:43 +00:00
Attilio Rao	774d251d99	Sync back vmcontention branch into HEAD: Replace the per-object resident and cached pages splay tree with a path-compressed multi-digit radix trie. Along with this, switch also the x86-specific handling of idle page tables to using the radix trie. This change is supposed to do the following: - Allowing the acquisition of read locking for lookup operations of the resident/cached pages collections as the per-vm_page_t splay iterators are now removed. - Increase the scalability of the operations on the page collections. The radix trie does rely on the consumers locking to ensure atomicity of its operations. In order to avoid deadlocks the bisection nodes are pre-allocated in the UMA zone. This can be done safely because the algorithm needs at maximum one new node per insert which means the maximum number of the desired nodes is the number of available physical frames themselves. However, not all the times a new bisection node is really needed. The radix trie implements path-compression because UFS indirect blocks can lead to several objects with a very sparse trie, increasing the number of levels to usually scan. It also helps in the nodes pre-fetching by introducing the single node per-insert property. This code is not generalized (yet) because of the possible loss of performance by having much of the sizes in play configurable. However, efforts to make this code more general and then reusable in further different consumers might be really done. The only KPI change is the removal of the function vm_page_splay() which is now reaped. The only KBI change, instead, is the removal of the left/right iterators from struct vm_page, which are now reaped. Further technical notes broken into mealpieces can be retrieved from the svn branch: http://svn.freebsd.org/base/user/attilio/vmcontention/ Sponsored by: EMC / Isilon storage division In collaboration with: alc, jeff Tested by: flo, pho, jhb, davide Tested by: ian (arm) Tested by: andreast (powerpc)	2013-03-18 00:25:02 +00:00
Attilio Rao	b38d37f7b5	Merge from vmc-playground branch: Rename the pv_entry_t iterator from pv_list to pv_next. Besides being more correct technically (as the name seems to suggest this is a list while it is an iterator), it will also be needed by vm_radix work to avoid a nameclash on macro expansions. Sponsored by: EMC / Isilon storage division Reviewed by: alc, jeff Tested by: flo, pho, jhb, davide	2013-03-02 14:19:08 +00:00
Neel Natu	6d62a48f47	Compute the number of initial kernel page table pages (NKPT) dynamically. This eliminates the need to recompile the kernel when the default value of NKPT is not big enough - for e.g. when loading large kernel modules or memory disk images from the loader. If NKPT is defined in the kernel configuration file then it overrides the dynamic calculation. Reviewed by: alc, kib	2013-02-06 04:53:00 +00:00
Alan Cox	6031c68de4	The page flag PGA_WRITEABLE is set and cleared exclusively by the pmap layer, but it is read directly by the MI VM layer. This change introduces pmap_page_is_write_mapped() in order to completely encapsulate all direct access to PGA_WRITEABLE in the pmap layer. Aesthetics aside, I am making this change because amd64 will likely begin using an alternative method to track write mappings, and having pmap_page_is_write_mapped() in place allows me to make such a change without further modification to the MI VM layer. As an added bonus, tidy up some nearby comments concerning page flags. Reviewed by: kib MFC after: 6 weeks	2012-06-16 18:56:19 +00:00
Alan Cox	246d751711	Rename pmap_collect() to pmap_pv_reclaim() and rewrite it such that it no longer uses the active and inactive paging queues. Instead, the pmap now maintains an LRU-ordered list of pv entry pages, and pmap_pv_reclaim() uses this list to select pv entries for reclamation. Note: The old pmap_collect() tried to avoid reclaiming mappings for pages that have either a hold_count or a busy field that is non-zero. However, this isn't necessary for correctness, and the locking in pmap_collect() was insufficient to guarantee that such mappings weren't reclaimed. The new pmap_pv_reclaim() doesn't even try. Reviewed by: kib MFC after: 6 weeks	2012-05-18 05:36:04 +00:00
Attilio Rao	71a19bdc64	Commit the support for removing cpumask_t and replacing it directly with cpuset_t objects. That is going to offer the underlying support for a simple bump of MAXCPU and then support for number of cpus > 32 (as it is today). Right now, cpumask_t is an int, 32 bits on all our supported architecture. cpumask_t on the other side is implemented as an array of longs, and easilly extendible by definition. The architectures touched by this commit are the following: - amd64 - i386 - pc98 - arm - ia64 - XEN while the others are still missing. Userland is believed to be fully converted with the changes contained here. Some technical notes: - This commit may be considered an ABI nop for all the architectures different from amd64 and ia64 (and sparc64 in the future) - per-cpu members, which are now converted to cpuset_t, needs to be accessed avoiding migration, because the size of cpuset_t should be considered unknown - size of cpuset_t objects is different from kernel and userland (this is primirally done in order to leave some more space in userland to cope with KBI extensions). If you need to access kernel cpuset_t from the userland please refer to example in this patch on how to do that correctly (kgdb may be a good source, for example). - Support for other architectures is going to be added soon - Only MAXCPU for amd64 is bumped now The patch has been tested by sbruno and Nicholas Esborn on opteron 4 x 12 pack CPUs. More testing on big SMP is expected to came soon. pluknet tested the patch with his 8-ways on both amd64 and i386. Tested by: pluknet, sbruno, gianni, Nicholas Esborn Reviewed by: jeff, jhb, sbruno	2011-05-05 14:39:14 +00:00
Konstantin Belousov	3136faa59d	Make pmap_invalidate_cache_range() available for consumption on amd64. Add pmap_invalidate_cache_pages() method on x86. It flushes the CPU cache for the set of pages, which are not neccessary mapped. Since its supposed use is to prepare the move of the pages ownership to a device that does not snoop all CPU accesses to the main memory (read GPU in GMCH), do not rely on CPU self-snoop feature. amd64 implementation takes advantage of the direct map. On i386, extract the helper pmap_flush_page() from pmap_page_set_memattr(), and use it to make a temporary mapping of the flushed page. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2011-04-18 21:24:42 +00:00
Alan Cox	1587dfd730	Move an external declaration to the appropriate header file.	2011-03-26 06:21:05 +00:00
Alan Cox	e6ffa21488	Remove pmap fields that are either unused or not fully implemented. Discussed with: kib	2011-02-17 15:36:29 +00:00
Alan Cox	686b00d691	Make the size of the direct map easily configurable. Changing NDMPML4E now suffices. Increase the size of the direct map to 1TB. An earler version of this patch was tested by sbruno@.	2010-11-26 19:36:26 +00:00
Alan Cox	92ababa777	[1] According to the x86 architectural specifications, no virtual-to- physical page mapping should span two or more MTRRs of different types. Add a pmap function, pmap_demote_DMAP(), by which the MTRR module can ensure that the direct map region doesn't have such a mapping. [2] Fix a couple of nearby style errors in amd64_mrset(). [3] Re-enable the use of 1GB page mappings for implementing the direct map. (See also r197580 and r213897.) Tested by: kib@ on a Westmere-family processor [3] MFC after: 3 weeks	2010-10-27 16:46:37 +00:00
Konstantin Belousov	2680dac9e1	For both i386 and amd64 pmap, - change the type of pm_active to cpumask_t, which it is; - in pmap_remove_pages(), compare with PCPU(curpmap), instead of dereferencing the long chain of pointers [1]. For amd64 pmap, remove the unneeded checks for validity of curpmap in pmap_activate(), since curpmap should be always valid after r209789. Submitted by: alc [1] Reviewed by: alc MFC after: 3 weeks	2010-07-09 20:05:56 +00:00
Kip Macy	2965a45315	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib	2010-04-30 00:46:43 +00:00
Alan Cox	3153e878dd	Add support to the virtual memory system for configuring machine- dependent memory attributes: Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the fact that there are machine-dependent memory attributes that have nothing to do with controlling the cache's behavior. Introduce vm_object_set_memattr() for setting the default memory attributes that will be given to an object's pages. Introduce and use pmap_page_{get,set}_memattr() for getting and setting a page's machine-dependent memory attributes. Add full support for these functions on amd64 and i386 and stubs for them on the other architectures. The function pmap_page_set_memattr() is also responsible for any other machine-dependent aspects of changing a page's memory attributes, such as flushing the cache or updating the direct map. The uses include kmem_alloc_contig(), vm_page_alloc(), and the device pager: kmem_alloc_contig() can now be used to allocate kernel memory with non-default memory attributes on amd64 and i386. vm_page_alloc() and the device pager will set the memory attributes for the real or fictitious page according to the object's default memory attributes. Update the various pmap functions on amd64 and i386 that map pages to incorporate each page's memory attributes in the mapping. Notes: (1) Inherent to this design are safety features that prevent the specification of inconsistent memory attributes by different mappings on amd64 and i386. In addition, the device pager provides a warning when a device driver creates a fictitious page with memory attributes that are inconsistent with the real page that the fictitious page is an alias for. (2) Storing the machine-dependent memory attributes for amd64 and i386 as a dedicated "int" in "struct md_page" represents a compromise between space efficiency and the ease of MFCing these changes to RELENG_7. In collaboration with: jhb Approved by: re (kib)	2009-07-12 23:31:20 +00:00
Alan Cox	0f6766f3da	Eliminate dead code. These definitions should have been deleted with the introduction of i686_mem.c in r45405. Merge adjacent #ifdef _KERNEL/#endif blocks.	2009-06-22 04:21:02 +00:00
Alan Cox	b4862e19af	Update stale comments. The alternate address space mapping was eliminated when PAE support was added to i386. The direct mapping exists on amd64.	2009-03-22 18:56:26 +00:00
Alan Cox	0c645b7267	In general, the kernel virtual address of the pml4 page table page that is stored in the pmap is from the direct map region. The two exceptions have been the kernel pmap and the swapper's pmap. These pmaps have used a kernel virtual address established by pmap_bootstrap() for their shared pml4 page table page. However, there is no reason not to use the direct map for these pmaps as well.	2009-03-22 04:32:05 +00:00
Alan Cox	494c177e81	Make pmap_kenter_attr() static.	2008-08-04 08:04:09 +00:00
Alan Cox	ba65f767c0	Enhance pmap_change_attr(). Specifically, avoid 2MB page demotions, cache mode changes, and cache and TLB invalidation when some or all of the specified range is already mapped with the specified cache mode. Submitted by: Magesh Dhasayyan	2008-07-31 22:45:28 +00:00
Alan Cox	8136b7265f	Eliminate pmap_growkernel()'s dependence on create_pagetables() preallocating page directory pages from VM_MIN_KERNEL_ADDRESS through the end of the kernel's bss. Specifically, the dependence was in pmap_growkernel()'s one- time initialization of kernel_vm_end, not in its main body. (I could not, however, resist the urge to optimize the main body.) Reduce the number of preallocated page directory pages to just those needed to support NKPT page table pages. (In fact, this allows me to revert a couple of my earlier changes to create_pagetables().)	2008-07-08 22:59:17 +00:00
Alan Cox	4a7c66163b	Change create_pagetables() and pmap_init() so that many fewer page table pages have to be preallocated by create_pagetables().	2008-07-06 22:36:28 +00:00
Alan Cox	13e0058451	Increase the kernel map's size to 7GB, making room for a kmem map of size greater than 4GB. (Auto-sizing will set the ceiling on the kmem map size to 4.2GB.)	2008-07-05 20:44:55 +00:00
Alan Cox	67ce249ac9	Compute NKPDPE from NKPT. This reduces the number of knobs that must be turned in order to change the size of the kernel virtual address space.	2008-06-30 02:35:55 +00:00
Alan Cox	ce3cb38836	Strictly speaking, the definition of VM_MAX_KERNEL_ADDRESS is wrong. However, in practice, the error (currently) makes no difference because the computation performed by KVADDR() hides the error. This revision fixes the error. Also, eliminate a (now) unused definition.	2008-06-29 19:13:27 +00:00
Alan Cox	f4f491d095	Increase the size of the kernel virtual address space to 6GB. Until the maximum size of the kmem map can be greater than 4GB, there is little point in making the kernel virtual address space larger than 6GB. Tested by: kris@	2008-06-29 18:35:00 +00:00
Alan Cox	0116b8b321	Add support for automatic promotion of 4KB page mappings to 2MB page mappings. Automatic promotion can be enabled by setting the tunable "vm.pmap.pg_ps_enabled" to a non-zero value. By default, automatic promotion is disabled. (Expect this to change.) Reviewed by: ups Tested by: kris, Peter Holm	2008-03-04 18:50:15 +00:00
Alan Cox	5cccf58676	Shrink the size of struct vm_page on amd64 and i386 by eliminating pv_list_count from struct md_page. Ever since Peter rewrote the pv entry allocator for amd64 and i386 pv_list_count has been correctly maintained but otherwise unused.	2008-01-06 18:51:04 +00:00
Ruslan Ermilov	3cbc967ef7	Use a different bitmask for superpages' base address so that it doesn't conflict with the PG_PDE_PAT bit. (We still don't mask off all the reserved bits but that's okay for now.) Reviewed by: alc	2006-12-05 11:31:33 +00:00
Alan Cox	da44960498	The global variable avail_end is redundant and only used once. Eliminate it. Make avail_start static to the pmap on amd64. (It no longer exists on other architectures.)	2006-11-19 20:54:58 +00:00
Ruslan Ermilov	d77f5882e7	Fix NKPT comments to match reality. Note that the current value of NKPT is no longer enough to run amd64 with 16G of RAM, as it doesn't have space for mapping a kernel (16M kernel would require additionally 8 page tables).	2006-11-13 20:33:54 +00:00
Ruslan Ermilov	26af9ac7d0	Fix a comment.	2006-11-13 06:26:57 +00:00
John Baldwin	7e9f73f3ed	First pass at allowing memory to be mapped using cache modes other than WB (write-back) on x86 via control bits in PTEs and PDEs (including making use of the PAT MSR). Changes include: - A new pmap_mapdev_attr() function for amd64 and i386 which takes an additional parameter (relative to pmap_mapdev()) specifying the cache mode for this mapping. Note that on amd64 only WB mappings are done with the direct map, all other modes result in a private mapping. - pmap_mapdev() on i386 and amd64 now defaults to using UC (uncached) mappings rather than WB. Previously we relied on the BIOS setting up MTRR's to enforce memio regions being treated as UC. This might make hw.cbb_start_memory unnecessary in some cases now for example. - A new pmap_mapbios()/pmap_unmapbios() API has been added to allow places that used pmap_mapdev() to map non-device memory (such as ACPI tables) to do so using WB as before. - A new pmap_change_attr() function for amd64 and i386 that changes the caching mode for a range of KVA. Reviewed by: alc	2006-08-11 19:22:57 +00:00
Alan Cox	f8883c0160	Define the additional page fault error codes that are implemented by amd64.	2006-08-02 16:24:23 +00:00
John Baldwin	2b8a339c7e	Add various constants for the PAT MSR and the PAT PTE and PDE flags. Initialize the PAT MSR during boot to map PAT type 2 to Write-Combining (WC) instead of Uncached (UC-). MFC after: 1 month	2006-05-01 22:07:00 +00:00
John Baldwin	4ac60df584	Add a new 'pmap_invalidate_cache()' to flush the CPU caches via the wbinvd() instruction. This includes a new IPI so that all CPU caches on all CPUs are flushed for the SMP case. MFC after: 1 month	2006-05-01 21:36:47 +00:00
Peter Wemm	68ac481184	Shrink the amd64 pv entry from 48 bytes to about 24 bytes. On a machine with large mmap files mapped into many processes, this saves hundreds of megabytes of ram. pv entries were individually allocated and had two tailq entries and two pointers (or addresses). Each pv entry was linked to a vm_page_t and a process's address space (pmap). It had the virtual address and a pointer to the pmap. This change replaces the individual allocation with a per-process allocation system. A page ("pv chunk") is allocated and this provides 168 pv entries for that process. We can now eliminate one of the 16 byte tailq entries because we can simply iterate through the pv chunks to find all the pv entries for a process. We can eliminate one of the 8 byte pointers because the location of the pv entry implies the containing pv chunk, which has the pointer. After overheads from the pv chunk bitmap and tailq linkage, this works out that each pv entry has an effective size of 24.38 bytes. Future work still required, and other problems: * when running low on pv entries or system ram, we may need to defrag the chunk pages and free any spares. The stats (vm.pmap.) show that this doesn't seem to be that much of a problem, but it can be done if needed. running low on pv entries is now a much bigger problem. The old get_pv_entry() routine just needed to reclaim one other pv entry. Now, since they are per-process, we can only use pv entries that are assigned to our current process, or by stealing an entire page worth from another process. Under normal circumstances, the pmap_collect() code should be able to dislodge some pv entries from the current process. But if needed, it can still reclaim entire pv chunk pages from other processes. * This should port to i386 really easily, except there it would reduce pv entries from 24 bytes to about 12 bytes. (I have integrated Alan's recent changes.)	2006-04-03 21:36:01 +00:00
Peter Wemm	8d0593f54e	Merge/sync with i386: various cosmetic tweaks	2006-03-14 00:01:56 +00:00

1 2 3 4

178 Commits