freebsd-skq

Author	SHA1	Message	Date
markj	1db0a22947	Speed up vm_page_array initialization. We currently initialize the vm_page array in three passes: one to zero the array, one to initialize the "order" field of each page (necessary when inserting them into the vm_phys buddy allocator one-by-one), and one to initialize the remaining non-zero fields and individually insert each page into the allocator. Merge the three passes into one following a suggestion from alc: initialize vm_page fields in a single pass, and use vm_phys_free_contig() to efficiently insert physical memory segments into the buddy allocator. This reduces the initialization time to a third or a quarter of what it was before on most systems that I tested. Reviewed by: alc, kib MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D12248	2017-09-07 21:43:39 +00:00
trasz	09059e926a	Ifdef out the unused vm_rr_selectdomain(). MFC after: 2 weeks Sponsored by: DARPA, AFRL	2017-02-02 17:44:55 +00:00
markj	fb5804c98d	Remove support for idle page zeroing. Idle page zeroing has been disabled by default on all architectures since r170816 and has some bugs that make it seemingly unusable. Specifically, the idle-priority pagezero thread exacerbates contention for the free page lock, and yields the CPU without releasing it in non-preemptive kernels. The pagezero thread also does not behave correctly when superpage reservations are enabled: its target is a function of v_free_count, which includes reserved-but-free pages, but it is only able to zero pages belonging to the physical memory allocator. Reviewed by: alc, imp, kib Differential Revision: https://reviews.freebsd.org/D7714	2016-09-03 20:38:13 +00:00
markj	d5a954945f	Initialize page busy lock state in vm_phys_add_page(). MFC after: 1 week	2016-08-13 19:48:43 +00:00
pfg	729533413f	sys: use our roundup2/rounddown2() macros when param.h is available. rounddown2 tends to produce longer lines than the original code and when the code has a high indentation level it was not really advantageous to do the replacement. This tries to strike a balance between readability using the macros and flexibility of having the expressions, so not everything is converted.	2016-04-21 19:57:40 +00:00
jhb	6beb82443a	Add more fine-grained kernel options for NUMA support. VM_NUMA_ALLOC is used to enable use of domain-aware memory allocation in the virtual memory system. DEVICE_NUMA is used to enable affinity reporting for devices such as bus_get_domain(). MAXMEMDOM must still be set to a value greater than for any NUMA support to be effective. Note that 'cpuset -gd' always works if MAXMEMDOM is enabled and the system supports NUMA. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D5782	2016-04-09 13:58:04 +00:00
alc	2c013a8511	A fix to r292469: Iterate over the physical segments in descending rather than ascending order in vm_phys_alloc_contig() so that, for example, a sequence of contigmalloc(low=0, high=4GB) calls doesn't exhaust the supply of low physical memory resulting in a later contigmalloc(low=0, high=1MB) failure. Reported by: cy Tested by: cy Sponsored by: EMC / Isilon Storage Division	2016-01-16 04:41:40 +00:00
alc	8343c406db	Introduce a new mechanism for relocating virtual pages to a new physical address and use this mechanism when: 1. kmem_alloc_{attr,contig}() can't find suitable free pages in the physical memory allocator's free page lists. This replaces the long-standing approach of scanning the inactive and inactive queues, converting clean pages into PG_CACHED pages and laundering dirty pages. In contrast, the new mechanism does not use PG_CACHED pages nor does it trigger a large number of I/O operations. 2. on 32-bit MIPS processors, uma_small_alloc() and the pmap can't find free pages in the physical memory allocator's free page lists that are covered by the direct map. Tested by: adrian 3. ttm_bo_global_init() and ttm_vm_page_alloc_dma32() can't find suitable free pages in the physical memory allocator's free page lists. In the coming months, I expect that this new mechanism will be applied in other places. For example, balloon drivers should use relocation to minimize fragmentation of the guest physical address space. Make vm_phys_alloc_contig() a little smarter (and more efficient in some cases). Specifically, use vm_phys_segs[] earlier to avoid scanning free page lists that can't possibly contain suitable pages. Reviewed by: kib, markj Glanced at: jhb Discussed with: jeff Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D4444	2015-12-19 18:42:50 +00:00
adrian	41db4b88e0	Add an initial NUMA affinity/policy configuration for threads and processes. This is based on work done by jeff@ and jhb@, as well as the numa.diff patch that has been circulating when someone asks for first-touch NUMA on -10 or -11. * Introduce a simple set of VM policy and iterator types. * tie the policy types into the vm_phys path for now, mirroring how the initial first-touch allocation work was enabled. * add syscalls to control changing thread and process defaults. * add a global NUMA VM domain policy. * implement a simple cascade policy order - if a thread policy exists, use it; if a process policy exists, use it; use the default policy. * processes inherit policies from their parent processes, threads inherit policies from their parent threads. * add a simple tool (numactl) to query and modify default thread/process policities. * add documentation for the new syscalls, for numa and for numactl. * re-enable first touch NUMA again by default, as now policies can be set in a variety of methods. This is only relevant for very specific workloads. This doesn't pretend to be a final NUMA solution. The previous defaults in -HEAD (with MAXMEMDOM set) can be achieved by 'sysctl vm.default_policy=rr'. This is only relevant if MAXMEMDOM is set to something other than 1. Ie, if you're using GENERIC or a modified kernel with non-NUMA, then this is a glorified no-op for you. Thank you to Norse Corp for giving me access to rather large (for FreeBSD!) NUMA machines in order to develop and verify this. Thank you to Dell for providing me with dual socket sandybridge and westmere v3 hardware to do NUMA development with. Thank you to Scott Long at Netflix for providing me with access to the two-socket, four-domain haswell v3 hardware. Thank you to Peter Holm for running the stress testing suite against the NUMA branch during various stages of development! Tested: * MIPS (regression testing; non-NUMA) * i386 (regression testing; non-NUMA GENERIC) * amd64 (regression testing; non-NUMA GENERIC) * westmere, 2 socket (thankyou norse!) * sandy bridge, 2 socket (thankyou dell!) * ivy bridge, 2 socket (thankyou norse!) * westmere-EX, 4 socket / 1TB RAM (thankyou norse!) * haswell, 2 socket (thankyou norse!) * haswell v3, 2 socket (thankyou dell) * haswell v3, 2x18 core (thankyou scott long / netflix!) * Peter Holm ran a stress test suite on this work and found one issue, but has not been able to verify it (it doesn't look NUMA related, and he only saw it once over many testing runs.) * I've tested bhyve instances running in fixed NUMA domains and cpusets; all seems to work correctly. Verified: * intel-pcm - pcm-numa.x and pcm-memory.x, whilst selecting different NUMA policies for processes under test. Review: This was reviewed through phabricator (https://reviews.freebsd.org/D2559) as well as privately and via emails to freebsd-arch@. The git history with specific attributes is available at https://github.com/erikarn/freebsd/ in the NUMA branch (https://github.com/erikarn/freebsd/compare/local/adrian_numa_policy). This has been reviewed by a number of people (stas, rpaulo, kib, ngie, wblock) but not achieved a clear consensus. My hope is that with further exposure and testing more functionality can be implemented and evaluated. Notes: * The VM doesn't handle unbalanced domains very well, and if you have an overly unbalanced memory setup whilst under high memory pressure, VM page allocation may fail leading to a kernel panic. This was a problem in the past, but it's much more easily triggered now with these tools. * This work only controls the path through vm_phys; it doesn't yet strongly/predictably affect contigmalloc, KVA placement, UMA, etc. So, driver placement of memory isn't really guaranteed in any way. That's next on my plate. Sponsored by: Norse Corp, Inc.; Dell	2015-07-11 15:21:37 +00:00
adrian	2cad336903	Add initial memory locality cost awareness to the VM, and include a basic ACPI SLIT table parser. For now this just exports the map via sysctl; it'll eventually be useful to userland when there's more useful NUMA support in -HEAD. * Add an optional mem_locality map; * add a mapping function taking from/to domain and returning the relative cost, or -1 if it's not available; * Add a very basic SLIT parser to x86 ACPI. Differential Revision: https://reviews.freebsd.org/D2460 Reviewed by: rpaulo, stas, jhb Sponsored by: Norse Corp, Inc (hardware, coding); Dell (hardware)	2015-05-08 00:56:56 +00:00
ian	eae109babf	Revert r279932; this is going to be fixed in the sbuf code instead. PR: 195668	2015-03-14 13:00:37 +00:00
ian	037188bda9	Nullterminate strings returned via sysctl. PR: 195668	2015-03-12 18:06:30 +00:00
alc	369e66acd7	The physical memory allocator supports the use of distinct free lists for managing pages from different address ranges. Generally speaking, this feature is used to increase the likelihood that physical pages are available that can meet special DMA requirements or can be accessed through a limited-coverage direct mapping (e.g., MIPS). However, prior to this change, the configuration of the free lists was static, i.e., it was determined at compile time. Consequentally, free lists could be created for address ranges that held no actual pages, for example, on 32-bit MIPS- based systems with 512 MB or less of physical memory. This change makes the creation of the free lists dynamic, i.e., it is based on the available physical memory at boot time. On 64-bit x86-based systems with 64 GB or more of physical memory, create free lists for managing pages with physical addresses below 4 GB. This change is to address reported problems with initializing devices that require the allocation of physical pages below 4 GB on some systems with 128 GB or more of physical memory. PR: 185727 Differential Revision: https://reviews.freebsd.org/D1274 Reviewed by: jhb, kib MFC after: 3 weeks Sponsored by: EMC / Isilon Storage Division	2014-12-31 00:54:38 +00:00
alc	aeebd38e4b	Enable the use of VM_PHYSSEG_SPARSE on amd64 and i386, making it the default on i386 PAE. Previously, VM_PHYSSEG_SPARSE could not be used on amd64 and i386 because vm_page_startup() would not create vm_page structures for the kernel page table pages allocated during pmap_bootstrap() but those vm_page structures are needed when the kernel attempts to promote the corresponding kernel virtual addresses to superpage mappings. To address this problem, a new public function, vm_phys_add_seg(), is introduced and vm_phys_init() is updated to reflect the creation of vm_phys_seg structures by calls to vm_phys_add_seg(). Discussed with: Svatopluk Kraus MFC after: 3 weeks Sponsored by: EMC / Isilon Storage Division	2014-11-15 23:40:44 +00:00
royger	b86acea372	vm_phys: improve robustness of fictitious ranges With the current implementation of managed fictitious ranges when also using VM_PHYSSEG_DENSE, a user could try to register a fictitious range that starts inside of vm_page_array, but then overrruns it (because the end of the fictitious range is greater than vm_page_array_size + first_page). This would result in PHYS_TO_VM_PAGE returning unallocated pages from past the end of vm_page_array. The same could happen if a user tried to register a segment that starts outside of vm_page_array but ends inside of it. In order to fix this, allow vm_phys_fictitious_{reg/unreg}_range to use a set of pages from vm_page_array, and allocate the rest. Sponsored by: Citrix Systems R&D Reviewed by: kib, alc vm/vm_phys.c: - Allow registering/unregistering fictitious ranges that overrun vm_page_array.	2014-08-05 10:29:01 +00:00
royger	c9ce58e137	vm_phys: remove limitation on number of fictitious regions The number of vm fictitious regions was limited to 8 by default, but Xen will make heavy usage of those kind of regions in order to map memory from foreign domains, so instead of increasing the default number, change the implementation to use a red-black tree to track vm fictitious ranges. The public interface remains the same. Sponsored by: Citrix Systems R&D Reviewed by: kib, alc Approved by: gibbs vm/vm_phys.c: - Replace the vm fictitious static array with a red-black tree. - Use a rwlock instead of a mutex, since now we also need to take the lock in vm_phys_fictitious_to_vm_page, and it can be shared.	2014-07-09 08:12:58 +00:00
kib	0c45ba8eb0	For the VM_PHYSSEG_DENSE case, checking the requested range to fall into the area backed by vm_page_array wrongly compared end with vm_page_array_size. It should be adjusted by first_page index to be correct. Also, the corner and incorrect case of the requested range extending after the end of the vm_page_array was incorrectly handled by allocating the segment. Fix the comparision for the end of range and return EINVAL if the end extends beyond vm_page_array. Discussed with: royger Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-04-29 18:42:37 +00:00
bdrewery	6fcf6199a4	Rename global cnt to vm_cnt to avoid shadowing. To reduce the diff struct pcu.cnt field was not renamed, so PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in kvm(3) and vmstat(8). The goal was to not affect externally used KPI. Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the the global cnt variable. Exp-run revealed no ports using it directly. No objection from: arch@ Sponsored by: EMC / Isilon Storage Division	2014-03-22 10:26:09 +00:00
alc	d0ccfff2c5	Since the introduction of the popmap to reservations in r259999, there is no longer any need for the page's PG_CACHED and PG_FREE flags to be set and cleared while the free page queues lock is held. Thus, vm_page_alloc(), vm_page_alloc_contig(), and vm_page_alloc_freelist() can wait until after the free page queues lock is released to clear the page's flags. Moreover, the PG_FREE flag can be retired. Now that the reservation system no longer uses it, its only uses are in a few assertions. Eliminating these assertions is no real loss. Other assertions catch the same types of misbehavior, like doubly freeing a page (see r260032) or dirtying a free page (free pages are invalid and only valid pages can be dirtied). Eliminate an unneeded variable from vm_page_alloc_contig(). Sponsored by: EMC / Isilon Storage Division	2013-12-31 18:25:15 +00:00
alc	83e71fe4f7	Tidy up the output of "sysctl vm.phys_free". Approved by: re (glebius) Sponsored by: EMC / Isilon Storage Division	2013-10-10 16:11:45 +00:00
kib	4675fcfce0	Different consumers of the struct vm_page abuse pageq member to keep additional information, when the page is guaranteed to not belong to a paging queue. Usually, this results in a lot of type casts which make reasoning about the code correctness harder. Sometimes m->object is used instead of pageq, which could cause real and confusing bugs if non-NULL m->object is leaked. See r141955 and r253140 for examples. Change the pageq member into a union containing explicitly-typed members. Use them instead of type-punning or abusing m->object in x86 pmaps, uma and vm_page_alloc_contig(). Requested and reviewed by: alc Sponsored by: The FreeBSD Foundation	2013-08-10 17:36:42 +00:00
attilio	16c7563cf4	The soft and hard busy mechanism rely on the vm object lock to work. Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl	2013-08-09 11:11:11 +00:00
kib	8de1718b60	Split the pagequeues per NUMA domains, and split pageademon process into threads each processing queue in a single domain. The structure of the pagedaemons and queues is kept intact, most of the changes come from the need for code to find an owning page queue for given page, calculated from the segment containing the page. The tie between NUMA domain and pagedaemon thread/pagequeue split is rather arbitrary, the multithreaded daemon could be allowed for the single-domain machines, or one domain might be split into several page domains, to further increase concurrency. Right now, each pagedaemon thread tries to reach the global target, precalculated at the start of the pass. This is not optimal, since it could cause excessive page deactivation and freeing. The code should be changed to re-check the global page deficit state in the loop after some number of iterations. The pagedaemons reach the quorum before starting the OOM, since one thread inability to meet the target is normal for split queues. Only when all pagedaemons fail to produce enough reusable pages, OOM is started by single selected thread. Launder is modified to take into account the segments layout with regard to the region for which cleaning is performed. Based on the preliminary patch by jeff, sponsored by EMC / Isilon Storage Division. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation	2013-08-07 16:36:38 +00:00
markj	44ee260831	Fill in the description fields for M_FICT_PAGES. Reviewed by: kib MFC after: 3 days	2013-08-07 00:20:30 +00:00
neel	44e6cbd7c8	vm_phys_fictitious_reg_range() was losing the 'memattr' because it would be reset by pmap_page_init() right after being initialized in vm_page_initfake(). The statement above is with reference to the amd64 implementation of pmap_page_init(). Fix this by calling 'pmap_page_init()' in 'vm_page_initfake()' before changing the 'memattr'. Reviewed by: kib MFC after: 2 weeks	2013-07-03 23:38:37 +00:00
attilio	291f413ed8	o Add accessor functions to add and remove pages from a specific freelist. o Split the pool of free pages queues really by domain and not rely on definition of VM_RAW_NFREELIST. o For MAXMEMDOM > 1, wrap the RR allocation logic into a specific function that is called when calculating the allocation domain. The RR counter is kept, currently, per-thread. In the future it is expected that such function evolves in a real policy decision referee, based on specific informations retrieved by per-thread and per-vm_object attributes. o Add the concept of "probed domains" under the form of vm_ndomains. It is responsibility for every architecture willing to support multiple memory domains to correctly probe vm_ndomains along with mem_affinity segments attributes. Those two values are supposed to remain always consistent. Please also note that vm_ndomains and td_dom_rr_idx are both int because segments already store domains as int. Ideally u_int would have much more sense. Probabilly this should be cleaned up in the future. o Apply RR domain selection also to vm_phys_zero_pages_idle(). Sponsored by: EMC / Isilon storage division Partly obtained from: jeff Reviewed by: alc Tested by: jeff	2013-05-13 15:40:51 +00:00
attilio	c549d43cb1	Fix-up r250338 by completing the removal of VM_NDOMAIN in favor of MAXMEMDOM. This unbreak builds. Sponsored by: EMC / Isilon storage division Reported by: adrian, jeli	2013-05-08 10:55:39 +00:00
attilio	b24a52ec9e	Rename VM_NDOMAIN into MAXMEMDOM and move it into machine/param.h in order to match the MAXCPU concept. The change should also be useful for consolidation and consistency. Sponsored by: EMC / Isilon storage division Obtained from: jeff Reviewed by: alc	2013-05-07 22:46:24 +00:00
jhb	383aea5677	Fix two bugs in the current NUMA-aware allocation code: - vm_phys_alloc_freelist_pages() can be called by vm_page_alloc_freelist() to allocate a page from a specific freelist. In the NUMA case it did not properly map the public VM_FREELIST_* constants to the correct backing freelists, nor did it try all NUMA domains for allocations from VM_FREELIST_DEFAULT. - vm_phys_alloc_pages() did not pin the thread and each call to vm_phys_alloc_freelist_pages() fetched the current domain to choose which freelist to use. If a thread migrated domains during the loop in vm_phys_alloc_pages() it could skip one of the freelists. If the other freelists were out of memory then it is possible that vm_phys_alloc_pages() would fail to allocate a page even though pages were available resulting in a panic in vm_page_alloc(). Reviewed by: alc MFC after: 1 week	2013-05-03 18:58:37 +00:00
jhb	ca6ecf3ea5	Make VM_NDOMAIN a kernel option so that it can be enabled from a kernel config file. Requested by: phk (ages ago) MFC after: 1 month	2013-02-14 19:38:04 +00:00
kib	9ff1ec42a4	Add a facility to register a range of physical addresses to be used for allocation of fictitious pages, for which PHYS_TO_VM_PAGE() returns proper fictitious vm_page_t. The range should be de-registered after consumer stopped using it. De-inline the PHYS_TO_VM_PAGE() since it now carries code to iterate over registered ranges. A hash container might be developed instead of range registration interface, and fake pages could be put automatically into the hash, were PHYS_TO_VM_PAGE() could look them up later. This should be considered before the MFC of the commit is done. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month	2012-05-12 20:42:56 +00:00
jhb	102d1a0777	Bah, just revert my earlier change entirely. (Missed alc's request to do this earlier.) Requested by: alc	2012-03-19 19:06:40 +00:00
jhb	2e0db42a5f	Alter the previous commit to use vm_size_t instead of vm_pindex_t. vm_pindex_t is not a count of pages per se, it is more like vm_ooffset_t, but a page index instead of a byte offset.	2012-03-19 18:43:44 +00:00
jhb	1e515523f3	Pedantic nit: use vm_pindex_t instead of long for a count of pages.	2012-03-14 20:57:48 +00:00
alc	3692d01659	Refactor the code that performs physically contiguous memory allocation, yielding a new public interface, vm_page_alloc_contig(). This new function addresses some of the limitations of the current interfaces, contigmalloc() and kmem_alloc_contig(). For example, the physically contiguous memory that is allocated with those interfaces can only be allocated to the kernel vm object and must be mapped into the kernel virtual address space. It also provides functionality that vm_phys_alloc_contig() doesn't, such as wiring the returned pages. Moreover, unlike that function, it respects the low water marks on the paging queues and wakes up the page daemon when necessary. That said, at present, this new function can't be applied to all types of vm objects. However, that restriction will be eliminated in the coming weeks. From a design standpoint, this change also addresses an inconsistency between vm_phys_alloc_contig() and the other vm_phys_alloc*() functions. Specifically, vm_phys_alloc_contig() manipulated vm_page fields that other functions in vm/vm_phys.c didn't. Moreover, vm_phys_alloc_contig() knew about vnodes and reservations. Now, vm_page_alloc_contig() is responsible for these things. Reviewed by: kib Discussed with: jhb	2011-11-16 16:46:09 +00:00
alc	57e8705396	Eliminate vm_phys_bootstrap_alloc(). It was a failed attempt at eliminating duplicated code in the various pmap implementations. Micro-optimize vm_phys_free_pages(). Introduce vm_phys_free_contig(). It is fast routine for freeing an arbitrary number of physically contiguous pages. In particular, it doesn't require the number of pages to be a power of two. Use "u_long" instead of "unsigned long". Bruce Evans (bde@) has convinced me that the "boundary" parameters to kmem_alloc_contig(), vm_phys_alloc_contig(), and vm_reserv_reclaim_contig() should be of type "vm_paddr_t" and not "u_long". Make this change.	2011-10-30 05:06:14 +00:00
attilio	846436b8c8	VN_NRESERVLEVEL is used in this file but opt_vm is not included thus the stub switch won't be correctly handled. Include opt_vm.h. Submitted by: jeff MFC after: 3 days	2011-10-22 22:00:35 +00:00
mdf	7fc649fc41	Explicitly wire the user buffer rather than doing it implicitly in sbuf_new_for_sysctl(9). This allows using an sbuf with a SYSCTL_OUT drain for extremely large amounts of data where the caller knows that appropriate references are held, and sleeping is not an issue. Inspired by: rwatson	2011-01-27 00:34:12 +00:00
alc	d326a38b4b	Explicitly initialize the page's queue field to PQ_NONE instead of relying on PQ_NONE being zero. Redefine PQ_NONE and PQ_COUNT so that a page queue isn't allocated for PQ_NONE. Reviewed by: kib@	2011-01-17 19:17:26 +00:00
alc	80380396eb	Correct some format strings used by sysctls. MFC after: 1 week	2010-10-30 18:00:53 +00:00
mdf	5695ef4698	Re-add r212370 now that the LOR in powerpc64 has been resolved: Add a drain function for struct sysctl_req, and use it for a variety of handlers, some of which had to do awkward things to get a large enough SBUF_FIXEDLEN buffer. Note that some sysctl handlers were explicitly outputting a trailing NUL byte. This behaviour was preserved, though it should not be necessary. Reviewed by: phk (original patch)	2010-09-16 16:13:12 +00:00
mdf	3ed6eac561	Revert r212370, as it causes a LOR on powerpc. powerpc does a few unexpected things in copyout(9) and so wiring the user buffer is not sufficient to perform a copyout(9) while holding a random mutex. Requested by: nwhitehorn	2010-09-13 18:48:23 +00:00
mdf	bc54684253	Add a drain function for struct sysctl_req, and use it for a variety of handlers, some of which had to do awkward things to get a large enough FIXEDLEN buffer. Note that some sysctl handlers were explicitly outputting a trailing NUL byte. This behaviour was preserved, though it should not be necessary. Reviewed by: phk	2010-09-09 18:33:46 +00:00
jhb	f27c8b35e2	Very rough first cut at NUMA support for the physical page allocator. For now it uses a very dumb first-touch allocation policy. This will change in the future. - Each architecture indicates the maximum number of supported memory domains via a new VM_NDOMAIN parameter in <machine/vmparam.h>. - Each cpu now has a PCPU_GET(domain) member to indicate the memory domain a CPU belongs to. Domain values are dense and numbered from 0. - When a platform supports multiple domains, the default freelist (VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain. The MD code is required to populate an array of mem_affinity structures. Each entry in the array defines a range of memory (start and end) and a domain for the range. Multiple entries may be present for a single domain. The list is terminated by an entry where all fields are zero. This array of structures is used to split up phys_avail[] regions that fall in VM_FREELIST_DEFAULT into per-domain freelists. - Each memory domain has a separate lookup-array of freelists that is used when fulfulling a physical memory allocation. Right now the per-domain freelists are listed in a round-robin order for each domain. In the future a table such as the ACPI SLIT table may be used to order the per-domain lookup lists based on the penalty for each memory domain relative to a specific domain. The lookup lists may be examined via a new vm.phys.lookup_lists sysctl. - The first-touch policy is implemented by using PCPU_GET(domain) to pick a lookup list when allocating memory. Reviewed by: alc	2010-07-27 20:33:50 +00:00
jchandra	10dfd55de4	Redo the page table page allocation on MIPS, as suggested by alc@. The UMA zone based allocation is replaced by a scheme that creates a new free page list for the KSEG0 region, and a new function in sys/vm that allocates pages from a specific free page list. This also fixes a race condition introduced by the UMA based page table page allocation code. Dropping the page queue and pmap locks before the call to uma_zfree, and re-acquiring them afterwards will introduce a race condtion(noted by alc@). The changes are : - Revert the earlier changes in MIPS pmap.c that added UMA zone for page table pages. - Add a new freelist VM_FREELIST_HIGHMEM to MIPS vmparam.h for memory that is not directly mapped (in 32bit kernel). Normal page allocations will first try the HIGHMEM freelist and then the default(direct mapped) freelist. - Add a new function 'vm_page_t vm_page_alloc_freelist(int flind, int order, int req)' to vm/vm_page.c to allocate a page from a specified freelist. The MIPS page table pages will be allocated using this function from the freelist containing direct mapped pages. - Move the page initialization code from vm_phys_alloc_contig() to a new function vm_page_alloc_init(), and use this function to initialize pages in vm_page_alloc_freelist() too. - Split the function vm_phys_alloc_pages(int pool, int order) to create vm_phys_alloc_freelist_pages(int flind, int pool, int order), and use this function from both vm_page_alloc_freelist() and vm_phys_alloc_pages(). Reviewed by: alc	2010-07-21 09:27:00 +00:00
alc	ea60573817	Add support to the virtual memory system for configuring machine- dependent memory attributes: Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the fact that there are machine-dependent memory attributes that have nothing to do with controlling the cache's behavior. Introduce vm_object_set_memattr() for setting the default memory attributes that will be given to an object's pages. Introduce and use pmap_page_{get,set}_memattr() for getting and setting a page's machine-dependent memory attributes. Add full support for these functions on amd64 and i386 and stubs for them on the other architectures. The function pmap_page_set_memattr() is also responsible for any other machine-dependent aspects of changing a page's memory attributes, such as flushing the cache or updating the direct map. The uses include kmem_alloc_contig(), vm_page_alloc(), and the device pager: kmem_alloc_contig() can now be used to allocate kernel memory with non-default memory attributes on amd64 and i386. vm_page_alloc() and the device pager will set the memory attributes for the real or fictitious page according to the object's default memory attributes. Update the various pmap functions on amd64 and i386 that map pages to incorporate each page's memory attributes in the mapping. Notes: (1) Inherent to this design are safety features that prevent the specification of inconsistent memory attributes by different mappings on amd64 and i386. In addition, the device pager provides a warning when a device driver creates a fictitious page with memory attributes that are inconsistent with the real page that the fictitious page is an alias for. (2) Storing the machine-dependent memory attributes for amd64 and i386 as a dedicated "int" in "struct md_page" represents a compromise between space efficiency and the ease of MFCing these changes to RELENG_7. In collaboration with: jhb Approved by: re (kib)	2009-07-12 23:31:20 +00:00
alc	91cafd48b1	This change is the next step in implementing the cache control functionality required by video card drivers. Specifically, this change introduces vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all architectures. In addition, this changes adds a vm_cache_mode_t parameter to kmem_alloc_contig() and vm_phys_alloc_contig(). These will be the interfaces for allocating mapped kernel memory and physical memory, respectively, with non-default cache modes. In collaboration with: jhb	2009-06-26 04:47:43 +00:00
alc	d092e8e13d	Implement a mechanism within vm_phys_alloc_contig() to defer all necessary calls to vdrop() until after the free page queues lock is released. This eliminates repeatedly releasing and reacquiring the free page queues lock each time the last cached page is reclaimed from a vnode-backed object.	2009-06-21 20:29:14 +00:00
alc	7b05ffed76	Strive for greater consistency among the places that implement real, fictious, and contiguous page allocation. Eliminate unnecessary reinitialization of a page's fields.	2009-06-21 00:21:33 +00:00
thompsa	f3a1b951fc	Track the kernel mapping of a physical page by a new entry in vm_page structure. When the page is shared, the kernel mapping becomes a special type of managed page to force the cache off the page mappings. This is needed to avoid stale entries on all ARM VIVT caches, and VIPT caches with cache color issue. Submitted by: Mark Tinguely Reviewed by: alc Tested by: Grzegorz Bernacki, thompsa	2009-06-18 20:42:37 +00:00

1 2

60 Commits