Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
/*-
|
|
|
|
* Copyright (c) 2002-2006 Rice University
|
|
|
|
* Copyright (c) 2007 Alan L. Cox <alc@cs.rice.edu>
|
|
|
|
* All rights reserved.
|
|
|
|
*
|
|
|
|
* This software was developed for the FreeBSD Project by Alan L. Cox,
|
|
|
|
* Olivier Crameri, Peter Druschel, Sitaram Iyer, and Juan Navarro.
|
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
|
|
|
* ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
|
|
|
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
|
|
|
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
|
|
|
* HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
|
|
|
|
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
|
|
|
|
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
|
|
|
|
* OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
|
|
|
|
* AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY
|
|
|
|
* WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
|
|
|
* POSSIBILITY OF SUCH DAMAGE.
|
|
|
|
*/
|
|
|
|
|
Refactor the code that performs physically contiguous memory allocation,
yielding a new public interface, vm_page_alloc_contig(). This new function
addresses some of the limitations of the current interfaces, contigmalloc()
and kmem_alloc_contig(). For example, the physically contiguous memory that
is allocated with those interfaces can only be allocated to the kernel vm
object and must be mapped into the kernel virtual address space. It also
provides functionality that vm_phys_alloc_contig() doesn't, such as wiring
the returned pages. Moreover, unlike that function, it respects the low
water marks on the paging queues and wakes up the page daemon when
necessary. That said, at present, this new function can't be applied to all
types of vm objects. However, that restriction will be eliminated in the
coming weeks.
From a design standpoint, this change also addresses an inconsistency
between vm_phys_alloc_contig() and the other vm_phys_alloc*() functions.
Specifically, vm_phys_alloc_contig() manipulated vm_page fields that other
functions in vm/vm_phys.c didn't. Moreover, vm_phys_alloc_contig() knew
about vnodes and reservations. Now, vm_page_alloc_contig() is responsible
for these things.
Reviewed by: kib
Discussed with: jhb
2011-11-16 16:46:09 +00:00
|
|
|
/*
|
|
|
|
* Physical memory system implementation
|
|
|
|
*
|
|
|
|
* Any external functions defined by this module are only to be used by the
|
|
|
|
* virtual memory system.
|
|
|
|
*/
|
|
|
|
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
#include <sys/cdefs.h>
|
|
|
|
__FBSDID("$FreeBSD$");
|
|
|
|
|
|
|
|
#include "opt_ddb.h"
|
2013-02-14 19:38:04 +00:00
|
|
|
#include "opt_vm.h"
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
|
|
|
|
#include <sys/param.h>
|
|
|
|
#include <sys/systm.h>
|
|
|
|
#include <sys/lock.h>
|
|
|
|
#include <sys/kernel.h>
|
|
|
|
#include <sys/malloc.h>
|
|
|
|
#include <sys/mutex.h>
|
2013-05-13 15:40:51 +00:00
|
|
|
#if MAXMEMDOM > 1
|
|
|
|
#include <sys/proc.h>
|
|
|
|
#endif
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
#include <sys/queue.h>
|
2014-07-09 08:12:58 +00:00
|
|
|
#include <sys/rwlock.h>
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
#include <sys/sbuf.h>
|
|
|
|
#include <sys/sysctl.h>
|
2014-07-09 08:12:58 +00:00
|
|
|
#include <sys/tree.h>
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
#include <sys/vmmeter.h>
|
|
|
|
|
|
|
|
#include <ddb/ddb.h>
|
|
|
|
|
|
|
|
#include <vm/vm.h>
|
|
|
|
#include <vm/vm_param.h>
|
|
|
|
#include <vm/vm_kern.h>
|
|
|
|
#include <vm/vm_object.h>
|
|
|
|
#include <vm/vm_page.h>
|
|
|
|
#include <vm/vm_phys.h>
|
|
|
|
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
_Static_assert(sizeof(long) * NBBY >= VM_PHYSSEG_MAX,
|
|
|
|
"Too many physsegs.");
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
|
2010-07-27 20:33:50 +00:00
|
|
|
struct mem_affinity *mem_affinity;
|
|
|
|
|
2013-05-13 15:40:51 +00:00
|
|
|
int vm_ndomains = 1;
|
|
|
|
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
struct vm_phys_seg vm_phys_segs[VM_PHYSSEG_MAX];
|
|
|
|
int vm_phys_nsegs;
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
|
2014-07-09 08:12:58 +00:00
|
|
|
struct vm_phys_fictitious_seg;
|
|
|
|
static int vm_phys_fictitious_cmp(struct vm_phys_fictitious_seg *,
|
|
|
|
struct vm_phys_fictitious_seg *);
|
|
|
|
|
|
|
|
RB_HEAD(fict_tree, vm_phys_fictitious_seg) vm_phys_fictitious_tree =
|
|
|
|
RB_INITIALIZER(_vm_phys_fictitious_tree);
|
|
|
|
|
|
|
|
struct vm_phys_fictitious_seg {
|
|
|
|
RB_ENTRY(vm_phys_fictitious_seg) node;
|
|
|
|
/* Memory region data */
|
2012-05-12 20:42:56 +00:00
|
|
|
vm_paddr_t start;
|
|
|
|
vm_paddr_t end;
|
|
|
|
vm_page_t first_page;
|
2014-07-09 08:12:58 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
RB_GENERATE_STATIC(fict_tree, vm_phys_fictitious_seg, node,
|
|
|
|
vm_phys_fictitious_cmp);
|
|
|
|
|
|
|
|
static struct rwlock vm_phys_fictitious_reg_lock;
|
2013-08-07 00:20:30 +00:00
|
|
|
MALLOC_DEFINE(M_FICT_PAGES, "vm_fictitious", "Fictitious VM pages");
|
2012-05-12 20:42:56 +00:00
|
|
|
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
static struct vm_freelist
|
2013-05-13 15:40:51 +00:00
|
|
|
vm_phys_free_queues[MAXMEMDOM][VM_NFREELIST][VM_NFREEPOOL][VM_NFREEORDER];
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
|
|
|
|
static int vm_nfreelists = VM_FREELIST_DEFAULT + 1;
|
|
|
|
|
|
|
|
static int cnt_prezero;
|
|
|
|
SYSCTL_INT(_vm_stats_misc, OID_AUTO, cnt_prezero, CTLFLAG_RD,
|
|
|
|
&cnt_prezero, 0, "The number of physical pages prezeroed at idle time");
|
|
|
|
|
|
|
|
static int sysctl_vm_phys_free(SYSCTL_HANDLER_ARGS);
|
|
|
|
SYSCTL_OID(_vm, OID_AUTO, phys_free, CTLTYPE_STRING | CTLFLAG_RD,
|
|
|
|
NULL, 0, sysctl_vm_phys_free, "A", "Phys Free Info");
|
|
|
|
|
|
|
|
static int sysctl_vm_phys_segs(SYSCTL_HANDLER_ARGS);
|
|
|
|
SYSCTL_OID(_vm, OID_AUTO, phys_segs, CTLTYPE_STRING | CTLFLAG_RD,
|
|
|
|
NULL, 0, sysctl_vm_phys_segs, "A", "Phys Seg Info");
|
|
|
|
|
2013-05-13 15:40:51 +00:00
|
|
|
SYSCTL_INT(_vm, OID_AUTO, ndomains, CTLFLAG_RD,
|
|
|
|
&vm_ndomains, 0, "Number of physical memory domains available.");
|
2010-07-27 20:33:50 +00:00
|
|
|
|
2013-05-03 18:58:37 +00:00
|
|
|
static vm_page_t vm_phys_alloc_domain_pages(int domain, int flind, int pool,
|
|
|
|
int order);
|
2010-07-27 20:33:50 +00:00
|
|
|
static void _vm_phys_create_seg(vm_paddr_t start, vm_paddr_t end, int flind,
|
|
|
|
int domain);
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
static void vm_phys_create_seg(vm_paddr_t start, vm_paddr_t end, int flind);
|
|
|
|
static int vm_phys_paddr_to_segind(vm_paddr_t pa);
|
|
|
|
static void vm_phys_split_pages(vm_page_t m, int oind, struct vm_freelist *fl,
|
|
|
|
int order);
|
|
|
|
|
2014-07-09 08:12:58 +00:00
|
|
|
/*
|
|
|
|
* Red-black tree helpers for vm fictitious range management.
|
|
|
|
*/
|
|
|
|
static inline int
|
|
|
|
vm_phys_fictitious_in_range(struct vm_phys_fictitious_seg *p,
|
|
|
|
struct vm_phys_fictitious_seg *range)
|
|
|
|
{
|
|
|
|
|
|
|
|
KASSERT(range->start != 0 && range->end != 0,
|
|
|
|
("Invalid range passed on search for vm_fictitious page"));
|
|
|
|
if (p->start >= range->end)
|
|
|
|
return (1);
|
|
|
|
if (p->start < range->start)
|
|
|
|
return (-1);
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
vm_phys_fictitious_cmp(struct vm_phys_fictitious_seg *p1,
|
|
|
|
struct vm_phys_fictitious_seg *p2)
|
|
|
|
{
|
|
|
|
|
|
|
|
/* Check if this is a search for a page */
|
|
|
|
if (p1->end == 0)
|
|
|
|
return (vm_phys_fictitious_in_range(p1, p2));
|
|
|
|
|
|
|
|
KASSERT(p2->end != 0,
|
|
|
|
("Invalid range passed as second parameter to vm fictitious comparison"));
|
|
|
|
|
|
|
|
/* Searching to add a new range */
|
|
|
|
if (p1->end <= p2->start)
|
|
|
|
return (-1);
|
|
|
|
if (p1->start >= p2->end)
|
|
|
|
return (1);
|
|
|
|
|
|
|
|
panic("Trying to add overlapping vm fictitious ranges:\n"
|
|
|
|
"[%#jx:%#jx] and [%#jx:%#jx]", (uintmax_t)p1->start,
|
|
|
|
(uintmax_t)p1->end, (uintmax_t)p2->start, (uintmax_t)p2->end);
|
|
|
|
}
|
|
|
|
|
2013-05-13 15:40:51 +00:00
|
|
|
static __inline int
|
|
|
|
vm_rr_selectdomain(void)
|
|
|
|
{
|
|
|
|
#if MAXMEMDOM > 1
|
|
|
|
struct thread *td;
|
|
|
|
|
|
|
|
td = curthread;
|
|
|
|
|
|
|
|
td->td_dom_rr_idx++;
|
|
|
|
td->td_dom_rr_idx %= vm_ndomains;
|
|
|
|
return (td->td_dom_rr_idx);
|
|
|
|
#else
|
|
|
|
return (0);
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
boolean_t
|
|
|
|
vm_phys_domain_intersects(long mask, vm_paddr_t low, vm_paddr_t high)
|
|
|
|
{
|
|
|
|
struct vm_phys_seg *s;
|
|
|
|
int idx;
|
|
|
|
|
|
|
|
while ((idx = ffsl(mask)) != 0) {
|
|
|
|
idx--; /* ffsl counts from 1 */
|
|
|
|
mask &= ~(1UL << idx);
|
|
|
|
s = &vm_phys_segs[idx];
|
|
|
|
if (low < s->end && high > s->start)
|
|
|
|
return (TRUE);
|
|
|
|
}
|
|
|
|
return (FALSE);
|
|
|
|
}
|
|
|
|
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
/*
|
|
|
|
* Outputs the state of the physical memory allocator, specifically,
|
|
|
|
* the amount of physical memory in each free list.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
sysctl_vm_phys_free(SYSCTL_HANDLER_ARGS)
|
|
|
|
{
|
|
|
|
struct sbuf sbuf;
|
|
|
|
struct vm_freelist *fl;
|
2013-05-13 15:40:51 +00:00
|
|
|
int dom, error, flind, oind, pind;
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
|
2011-01-27 00:34:12 +00:00
|
|
|
error = sysctl_wire_old_buffer(req, 0);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
2013-05-13 15:40:51 +00:00
|
|
|
sbuf_new_for_sysctl(&sbuf, NULL, 128 * vm_ndomains, req);
|
|
|
|
for (dom = 0; dom < vm_ndomains; dom++) {
|
2013-10-10 16:11:45 +00:00
|
|
|
sbuf_printf(&sbuf,"\nDOMAIN %d:\n", dom);
|
2013-05-13 15:40:51 +00:00
|
|
|
for (flind = 0; flind < vm_nfreelists; flind++) {
|
2013-10-10 16:11:45 +00:00
|
|
|
sbuf_printf(&sbuf, "\nFREE LIST %d:\n"
|
2013-05-13 15:40:51 +00:00
|
|
|
"\n ORDER (SIZE) | NUMBER"
|
|
|
|
"\n ", flind);
|
|
|
|
for (pind = 0; pind < VM_NFREEPOOL; pind++)
|
|
|
|
sbuf_printf(&sbuf, " | POOL %d", pind);
|
|
|
|
sbuf_printf(&sbuf, "\n-- ");
|
|
|
|
for (pind = 0; pind < VM_NFREEPOOL; pind++)
|
|
|
|
sbuf_printf(&sbuf, "-- -- ");
|
|
|
|
sbuf_printf(&sbuf, "--\n");
|
|
|
|
for (oind = VM_NFREEORDER - 1; oind >= 0; oind--) {
|
|
|
|
sbuf_printf(&sbuf, " %2d (%6dK)", oind,
|
|
|
|
1 << (PAGE_SHIFT - 10 + oind));
|
|
|
|
for (pind = 0; pind < VM_NFREEPOOL; pind++) {
|
|
|
|
fl = vm_phys_free_queues[dom][flind][pind];
|
2013-10-10 16:11:45 +00:00
|
|
|
sbuf_printf(&sbuf, " | %6d",
|
2013-05-13 15:40:51 +00:00
|
|
|
fl[oind].lcnt);
|
|
|
|
}
|
|
|
|
sbuf_printf(&sbuf, "\n");
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2010-09-16 16:13:12 +00:00
|
|
|
error = sbuf_finish(&sbuf);
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
sbuf_delete(&sbuf);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Outputs the set of physical memory segments.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
sysctl_vm_phys_segs(SYSCTL_HANDLER_ARGS)
|
|
|
|
{
|
|
|
|
struct sbuf sbuf;
|
|
|
|
struct vm_phys_seg *seg;
|
|
|
|
int error, segind;
|
|
|
|
|
2011-01-27 00:34:12 +00:00
|
|
|
error = sysctl_wire_old_buffer(req, 0);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
2010-09-16 16:13:12 +00:00
|
|
|
sbuf_new_for_sysctl(&sbuf, NULL, 128, req);
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
for (segind = 0; segind < vm_phys_nsegs; segind++) {
|
|
|
|
sbuf_printf(&sbuf, "\nSEGMENT %d:\n\n", segind);
|
|
|
|
seg = &vm_phys_segs[segind];
|
|
|
|
sbuf_printf(&sbuf, "start: %#jx\n",
|
|
|
|
(uintmax_t)seg->start);
|
|
|
|
sbuf_printf(&sbuf, "end: %#jx\n",
|
|
|
|
(uintmax_t)seg->end);
|
2010-07-27 20:33:50 +00:00
|
|
|
sbuf_printf(&sbuf, "domain: %d\n", seg->domain);
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
sbuf_printf(&sbuf, "free list: %p\n", seg->free_queues);
|
|
|
|
}
|
2010-09-16 16:13:12 +00:00
|
|
|
error = sbuf_finish(&sbuf);
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
sbuf_delete(&sbuf);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2013-05-13 15:40:51 +00:00
|
|
|
static void
|
|
|
|
vm_freelist_add(struct vm_freelist *fl, vm_page_t m, int order, int tail)
|
2010-07-27 20:33:50 +00:00
|
|
|
{
|
|
|
|
|
2013-05-13 15:40:51 +00:00
|
|
|
m->order = order;
|
|
|
|
if (tail)
|
2013-08-10 17:36:42 +00:00
|
|
|
TAILQ_INSERT_TAIL(&fl[order].pl, m, plinks.q);
|
2013-05-13 15:40:51 +00:00
|
|
|
else
|
2013-08-10 17:36:42 +00:00
|
|
|
TAILQ_INSERT_HEAD(&fl[order].pl, m, plinks.q);
|
2013-05-13 15:40:51 +00:00
|
|
|
fl[order].lcnt++;
|
2010-07-27 20:33:50 +00:00
|
|
|
}
|
2013-05-13 15:40:51 +00:00
|
|
|
|
|
|
|
static void
|
|
|
|
vm_freelist_rem(struct vm_freelist *fl, vm_page_t m, int order)
|
|
|
|
{
|
|
|
|
|
2013-08-10 17:36:42 +00:00
|
|
|
TAILQ_REMOVE(&fl[order].pl, m, plinks.q);
|
2013-05-13 15:40:51 +00:00
|
|
|
fl[order].lcnt--;
|
|
|
|
m->order = VM_NFREEORDER;
|
|
|
|
}
|
|
|
|
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
/*
|
|
|
|
* Create a physical memory segment.
|
|
|
|
*/
|
|
|
|
static void
|
2010-07-27 20:33:50 +00:00
|
|
|
_vm_phys_create_seg(vm_paddr_t start, vm_paddr_t end, int flind, int domain)
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
{
|
|
|
|
struct vm_phys_seg *seg;
|
|
|
|
|
|
|
|
KASSERT(vm_phys_nsegs < VM_PHYSSEG_MAX,
|
|
|
|
("vm_phys_create_seg: increase VM_PHYSSEG_MAX"));
|
2013-05-13 15:40:51 +00:00
|
|
|
KASSERT(domain < vm_ndomains,
|
|
|
|
("vm_phys_create_seg: invalid domain provided"));
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
seg = &vm_phys_segs[vm_phys_nsegs++];
|
2014-11-15 23:40:44 +00:00
|
|
|
while (seg > vm_phys_segs && (seg - 1)->start >= end) {
|
|
|
|
*seg = *(seg - 1);
|
|
|
|
seg--;
|
|
|
|
}
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
seg->start = start;
|
|
|
|
seg->end = end;
|
2010-07-27 20:33:50 +00:00
|
|
|
seg->domain = domain;
|
2013-05-13 15:40:51 +00:00
|
|
|
seg->free_queues = &vm_phys_free_queues[domain][flind];
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
}
|
|
|
|
|
2010-07-27 20:33:50 +00:00
|
|
|
static void
|
|
|
|
vm_phys_create_seg(vm_paddr_t start, vm_paddr_t end, int flind)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (mem_affinity == NULL) {
|
|
|
|
_vm_phys_create_seg(start, end, flind, 0);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (i = 0;; i++) {
|
|
|
|
if (mem_affinity[i].end == 0)
|
|
|
|
panic("Reached end of affinity info");
|
|
|
|
if (mem_affinity[i].end <= start)
|
|
|
|
continue;
|
|
|
|
if (mem_affinity[i].start > start)
|
|
|
|
panic("No affinity info for start %jx",
|
|
|
|
(uintmax_t)start);
|
|
|
|
if (mem_affinity[i].end >= end) {
|
|
|
|
_vm_phys_create_seg(start, end, flind,
|
|
|
|
mem_affinity[i].domain);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
_vm_phys_create_seg(start, mem_affinity[i].end, flind,
|
|
|
|
mem_affinity[i].domain);
|
|
|
|
start = mem_affinity[i].end;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
/*
|
2014-11-15 23:40:44 +00:00
|
|
|
* Add a physical memory segment.
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
*/
|
|
|
|
void
|
2014-11-15 23:40:44 +00:00
|
|
|
vm_phys_add_seg(vm_paddr_t start, vm_paddr_t end)
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
{
|
|
|
|
|
2014-11-15 23:40:44 +00:00
|
|
|
KASSERT((start & PAGE_MASK) == 0,
|
|
|
|
("vm_phys_define_seg: start is not page aligned"));
|
|
|
|
KASSERT((end & PAGE_MASK) == 0,
|
|
|
|
("vm_phys_define_seg: end is not page aligned"));
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
#ifdef VM_FREELIST_ISADMA
|
2014-11-15 23:40:44 +00:00
|
|
|
if (start < 16777216) {
|
|
|
|
if (end > 16777216) {
|
|
|
|
vm_phys_create_seg(start, 16777216,
|
|
|
|
VM_FREELIST_ISADMA);
|
|
|
|
vm_phys_create_seg(16777216, end, VM_FREELIST_DEFAULT);
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
} else
|
2014-11-15 23:40:44 +00:00
|
|
|
vm_phys_create_seg(start, end, VM_FREELIST_ISADMA);
|
|
|
|
if (VM_FREELIST_ISADMA >= vm_nfreelists)
|
|
|
|
vm_nfreelists = VM_FREELIST_ISADMA + 1;
|
|
|
|
} else
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
#endif
|
|
|
|
#ifdef VM_FREELIST_HIGHMEM
|
2014-11-15 23:40:44 +00:00
|
|
|
if (end > VM_HIGHMEM_ADDRESS) {
|
|
|
|
if (start < VM_HIGHMEM_ADDRESS) {
|
|
|
|
vm_phys_create_seg(start, VM_HIGHMEM_ADDRESS,
|
|
|
|
VM_FREELIST_DEFAULT);
|
|
|
|
vm_phys_create_seg(VM_HIGHMEM_ADDRESS, end,
|
|
|
|
VM_FREELIST_HIGHMEM);
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
} else
|
2014-11-15 23:40:44 +00:00
|
|
|
vm_phys_create_seg(start, end, VM_FREELIST_HIGHMEM);
|
|
|
|
if (VM_FREELIST_HIGHMEM >= vm_nfreelists)
|
|
|
|
vm_nfreelists = VM_FREELIST_HIGHMEM + 1;
|
|
|
|
} else
|
|
|
|
#endif
|
|
|
|
vm_phys_create_seg(start, end, VM_FREELIST_DEFAULT);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Initialize the physical memory allocator.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
vm_phys_init(void)
|
|
|
|
{
|
|
|
|
struct vm_freelist *fl;
|
|
|
|
struct vm_phys_seg *seg;
|
|
|
|
#ifdef VM_PHYSSEG_SPARSE
|
|
|
|
long pages;
|
|
|
|
#endif
|
|
|
|
int dom, flind, oind, pind, segind;
|
|
|
|
|
|
|
|
#ifdef VM_PHYSSEG_SPARSE
|
|
|
|
pages = 0;
|
|
|
|
#endif
|
|
|
|
for (segind = 0; segind < vm_phys_nsegs; segind++) {
|
|
|
|
seg = &vm_phys_segs[segind];
|
|
|
|
#ifdef VM_PHYSSEG_SPARSE
|
|
|
|
seg->first_page = &vm_page_array[pages];
|
|
|
|
pages += atop(seg->end - seg->start);
|
|
|
|
#else
|
|
|
|
seg->first_page = PHYS_TO_VM_PAGE(seg->start);
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
#endif
|
|
|
|
}
|
2013-05-13 15:40:51 +00:00
|
|
|
for (dom = 0; dom < vm_ndomains; dom++) {
|
|
|
|
for (flind = 0; flind < vm_nfreelists; flind++) {
|
|
|
|
for (pind = 0; pind < VM_NFREEPOOL; pind++) {
|
|
|
|
fl = vm_phys_free_queues[dom][flind][pind];
|
|
|
|
for (oind = 0; oind < VM_NFREEORDER; oind++)
|
|
|
|
TAILQ_INIT(&fl[oind].pl);
|
|
|
|
}
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
}
|
|
|
|
}
|
2014-07-09 08:12:58 +00:00
|
|
|
rw_init(&vm_phys_fictitious_reg_lock, "vmfctr");
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Split a contiguous, power of two-sized set of physical pages.
|
|
|
|
*/
|
|
|
|
static __inline void
|
|
|
|
vm_phys_split_pages(vm_page_t m, int oind, struct vm_freelist *fl, int order)
|
|
|
|
{
|
|
|
|
vm_page_t m_buddy;
|
|
|
|
|
|
|
|
while (oind > order) {
|
|
|
|
oind--;
|
|
|
|
m_buddy = &m[1 << oind];
|
|
|
|
KASSERT(m_buddy->order == VM_NFREEORDER,
|
|
|
|
("vm_phys_split_pages: page %p has unexpected order %d",
|
|
|
|
m_buddy, m_buddy->order));
|
2013-05-13 15:40:51 +00:00
|
|
|
vm_freelist_add(fl, m_buddy, oind, 0);
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Initialize a physical page and add it to the free lists.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
vm_phys_add_page(vm_paddr_t pa)
|
|
|
|
{
|
|
|
|
vm_page_t m;
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
struct vm_domain *vmd;
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
|
2014-03-22 10:26:09 +00:00
|
|
|
vm_cnt.v_page_count++;
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
m = vm_phys_paddr_to_vm_page(pa);
|
|
|
|
m->phys_addr = pa;
|
2011-01-17 19:17:26 +00:00
|
|
|
m->queue = PQ_NONE;
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
m->segind = vm_phys_paddr_to_segind(pa);
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
vmd = vm_phys_domain(m);
|
|
|
|
vmd->vmd_page_count++;
|
|
|
|
vmd->vmd_segs |= 1UL << m->segind;
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
KASSERT(m->order == VM_NFREEORDER,
|
|
|
|
("vm_phys_add_page: page %p has unexpected order %d",
|
|
|
|
m, m->order));
|
|
|
|
m->pool = VM_FREEPOOL_DEFAULT;
|
|
|
|
pmap_page_init(m);
|
|
|
|
mtx_lock(&vm_page_queue_free_mtx);
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
vm_phys_freecnt_adj(m, 1);
|
2007-07-14 21:21:17 +00:00
|
|
|
vm_phys_free_pages(m, 0);
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
mtx_unlock(&vm_page_queue_free_mtx);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Allocate a contiguous, power of two-sized set of physical pages
|
|
|
|
* from the free lists.
|
2007-07-14 21:21:17 +00:00
|
|
|
*
|
|
|
|
* The free page queues must be locked.
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
*/
|
|
|
|
vm_page_t
|
2007-07-14 21:21:17 +00:00
|
|
|
vm_phys_alloc_pages(int pool, int order)
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
{
|
Redo the page table page allocation on MIPS, as suggested by
alc@.
The UMA zone based allocation is replaced by a scheme that creates
a new free page list for the KSEG0 region, and a new function
in sys/vm that allocates pages from a specific free page list.
This also fixes a race condition introduced by the UMA based page table
page allocation code. Dropping the page queue and pmap locks before
the call to uma_zfree, and re-acquiring them afterwards will introduce
a race condtion(noted by alc@).
The changes are :
- Revert the earlier changes in MIPS pmap.c that added UMA zone for
page table pages.
- Add a new freelist VM_FREELIST_HIGHMEM to MIPS vmparam.h for memory that
is not directly mapped (in 32bit kernel). Normal page allocations will first
try the HIGHMEM freelist and then the default(direct mapped) freelist.
- Add a new function 'vm_page_t vm_page_alloc_freelist(int flind, int
order, int req)' to vm/vm_page.c to allocate a page from a specified
freelist. The MIPS page table pages will be allocated using this function
from the freelist containing direct mapped pages.
- Move the page initialization code from vm_phys_alloc_contig() to a
new function vm_page_alloc_init(), and use this function to initialize
pages in vm_page_alloc_freelist() too.
- Split the function vm_phys_alloc_pages(int pool, int order) to create
vm_phys_alloc_freelist_pages(int flind, int pool, int order), and use
this function from both vm_page_alloc_freelist() and vm_phys_alloc_pages().
Reviewed by: alc
2010-07-21 09:27:00 +00:00
|
|
|
vm_page_t m;
|
2013-05-13 15:40:51 +00:00
|
|
|
int dom, domain, flind;
|
2013-05-03 18:58:37 +00:00
|
|
|
|
|
|
|
KASSERT(pool < VM_NFREEPOOL,
|
|
|
|
("vm_phys_alloc_pages: pool %d is out of range", pool));
|
|
|
|
KASSERT(order < VM_NFREEORDER,
|
|
|
|
("vm_phys_alloc_pages: order %d is out of range", order));
|
Redo the page table page allocation on MIPS, as suggested by
alc@.
The UMA zone based allocation is replaced by a scheme that creates
a new free page list for the KSEG0 region, and a new function
in sys/vm that allocates pages from a specific free page list.
This also fixes a race condition introduced by the UMA based page table
page allocation code. Dropping the page queue and pmap locks before
the call to uma_zfree, and re-acquiring them afterwards will introduce
a race condtion(noted by alc@).
The changes are :
- Revert the earlier changes in MIPS pmap.c that added UMA zone for
page table pages.
- Add a new freelist VM_FREELIST_HIGHMEM to MIPS vmparam.h for memory that
is not directly mapped (in 32bit kernel). Normal page allocations will first
try the HIGHMEM freelist and then the default(direct mapped) freelist.
- Add a new function 'vm_page_t vm_page_alloc_freelist(int flind, int
order, int req)' to vm/vm_page.c to allocate a page from a specified
freelist. The MIPS page table pages will be allocated using this function
from the freelist containing direct mapped pages.
- Move the page initialization code from vm_phys_alloc_contig() to a
new function vm_page_alloc_init(), and use this function to initialize
pages in vm_page_alloc_freelist() too.
- Split the function vm_phys_alloc_pages(int pool, int order) to create
vm_phys_alloc_freelist_pages(int flind, int pool, int order), and use
this function from both vm_page_alloc_freelist() and vm_phys_alloc_pages().
Reviewed by: alc
2010-07-21 09:27:00 +00:00
|
|
|
|
2013-05-13 15:40:51 +00:00
|
|
|
for (dom = 0; dom < vm_ndomains; dom++) {
|
|
|
|
domain = vm_rr_selectdomain();
|
|
|
|
for (flind = 0; flind < vm_nfreelists; flind++) {
|
|
|
|
m = vm_phys_alloc_domain_pages(domain, flind, pool,
|
|
|
|
order);
|
|
|
|
if (m != NULL)
|
|
|
|
return (m);
|
|
|
|
}
|
Redo the page table page allocation on MIPS, as suggested by
alc@.
The UMA zone based allocation is replaced by a scheme that creates
a new free page list for the KSEG0 region, and a new function
in sys/vm that allocates pages from a specific free page list.
This also fixes a race condition introduced by the UMA based page table
page allocation code. Dropping the page queue and pmap locks before
the call to uma_zfree, and re-acquiring them afterwards will introduce
a race condtion(noted by alc@).
The changes are :
- Revert the earlier changes in MIPS pmap.c that added UMA zone for
page table pages.
- Add a new freelist VM_FREELIST_HIGHMEM to MIPS vmparam.h for memory that
is not directly mapped (in 32bit kernel). Normal page allocations will first
try the HIGHMEM freelist and then the default(direct mapped) freelist.
- Add a new function 'vm_page_t vm_page_alloc_freelist(int flind, int
order, int req)' to vm/vm_page.c to allocate a page from a specified
freelist. The MIPS page table pages will be allocated using this function
from the freelist containing direct mapped pages.
- Move the page initialization code from vm_phys_alloc_contig() to a
new function vm_page_alloc_init(), and use this function to initialize
pages in vm_page_alloc_freelist() too.
- Split the function vm_phys_alloc_pages(int pool, int order) to create
vm_phys_alloc_freelist_pages(int flind, int pool, int order), and use
this function from both vm_page_alloc_freelist() and vm_phys_alloc_pages().
Reviewed by: alc
2010-07-21 09:27:00 +00:00
|
|
|
}
|
|
|
|
return (NULL);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Find and dequeue a free page on the given free list, with the
|
|
|
|
* specified pool and order
|
|
|
|
*/
|
|
|
|
vm_page_t
|
|
|
|
vm_phys_alloc_freelist_pages(int flind, int pool, int order)
|
2013-05-03 18:58:37 +00:00
|
|
|
{
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
vm_page_t m;
|
2013-05-13 15:40:51 +00:00
|
|
|
int dom, domain;
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
|
Redo the page table page allocation on MIPS, as suggested by
alc@.
The UMA zone based allocation is replaced by a scheme that creates
a new free page list for the KSEG0 region, and a new function
in sys/vm that allocates pages from a specific free page list.
This also fixes a race condition introduced by the UMA based page table
page allocation code. Dropping the page queue and pmap locks before
the call to uma_zfree, and re-acquiring them afterwards will introduce
a race condtion(noted by alc@).
The changes are :
- Revert the earlier changes in MIPS pmap.c that added UMA zone for
page table pages.
- Add a new freelist VM_FREELIST_HIGHMEM to MIPS vmparam.h for memory that
is not directly mapped (in 32bit kernel). Normal page allocations will first
try the HIGHMEM freelist and then the default(direct mapped) freelist.
- Add a new function 'vm_page_t vm_page_alloc_freelist(int flind, int
order, int req)' to vm/vm_page.c to allocate a page from a specified
freelist. The MIPS page table pages will be allocated using this function
from the freelist containing direct mapped pages.
- Move the page initialization code from vm_phys_alloc_contig() to a
new function vm_page_alloc_init(), and use this function to initialize
pages in vm_page_alloc_freelist() too.
- Split the function vm_phys_alloc_pages(int pool, int order) to create
vm_phys_alloc_freelist_pages(int flind, int pool, int order), and use
this function from both vm_page_alloc_freelist() and vm_phys_alloc_pages().
Reviewed by: alc
2010-07-21 09:27:00 +00:00
|
|
|
KASSERT(flind < VM_NFREELIST,
|
|
|
|
("vm_phys_alloc_freelist_pages: freelist %d is out of range", flind));
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
KASSERT(pool < VM_NFREEPOOL,
|
Redo the page table page allocation on MIPS, as suggested by
alc@.
The UMA zone based allocation is replaced by a scheme that creates
a new free page list for the KSEG0 region, and a new function
in sys/vm that allocates pages from a specific free page list.
This also fixes a race condition introduced by the UMA based page table
page allocation code. Dropping the page queue and pmap locks before
the call to uma_zfree, and re-acquiring them afterwards will introduce
a race condtion(noted by alc@).
The changes are :
- Revert the earlier changes in MIPS pmap.c that added UMA zone for
page table pages.
- Add a new freelist VM_FREELIST_HIGHMEM to MIPS vmparam.h for memory that
is not directly mapped (in 32bit kernel). Normal page allocations will first
try the HIGHMEM freelist and then the default(direct mapped) freelist.
- Add a new function 'vm_page_t vm_page_alloc_freelist(int flind, int
order, int req)' to vm/vm_page.c to allocate a page from a specified
freelist. The MIPS page table pages will be allocated using this function
from the freelist containing direct mapped pages.
- Move the page initialization code from vm_phys_alloc_contig() to a
new function vm_page_alloc_init(), and use this function to initialize
pages in vm_page_alloc_freelist() too.
- Split the function vm_phys_alloc_pages(int pool, int order) to create
vm_phys_alloc_freelist_pages(int flind, int pool, int order), and use
this function from both vm_page_alloc_freelist() and vm_phys_alloc_pages().
Reviewed by: alc
2010-07-21 09:27:00 +00:00
|
|
|
("vm_phys_alloc_freelist_pages: pool %d is out of range", pool));
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
KASSERT(order < VM_NFREEORDER,
|
Redo the page table page allocation on MIPS, as suggested by
alc@.
The UMA zone based allocation is replaced by a scheme that creates
a new free page list for the KSEG0 region, and a new function
in sys/vm that allocates pages from a specific free page list.
This also fixes a race condition introduced by the UMA based page table
page allocation code. Dropping the page queue and pmap locks before
the call to uma_zfree, and re-acquiring them afterwards will introduce
a race condtion(noted by alc@).
The changes are :
- Revert the earlier changes in MIPS pmap.c that added UMA zone for
page table pages.
- Add a new freelist VM_FREELIST_HIGHMEM to MIPS vmparam.h for memory that
is not directly mapped (in 32bit kernel). Normal page allocations will first
try the HIGHMEM freelist and then the default(direct mapped) freelist.
- Add a new function 'vm_page_t vm_page_alloc_freelist(int flind, int
order, int req)' to vm/vm_page.c to allocate a page from a specified
freelist. The MIPS page table pages will be allocated using this function
from the freelist containing direct mapped pages.
- Move the page initialization code from vm_phys_alloc_contig() to a
new function vm_page_alloc_init(), and use this function to initialize
pages in vm_page_alloc_freelist() too.
- Split the function vm_phys_alloc_pages(int pool, int order) to create
vm_phys_alloc_freelist_pages(int flind, int pool, int order), and use
this function from both vm_page_alloc_freelist() and vm_phys_alloc_pages().
Reviewed by: alc
2010-07-21 09:27:00 +00:00
|
|
|
("vm_phys_alloc_freelist_pages: order %d is out of range", order));
|
2010-07-27 20:33:50 +00:00
|
|
|
|
2013-05-13 15:40:51 +00:00
|
|
|
for (dom = 0; dom < vm_ndomains; dom++) {
|
|
|
|
domain = vm_rr_selectdomain();
|
|
|
|
m = vm_phys_alloc_domain_pages(domain, flind, pool, order);
|
|
|
|
if (m != NULL)
|
|
|
|
return (m);
|
|
|
|
}
|
|
|
|
return (NULL);
|
2013-05-03 18:58:37 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static vm_page_t
|
|
|
|
vm_phys_alloc_domain_pages(int domain, int flind, int pool, int order)
|
|
|
|
{
|
|
|
|
struct vm_freelist *fl;
|
|
|
|
struct vm_freelist *alt;
|
|
|
|
int oind, pind;
|
|
|
|
vm_page_t m;
|
|
|
|
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
mtx_assert(&vm_page_queue_free_mtx, MA_OWNED);
|
2013-05-13 15:40:51 +00:00
|
|
|
fl = &vm_phys_free_queues[domain][flind][pool][0];
|
Redo the page table page allocation on MIPS, as suggested by
alc@.
The UMA zone based allocation is replaced by a scheme that creates
a new free page list for the KSEG0 region, and a new function
in sys/vm that allocates pages from a specific free page list.
This also fixes a race condition introduced by the UMA based page table
page allocation code. Dropping the page queue and pmap locks before
the call to uma_zfree, and re-acquiring them afterwards will introduce
a race condtion(noted by alc@).
The changes are :
- Revert the earlier changes in MIPS pmap.c that added UMA zone for
page table pages.
- Add a new freelist VM_FREELIST_HIGHMEM to MIPS vmparam.h for memory that
is not directly mapped (in 32bit kernel). Normal page allocations will first
try the HIGHMEM freelist and then the default(direct mapped) freelist.
- Add a new function 'vm_page_t vm_page_alloc_freelist(int flind, int
order, int req)' to vm/vm_page.c to allocate a page from a specified
freelist. The MIPS page table pages will be allocated using this function
from the freelist containing direct mapped pages.
- Move the page initialization code from vm_phys_alloc_contig() to a
new function vm_page_alloc_init(), and use this function to initialize
pages in vm_page_alloc_freelist() too.
- Split the function vm_phys_alloc_pages(int pool, int order) to create
vm_phys_alloc_freelist_pages(int flind, int pool, int order), and use
this function from both vm_page_alloc_freelist() and vm_phys_alloc_pages().
Reviewed by: alc
2010-07-21 09:27:00 +00:00
|
|
|
for (oind = order; oind < VM_NFREEORDER; oind++) {
|
|
|
|
m = TAILQ_FIRST(&fl[oind].pl);
|
|
|
|
if (m != NULL) {
|
2013-05-13 15:40:51 +00:00
|
|
|
vm_freelist_rem(fl, m, oind);
|
Redo the page table page allocation on MIPS, as suggested by
alc@.
The UMA zone based allocation is replaced by a scheme that creates
a new free page list for the KSEG0 region, and a new function
in sys/vm that allocates pages from a specific free page list.
This also fixes a race condition introduced by the UMA based page table
page allocation code. Dropping the page queue and pmap locks before
the call to uma_zfree, and re-acquiring them afterwards will introduce
a race condtion(noted by alc@).
The changes are :
- Revert the earlier changes in MIPS pmap.c that added UMA zone for
page table pages.
- Add a new freelist VM_FREELIST_HIGHMEM to MIPS vmparam.h for memory that
is not directly mapped (in 32bit kernel). Normal page allocations will first
try the HIGHMEM freelist and then the default(direct mapped) freelist.
- Add a new function 'vm_page_t vm_page_alloc_freelist(int flind, int
order, int req)' to vm/vm_page.c to allocate a page from a specified
freelist. The MIPS page table pages will be allocated using this function
from the freelist containing direct mapped pages.
- Move the page initialization code from vm_phys_alloc_contig() to a
new function vm_page_alloc_init(), and use this function to initialize
pages in vm_page_alloc_freelist() too.
- Split the function vm_phys_alloc_pages(int pool, int order) to create
vm_phys_alloc_freelist_pages(int flind, int pool, int order), and use
this function from both vm_page_alloc_freelist() and vm_phys_alloc_pages().
Reviewed by: alc
2010-07-21 09:27:00 +00:00
|
|
|
vm_phys_split_pages(m, oind, fl, order);
|
|
|
|
return (m);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The given pool was empty. Find the largest
|
|
|
|
* contiguous, power-of-two-sized set of pages in any
|
|
|
|
* pool. Transfer these pages to the given pool, and
|
|
|
|
* use them to satisfy the allocation.
|
|
|
|
*/
|
|
|
|
for (oind = VM_NFREEORDER - 1; oind >= order; oind--) {
|
|
|
|
for (pind = 0; pind < VM_NFREEPOOL; pind++) {
|
2013-05-13 15:40:51 +00:00
|
|
|
alt = &vm_phys_free_queues[domain][flind][pind][0];
|
Redo the page table page allocation on MIPS, as suggested by
alc@.
The UMA zone based allocation is replaced by a scheme that creates
a new free page list for the KSEG0 region, and a new function
in sys/vm that allocates pages from a specific free page list.
This also fixes a race condition introduced by the UMA based page table
page allocation code. Dropping the page queue and pmap locks before
the call to uma_zfree, and re-acquiring them afterwards will introduce
a race condtion(noted by alc@).
The changes are :
- Revert the earlier changes in MIPS pmap.c that added UMA zone for
page table pages.
- Add a new freelist VM_FREELIST_HIGHMEM to MIPS vmparam.h for memory that
is not directly mapped (in 32bit kernel). Normal page allocations will first
try the HIGHMEM freelist and then the default(direct mapped) freelist.
- Add a new function 'vm_page_t vm_page_alloc_freelist(int flind, int
order, int req)' to vm/vm_page.c to allocate a page from a specified
freelist. The MIPS page table pages will be allocated using this function
from the freelist containing direct mapped pages.
- Move the page initialization code from vm_phys_alloc_contig() to a
new function vm_page_alloc_init(), and use this function to initialize
pages in vm_page_alloc_freelist() too.
- Split the function vm_phys_alloc_pages(int pool, int order) to create
vm_phys_alloc_freelist_pages(int flind, int pool, int order), and use
this function from both vm_page_alloc_freelist() and vm_phys_alloc_pages().
Reviewed by: alc
2010-07-21 09:27:00 +00:00
|
|
|
m = TAILQ_FIRST(&alt[oind].pl);
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
if (m != NULL) {
|
2013-05-13 15:40:51 +00:00
|
|
|
vm_freelist_rem(alt, m, oind);
|
Redo the page table page allocation on MIPS, as suggested by
alc@.
The UMA zone based allocation is replaced by a scheme that creates
a new free page list for the KSEG0 region, and a new function
in sys/vm that allocates pages from a specific free page list.
This also fixes a race condition introduced by the UMA based page table
page allocation code. Dropping the page queue and pmap locks before
the call to uma_zfree, and re-acquiring them afterwards will introduce
a race condtion(noted by alc@).
The changes are :
- Revert the earlier changes in MIPS pmap.c that added UMA zone for
page table pages.
- Add a new freelist VM_FREELIST_HIGHMEM to MIPS vmparam.h for memory that
is not directly mapped (in 32bit kernel). Normal page allocations will first
try the HIGHMEM freelist and then the default(direct mapped) freelist.
- Add a new function 'vm_page_t vm_page_alloc_freelist(int flind, int
order, int req)' to vm/vm_page.c to allocate a page from a specified
freelist. The MIPS page table pages will be allocated using this function
from the freelist containing direct mapped pages.
- Move the page initialization code from vm_phys_alloc_contig() to a
new function vm_page_alloc_init(), and use this function to initialize
pages in vm_page_alloc_freelist() too.
- Split the function vm_phys_alloc_pages(int pool, int order) to create
vm_phys_alloc_freelist_pages(int flind, int pool, int order), and use
this function from both vm_page_alloc_freelist() and vm_phys_alloc_pages().
Reviewed by: alc
2010-07-21 09:27:00 +00:00
|
|
|
vm_phys_set_pool(pool, m, oind);
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
vm_phys_split_pages(m, oind, fl, order);
|
|
|
|
return (m);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return (NULL);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Find the vm_page corresponding to the given physical address.
|
|
|
|
*/
|
|
|
|
vm_page_t
|
|
|
|
vm_phys_paddr_to_vm_page(vm_paddr_t pa)
|
|
|
|
{
|
|
|
|
struct vm_phys_seg *seg;
|
|
|
|
int segind;
|
|
|
|
|
|
|
|
for (segind = 0; segind < vm_phys_nsegs; segind++) {
|
|
|
|
seg = &vm_phys_segs[segind];
|
|
|
|
if (pa >= seg->start && pa < seg->end)
|
|
|
|
return (&seg->first_page[atop(pa - seg->start)]);
|
|
|
|
}
|
2009-06-18 20:42:37 +00:00
|
|
|
return (NULL);
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
}
|
|
|
|
|
2012-05-12 20:42:56 +00:00
|
|
|
vm_page_t
|
|
|
|
vm_phys_fictitious_to_vm_page(vm_paddr_t pa)
|
|
|
|
{
|
2014-07-09 08:12:58 +00:00
|
|
|
struct vm_phys_fictitious_seg tmp, *seg;
|
2012-05-12 20:42:56 +00:00
|
|
|
vm_page_t m;
|
|
|
|
|
|
|
|
m = NULL;
|
2014-07-09 08:12:58 +00:00
|
|
|
tmp.start = pa;
|
|
|
|
tmp.end = 0;
|
|
|
|
|
|
|
|
rw_rlock(&vm_phys_fictitious_reg_lock);
|
|
|
|
seg = RB_FIND(fict_tree, &vm_phys_fictitious_tree, &tmp);
|
|
|
|
rw_runlock(&vm_phys_fictitious_reg_lock);
|
|
|
|
if (seg == NULL)
|
|
|
|
return (NULL);
|
|
|
|
|
|
|
|
m = &seg->first_page[atop(pa - seg->start)];
|
|
|
|
KASSERT((m->flags & PG_FICTITIOUS) != 0, ("%p not fictitious", m));
|
|
|
|
|
2012-05-12 20:42:56 +00:00
|
|
|
return (m);
|
|
|
|
}
|
|
|
|
|
2014-08-05 10:29:01 +00:00
|
|
|
static inline void
|
|
|
|
vm_phys_fictitious_init_range(vm_page_t range, vm_paddr_t start,
|
|
|
|
long page_count, vm_memattr_t memattr)
|
|
|
|
{
|
|
|
|
long i;
|
|
|
|
|
|
|
|
for (i = 0; i < page_count; i++) {
|
|
|
|
vm_page_initfake(&range[i], start + PAGE_SIZE * i, memattr);
|
|
|
|
range[i].oflags &= ~VPO_UNMANAGED;
|
|
|
|
range[i].busy_lock = VPB_UNBUSIED;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2012-05-12 20:42:56 +00:00
|
|
|
int
|
|
|
|
vm_phys_fictitious_reg_range(vm_paddr_t start, vm_paddr_t end,
|
|
|
|
vm_memattr_t memattr)
|
|
|
|
{
|
|
|
|
struct vm_phys_fictitious_seg *seg;
|
|
|
|
vm_page_t fp;
|
2014-08-05 10:29:01 +00:00
|
|
|
long page_count;
|
2012-05-12 20:42:56 +00:00
|
|
|
#ifdef VM_PHYSSEG_DENSE
|
2014-08-05 10:29:01 +00:00
|
|
|
long pi, pe;
|
|
|
|
long dpage_count;
|
2012-05-12 20:42:56 +00:00
|
|
|
#endif
|
|
|
|
|
2014-08-05 10:29:01 +00:00
|
|
|
KASSERT(start < end,
|
|
|
|
("Start of segment isn't less than end (start: %jx end: %jx)",
|
|
|
|
(uintmax_t)start, (uintmax_t)end));
|
|
|
|
|
2012-05-12 20:42:56 +00:00
|
|
|
page_count = (end - start) / PAGE_SIZE;
|
|
|
|
|
|
|
|
#ifdef VM_PHYSSEG_DENSE
|
|
|
|
pi = atop(start);
|
2014-08-05 10:29:01 +00:00
|
|
|
pe = atop(end);
|
|
|
|
if (pi >= first_page && (pi - first_page) < vm_page_array_size) {
|
2012-05-12 20:42:56 +00:00
|
|
|
fp = &vm_page_array[pi - first_page];
|
2014-08-05 10:29:01 +00:00
|
|
|
if ((pe - first_page) > vm_page_array_size) {
|
|
|
|
/*
|
|
|
|
* We have a segment that starts inside
|
|
|
|
* of vm_page_array, but ends outside of it.
|
|
|
|
*
|
|
|
|
* Use vm_page_array pages for those that are
|
|
|
|
* inside of the vm_page_array range, and
|
|
|
|
* allocate the remaining ones.
|
|
|
|
*/
|
|
|
|
dpage_count = vm_page_array_size - (pi - first_page);
|
|
|
|
vm_phys_fictitious_init_range(fp, start, dpage_count,
|
|
|
|
memattr);
|
|
|
|
page_count -= dpage_count;
|
|
|
|
start += ptoa(dpage_count);
|
|
|
|
goto alloc;
|
|
|
|
}
|
|
|
|
/*
|
|
|
|
* We can allocate the full range from vm_page_array,
|
|
|
|
* so there's no need to register the range in the tree.
|
|
|
|
*/
|
|
|
|
vm_phys_fictitious_init_range(fp, start, page_count, memattr);
|
|
|
|
return (0);
|
|
|
|
} else if (pe > first_page && (pe - first_page) < vm_page_array_size) {
|
|
|
|
/*
|
|
|
|
* We have a segment that ends inside of vm_page_array,
|
|
|
|
* but starts outside of it.
|
|
|
|
*/
|
|
|
|
fp = &vm_page_array[0];
|
|
|
|
dpage_count = pe - first_page;
|
|
|
|
vm_phys_fictitious_init_range(fp, ptoa(first_page), dpage_count,
|
|
|
|
memattr);
|
|
|
|
end -= ptoa(dpage_count);
|
|
|
|
page_count -= dpage_count;
|
|
|
|
goto alloc;
|
|
|
|
} else if (pi < first_page && pe > (first_page + vm_page_array_size)) {
|
|
|
|
/*
|
|
|
|
* Trying to register a fictitious range that expands before
|
|
|
|
* and after vm_page_array.
|
|
|
|
*/
|
|
|
|
return (EINVAL);
|
|
|
|
} else {
|
|
|
|
alloc:
|
2012-05-12 20:42:56 +00:00
|
|
|
#endif
|
|
|
|
fp = malloc(page_count * sizeof(struct vm_page), M_FICT_PAGES,
|
|
|
|
M_WAITOK | M_ZERO);
|
2014-08-05 10:29:01 +00:00
|
|
|
#ifdef VM_PHYSSEG_DENSE
|
2012-05-12 20:42:56 +00:00
|
|
|
}
|
2014-08-05 10:29:01 +00:00
|
|
|
#endif
|
|
|
|
vm_phys_fictitious_init_range(fp, start, page_count, memattr);
|
2014-07-09 08:12:58 +00:00
|
|
|
|
|
|
|
seg = malloc(sizeof(*seg), M_FICT_PAGES, M_WAITOK | M_ZERO);
|
|
|
|
seg->start = start;
|
|
|
|
seg->end = end;
|
|
|
|
seg->first_page = fp;
|
|
|
|
|
|
|
|
rw_wlock(&vm_phys_fictitious_reg_lock);
|
|
|
|
RB_INSERT(fict_tree, &vm_phys_fictitious_tree, seg);
|
|
|
|
rw_wunlock(&vm_phys_fictitious_reg_lock);
|
|
|
|
|
|
|
|
return (0);
|
2012-05-12 20:42:56 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
vm_phys_fictitious_unreg_range(vm_paddr_t start, vm_paddr_t end)
|
|
|
|
{
|
2014-07-09 08:12:58 +00:00
|
|
|
struct vm_phys_fictitious_seg *seg, tmp;
|
2012-05-12 20:42:56 +00:00
|
|
|
#ifdef VM_PHYSSEG_DENSE
|
2014-08-05 10:29:01 +00:00
|
|
|
long pi, pe;
|
2012-05-12 20:42:56 +00:00
|
|
|
#endif
|
|
|
|
|
2014-08-05 10:29:01 +00:00
|
|
|
KASSERT(start < end,
|
|
|
|
("Start of segment isn't less than end (start: %jx end: %jx)",
|
|
|
|
(uintmax_t)start, (uintmax_t)end));
|
|
|
|
|
2012-05-12 20:42:56 +00:00
|
|
|
#ifdef VM_PHYSSEG_DENSE
|
|
|
|
pi = atop(start);
|
2014-08-05 10:29:01 +00:00
|
|
|
pe = atop(end);
|
|
|
|
if (pi >= first_page && (pi - first_page) < vm_page_array_size) {
|
|
|
|
if ((pe - first_page) <= vm_page_array_size) {
|
|
|
|
/*
|
|
|
|
* This segment was allocated using vm_page_array
|
|
|
|
* only, there's nothing to do since those pages
|
|
|
|
* were never added to the tree.
|
|
|
|
*/
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
/*
|
|
|
|
* We have a segment that starts inside
|
|
|
|
* of vm_page_array, but ends outside of it.
|
|
|
|
*
|
|
|
|
* Calculate how many pages were added to the
|
|
|
|
* tree and free them.
|
|
|
|
*/
|
|
|
|
start = ptoa(first_page + vm_page_array_size);
|
|
|
|
} else if (pe > first_page && (pe - first_page) < vm_page_array_size) {
|
|
|
|
/*
|
|
|
|
* We have a segment that ends inside of vm_page_array,
|
|
|
|
* but starts outside of it.
|
|
|
|
*/
|
|
|
|
end = ptoa(first_page);
|
|
|
|
} else if (pi < first_page && pe > (first_page + vm_page_array_size)) {
|
|
|
|
/* Since it's not possible to register such a range, panic. */
|
|
|
|
panic(
|
|
|
|
"Unregistering not registered fictitious range [%#jx:%#jx]",
|
|
|
|
(uintmax_t)start, (uintmax_t)end);
|
|
|
|
}
|
2012-05-12 20:42:56 +00:00
|
|
|
#endif
|
2014-07-09 08:12:58 +00:00
|
|
|
tmp.start = start;
|
|
|
|
tmp.end = 0;
|
|
|
|
|
|
|
|
rw_wlock(&vm_phys_fictitious_reg_lock);
|
|
|
|
seg = RB_FIND(fict_tree, &vm_phys_fictitious_tree, &tmp);
|
|
|
|
if (seg->start != start || seg->end != end) {
|
|
|
|
rw_wunlock(&vm_phys_fictitious_reg_lock);
|
|
|
|
panic(
|
|
|
|
"Unregistering not registered fictitious range [%#jx:%#jx]",
|
|
|
|
(uintmax_t)start, (uintmax_t)end);
|
|
|
|
}
|
|
|
|
RB_REMOVE(fict_tree, &vm_phys_fictitious_tree, seg);
|
|
|
|
rw_wunlock(&vm_phys_fictitious_reg_lock);
|
2014-08-05 10:29:01 +00:00
|
|
|
free(seg->first_page, M_FICT_PAGES);
|
2014-07-09 08:12:58 +00:00
|
|
|
free(seg, M_FICT_PAGES);
|
2012-05-12 20:42:56 +00:00
|
|
|
}
|
|
|
|
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
/*
|
|
|
|
* Find the segment containing the given physical address.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
vm_phys_paddr_to_segind(vm_paddr_t pa)
|
|
|
|
{
|
|
|
|
struct vm_phys_seg *seg;
|
|
|
|
int segind;
|
|
|
|
|
|
|
|
for (segind = 0; segind < vm_phys_nsegs; segind++) {
|
|
|
|
seg = &vm_phys_segs[segind];
|
|
|
|
if (pa >= seg->start && pa < seg->end)
|
|
|
|
return (segind);
|
|
|
|
}
|
|
|
|
panic("vm_phys_paddr_to_segind: paddr %#jx is not in any segment" ,
|
|
|
|
(uintmax_t)pa);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Free a contiguous, power of two-sized set of physical pages.
|
2007-07-14 21:21:17 +00:00
|
|
|
*
|
|
|
|
* The free page queues must be locked.
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
*/
|
|
|
|
void
|
|
|
|
vm_phys_free_pages(vm_page_t m, int order)
|
|
|
|
{
|
|
|
|
struct vm_freelist *fl;
|
|
|
|
struct vm_phys_seg *seg;
|
2011-10-30 05:06:14 +00:00
|
|
|
vm_paddr_t pa;
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
vm_page_t m_buddy;
|
|
|
|
|
|
|
|
KASSERT(m->order == VM_NFREEORDER,
|
2007-07-14 21:21:17 +00:00
|
|
|
("vm_phys_free_pages: page %p has unexpected order %d",
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
m, m->order));
|
|
|
|
KASSERT(m->pool < VM_NFREEPOOL,
|
2007-07-14 21:21:17 +00:00
|
|
|
("vm_phys_free_pages: page %p has unexpected pool %d",
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
m, m->pool));
|
|
|
|
KASSERT(order < VM_NFREEORDER,
|
2007-07-14 21:21:17 +00:00
|
|
|
("vm_phys_free_pages: order %d is out of range", order));
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
mtx_assert(&vm_page_queue_free_mtx, MA_OWNED);
|
|
|
|
seg = &vm_phys_segs[m->segind];
|
2011-10-30 05:06:14 +00:00
|
|
|
if (order < VM_NFREEORDER - 1) {
|
|
|
|
pa = VM_PAGE_TO_PHYS(m);
|
|
|
|
do {
|
|
|
|
pa ^= ((vm_paddr_t)1 << (PAGE_SHIFT + order));
|
|
|
|
if (pa < seg->start || pa >= seg->end)
|
|
|
|
break;
|
|
|
|
m_buddy = &seg->first_page[atop(pa - seg->start)];
|
|
|
|
if (m_buddy->order != order)
|
|
|
|
break;
|
|
|
|
fl = (*seg->free_queues)[m_buddy->pool];
|
2013-05-13 15:40:51 +00:00
|
|
|
vm_freelist_rem(fl, m_buddy, order);
|
2011-10-30 05:06:14 +00:00
|
|
|
if (m_buddy->pool != m->pool)
|
|
|
|
vm_phys_set_pool(m->pool, m_buddy, order);
|
|
|
|
order++;
|
|
|
|
pa &= ~(((vm_paddr_t)1 << (PAGE_SHIFT + order)) - 1);
|
|
|
|
m = &seg->first_page[atop(pa - seg->start)];
|
|
|
|
} while (order < VM_NFREEORDER - 1);
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
}
|
|
|
|
fl = (*seg->free_queues)[m->pool];
|
2013-05-13 15:40:51 +00:00
|
|
|
vm_freelist_add(fl, m, order, 1);
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
}
|
|
|
|
|
2011-10-30 05:06:14 +00:00
|
|
|
/*
|
|
|
|
* Free a contiguous, arbitrarily sized set of physical pages.
|
|
|
|
*
|
|
|
|
* The free page queues must be locked.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
vm_phys_free_contig(vm_page_t m, u_long npages)
|
|
|
|
{
|
|
|
|
u_int n;
|
|
|
|
int order;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Avoid unnecessary coalescing by freeing the pages in the largest
|
|
|
|
* possible power-of-two-sized subsets.
|
|
|
|
*/
|
|
|
|
mtx_assert(&vm_page_queue_free_mtx, MA_OWNED);
|
|
|
|
for (;; npages -= n) {
|
|
|
|
/*
|
|
|
|
* Unsigned "min" is used here so that "order" is assigned
|
|
|
|
* "VM_NFREEORDER - 1" when "m"'s physical address is zero
|
|
|
|
* or the low-order bits of its physical address are zero
|
|
|
|
* because the size of a physical address exceeds the size of
|
|
|
|
* a long.
|
|
|
|
*/
|
|
|
|
order = min(ffsl(VM_PAGE_TO_PHYS(m) >> PAGE_SHIFT) - 1,
|
|
|
|
VM_NFREEORDER - 1);
|
|
|
|
n = 1 << order;
|
|
|
|
if (npages < n)
|
|
|
|
break;
|
|
|
|
vm_phys_free_pages(m, order);
|
|
|
|
m += n;
|
|
|
|
}
|
|
|
|
/* The residual "npages" is less than "1 << (VM_NFREEORDER - 1)". */
|
|
|
|
for (; npages > 0; npages -= n) {
|
|
|
|
order = flsl(npages) - 1;
|
|
|
|
n = 1 << order;
|
|
|
|
vm_phys_free_pages(m, order);
|
|
|
|
m += n;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
/*
|
|
|
|
* Set the pool for a contiguous, power of two-sized set of physical pages.
|
|
|
|
*/
|
Change the management of cached pages (PQ_CACHE) in two fundamental
ways:
(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.
This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.
Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.
(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.
Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.
Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
|
|
|
void
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
vm_phys_set_pool(int pool, vm_page_t m, int order)
|
|
|
|
{
|
|
|
|
vm_page_t m_tmp;
|
|
|
|
|
|
|
|
for (m_tmp = m; m_tmp < &m[1 << order]; m_tmp++)
|
|
|
|
m_tmp->pool = pool;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2007-12-21 02:44:31 +00:00
|
|
|
* Search for the given physical page "m" in the free lists. If the search
|
|
|
|
* succeeds, remove "m" from the free lists and return TRUE. Otherwise, return
|
|
|
|
* FALSE, indicating that "m" is not in the free lists.
|
Change the management of cached pages (PQ_CACHE) in two fundamental
ways:
(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.
This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.
Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.
(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.
Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.
Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
|
|
|
*
|
|
|
|
* The free page queues must be locked.
|
|
|
|
*/
|
2007-12-20 22:45:54 +00:00
|
|
|
boolean_t
|
Change the management of cached pages (PQ_CACHE) in two fundamental
ways:
(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.
This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.
Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.
(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.
Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.
Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
|
|
|
vm_phys_unfree_page(vm_page_t m)
|
|
|
|
{
|
|
|
|
struct vm_freelist *fl;
|
|
|
|
struct vm_phys_seg *seg;
|
|
|
|
vm_paddr_t pa, pa_half;
|
|
|
|
vm_page_t m_set, m_tmp;
|
|
|
|
int order;
|
|
|
|
|
|
|
|
mtx_assert(&vm_page_queue_free_mtx, MA_OWNED);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* First, find the contiguous, power of two-sized set of free
|
|
|
|
* physical pages containing the given physical page "m" and
|
|
|
|
* assign it to "m_set".
|
|
|
|
*/
|
|
|
|
seg = &vm_phys_segs[m->segind];
|
|
|
|
for (m_set = m, order = 0; m_set->order == VM_NFREEORDER &&
|
2007-12-19 23:09:45 +00:00
|
|
|
order < VM_NFREEORDER - 1; ) {
|
Change the management of cached pages (PQ_CACHE) in two fundamental
ways:
(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.
This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.
Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.
(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.
Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.
Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
|
|
|
order++;
|
|
|
|
pa = m->phys_addr & (~(vm_paddr_t)0 << (PAGE_SHIFT + order));
|
2008-04-05 05:02:53 +00:00
|
|
|
if (pa >= seg->start)
|
2007-12-20 22:45:54 +00:00
|
|
|
m_set = &seg->first_page[atop(pa - seg->start)];
|
|
|
|
else
|
|
|
|
return (FALSE);
|
Change the management of cached pages (PQ_CACHE) in two fundamental
ways:
(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.
This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.
Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.
(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.
Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.
Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
|
|
|
}
|
2007-12-20 22:45:54 +00:00
|
|
|
if (m_set->order < order)
|
|
|
|
return (FALSE);
|
|
|
|
if (m_set->order == VM_NFREEORDER)
|
|
|
|
return (FALSE);
|
Change the management of cached pages (PQ_CACHE) in two fundamental
ways:
(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.
This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.
Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.
(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.
Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.
Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
|
|
|
KASSERT(m_set->order < VM_NFREEORDER,
|
|
|
|
("vm_phys_unfree_page: page %p has unexpected order %d",
|
|
|
|
m_set, m_set->order));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Next, remove "m_set" from the free lists. Finally, extract
|
|
|
|
* "m" from "m_set" using an iterative algorithm: While "m_set"
|
|
|
|
* is larger than a page, shrink "m_set" by returning the half
|
|
|
|
* of "m_set" that does not contain "m" to the free lists.
|
|
|
|
*/
|
|
|
|
fl = (*seg->free_queues)[m_set->pool];
|
|
|
|
order = m_set->order;
|
2013-05-13 15:40:51 +00:00
|
|
|
vm_freelist_rem(fl, m_set, order);
|
Change the management of cached pages (PQ_CACHE) in two fundamental
ways:
(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.
This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.
Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.
(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.
Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.
Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
|
|
|
while (order > 0) {
|
|
|
|
order--;
|
|
|
|
pa_half = m_set->phys_addr ^ (1 << (PAGE_SHIFT + order));
|
|
|
|
if (m->phys_addr < pa_half)
|
|
|
|
m_tmp = &seg->first_page[atop(pa_half - seg->start)];
|
|
|
|
else {
|
|
|
|
m_tmp = m_set;
|
|
|
|
m_set = &seg->first_page[atop(pa_half - seg->start)];
|
|
|
|
}
|
2013-05-13 15:40:51 +00:00
|
|
|
vm_freelist_add(fl, m_tmp, order, 0);
|
Change the management of cached pages (PQ_CACHE) in two fundamental
ways:
(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.
This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.
Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.
(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.
Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.
Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
|
|
|
}
|
|
|
|
KASSERT(m_set == m, ("vm_phys_unfree_page: fatal inconsistency"));
|
2007-12-20 22:45:54 +00:00
|
|
|
return (TRUE);
|
Change the management of cached pages (PQ_CACHE) in two fundamental
ways:
(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.
This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.
Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.
(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.
Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.
Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Try to zero one physical page. Used by an idle priority thread.
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
*/
|
|
|
|
boolean_t
|
|
|
|
vm_phys_zero_pages_idle(void)
|
|
|
|
{
|
2013-05-13 15:40:51 +00:00
|
|
|
static struct vm_freelist *fl;
|
Change the management of cached pages (PQ_CACHE) in two fundamental
ways:
(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.
This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.
Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.
(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.
Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.
Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
|
|
|
static int flind, oind, pind;
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
vm_page_t m, m_tmp;
|
2013-05-13 15:40:51 +00:00
|
|
|
int domain;
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
|
2013-05-13 15:40:51 +00:00
|
|
|
domain = vm_rr_selectdomain();
|
|
|
|
fl = vm_phys_free_queues[domain][0][0];
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
mtx_assert(&vm_page_queue_free_mtx, MA_OWNED);
|
Change the management of cached pages (PQ_CACHE) in two fundamental
ways:
(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.
This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.
Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.
(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.
Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.
Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
|
|
|
for (;;) {
|
2013-08-10 17:36:42 +00:00
|
|
|
TAILQ_FOREACH_REVERSE(m, &fl[oind].pl, pglist, plinks.q) {
|
Change the management of cached pages (PQ_CACHE) in two fundamental
ways:
(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.
This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.
Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.
(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.
Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.
Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
|
|
|
for (m_tmp = m; m_tmp < &m[1 << oind]; m_tmp++) {
|
|
|
|
if ((m_tmp->flags & (PG_CACHED | PG_ZERO)) == 0) {
|
|
|
|
vm_phys_unfree_page(m_tmp);
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
vm_phys_freecnt_adj(m, -1);
|
Change the management of cached pages (PQ_CACHE) in two fundamental
ways:
(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.
This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.
Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.
(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.
Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.
Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
|
|
|
mtx_unlock(&vm_page_queue_free_mtx);
|
|
|
|
pmap_zero_page_idle(m_tmp);
|
|
|
|
m_tmp->flags |= PG_ZERO;
|
|
|
|
mtx_lock(&vm_page_queue_free_mtx);
|
Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
|
|
|
vm_phys_freecnt_adj(m, 1);
|
Change the management of cached pages (PQ_CACHE) in two fundamental
ways:
(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.
This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.
Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.
(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.
Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.
Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
|
|
|
vm_phys_free_pages(m_tmp, 0);
|
|
|
|
vm_page_zero_count++;
|
|
|
|
cnt_prezero++;
|
|
|
|
return (TRUE);
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
Change the management of cached pages (PQ_CACHE) in two fundamental
ways:
(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.
This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.
Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.
(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.
Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.
Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
|
|
|
oind++;
|
|
|
|
if (oind == VM_NFREEORDER) {
|
|
|
|
oind = 0;
|
|
|
|
pind++;
|
|
|
|
if (pind == VM_NFREEPOOL) {
|
|
|
|
pind = 0;
|
|
|
|
flind++;
|
|
|
|
if (flind == vm_nfreelists)
|
|
|
|
flind = 0;
|
|
|
|
}
|
2013-05-13 15:40:51 +00:00
|
|
|
fl = vm_phys_free_queues[domain][flind][pind];
|
Change the management of cached pages (PQ_CACHE) in two fundamental
ways:
(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.
This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.
Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.
(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.
Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.
Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
|
|
|
}
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2007-06-16 05:25:53 +00:00
|
|
|
* Allocate a contiguous set of physical pages of the given size
|
|
|
|
* "npages" from the free lists. All of the physical pages must be at
|
|
|
|
* or above the given physical address "low" and below the given
|
|
|
|
* physical address "high". The given value "alignment" determines the
|
|
|
|
* alignment of the first physical page in the set. If the given value
|
|
|
|
* "boundary" is non-zero, then the set of physical pages cannot cross
|
|
|
|
* any physical address boundary that is a multiple of that value. Both
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
* "alignment" and "boundary" must be a power of two.
|
|
|
|
*/
|
|
|
|
vm_page_t
|
2011-10-30 05:06:14 +00:00
|
|
|
vm_phys_alloc_contig(u_long npages, vm_paddr_t low, vm_paddr_t high,
|
|
|
|
u_long alignment, vm_paddr_t boundary)
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
{
|
|
|
|
struct vm_freelist *fl;
|
|
|
|
struct vm_phys_seg *seg;
|
|
|
|
vm_paddr_t pa, pa_last, size;
|
Refactor the code that performs physically contiguous memory allocation,
yielding a new public interface, vm_page_alloc_contig(). This new function
addresses some of the limitations of the current interfaces, contigmalloc()
and kmem_alloc_contig(). For example, the physically contiguous memory that
is allocated with those interfaces can only be allocated to the kernel vm
object and must be mapped into the kernel virtual address space. It also
provides functionality that vm_phys_alloc_contig() doesn't, such as wiring
the returned pages. Moreover, unlike that function, it respects the low
water marks on the paging queues and wakes up the page daemon when
necessary. That said, at present, this new function can't be applied to all
types of vm objects. However, that restriction will be eliminated in the
coming weeks.
From a design standpoint, this change also addresses an inconsistency
between vm_phys_alloc_contig() and the other vm_phys_alloc*() functions.
Specifically, vm_phys_alloc_contig() manipulated vm_page fields that other
functions in vm/vm_phys.c didn't. Moreover, vm_phys_alloc_contig() knew
about vnodes and reservations. Now, vm_page_alloc_contig() is responsible
for these things.
Reviewed by: kib
Discussed with: jhb
2011-11-16 16:46:09 +00:00
|
|
|
vm_page_t m, m_ret;
|
2011-10-30 05:06:14 +00:00
|
|
|
u_long npages_end;
|
2013-05-13 15:40:51 +00:00
|
|
|
int dom, domain, flind, oind, order, pind;
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
|
Refactor the code that performs physically contiguous memory allocation,
yielding a new public interface, vm_page_alloc_contig(). This new function
addresses some of the limitations of the current interfaces, contigmalloc()
and kmem_alloc_contig(). For example, the physically contiguous memory that
is allocated with those interfaces can only be allocated to the kernel vm
object and must be mapped into the kernel virtual address space. It also
provides functionality that vm_phys_alloc_contig() doesn't, such as wiring
the returned pages. Moreover, unlike that function, it respects the low
water marks on the paging queues and wakes up the page daemon when
necessary. That said, at present, this new function can't be applied to all
types of vm objects. However, that restriction will be eliminated in the
coming weeks.
From a design standpoint, this change also addresses an inconsistency
between vm_phys_alloc_contig() and the other vm_phys_alloc*() functions.
Specifically, vm_phys_alloc_contig() manipulated vm_page fields that other
functions in vm/vm_phys.c didn't. Moreover, vm_phys_alloc_contig() knew
about vnodes and reservations. Now, vm_page_alloc_contig() is responsible
for these things.
Reviewed by: kib
Discussed with: jhb
2011-11-16 16:46:09 +00:00
|
|
|
mtx_assert(&vm_page_queue_free_mtx, MA_OWNED);
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
size = npages << PAGE_SHIFT;
|
|
|
|
KASSERT(size != 0,
|
|
|
|
("vm_phys_alloc_contig: size must not be 0"));
|
|
|
|
KASSERT((alignment & (alignment - 1)) == 0,
|
|
|
|
("vm_phys_alloc_contig: alignment must be a power of 2"));
|
|
|
|
KASSERT((boundary & (boundary - 1)) == 0,
|
|
|
|
("vm_phys_alloc_contig: boundary must be a power of 2"));
|
|
|
|
/* Compute the queue that is the best fit for npages. */
|
|
|
|
for (order = 0; (1 << order) < npages; order++);
|
2013-05-13 15:40:51 +00:00
|
|
|
dom = 0;
|
|
|
|
restartdom:
|
|
|
|
domain = vm_rr_selectdomain();
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
for (flind = 0; flind < vm_nfreelists; flind++) {
|
|
|
|
for (oind = min(order, VM_NFREEORDER - 1); oind < VM_NFREEORDER; oind++) {
|
|
|
|
for (pind = 0; pind < VM_NFREEPOOL; pind++) {
|
2013-05-13 15:40:51 +00:00
|
|
|
fl = &vm_phys_free_queues[domain][flind][pind][0];
|
2013-08-10 17:36:42 +00:00
|
|
|
TAILQ_FOREACH(m_ret, &fl[oind].pl, plinks.q) {
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
/*
|
|
|
|
* A free list may contain physical pages
|
|
|
|
* from one or more segments.
|
|
|
|
*/
|
|
|
|
seg = &vm_phys_segs[m_ret->segind];
|
|
|
|
if (seg->start > high ||
|
|
|
|
low >= seg->end)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Is the size of this allocation request
|
|
|
|
* larger than the largest block size?
|
|
|
|
*/
|
|
|
|
if (order >= VM_NFREEORDER) {
|
|
|
|
/*
|
|
|
|
* Determine if a sufficient number
|
|
|
|
* of subsequent blocks to satisfy
|
|
|
|
* the allocation request are free.
|
|
|
|
*/
|
|
|
|
pa = VM_PAGE_TO_PHYS(m_ret);
|
|
|
|
pa_last = pa + size;
|
|
|
|
for (;;) {
|
|
|
|
pa += 1 << (PAGE_SHIFT + VM_NFREEORDER - 1);
|
|
|
|
if (pa >= pa_last)
|
|
|
|
break;
|
|
|
|
if (pa < seg->start ||
|
|
|
|
pa >= seg->end)
|
|
|
|
break;
|
|
|
|
m = &seg->first_page[atop(pa - seg->start)];
|
|
|
|
if (m->order != VM_NFREEORDER - 1)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
/* If not, continue to the next block. */
|
|
|
|
if (pa < pa_last)
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Determine if the blocks are within the given range,
|
|
|
|
* satisfy the given alignment, and do not cross the
|
|
|
|
* given boundary.
|
|
|
|
*/
|
|
|
|
pa = VM_PAGE_TO_PHYS(m_ret);
|
|
|
|
if (pa >= low &&
|
|
|
|
pa + size <= high &&
|
|
|
|
(pa & (alignment - 1)) == 0 &&
|
|
|
|
((pa ^ (pa + size - 1)) & ~(boundary - 1)) == 0)
|
|
|
|
goto done;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2013-05-13 15:40:51 +00:00
|
|
|
if (++dom < vm_ndomains)
|
|
|
|
goto restartdom;
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
return (NULL);
|
|
|
|
done:
|
|
|
|
for (m = m_ret; m < &m_ret[npages]; m = &m[1 << oind]) {
|
|
|
|
fl = (*seg->free_queues)[m->pool];
|
2013-05-13 15:40:51 +00:00
|
|
|
vm_freelist_rem(fl, m, m->order);
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
}
|
|
|
|
if (m_ret->pool != VM_FREEPOOL_DEFAULT)
|
|
|
|
vm_phys_set_pool(VM_FREEPOOL_DEFAULT, m_ret, oind);
|
|
|
|
fl = (*seg->free_queues)[m_ret->pool];
|
|
|
|
vm_phys_split_pages(m_ret, oind, fl, order);
|
2011-10-30 05:06:14 +00:00
|
|
|
/* Return excess pages to the free lists. */
|
|
|
|
npages_end = roundup2(npages, 1 << imin(oind, order));
|
|
|
|
if (npages < npages_end)
|
|
|
|
vm_phys_free_contig(&m_ret[npages], npages_end - npages);
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
return (m_ret);
|
|
|
|
}
|
|
|
|
|
|
|
|
#ifdef DDB
|
|
|
|
/*
|
|
|
|
* Show the number of physical pages in each of the free lists.
|
|
|
|
*/
|
|
|
|
DB_SHOW_COMMAND(freepages, db_show_freepages)
|
|
|
|
{
|
|
|
|
struct vm_freelist *fl;
|
2013-05-13 15:40:51 +00:00
|
|
|
int flind, oind, pind, dom;
|
|
|
|
|
|
|
|
for (dom = 0; dom < vm_ndomains; dom++) {
|
|
|
|
db_printf("DOMAIN: %d\n", dom);
|
|
|
|
for (flind = 0; flind < vm_nfreelists; flind++) {
|
|
|
|
db_printf("FREE LIST %d:\n"
|
|
|
|
"\n ORDER (SIZE) | NUMBER"
|
|
|
|
"\n ", flind);
|
|
|
|
for (pind = 0; pind < VM_NFREEPOOL; pind++)
|
|
|
|
db_printf(" | POOL %d", pind);
|
|
|
|
db_printf("\n-- ");
|
|
|
|
for (pind = 0; pind < VM_NFREEPOOL; pind++)
|
|
|
|
db_printf("-- -- ");
|
|
|
|
db_printf("--\n");
|
|
|
|
for (oind = VM_NFREEORDER - 1; oind >= 0; oind--) {
|
|
|
|
db_printf(" %2.2d (%6.6dK)", oind,
|
|
|
|
1 << (PAGE_SHIFT - 10 + oind));
|
|
|
|
for (pind = 0; pind < VM_NFREEPOOL; pind++) {
|
|
|
|
fl = vm_phys_free_queues[dom][flind][pind];
|
|
|
|
db_printf(" | %6.6d", fl[oind].lcnt);
|
|
|
|
}
|
|
|
|
db_printf("\n");
|
Add a new physical memory allocator. However, do not yet connect it
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
2007-06-10 00:49:16 +00:00
|
|
|
}
|
|
|
|
db_printf("\n");
|
|
|
|
}
|
|
|
|
db_printf("\n");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
#endif
|