2005-01-07 02:29:27 +00:00
|
|
|
/*-
|
2017-11-30 15:48:35 +00:00
|
|
|
* SPDX-License-Identifier: (BSD-3-Clause AND MIT-CMU)
|
2017-11-20 19:43:44 +00:00
|
|
|
*
|
1994-05-24 10:09:53 +00:00
|
|
|
* Copyright (c) 1991, 1993
|
|
|
|
* The Regents of the University of California. All rights reserved.
|
|
|
|
*
|
|
|
|
* This code is derived from software contributed to Berkeley by
|
|
|
|
* The Mach Operating System project at Carnegie-Mellon University.
|
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
2017-02-28 23:42:47 +00:00
|
|
|
* 3. Neither the name of the University nor the names of its contributors
|
1994-05-24 10:09:53 +00:00
|
|
|
* may be used to endorse or promote products derived from this software
|
|
|
|
* without specific prior written permission.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
|
|
|
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
|
|
|
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
|
|
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
|
|
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
|
|
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
|
|
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
* SUCH DAMAGE.
|
|
|
|
*
|
1994-08-02 07:55:43 +00:00
|
|
|
* from: @(#)vm_kern.c 8.3 (Berkeley) 1/12/94
|
1994-05-24 10:09:53 +00:00
|
|
|
*
|
|
|
|
*
|
|
|
|
* Copyright (c) 1987, 1990 Carnegie-Mellon University.
|
|
|
|
* All rights reserved.
|
|
|
|
*
|
|
|
|
* Authors: Avadis Tevanian, Jr., Michael Wayne Young
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
*
|
1994-05-24 10:09:53 +00:00
|
|
|
* Permission to use, copy, modify and distribute this software and
|
|
|
|
* its documentation is hereby granted, provided that both the copyright
|
|
|
|
* notice and this permission notice appear in all copies of the
|
|
|
|
* software, derivative works or modified versions, and any portions
|
|
|
|
* thereof, and that both notices appear in supporting documentation.
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
*
|
|
|
|
* CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
|
|
|
|
* CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
|
1994-05-24 10:09:53 +00:00
|
|
|
* FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
*
|
1994-05-24 10:09:53 +00:00
|
|
|
* Carnegie Mellon requests users of this software to return to
|
|
|
|
*
|
|
|
|
* Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU
|
|
|
|
* School of Computer Science
|
|
|
|
* Carnegie Mellon University
|
|
|
|
* Pittsburgh PA 15213-3890
|
|
|
|
*
|
|
|
|
* any improvements or extensions that they make and grant Carnegie the
|
|
|
|
* rights to redistribute these changes.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Kernel memory management.
|
|
|
|
*/
|
|
|
|
|
2003-06-11 23:50:51 +00:00
|
|
|
#include <sys/cdefs.h>
|
|
|
|
__FBSDID("$FreeBSD$");
|
|
|
|
|
2018-01-12 23:13:55 +00:00
|
|
|
#include "opt_vm.h"
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <sys/param.h>
|
|
|
|
#include <sys/systm.h>
|
2001-12-01 00:21:30 +00:00
|
|
|
#include <sys/kernel.h> /* for ticks and hz */
|
2018-01-12 23:13:55 +00:00
|
|
|
#include <sys/domainset.h>
|
2007-04-05 20:52:51 +00:00
|
|
|
#include <sys/eventhandler.h>
|
2001-05-01 08:13:21 +00:00
|
|
|
#include <sys/lock.h>
|
1994-08-18 22:36:09 +00:00
|
|
|
#include <sys/proc.h>
|
1995-02-02 09:09:15 +00:00
|
|
|
#include <sys/malloc.h>
|
2013-03-09 02:32:23 +00:00
|
|
|
#include <sys/rwlock.h>
|
2009-02-23 23:00:12 +00:00
|
|
|
#include <sys/sysctl.h>
|
2013-08-07 06:21:20 +00:00
|
|
|
#include <sys/vmem.h>
|
2018-01-12 23:13:55 +00:00
|
|
|
#include <sys/vmmeter.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
|
|
|
|
#include <vm/vm.h>
|
1995-12-07 12:48:31 +00:00
|
|
|
#include <vm/vm_param.h>
|
2018-01-12 23:13:55 +00:00
|
|
|
#include <vm/vm_domainset.h>
|
2013-08-07 06:21:20 +00:00
|
|
|
#include <vm/vm_kern.h>
|
1995-12-07 12:48:31 +00:00
|
|
|
#include <vm/pmap.h>
|
|
|
|
#include <vm/vm_map.h>
|
|
|
|
#include <vm/vm_object.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <vm/vm_page.h>
|
|
|
|
#include <vm/vm_pageout.h>
|
2018-01-12 23:13:55 +00:00
|
|
|
#include <vm/vm_phys.h>
|
2018-02-06 22:10:07 +00:00
|
|
|
#include <vm/vm_pagequeue.h>
|
2017-08-15 16:39:49 +00:00
|
|
|
#include <vm/vm_radix.h>
|
1995-12-10 14:52:10 +00:00
|
|
|
#include <vm/vm_extern.h>
|
2007-04-05 20:52:51 +00:00
|
|
|
#include <vm/uma.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2013-03-14 19:50:09 +00:00
|
|
|
vm_map_t kernel_map;
|
|
|
|
vm_map_t exec_map;
|
2003-08-11 05:51:51 +00:00
|
|
|
vm_map_t pipe_map;
|
1994-08-18 22:36:09 +00:00
|
|
|
|
2011-05-13 18:48:00 +00:00
|
|
|
const void *zero_region;
|
|
|
|
CTASSERT((ZERO_REGION_SIZE & PAGE_MASK) == 0);
|
|
|
|
|
2015-11-12 22:00:59 +00:00
|
|
|
/* NB: Used by kernel debuggers. */
|
|
|
|
const u_long vm_maxuser_address = VM_MAXUSER_ADDRESS;
|
|
|
|
|
2017-01-05 01:44:12 +00:00
|
|
|
u_int exec_map_entry_size;
|
|
|
|
u_int exec_map_entries;
|
|
|
|
|
2013-02-04 09:35:48 +00:00
|
|
|
SYSCTL_ULONG(_vm, OID_AUTO, min_kernel_address, CTLFLAG_RD,
|
2014-10-21 07:31:21 +00:00
|
|
|
SYSCTL_NULL_ULONG_PTR, VM_MIN_KERNEL_ADDRESS, "Min kernel address");
|
2013-02-04 09:35:48 +00:00
|
|
|
|
|
|
|
SYSCTL_ULONG(_vm, OID_AUTO, max_kernel_address, CTLFLAG_RD,
|
2013-02-18 01:02:48 +00:00
|
|
|
#if defined(__arm__) || defined(__sparc64__)
|
2013-02-04 09:35:48 +00:00
|
|
|
&vm_max_kernel_address, 0,
|
|
|
|
#else
|
2014-10-21 07:31:21 +00:00
|
|
|
SYSCTL_NULL_ULONG_PTR, VM_MAX_KERNEL_ADDRESS,
|
2013-02-04 09:35:48 +00:00
|
|
|
#endif
|
|
|
|
"Max kernel address");
|
|
|
|
|
2018-09-19 19:13:43 +00:00
|
|
|
#if VM_NRESERVLEVEL > 0
|
2018-09-20 15:45:12 +00:00
|
|
|
#define KVA_QUANTUM_SHIFT (VM_LEVEL_0_ORDER + PAGE_SHIFT)
|
2018-09-19 19:13:43 +00:00
|
|
|
#else
|
2019-02-25 19:22:13 +00:00
|
|
|
/* On non-superpage architectures we want large import sizes. */
|
|
|
|
#define KVA_QUANTUM_SHIFT (8 + PAGE_SHIFT)
|
2018-09-19 19:13:43 +00:00
|
|
|
#endif
|
2018-09-20 15:45:12 +00:00
|
|
|
#define KVA_QUANTUM (1 << KVA_QUANTUM_SHIFT)
|
2018-09-19 19:13:43 +00:00
|
|
|
|
1999-06-08 17:03:28 +00:00
|
|
|
/*
|
2013-08-07 06:21:20 +00:00
|
|
|
* kva_alloc:
|
1999-06-08 17:03:28 +00:00
|
|
|
*
|
2003-08-01 19:51:43 +00:00
|
|
|
* Allocate a virtual address range with no underlying object and
|
|
|
|
* no initial mapping to physical memory. Any mapping from this
|
|
|
|
* range to physical memory must be explicitly created prior to
|
|
|
|
* its use, typically with pmap_qenter(). Any attempt to create
|
|
|
|
* a mapping on demand through vm_fault() will result in a panic.
|
1999-06-08 17:03:28 +00:00
|
|
|
*/
|
|
|
|
vm_offset_t
|
2017-10-13 13:53:19 +00:00
|
|
|
kva_alloc(vm_size_t size)
|
1999-06-08 17:03:28 +00:00
|
|
|
{
|
|
|
|
vm_offset_t addr;
|
|
|
|
|
|
|
|
size = round_page(size);
|
2013-08-07 06:21:20 +00:00
|
|
|
if (vmem_alloc(kernel_arena, size, M_BESTFIT | M_NOWAIT, &addr))
|
1999-06-08 17:03:28 +00:00
|
|
|
return (0);
|
2013-08-07 06:21:20 +00:00
|
|
|
|
1999-06-08 17:03:28 +00:00
|
|
|
return (addr);
|
|
|
|
}
|
|
|
|
|
2010-04-18 22:32:07 +00:00
|
|
|
/*
|
2013-08-07 06:21:20 +00:00
|
|
|
* kva_free:
|
2010-04-18 22:32:07 +00:00
|
|
|
*
|
2013-08-07 06:21:20 +00:00
|
|
|
* Release a region of kernel virtual memory allocated
|
|
|
|
* with kva_alloc, and return the physical pages
|
|
|
|
* associated with that region.
|
|
|
|
*
|
|
|
|
* This routine may not block on kernel maps.
|
2010-04-18 22:32:07 +00:00
|
|
|
*/
|
2013-08-07 06:21:20 +00:00
|
|
|
void
|
2017-10-13 13:53:19 +00:00
|
|
|
kva_free(vm_offset_t addr, vm_size_t size)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
|
|
|
|
|
|
|
size = round_page(size);
|
2013-08-07 06:21:20 +00:00
|
|
|
vmem_free(kernel_arena, addr, size);
|
2012-07-14 18:10:44 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Allocates a region from the kernel address map and physical pages
|
|
|
|
* within the specified address range to the kernel object. Creates a
|
|
|
|
* wired mapping from this region to these pages, and returns the
|
|
|
|
* region's starting virtual address. The allocated pages are not
|
|
|
|
* necessarily physically contiguous. If M_ZERO is specified through the
|
|
|
|
* given flags, then the pages are zeroed before they are mapped.
|
|
|
|
*/
|
2018-10-30 18:26:34 +00:00
|
|
|
static vm_offset_t
|
2018-01-12 23:13:55 +00:00
|
|
|
kmem_alloc_attr_domain(int domain, vm_size_t size, int flags, vm_paddr_t low,
|
2012-07-14 18:10:44 +00:00
|
|
|
vm_paddr_t high, vm_memattr_t memattr)
|
|
|
|
{
|
2018-01-12 23:13:55 +00:00
|
|
|
vmem_t *vmem;
|
2017-11-28 23:40:54 +00:00
|
|
|
vm_object_t object = kernel_object;
|
2017-03-14 19:39:17 +00:00
|
|
|
vm_offset_t addr, i, offset;
|
2012-07-14 18:10:44 +00:00
|
|
|
vm_page_t m;
|
|
|
|
int pflags, tries;
|
2019-02-07 02:00:23 +00:00
|
|
|
vm_prot_t prot;
|
2012-07-14 18:10:44 +00:00
|
|
|
|
|
|
|
size = round_page(size);
|
2018-01-12 23:13:55 +00:00
|
|
|
vmem = vm_dom[domain].vmd_kernel_arena;
|
2013-08-07 06:21:20 +00:00
|
|
|
if (vmem_alloc(vmem, size, M_BESTFIT | flags, &addr))
|
2012-07-14 18:10:44 +00:00
|
|
|
return (0);
|
|
|
|
offset = addr - VM_MIN_KERNEL_ADDRESS;
|
2013-08-07 06:21:20 +00:00
|
|
|
pflags = malloc2vm_flags(flags) | VM_ALLOC_NOBUSY | VM_ALLOC_WIRED;
|
2017-11-08 02:39:37 +00:00
|
|
|
pflags &= ~(VM_ALLOC_NOWAIT | VM_ALLOC_WAITOK | VM_ALLOC_WAITFAIL);
|
|
|
|
pflags |= VM_ALLOC_NOWAIT;
|
2019-02-07 02:00:23 +00:00
|
|
|
prot = (flags & M_EXEC) != 0 ? VM_PROT_ALL : VM_PROT_RW;
|
2013-03-09 02:32:23 +00:00
|
|
|
VM_OBJECT_WLOCK(object);
|
2013-08-07 06:21:20 +00:00
|
|
|
for (i = 0; i < size; i += PAGE_SIZE) {
|
2012-07-14 18:10:44 +00:00
|
|
|
tries = 0;
|
|
|
|
retry:
|
2018-01-12 23:13:55 +00:00
|
|
|
m = vm_page_alloc_contig_domain(object, atop(offset + i),
|
|
|
|
domain, pflags, 1, low, high, PAGE_SIZE, 0, memattr);
|
2012-07-14 18:10:44 +00:00
|
|
|
if (m == NULL) {
|
2013-03-09 02:32:23 +00:00
|
|
|
VM_OBJECT_WUNLOCK(object);
|
2012-07-14 18:10:44 +00:00
|
|
|
if (tries < ((flags & M_NOWAIT) != 0 ? 1 : 3)) {
|
2018-01-12 23:13:55 +00:00
|
|
|
if (!vm_page_reclaim_contig_domain(domain,
|
|
|
|
pflags, 1, low, high, PAGE_SIZE, 0) &&
|
2015-12-19 18:42:50 +00:00
|
|
|
(flags & M_WAITOK) != 0)
|
2018-02-06 22:10:07 +00:00
|
|
|
vm_wait_domain(domain);
|
2013-03-09 02:32:23 +00:00
|
|
|
VM_OBJECT_WLOCK(object);
|
2012-07-14 18:10:44 +00:00
|
|
|
tries++;
|
|
|
|
goto retry;
|
|
|
|
}
|
2015-09-26 22:57:10 +00:00
|
|
|
kmem_unback(object, addr, i);
|
2013-08-07 06:21:20 +00:00
|
|
|
vmem_free(vmem, addr, size);
|
2012-07-14 18:10:44 +00:00
|
|
|
return (0);
|
|
|
|
}
|
2018-02-06 22:10:07 +00:00
|
|
|
KASSERT(vm_phys_domain(m) == domain,
|
2018-01-12 23:13:55 +00:00
|
|
|
("kmem_alloc_attr_domain: Domain mismatch %d != %d",
|
2018-02-06 22:10:07 +00:00
|
|
|
vm_phys_domain(m), domain));
|
2012-07-14 18:10:44 +00:00
|
|
|
if ((flags & M_ZERO) && (m->flags & PG_ZERO) == 0)
|
|
|
|
pmap_zero_page(m);
|
|
|
|
m->valid = VM_PAGE_BITS_ALL;
|
2019-02-07 02:00:23 +00:00
|
|
|
pmap_enter(kernel_pmap, addr + i, m, prot,
|
|
|
|
prot | PMAP_ENTER_WIRED, 0);
|
2012-07-14 18:10:44 +00:00
|
|
|
}
|
2013-03-09 02:32:23 +00:00
|
|
|
VM_OBJECT_WUNLOCK(object);
|
2012-07-14 18:10:44 +00:00
|
|
|
return (addr);
|
|
|
|
}
|
|
|
|
|
2018-01-12 23:13:55 +00:00
|
|
|
vm_offset_t
|
2018-08-18 22:07:48 +00:00
|
|
|
kmem_alloc_attr(vm_size_t size, int flags, vm_paddr_t low, vm_paddr_t high,
|
|
|
|
vm_memattr_t memattr)
|
2018-01-12 23:13:55 +00:00
|
|
|
{
|
2018-10-30 18:26:34 +00:00
|
|
|
|
|
|
|
return (kmem_alloc_attr_domainset(DOMAINSET_RR(), size, flags, low,
|
|
|
|
high, memattr));
|
|
|
|
}
|
|
|
|
|
|
|
|
vm_offset_t
|
|
|
|
kmem_alloc_attr_domainset(struct domainset *ds, vm_size_t size, int flags,
|
|
|
|
vm_paddr_t low, vm_paddr_t high, vm_memattr_t memattr)
|
|
|
|
{
|
2018-01-12 23:13:55 +00:00
|
|
|
struct vm_domainset_iter di;
|
|
|
|
vm_offset_t addr;
|
|
|
|
int domain;
|
|
|
|
|
2018-10-30 18:26:34 +00:00
|
|
|
vm_domainset_iter_policy_init(&di, ds, &domain, &flags);
|
2018-01-12 23:13:55 +00:00
|
|
|
do {
|
|
|
|
addr = kmem_alloc_attr_domain(domain, size, flags, low, high,
|
|
|
|
memattr);
|
|
|
|
if (addr != 0)
|
|
|
|
break;
|
2018-10-23 16:35:58 +00:00
|
|
|
} while (vm_domainset_iter_policy(&di, &domain) == 0);
|
2018-01-12 23:13:55 +00:00
|
|
|
|
|
|
|
return (addr);
|
|
|
|
}
|
|
|
|
|
2012-07-14 18:10:44 +00:00
|
|
|
/*
|
|
|
|
* Allocates a region from the kernel address map and physically
|
|
|
|
* contiguous pages within the specified address range to the kernel
|
|
|
|
* object. Creates a wired mapping from this region to these pages, and
|
|
|
|
* returns the region's starting virtual address. If M_ZERO is specified
|
|
|
|
* through the given flags, then the pages are zeroed before they are
|
|
|
|
* mapped.
|
|
|
|
*/
|
2018-10-30 18:26:34 +00:00
|
|
|
static vm_offset_t
|
2018-01-12 23:13:55 +00:00
|
|
|
kmem_alloc_contig_domain(int domain, vm_size_t size, int flags, vm_paddr_t low,
|
2012-07-14 18:10:44 +00:00
|
|
|
vm_paddr_t high, u_long alignment, vm_paddr_t boundary,
|
|
|
|
vm_memattr_t memattr)
|
|
|
|
{
|
2018-01-12 23:13:55 +00:00
|
|
|
vmem_t *vmem;
|
2017-11-28 23:40:54 +00:00
|
|
|
vm_object_t object = kernel_object;
|
2017-03-14 19:39:17 +00:00
|
|
|
vm_offset_t addr, offset, tmp;
|
2012-07-14 18:10:44 +00:00
|
|
|
vm_page_t end_m, m;
|
2015-12-19 18:42:50 +00:00
|
|
|
u_long npages;
|
2012-07-14 18:10:44 +00:00
|
|
|
int pflags, tries;
|
|
|
|
|
|
|
|
size = round_page(size);
|
2018-01-12 23:13:55 +00:00
|
|
|
vmem = vm_dom[domain].vmd_kernel_arena;
|
2013-08-07 06:21:20 +00:00
|
|
|
if (vmem_alloc(vmem, size, flags | M_BESTFIT, &addr))
|
2012-07-14 18:10:44 +00:00
|
|
|
return (0);
|
|
|
|
offset = addr - VM_MIN_KERNEL_ADDRESS;
|
2013-08-07 06:21:20 +00:00
|
|
|
pflags = malloc2vm_flags(flags) | VM_ALLOC_NOBUSY | VM_ALLOC_WIRED;
|
2017-11-08 02:39:37 +00:00
|
|
|
pflags &= ~(VM_ALLOC_NOWAIT | VM_ALLOC_WAITOK | VM_ALLOC_WAITFAIL);
|
|
|
|
pflags |= VM_ALLOC_NOWAIT;
|
2015-12-19 18:42:50 +00:00
|
|
|
npages = atop(size);
|
2013-03-09 02:32:23 +00:00
|
|
|
VM_OBJECT_WLOCK(object);
|
2012-07-14 18:10:44 +00:00
|
|
|
tries = 0;
|
|
|
|
retry:
|
2018-01-12 23:13:55 +00:00
|
|
|
m = vm_page_alloc_contig_domain(object, atop(offset), domain, pflags,
|
2015-12-19 18:42:50 +00:00
|
|
|
npages, low, high, alignment, boundary, memattr);
|
2012-07-14 18:10:44 +00:00
|
|
|
if (m == NULL) {
|
2013-03-09 02:32:23 +00:00
|
|
|
VM_OBJECT_WUNLOCK(object);
|
2012-07-14 18:10:44 +00:00
|
|
|
if (tries < ((flags & M_NOWAIT) != 0 ? 1 : 3)) {
|
2018-01-12 23:13:55 +00:00
|
|
|
if (!vm_page_reclaim_contig_domain(domain, pflags,
|
|
|
|
npages, low, high, alignment, boundary) &&
|
|
|
|
(flags & M_WAITOK) != 0)
|
2018-02-06 22:10:07 +00:00
|
|
|
vm_wait_domain(domain);
|
2013-03-09 02:32:23 +00:00
|
|
|
VM_OBJECT_WLOCK(object);
|
2012-07-14 18:10:44 +00:00
|
|
|
tries++;
|
|
|
|
goto retry;
|
|
|
|
}
|
2013-08-07 06:21:20 +00:00
|
|
|
vmem_free(vmem, addr, size);
|
2012-07-14 18:10:44 +00:00
|
|
|
return (0);
|
|
|
|
}
|
2018-02-06 22:10:07 +00:00
|
|
|
KASSERT(vm_phys_domain(m) == domain,
|
2018-01-12 23:13:55 +00:00
|
|
|
("kmem_alloc_contig_domain: Domain mismatch %d != %d",
|
2018-02-06 22:10:07 +00:00
|
|
|
vm_phys_domain(m), domain));
|
2015-12-19 18:42:50 +00:00
|
|
|
end_m = m + npages;
|
2013-08-07 06:21:20 +00:00
|
|
|
tmp = addr;
|
2012-07-14 18:10:44 +00:00
|
|
|
for (; m < end_m; m++) {
|
|
|
|
if ((flags & M_ZERO) && (m->flags & PG_ZERO) == 0)
|
|
|
|
pmap_zero_page(m);
|
|
|
|
m->valid = VM_PAGE_BITS_ALL;
|
Make UMA and malloc(9) return non-executable memory in most cases.
Most kernel memory that is allocated after boot does not need to be
executable. There are a few exceptions. For example, kernel modules
do need executable memory, but they don't use UMA or malloc(9). The
BPF JIT compiler also needs executable memory and did use malloc(9)
until r317072.
(Note that a side effect of r316767 was that the "small allocation"
path in UMA on amd64 already returned non-executable memory. This
meant that some calls to malloc(9) or the UMA zone(9) allocator could
return executable memory, while others could return non-executable
memory. This change makes the behavior consistent.)
This change makes malloc(9) return non-executable memory unless the new
M_EXEC flag is specified. After this change, the UMA zone(9) allocator
will always return non-executable memory, and a KASSERT will catch
attempts to use the M_EXEC flag to allocate executable memory using
uma_zalloc() or its variants.
Allocations that do need executable memory have various choices. They
may use the M_EXEC flag to malloc(9), or they may use a different VM
interfact to obtain executable pages.
Now that malloc(9) again allows executable allocations, this change also
reverts most of r317072.
PR: 228927
Reviewed by: alc, kib, markj, jhb (previous version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D15691
2018-06-13 17:04:41 +00:00
|
|
|
pmap_enter(kernel_pmap, tmp, m, VM_PROT_RW,
|
|
|
|
VM_PROT_RW | PMAP_ENTER_WIRED, 0);
|
2013-08-07 06:21:20 +00:00
|
|
|
tmp += PAGE_SIZE;
|
2012-07-14 18:10:44 +00:00
|
|
|
}
|
2013-03-09 02:32:23 +00:00
|
|
|
VM_OBJECT_WUNLOCK(object);
|
2012-07-14 18:10:44 +00:00
|
|
|
return (addr);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
2018-01-12 23:13:55 +00:00
|
|
|
vm_offset_t
|
2018-08-20 15:57:27 +00:00
|
|
|
kmem_alloc_contig(vm_size_t size, int flags, vm_paddr_t low, vm_paddr_t high,
|
|
|
|
u_long alignment, vm_paddr_t boundary, vm_memattr_t memattr)
|
2018-01-12 23:13:55 +00:00
|
|
|
{
|
2018-10-30 18:26:34 +00:00
|
|
|
|
|
|
|
return (kmem_alloc_contig_domainset(DOMAINSET_RR(), size, flags, low,
|
|
|
|
high, alignment, boundary, memattr));
|
|
|
|
}
|
|
|
|
|
|
|
|
vm_offset_t
|
|
|
|
kmem_alloc_contig_domainset(struct domainset *ds, vm_size_t size, int flags,
|
|
|
|
vm_paddr_t low, vm_paddr_t high, u_long alignment, vm_paddr_t boundary,
|
|
|
|
vm_memattr_t memattr)
|
|
|
|
{
|
2018-01-12 23:13:55 +00:00
|
|
|
struct vm_domainset_iter di;
|
|
|
|
vm_offset_t addr;
|
|
|
|
int domain;
|
|
|
|
|
2018-10-30 18:26:34 +00:00
|
|
|
vm_domainset_iter_policy_init(&di, ds, &domain, &flags);
|
2018-01-12 23:13:55 +00:00
|
|
|
do {
|
|
|
|
addr = kmem_alloc_contig_domain(domain, size, flags, low, high,
|
|
|
|
alignment, boundary, memattr);
|
|
|
|
if (addr != 0)
|
|
|
|
break;
|
2018-10-23 16:35:58 +00:00
|
|
|
} while (vm_domainset_iter_policy(&di, &domain) == 0);
|
2018-01-12 23:13:55 +00:00
|
|
|
|
|
|
|
return (addr);
|
|
|
|
}
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
|
|
|
* kmem_suballoc:
|
|
|
|
*
|
|
|
|
* Allocates a map to manage a subrange
|
|
|
|
* of the kernel virtual address space.
|
|
|
|
*
|
|
|
|
* Arguments are as follows:
|
|
|
|
*
|
|
|
|
* parent Map to take range from
|
|
|
|
* min, max Returned endpoints of map
|
2000-12-29 13:49:05 +00:00
|
|
|
* size Size of range to find
|
2008-05-10 21:46:20 +00:00
|
|
|
* superpage_align Request that min is superpage aligned
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
1995-05-30 08:16:23 +00:00
|
|
|
vm_map_t
|
2008-05-10 21:46:20 +00:00
|
|
|
kmem_suballoc(vm_map_t parent, vm_offset_t *min, vm_offset_t *max,
|
|
|
|
vm_size_t size, boolean_t superpage_align)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2000-12-29 13:05:22 +00:00
|
|
|
int ret;
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
vm_map_t result;
|
2001-05-19 01:28:09 +00:00
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
size = round_page(size);
|
|
|
|
|
2008-04-28 17:25:27 +00:00
|
|
|
*min = vm_map_min(parent);
|
2013-09-09 18:11:59 +00:00
|
|
|
ret = vm_map_find(parent, NULL, 0, min, size, 0, superpage_align ?
|
2013-08-16 21:13:55 +00:00
|
|
|
VMFS_SUPER_SPACE : VMFS_ANY_SPACE, VM_PROT_ALL, VM_PROT_ALL,
|
Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.
The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.
The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.
The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).
Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.
In collaboration with: pho
Reviewed by: alc
Approved by: re (kensmith)
2009-06-23 20:45:22 +00:00
|
|
|
MAP_ACC_NO_CHARGE);
|
2008-03-30 20:08:59 +00:00
|
|
|
if (ret != KERN_SUCCESS)
|
|
|
|
panic("kmem_suballoc: bad status return of %d", ret);
|
1994-05-24 10:09:53 +00:00
|
|
|
*max = *min + size;
|
VM level code cleanups.
1) Start using TSM.
Struct procs continue to point to upages structure, after being freed.
Struct vmspace continues to point to pte object and kva space for kstack.
u_map is now superfluous.
2) vm_map's don't need to be reference counted. They always exist either
in the kernel or in a vmspace. The vmspaces are managed by reference
counts.
3) Remove the "wired" vm_map nonsense.
4) No need to keep a cache of kernel stack kva's.
5) Get rid of strange looking ++var, and change to var++.
6) Change more data structures to use our "zone" allocator. Added
struct proc, struct vmspace and struct vnode. This saves a significant
amount of kva space and physical memory. Additionally, this enables
TSM for the zone managed memory.
7) Keep ioopt disabled for now.
8) Remove the now bogus "single use" map concept.
9) Use generation counts or id's for data structures residing in TSM, where
it allows us to avoid unneeded restart overhead during traversals, where
blocking might occur.
10) Account better for memory deficits, so the pageout daemon will be able
to make enough memory available (experimental.)
11) Fix some vnode locking problems. (From Tor, I think.)
12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
(experimental.)
13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c
code. Use generation counts, get rid of unneded collpase operations,
and clean up the cluster code.
14) Make vm_zone more suitable for TSM.
This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)
This is not the infamous, final cleanup of the vnode stuff, but a necessary
step. Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)
1998-01-22 17:30:44 +00:00
|
|
|
result = vm_map_create(vm_map_pmap(parent), *min, *max);
|
1994-05-24 10:09:53 +00:00
|
|
|
if (result == NULL)
|
|
|
|
panic("kmem_suballoc: cannot create submap");
|
2000-12-29 13:05:22 +00:00
|
|
|
if (vm_map_submap(parent, *min, *max, result) != KERN_SUCCESS)
|
1994-05-24 10:09:53 +00:00
|
|
|
panic("kmem_suballoc: unable to change range to submap");
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
return (result);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2018-10-30 18:26:34 +00:00
|
|
|
* kmem_malloc_domain:
|
1999-01-21 08:29:12 +00:00
|
|
|
*
|
2013-08-07 06:21:20 +00:00
|
|
|
* Allocate wired-down pages in the kernel's address space.
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
2018-10-30 18:26:34 +00:00
|
|
|
static vm_offset_t
|
2018-08-18 18:33:50 +00:00
|
|
|
kmem_malloc_domain(int domain, vm_size_t size, int flags)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
Make UMA and malloc(9) return non-executable memory in most cases.
Most kernel memory that is allocated after boot does not need to be
executable. There are a few exceptions. For example, kernel modules
do need executable memory, but they don't use UMA or malloc(9). The
BPF JIT compiler also needs executable memory and did use malloc(9)
until r317072.
(Note that a side effect of r316767 was that the "small allocation"
path in UMA on amd64 already returned non-executable memory. This
meant that some calls to malloc(9) or the UMA zone(9) allocator could
return executable memory, while others could return non-executable
memory. This change makes the behavior consistent.)
This change makes malloc(9) return non-executable memory unless the new
M_EXEC flag is specified. After this change, the UMA zone(9) allocator
will always return non-executable memory, and a KASSERT will catch
attempts to use the M_EXEC flag to allocate executable memory using
uma_zalloc() or its variants.
Allocations that do need executable memory have various choices. They
may use the M_EXEC flag to malloc(9), or they may use a different VM
interfact to obtain executable pages.
Now that malloc(9) again allows executable allocations, this change also
reverts most of r317072.
PR: 228927
Reviewed by: alc, kib, markj, jhb (previous version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D15691
2018-06-13 17:04:41 +00:00
|
|
|
vmem_t *arena;
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
vm_offset_t addr;
|
2013-08-07 06:21:20 +00:00
|
|
|
int rv;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
Make UMA and malloc(9) return non-executable memory in most cases.
Most kernel memory that is allocated after boot does not need to be
executable. There are a few exceptions. For example, kernel modules
do need executable memory, but they don't use UMA or malloc(9). The
BPF JIT compiler also needs executable memory and did use malloc(9)
until r317072.
(Note that a side effect of r316767 was that the "small allocation"
path in UMA on amd64 already returned non-executable memory. This
meant that some calls to malloc(9) or the UMA zone(9) allocator could
return executable memory, while others could return non-executable
memory. This change makes the behavior consistent.)
This change makes malloc(9) return non-executable memory unless the new
M_EXEC flag is specified. After this change, the UMA zone(9) allocator
will always return non-executable memory, and a KASSERT will catch
attempts to use the M_EXEC flag to allocate executable memory using
uma_zalloc() or its variants.
Allocations that do need executable memory have various choices. They
may use the M_EXEC flag to malloc(9), or they may use a different VM
interfact to obtain executable pages.
Now that malloc(9) again allows executable allocations, this change also
reverts most of r317072.
PR: 228927
Reviewed by: alc, kib, markj, jhb (previous version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D15691
2018-06-13 17:04:41 +00:00
|
|
|
#if VM_NRESERVLEVEL > 0
|
2018-08-18 18:33:50 +00:00
|
|
|
if (__predict_true((flags & M_EXEC) == 0))
|
Make UMA and malloc(9) return non-executable memory in most cases.
Most kernel memory that is allocated after boot does not need to be
executable. There are a few exceptions. For example, kernel modules
do need executable memory, but they don't use UMA or malloc(9). The
BPF JIT compiler also needs executable memory and did use malloc(9)
until r317072.
(Note that a side effect of r316767 was that the "small allocation"
path in UMA on amd64 already returned non-executable memory. This
meant that some calls to malloc(9) or the UMA zone(9) allocator could
return executable memory, while others could return non-executable
memory. This change makes the behavior consistent.)
This change makes malloc(9) return non-executable memory unless the new
M_EXEC flag is specified. After this change, the UMA zone(9) allocator
will always return non-executable memory, and a KASSERT will catch
attempts to use the M_EXEC flag to allocate executable memory using
uma_zalloc() or its variants.
Allocations that do need executable memory have various choices. They
may use the M_EXEC flag to malloc(9), or they may use a different VM
interfact to obtain executable pages.
Now that malloc(9) again allows executable allocations, this change also
reverts most of r317072.
PR: 228927
Reviewed by: alc, kib, markj, jhb (previous version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D15691
2018-06-13 17:04:41 +00:00
|
|
|
arena = vm_dom[domain].vmd_kernel_arena;
|
|
|
|
else
|
|
|
|
arena = vm_dom[domain].vmd_kernel_rwx_arena;
|
|
|
|
#else
|
|
|
|
arena = vm_dom[domain].vmd_kernel_arena;
|
|
|
|
#endif
|
1994-05-24 10:09:53 +00:00
|
|
|
size = round_page(size);
|
Make UMA and malloc(9) return non-executable memory in most cases.
Most kernel memory that is allocated after boot does not need to be
executable. There are a few exceptions. For example, kernel modules
do need executable memory, but they don't use UMA or malloc(9). The
BPF JIT compiler also needs executable memory and did use malloc(9)
until r317072.
(Note that a side effect of r316767 was that the "small allocation"
path in UMA on amd64 already returned non-executable memory. This
meant that some calls to malloc(9) or the UMA zone(9) allocator could
return executable memory, while others could return non-executable
memory. This change makes the behavior consistent.)
This change makes malloc(9) return non-executable memory unless the new
M_EXEC flag is specified. After this change, the UMA zone(9) allocator
will always return non-executable memory, and a KASSERT will catch
attempts to use the M_EXEC flag to allocate executable memory using
uma_zalloc() or its variants.
Allocations that do need executable memory have various choices. They
may use the M_EXEC flag to malloc(9), or they may use a different VM
interfact to obtain executable pages.
Now that malloc(9) again allows executable allocations, this change also
reverts most of r317072.
PR: 228927
Reviewed by: alc, kib, markj, jhb (previous version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D15691
2018-06-13 17:04:41 +00:00
|
|
|
if (vmem_alloc(arena, size, flags | M_BESTFIT, &addr))
|
2013-08-07 06:21:20 +00:00
|
|
|
return (0);
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2018-01-12 23:13:55 +00:00
|
|
|
rv = kmem_back_domain(domain, kernel_object, addr, size, flags);
|
2013-08-07 06:21:20 +00:00
|
|
|
if (rv != KERN_SUCCESS) {
|
Make UMA and malloc(9) return non-executable memory in most cases.
Most kernel memory that is allocated after boot does not need to be
executable. There are a few exceptions. For example, kernel modules
do need executable memory, but they don't use UMA or malloc(9). The
BPF JIT compiler also needs executable memory and did use malloc(9)
until r317072.
(Note that a side effect of r316767 was that the "small allocation"
path in UMA on amd64 already returned non-executable memory. This
meant that some calls to malloc(9) or the UMA zone(9) allocator could
return executable memory, while others could return non-executable
memory. This change makes the behavior consistent.)
This change makes malloc(9) return non-executable memory unless the new
M_EXEC flag is specified. After this change, the UMA zone(9) allocator
will always return non-executable memory, and a KASSERT will catch
attempts to use the M_EXEC flag to allocate executable memory using
uma_zalloc() or its variants.
Allocations that do need executable memory have various choices. They
may use the M_EXEC flag to malloc(9), or they may use a different VM
interfact to obtain executable pages.
Now that malloc(9) again allows executable allocations, this change also
reverts most of r317072.
PR: 228927
Reviewed by: alc, kib, markj, jhb (previous version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D15691
2018-06-13 17:04:41 +00:00
|
|
|
vmem_free(arena, addr, size);
|
2013-08-07 06:21:20 +00:00
|
|
|
return (0);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
2013-08-07 06:21:20 +00:00
|
|
|
return (addr);
|
2010-08-11 22:10:37 +00:00
|
|
|
}
|
|
|
|
|
2018-01-12 23:13:55 +00:00
|
|
|
vm_offset_t
|
2018-08-21 16:43:46 +00:00
|
|
|
kmem_malloc(vm_size_t size, int flags)
|
2018-10-30 18:26:34 +00:00
|
|
|
{
|
|
|
|
|
|
|
|
return (kmem_malloc_domainset(DOMAINSET_RR(), size, flags));
|
|
|
|
}
|
|
|
|
|
|
|
|
vm_offset_t
|
|
|
|
kmem_malloc_domainset(struct domainset *ds, vm_size_t size, int flags)
|
2018-01-12 23:13:55 +00:00
|
|
|
{
|
|
|
|
struct vm_domainset_iter di;
|
|
|
|
vm_offset_t addr;
|
|
|
|
int domain;
|
|
|
|
|
2018-10-30 18:26:34 +00:00
|
|
|
vm_domainset_iter_policy_init(&di, ds, &domain, &flags);
|
2018-01-12 23:13:55 +00:00
|
|
|
do {
|
2018-08-18 18:33:50 +00:00
|
|
|
addr = kmem_malloc_domain(domain, size, flags);
|
2018-01-12 23:13:55 +00:00
|
|
|
if (addr != 0)
|
|
|
|
break;
|
2018-10-23 16:35:58 +00:00
|
|
|
} while (vm_domainset_iter_policy(&di, &domain) == 0);
|
2018-01-12 23:13:55 +00:00
|
|
|
|
|
|
|
return (addr);
|
|
|
|
}
|
|
|
|
|
2010-08-11 22:10:37 +00:00
|
|
|
/*
|
2018-09-20 15:45:12 +00:00
|
|
|
* kmem_back_domain:
|
2010-08-11 22:10:37 +00:00
|
|
|
*
|
2018-09-20 15:45:12 +00:00
|
|
|
* Allocate physical pages from the specified domain for the specified
|
|
|
|
* virtual address range.
|
2010-08-11 22:10:37 +00:00
|
|
|
*/
|
|
|
|
int
|
2018-01-12 23:13:55 +00:00
|
|
|
kmem_back_domain(int domain, vm_object_t object, vm_offset_t addr,
|
|
|
|
vm_size_t size, int flags)
|
2010-08-11 22:10:37 +00:00
|
|
|
{
|
|
|
|
vm_offset_t offset, i;
|
2017-08-15 16:39:49 +00:00
|
|
|
vm_page_t m, mpred;
|
Make UMA and malloc(9) return non-executable memory in most cases.
Most kernel memory that is allocated after boot does not need to be
executable. There are a few exceptions. For example, kernel modules
do need executable memory, but they don't use UMA or malloc(9). The
BPF JIT compiler also needs executable memory and did use malloc(9)
until r317072.
(Note that a side effect of r316767 was that the "small allocation"
path in UMA on amd64 already returned non-executable memory. This
meant that some calls to malloc(9) or the UMA zone(9) allocator could
return executable memory, while others could return non-executable
memory. This change makes the behavior consistent.)
This change makes malloc(9) return non-executable memory unless the new
M_EXEC flag is specified. After this change, the UMA zone(9) allocator
will always return non-executable memory, and a KASSERT will catch
attempts to use the M_EXEC flag to allocate executable memory using
uma_zalloc() or its variants.
Allocations that do need executable memory have various choices. They
may use the M_EXEC flag to malloc(9), or they may use a different VM
interfact to obtain executable pages.
Now that malloc(9) again allows executable allocations, this change also
reverts most of r317072.
PR: 228927
Reviewed by: alc, kib, markj, jhb (previous version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D15691
2018-06-13 17:04:41 +00:00
|
|
|
vm_prot_t prot;
|
2010-08-11 22:10:37 +00:00
|
|
|
int pflags;
|
|
|
|
|
2017-11-28 23:40:54 +00:00
|
|
|
KASSERT(object == kernel_object,
|
2018-01-12 23:13:55 +00:00
|
|
|
("kmem_back_domain: only supports kernel object."));
|
2011-02-15 09:03:58 +00:00
|
|
|
|
2013-08-07 06:21:20 +00:00
|
|
|
offset = addr - VM_MIN_KERNEL_ADDRESS;
|
|
|
|
pflags = malloc2vm_flags(flags) | VM_ALLOC_NOBUSY | VM_ALLOC_WIRED;
|
2017-11-08 02:39:37 +00:00
|
|
|
pflags &= ~(VM_ALLOC_NOWAIT | VM_ALLOC_WAITOK | VM_ALLOC_WAITFAIL);
|
|
|
|
if (flags & M_WAITOK)
|
|
|
|
pflags |= VM_ALLOC_WAITFAIL;
|
Make UMA and malloc(9) return non-executable memory in most cases.
Most kernel memory that is allocated after boot does not need to be
executable. There are a few exceptions. For example, kernel modules
do need executable memory, but they don't use UMA or malloc(9). The
BPF JIT compiler also needs executable memory and did use malloc(9)
until r317072.
(Note that a side effect of r316767 was that the "small allocation"
path in UMA on amd64 already returned non-executable memory. This
meant that some calls to malloc(9) or the UMA zone(9) allocator could
return executable memory, while others could return non-executable
memory. This change makes the behavior consistent.)
This change makes malloc(9) return non-executable memory unless the new
M_EXEC flag is specified. After this change, the UMA zone(9) allocator
will always return non-executable memory, and a KASSERT will catch
attempts to use the M_EXEC flag to allocate executable memory using
uma_zalloc() or its variants.
Allocations that do need executable memory have various choices. They
may use the M_EXEC flag to malloc(9), or they may use a different VM
interfact to obtain executable pages.
Now that malloc(9) again allows executable allocations, this change also
reverts most of r317072.
PR: 228927
Reviewed by: alc, kib, markj, jhb (previous version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D15691
2018-06-13 17:04:41 +00:00
|
|
|
prot = (flags & M_EXEC) != 0 ? VM_PROT_ALL : VM_PROT_RW;
|
2002-06-19 20:47:18 +00:00
|
|
|
|
2017-08-15 16:39:49 +00:00
|
|
|
i = 0;
|
|
|
|
VM_OBJECT_WLOCK(object);
|
2017-11-08 02:39:37 +00:00
|
|
|
retry:
|
2017-08-15 16:39:49 +00:00
|
|
|
mpred = vm_radix_lookup_le(&object->rtree, atop(offset + i));
|
|
|
|
for (; i < size; i += PAGE_SIZE, mpred = m) {
|
2018-01-12 23:13:55 +00:00
|
|
|
m = vm_page_alloc_domain_after(object, atop(offset + i),
|
|
|
|
domain, pflags, mpred);
|
1994-05-24 10:09:53 +00:00
|
|
|
|
|
|
|
/*
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
* Ran out of space, free everything up and return. Don't need
|
|
|
|
* to lock page queues here as we know that the pages we got
|
|
|
|
* aren't on any queues.
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
|
|
|
if (m == NULL) {
|
2017-11-08 02:39:37 +00:00
|
|
|
if ((flags & M_NOWAIT) == 0)
|
1996-05-18 03:38:05 +00:00
|
|
|
goto retry;
|
2017-11-08 02:39:37 +00:00
|
|
|
VM_OBJECT_WUNLOCK(object);
|
2015-09-26 22:57:10 +00:00
|
|
|
kmem_unback(object, addr, i);
|
2010-08-11 22:10:37 +00:00
|
|
|
return (KERN_NO_SPACE);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
2018-02-06 22:10:07 +00:00
|
|
|
KASSERT(vm_phys_domain(m) == domain,
|
2018-01-12 23:13:55 +00:00
|
|
|
("kmem_back_domain: Domain mismatch %d != %d",
|
2018-02-06 22:10:07 +00:00
|
|
|
vm_phys_domain(m), domain));
|
2002-06-19 23:49:57 +00:00
|
|
|
if (flags & M_ZERO && (m->flags & PG_ZERO) == 0)
|
2002-08-25 00:22:31 +00:00
|
|
|
pmap_zero_page(m);
|
2011-08-09 21:01:36 +00:00
|
|
|
KASSERT((m->oflags & VPO_UNMANAGED) != 0,
|
2007-02-25 06:14:58 +00:00
|
|
|
("kmem_malloc: page %p is managed", m));
|
2013-08-07 06:21:20 +00:00
|
|
|
m->valid = VM_PAGE_BITS_ALL;
|
Make UMA and malloc(9) return non-executable memory in most cases.
Most kernel memory that is allocated after boot does not need to be
executable. There are a few exceptions. For example, kernel modules
do need executable memory, but they don't use UMA or malloc(9). The
BPF JIT compiler also needs executable memory and did use malloc(9)
until r317072.
(Note that a side effect of r316767 was that the "small allocation"
path in UMA on amd64 already returned non-executable memory. This
meant that some calls to malloc(9) or the UMA zone(9) allocator could
return executable memory, while others could return non-executable
memory. This change makes the behavior consistent.)
This change makes malloc(9) return non-executable memory unless the new
M_EXEC flag is specified. After this change, the UMA zone(9) allocator
will always return non-executable memory, and a KASSERT will catch
attempts to use the M_EXEC flag to allocate executable memory using
uma_zalloc() or its variants.
Allocations that do need executable memory have various choices. They
may use the M_EXEC flag to malloc(9), or they may use a different VM
interfact to obtain executable pages.
Now that malloc(9) again allows executable allocations, this change also
reverts most of r317072.
PR: 228927
Reviewed by: alc, kib, markj, jhb (previous version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D15691
2018-06-13 17:04:41 +00:00
|
|
|
pmap_enter(kernel_pmap, addr + i, m, prot,
|
|
|
|
prot | PMAP_ENTER_WIRED, 0);
|
2018-08-25 19:38:08 +00:00
|
|
|
#if VM_NRESERVLEVEL > 0
|
|
|
|
if (__predict_false((prot & VM_PROT_EXECUTE) != 0))
|
|
|
|
m->oflags |= VPO_KMEM_EXEC;
|
|
|
|
#endif
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
2013-08-07 06:21:20 +00:00
|
|
|
VM_OBJECT_WUNLOCK(object);
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2013-08-07 06:21:20 +00:00
|
|
|
return (KERN_SUCCESS);
|
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2018-09-20 15:45:12 +00:00
|
|
|
/*
|
|
|
|
* kmem_back:
|
|
|
|
*
|
|
|
|
* Allocate physical pages for the specified virtual address range.
|
|
|
|
*/
|
2018-01-12 23:13:55 +00:00
|
|
|
int
|
|
|
|
kmem_back(vm_object_t object, vm_offset_t addr, vm_size_t size, int flags)
|
|
|
|
{
|
2018-09-20 15:45:12 +00:00
|
|
|
vm_offset_t end, next, start;
|
|
|
|
int domain, rv;
|
2018-01-12 23:13:55 +00:00
|
|
|
|
|
|
|
KASSERT(object == kernel_object,
|
|
|
|
("kmem_back: only supports kernel object."));
|
|
|
|
|
2018-09-20 15:45:12 +00:00
|
|
|
for (start = addr, end = addr + size; addr < end; addr = next) {
|
|
|
|
/*
|
|
|
|
* We must ensure that pages backing a given large virtual page
|
|
|
|
* all come from the same physical domain.
|
|
|
|
*/
|
|
|
|
if (vm_ndomains > 1) {
|
|
|
|
domain = (addr >> KVA_QUANTUM_SHIFT) % vm_ndomains;
|
2018-10-01 14:14:21 +00:00
|
|
|
while (VM_DOMAIN_EMPTY(domain))
|
|
|
|
domain++;
|
2018-09-20 15:45:12 +00:00
|
|
|
next = roundup2(addr + 1, KVA_QUANTUM);
|
|
|
|
if (next > end || next < start)
|
|
|
|
next = end;
|
2018-09-24 15:32:46 +00:00
|
|
|
} else {
|
|
|
|
domain = 0;
|
2018-09-20 15:45:12 +00:00
|
|
|
next = end;
|
2018-09-24 15:32:46 +00:00
|
|
|
}
|
2018-09-20 15:45:12 +00:00
|
|
|
rv = kmem_back_domain(domain, object, addr, next - addr, flags);
|
|
|
|
if (rv != KERN_SUCCESS) {
|
|
|
|
kmem_unback(object, start, addr - start);
|
2018-01-12 23:13:55 +00:00
|
|
|
break;
|
2018-09-20 15:45:12 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
return (rv);
|
2018-01-12 23:13:55 +00:00
|
|
|
}
|
|
|
|
|
2015-09-26 22:57:10 +00:00
|
|
|
/*
|
|
|
|
* kmem_unback:
|
|
|
|
*
|
|
|
|
* Unmap and free the physical pages underlying the specified virtual
|
|
|
|
* address range.
|
|
|
|
*
|
|
|
|
* A physical page must exist within the specified object at each index
|
|
|
|
* that is being unmapped.
|
|
|
|
*/
|
2018-08-25 19:38:08 +00:00
|
|
|
static struct vmem *
|
2018-01-12 23:13:55 +00:00
|
|
|
_kmem_unback(vm_object_t object, vm_offset_t addr, vm_size_t size)
|
2013-08-07 06:21:20 +00:00
|
|
|
{
|
2018-08-25 19:38:08 +00:00
|
|
|
struct vmem *arena;
|
2017-08-11 03:09:11 +00:00
|
|
|
vm_page_t m, next;
|
|
|
|
vm_offset_t end, offset;
|
2018-01-12 23:13:55 +00:00
|
|
|
int domain;
|
1996-12-28 23:07:49 +00:00
|
|
|
|
2017-11-28 23:40:54 +00:00
|
|
|
KASSERT(object == kernel_object,
|
|
|
|
("kmem_unback: only supports kernel object."));
|
2013-08-07 06:21:20 +00:00
|
|
|
|
2018-01-12 23:13:55 +00:00
|
|
|
if (size == 0)
|
2018-08-25 19:38:08 +00:00
|
|
|
return (NULL);
|
2014-05-23 16:22:36 +00:00
|
|
|
pmap_remove(kernel_pmap, addr, addr + size);
|
2013-08-07 06:21:20 +00:00
|
|
|
offset = addr - VM_MIN_KERNEL_ADDRESS;
|
2017-08-11 03:09:11 +00:00
|
|
|
end = offset + size;
|
2013-08-07 06:21:20 +00:00
|
|
|
VM_OBJECT_WLOCK(object);
|
2018-01-12 23:13:55 +00:00
|
|
|
m = vm_page_lookup(object, atop(offset));
|
2018-02-06 22:10:07 +00:00
|
|
|
domain = vm_phys_domain(m);
|
2018-08-25 19:38:08 +00:00
|
|
|
#if VM_NRESERVLEVEL > 0
|
|
|
|
if (__predict_true((m->oflags & VPO_KMEM_EXEC) == 0))
|
|
|
|
arena = vm_dom[domain].vmd_kernel_arena;
|
|
|
|
else
|
|
|
|
arena = vm_dom[domain].vmd_kernel_rwx_arena;
|
|
|
|
#else
|
|
|
|
arena = vm_dom[domain].vmd_kernel_arena;
|
|
|
|
#endif
|
2018-01-12 23:13:55 +00:00
|
|
|
for (; offset < end; offset += PAGE_SIZE, m = next) {
|
2017-08-11 03:09:11 +00:00
|
|
|
next = vm_page_next(m);
|
2015-10-06 05:49:00 +00:00
|
|
|
vm_page_unwire(m, PQ_NONE);
|
2013-08-07 06:21:20 +00:00
|
|
|
vm_page_free(m);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
2013-08-07 06:21:20 +00:00
|
|
|
VM_OBJECT_WUNLOCK(object);
|
2018-01-12 23:13:55 +00:00
|
|
|
|
2018-08-25 19:38:08 +00:00
|
|
|
return (arena);
|
2018-01-12 23:13:55 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
kmem_unback(vm_object_t object, vm_offset_t addr, vm_size_t size)
|
|
|
|
{
|
|
|
|
|
2018-08-25 19:38:08 +00:00
|
|
|
(void)_kmem_unback(object, addr, size);
|
2013-08-07 06:21:20 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2013-08-07 06:21:20 +00:00
|
|
|
/*
|
|
|
|
* kmem_free:
|
|
|
|
*
|
|
|
|
* Free memory allocated with kmem_malloc. The size must match the
|
|
|
|
* original allocation.
|
|
|
|
*/
|
|
|
|
void
|
2018-08-25 19:38:08 +00:00
|
|
|
kmem_free(vm_offset_t addr, vm_size_t size)
|
2013-08-07 06:21:20 +00:00
|
|
|
{
|
Make UMA and malloc(9) return non-executable memory in most cases.
Most kernel memory that is allocated after boot does not need to be
executable. There are a few exceptions. For example, kernel modules
do need executable memory, but they don't use UMA or malloc(9). The
BPF JIT compiler also needs executable memory and did use malloc(9)
until r317072.
(Note that a side effect of r316767 was that the "small allocation"
path in UMA on amd64 already returned non-executable memory. This
meant that some calls to malloc(9) or the UMA zone(9) allocator could
return executable memory, while others could return non-executable
memory. This change makes the behavior consistent.)
This change makes malloc(9) return non-executable memory unless the new
M_EXEC flag is specified. After this change, the UMA zone(9) allocator
will always return non-executable memory, and a KASSERT will catch
attempts to use the M_EXEC flag to allocate executable memory using
uma_zalloc() or its variants.
Allocations that do need executable memory have various choices. They
may use the M_EXEC flag to malloc(9), or they may use a different VM
interfact to obtain executable pages.
Now that malloc(9) again allows executable allocations, this change also
reverts most of r317072.
PR: 228927
Reviewed by: alc, kib, markj, jhb (previous version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D15691
2018-06-13 17:04:41 +00:00
|
|
|
struct vmem *arena;
|
|
|
|
|
2013-08-07 06:21:20 +00:00
|
|
|
size = round_page(size);
|
2018-08-25 19:38:08 +00:00
|
|
|
arena = _kmem_unback(kernel_object, addr, size);
|
|
|
|
if (arena != NULL)
|
|
|
|
vmem_free(arena, addr, size);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2013-08-07 06:21:20 +00:00
|
|
|
* kmap_alloc_wait:
|
1994-05-24 10:09:53 +00:00
|
|
|
*
|
|
|
|
* Allocates pageable memory from a sub-map of the kernel. If the submap
|
|
|
|
* has no room, the caller sleeps waiting for more memory in the submap.
|
|
|
|
*
|
1999-01-21 08:29:12 +00:00
|
|
|
* This routine may block.
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
1995-05-30 08:16:23 +00:00
|
|
|
vm_offset_t
|
2017-10-13 13:53:19 +00:00
|
|
|
kmap_alloc_wait(vm_map_t map, vm_size_t size)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
vm_offset_t addr;
|
2001-05-19 01:28:09 +00:00
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
size = round_page(size);
|
Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.
The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.
The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.
The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).
Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.
In collaboration with: pho
Reviewed by: alc
Approved by: re (kensmith)
2009-06-23 20:45:22 +00:00
|
|
|
if (!swap_reserve(size))
|
|
|
|
return (0);
|
1994-05-24 10:09:53 +00:00
|
|
|
|
|
|
|
for (;;) {
|
|
|
|
/*
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
* To make this work for more than one map, use the map's lock
|
|
|
|
* to lock out sleepers/wakers.
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
|
|
|
vm_map_lock(map);
|
Eliminate adj_free field from vm_map_entry.
Drop the adj_free field from vm_map_entry_t. Refine the max_free field
so that p->max_free is the size of the largest gap with one endpoint
in the subtree rooted at p. Change vm_map_findspace so that, first,
the address-based splay is restricted to tree nodes with large-enough
max_free value, to avoid searching for the right starting point in a
subtree where all the gaps are too small. Second, when the address
search leads to a tree search for the first large-enough gap, that gap
is the subject of a splay-search that brings the gap to the top of the
tree, so that an immediate insertion will take constant time.
Break up the splay code into separate components, one for searching
and breaking up the tree and another for reassembling it. Use these
components, and not splay itself, for linking and unlinking. Drop the
after-where parameter to link, as it is computed as a side-effect of
the splay search.
Submitted by: Doug Moore <dougm@rice.edu>
Reviewed by: markj
Tested by: pho
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D17794
2019-03-29 16:53:46 +00:00
|
|
|
addr = vm_map_findspace(map, vm_map_min(map), size);
|
|
|
|
if (addr + size <= vm_map_max(map))
|
1994-05-24 10:09:53 +00:00
|
|
|
break;
|
|
|
|
/* no space now; see if we can ever get space */
|
|
|
|
if (vm_map_max(map) - vm_map_min(map) < size) {
|
|
|
|
vm_map_unlock(map);
|
Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.
The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.
The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.
The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).
Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.
In collaboration with: pho
Reviewed by: alc
Approved by: re (kensmith)
2009-06-23 20:45:22 +00:00
|
|
|
swap_release(size);
|
1994-05-24 10:09:53 +00:00
|
|
|
return (0);
|
|
|
|
}
|
2002-07-11 02:39:24 +00:00
|
|
|
map->needs_wakeup = TRUE;
|
2007-11-07 21:56:58 +00:00
|
|
|
vm_map_unlock_and_wait(map, 0);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
2018-11-06 21:57:03 +00:00
|
|
|
vm_map_insert(map, NULL, 0, addr, addr + size, VM_PROT_RW, VM_PROT_RW,
|
|
|
|
MAP_ACC_CHARGED);
|
1994-05-24 10:09:53 +00:00
|
|
|
vm_map_unlock(map);
|
|
|
|
return (addr);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2013-08-07 06:21:20 +00:00
|
|
|
* kmap_free_wakeup:
|
1994-05-24 10:09:53 +00:00
|
|
|
*
|
NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!
Much needed overhaul of the VM system. Included in this first round of
changes:
1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".
2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.
3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.
4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.
5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.
6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.
7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.
8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.
9) Some almost useless debugging code removed.
10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.
11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.
12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).
13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.
14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)
TODO:
1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.
2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.
3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.
4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.
5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).
1995-07-13 08:48:48 +00:00
|
|
|
* Returns memory to a submap of the kernel, and wakes up any processes
|
1994-05-24 10:09:53 +00:00
|
|
|
* waiting for memory in that map.
|
|
|
|
*/
|
1995-05-30 08:16:23 +00:00
|
|
|
void
|
2017-10-13 13:53:19 +00:00
|
|
|
kmap_free_wakeup(vm_map_t map, vm_offset_t addr, vm_size_t size)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2001-05-19 01:28:09 +00:00
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
vm_map_lock(map);
|
2009-02-24 20:57:43 +00:00
|
|
|
(void) vm_map_delete(map, trunc_page(addr), round_page(addr + size));
|
2002-07-11 02:39:24 +00:00
|
|
|
if (map->needs_wakeup) {
|
|
|
|
map->needs_wakeup = FALSE;
|
|
|
|
vm_map_wakeup(map);
|
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
vm_map_unlock(map);
|
|
|
|
}
|
|
|
|
|
2013-08-07 06:21:20 +00:00
|
|
|
void
|
2011-05-13 18:48:00 +00:00
|
|
|
kmem_init_zero_region(void)
|
|
|
|
{
|
2011-05-13 19:35:01 +00:00
|
|
|
vm_offset_t addr, i;
|
2011-05-13 18:48:00 +00:00
|
|
|
vm_page_t m;
|
|
|
|
|
2011-05-13 19:35:01 +00:00
|
|
|
/*
|
|
|
|
* Map a single physical page of zeros to a larger virtual range.
|
|
|
|
* This requires less looping in places that want large amounts of
|
|
|
|
* zeros, while not using much more physical resources.
|
|
|
|
*/
|
2013-08-07 06:21:20 +00:00
|
|
|
addr = kva_alloc(ZERO_REGION_SIZE);
|
2011-10-27 16:39:17 +00:00
|
|
|
m = vm_page_alloc(NULL, 0, VM_ALLOC_NORMAL |
|
2011-05-13 18:48:00 +00:00
|
|
|
VM_ALLOC_NOOBJ | VM_ALLOC_WIRED | VM_ALLOC_ZERO);
|
|
|
|
if ((m->flags & PG_ZERO) == 0)
|
|
|
|
pmap_zero_page(m);
|
|
|
|
for (i = 0; i < ZERO_REGION_SIZE; i += PAGE_SIZE)
|
|
|
|
pmap_qenter(addr + i, &m, 1);
|
2013-08-07 06:21:20 +00:00
|
|
|
pmap_protect(kernel_pmap, addr, addr + ZERO_REGION_SIZE, VM_PROT_READ);
|
2011-05-13 18:48:00 +00:00
|
|
|
|
|
|
|
zero_region = (const void *)addr;
|
|
|
|
}
|
|
|
|
|
2018-09-19 19:13:43 +00:00
|
|
|
/*
|
2018-09-20 18:29:55 +00:00
|
|
|
* Import KVA from the kernel map into the kernel arena.
|
2018-09-19 19:13:43 +00:00
|
|
|
*/
|
|
|
|
static int
|
|
|
|
kva_import(void *unused, vmem_size_t size, int flags, vmem_addr_t *addrp)
|
|
|
|
{
|
|
|
|
vm_offset_t addr;
|
|
|
|
int result;
|
|
|
|
|
|
|
|
KASSERT((size % KVA_QUANTUM) == 0,
|
|
|
|
("kva_import: Size %jd is not a multiple of %d",
|
|
|
|
(intmax_t)size, (int)KVA_QUANTUM));
|
|
|
|
addr = vm_map_min(kernel_map);
|
|
|
|
result = vm_map_find(kernel_map, NULL, 0, &addr, size, 0,
|
|
|
|
VMFS_SUPER_SPACE, VM_PROT_ALL, VM_PROT_ALL, MAP_NOFAULT);
|
|
|
|
if (result != KERN_SUCCESS)
|
|
|
|
return (ENOMEM);
|
|
|
|
|
|
|
|
*addrp = addr;
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2018-09-20 18:29:55 +00:00
|
|
|
* Import KVA from a parent arena into a per-domain arena. Imports must be
|
|
|
|
* KVA_QUANTUM-aligned and a multiple of KVA_QUANTUM in size.
|
2018-09-19 19:13:43 +00:00
|
|
|
*/
|
|
|
|
static int
|
2018-09-20 18:29:55 +00:00
|
|
|
kva_import_domain(void *arena, vmem_size_t size, int flags, vmem_addr_t *addrp)
|
2018-09-19 19:13:43 +00:00
|
|
|
{
|
|
|
|
|
|
|
|
KASSERT((size % KVA_QUANTUM) == 0,
|
2018-09-20 18:29:55 +00:00
|
|
|
("kva_import_domain: Size %jd is not a multiple of %d",
|
2018-09-19 19:13:43 +00:00
|
|
|
(intmax_t)size, (int)KVA_QUANTUM));
|
|
|
|
return (vmem_xalloc(arena, size, KVA_QUANTUM, 0, 0, VMEM_ADDR_MIN,
|
|
|
|
VMEM_ADDR_MAX, flags, addrp));
|
|
|
|
}
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
1999-01-21 08:29:12 +00:00
|
|
|
* kmem_init:
|
|
|
|
*
|
|
|
|
* Create the kernel map; insert a mapping covering kernel text,
|
|
|
|
* data, bss, and all space allocated thus far (`boostrap' data). The
|
|
|
|
* new map will thus map the range between VM_MIN_KERNEL_ADDRESS and
|
|
|
|
* `start' as allocated, and the range between `start' and `end' as free.
|
2018-09-19 19:13:43 +00:00
|
|
|
* Create the kernel vmem arena and its per-domain children.
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
1995-05-30 08:16:23 +00:00
|
|
|
void
|
2017-10-13 13:53:19 +00:00
|
|
|
kmem_init(vm_offset_t start, vm_offset_t end)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2000-12-29 13:49:05 +00:00
|
|
|
vm_map_t m;
|
2018-09-19 19:13:43 +00:00
|
|
|
int domain;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
VM level code cleanups.
1) Start using TSM.
Struct procs continue to point to upages structure, after being freed.
Struct vmspace continues to point to pte object and kva space for kstack.
u_map is now superfluous.
2) vm_map's don't need to be reference counted. They always exist either
in the kernel or in a vmspace. The vmspaces are managed by reference
counts.
3) Remove the "wired" vm_map nonsense.
4) No need to keep a cache of kernel stack kva's.
5) Get rid of strange looking ++var, and change to var++.
6) Change more data structures to use our "zone" allocator. Added
struct proc, struct vmspace and struct vnode. This saves a significant
amount of kva space and physical memory. Additionally, this enables
TSM for the zone managed memory.
7) Keep ioopt disabled for now.
8) Remove the now bogus "single use" map concept.
9) Use generation counts or id's for data structures residing in TSM, where
it allows us to avoid unneeded restart overhead during traversals, where
blocking might occur.
10) Account better for memory deficits, so the pageout daemon will be able
to make enough memory available (experimental.)
11) Fix some vnode locking problems. (From Tor, I think.)
12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
(experimental.)
13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c
code. Use generation counts, get rid of unneded collpase operations,
and clean up the cluster code.
14) Make vm_zone more suitable for TSM.
This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)
This is not the infamous, final cleanup of the vnode stuff, but a necessary
step. Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)
1998-01-22 17:30:44 +00:00
|
|
|
m = vm_map_create(kernel_pmap, VM_MIN_KERNEL_ADDRESS, end);
|
2002-12-30 05:55:41 +00:00
|
|
|
m->system_map = 1;
|
1994-05-24 10:09:53 +00:00
|
|
|
vm_map_lock(m);
|
|
|
|
/* N.B.: cannot use kgdb to debug, starting with this assignment ... */
|
|
|
|
kernel_map = m;
|
2002-12-30 05:55:41 +00:00
|
|
|
(void) vm_map_insert(m, NULL, (vm_ooffset_t) 0,
|
2008-06-22 04:54:27 +00:00
|
|
|
#ifdef __amd64__
|
|
|
|
KERNBASE,
|
|
|
|
#else
|
|
|
|
VM_MIN_KERNEL_ADDRESS,
|
|
|
|
#endif
|
|
|
|
start, VM_PROT_ALL, VM_PROT_ALL, MAP_NOFAULT);
|
1994-05-24 10:09:53 +00:00
|
|
|
/* ... and ending with the completion of the above `insert' */
|
|
|
|
vm_map_unlock(m);
|
2018-09-19 19:13:43 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Initialize the kernel_arena. This can grow on demand.
|
|
|
|
*/
|
|
|
|
vmem_init(kernel_arena, "kernel arena", 0, 0, PAGE_SIZE, 0, 0);
|
|
|
|
vmem_set_import(kernel_arena, kva_import, NULL, NULL, KVA_QUANTUM);
|
|
|
|
|
|
|
|
for (domain = 0; domain < vm_ndomains; domain++) {
|
2018-09-20 18:29:55 +00:00
|
|
|
/*
|
|
|
|
* Initialize the per-domain arenas. These are used to color
|
|
|
|
* the KVA space in a way that ensures that virtual large pages
|
|
|
|
* are backed by memory from the same physical domain,
|
|
|
|
* maximizing the potential for superpage promotion.
|
|
|
|
*/
|
2018-09-19 19:13:43 +00:00
|
|
|
vm_dom[domain].vmd_kernel_arena = vmem_create(
|
|
|
|
"kernel arena domain", 0, 0, PAGE_SIZE, 0, M_WAITOK);
|
|
|
|
vmem_set_import(vm_dom[domain].vmd_kernel_arena,
|
2018-09-20 18:29:55 +00:00
|
|
|
kva_import_domain, NULL, kernel_arena, KVA_QUANTUM);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* In architectures with superpages, maintain separate arenas
|
|
|
|
* for allocations with permissions that differ from the
|
|
|
|
* "standard" read/write permissions used for kernel memory,
|
|
|
|
* so as not to inhibit superpage promotion.
|
|
|
|
*/
|
2018-09-19 19:13:43 +00:00
|
|
|
#if VM_NRESERVLEVEL > 0
|
|
|
|
vm_dom[domain].vmd_kernel_rwx_arena = vmem_create(
|
|
|
|
"kernel rwx arena domain", 0, 0, PAGE_SIZE, 0, M_WAITOK);
|
|
|
|
vmem_set_import(vm_dom[domain].vmd_kernel_rwx_arena,
|
2018-09-20 18:29:55 +00:00
|
|
|
kva_import_domain, (vmem_release_t *)vmem_xfree,
|
|
|
|
kernel_arena, KVA_QUANTUM);
|
2018-09-19 19:13:43 +00:00
|
|
|
#endif
|
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
2009-02-23 23:00:12 +00:00
|
|
|
|
2018-07-19 20:00:28 +00:00
|
|
|
/*
|
|
|
|
* kmem_bootstrap_free:
|
|
|
|
*
|
|
|
|
* Free pages backing preloaded data (e.g., kernel modules) to the
|
|
|
|
* system. Currently only supported on platforms that create a
|
|
|
|
* vm_phys segment for preloaded data.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
kmem_bootstrap_free(vm_offset_t start, vm_size_t size)
|
|
|
|
{
|
|
|
|
#if defined(__i386__) || defined(__amd64__)
|
|
|
|
struct vm_domain *vmd;
|
2018-07-27 15:46:34 +00:00
|
|
|
vm_offset_t end, va;
|
2018-07-19 20:00:28 +00:00
|
|
|
vm_paddr_t pa;
|
|
|
|
vm_page_t m;
|
|
|
|
|
|
|
|
end = trunc_page(start + size);
|
|
|
|
start = round_page(start);
|
|
|
|
|
2018-07-27 15:46:34 +00:00
|
|
|
for (va = start; va < end; va += PAGE_SIZE) {
|
|
|
|
pa = pmap_kextract(va);
|
2018-07-19 20:00:28 +00:00
|
|
|
m = PHYS_TO_VM_PAGE(pa);
|
|
|
|
|
|
|
|
vmd = vm_pagequeue_domain(m);
|
|
|
|
vm_domain_free_lock(vmd);
|
|
|
|
vm_phys_free_pages(m, 0);
|
|
|
|
vm_domain_free_unlock(vmd);
|
2018-08-03 16:35:37 +00:00
|
|
|
|
|
|
|
vm_domain_freecnt_inc(vmd, 1);
|
|
|
|
vm_cnt.v_page_count++;
|
2018-07-19 20:00:28 +00:00
|
|
|
}
|
2018-07-27 15:46:34 +00:00
|
|
|
pmap_remove(kernel_pmap, start, end);
|
|
|
|
(void)vmem_add(kernel_arena, start, end - start, M_WAITOK);
|
2018-07-19 20:00:28 +00:00
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2009-02-23 23:30:17 +00:00
|
|
|
#ifdef DIAGNOSTIC
|
2009-02-23 23:00:12 +00:00
|
|
|
/*
|
|
|
|
* Allow userspace to directly trigger the VM drain routine for testing
|
|
|
|
* purposes.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
debug_vm_lowmem(SYSCTL_HANDLER_ARGS)
|
|
|
|
{
|
|
|
|
int error, i;
|
|
|
|
|
|
|
|
i = 0;
|
|
|
|
error = sysctl_handle_int(oidp, &i, 0, req);
|
|
|
|
if (error)
|
|
|
|
return (error);
|
2017-02-25 16:39:21 +00:00
|
|
|
if ((i & ~(VM_LOW_KMEM | VM_LOW_PAGES)) != 0)
|
|
|
|
return (EINVAL);
|
|
|
|
if (i != 0)
|
|
|
|
EVENTHANDLER_INVOKE(vm_lowmem, i);
|
2009-02-23 23:00:12 +00:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
SYSCTL_PROC(_debug, OID_AUTO, vm_lowmem, CTLTYPE_INT | CTLFLAG_RW, 0, 0,
|
2017-02-25 16:39:21 +00:00
|
|
|
debug_vm_lowmem, "I", "set to trigger vm_lowmem event with given flags");
|
2009-02-23 23:30:17 +00:00
|
|
|
#endif
|