2005-01-07 02:29:27 +00:00
|
|
|
/*-
|
2017-11-30 15:48:35 +00:00
|
|
|
* SPDX-License-Identifier: (BSD-3-Clause AND MIT-CMU)
|
2017-11-20 19:43:44 +00:00
|
|
|
*
|
1994-05-24 10:09:53 +00:00
|
|
|
* Copyright (c) 1991, 1993
|
|
|
|
* The Regents of the University of California. All rights reserved.
|
|
|
|
*
|
|
|
|
* This code is derived from software contributed to Berkeley by
|
|
|
|
* The Mach Operating System project at Carnegie-Mellon University.
|
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
2017-02-28 23:42:47 +00:00
|
|
|
* 3. Neither the name of the University nor the names of its contributors
|
1994-05-24 10:09:53 +00:00
|
|
|
* may be used to endorse or promote products derived from this software
|
|
|
|
* without specific prior written permission.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
|
|
|
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
|
|
|
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
|
|
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
|
|
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
|
|
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
|
|
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
* SUCH DAMAGE.
|
|
|
|
*
|
1997-02-10 02:22:35 +00:00
|
|
|
* @(#)vm_map.h 8.9 (Berkeley) 5/17/95
|
1994-05-24 10:09:53 +00:00
|
|
|
*
|
|
|
|
*
|
|
|
|
* Copyright (c) 1987, 1990 Carnegie-Mellon University.
|
|
|
|
* All rights reserved.
|
|
|
|
*
|
|
|
|
* Authors: Avadis Tevanian, Jr., Michael Wayne Young
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
*
|
1994-05-24 10:09:53 +00:00
|
|
|
* Permission to use, copy, modify and distribute this software and
|
|
|
|
* its documentation is hereby granted, provided that both the copyright
|
|
|
|
* notice and this permission notice appear in all copies of the
|
|
|
|
* software, derivative works or modified versions, and any portions
|
|
|
|
* thereof, and that both notices appear in supporting documentation.
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
*
|
|
|
|
* CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
|
|
|
|
* CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
|
1994-05-24 10:09:53 +00:00
|
|
|
* FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
*
|
1994-05-24 10:09:53 +00:00
|
|
|
* Carnegie Mellon requests users of this software to return to
|
|
|
|
*
|
|
|
|
* Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU
|
|
|
|
* School of Computer Science
|
|
|
|
* Carnegie Mellon University
|
|
|
|
* Pittsburgh PA 15213-3890
|
|
|
|
*
|
|
|
|
* any improvements or extensions that they make and grant Carnegie the
|
|
|
|
* rights to redistribute these changes.
|
1994-08-02 07:55:43 +00:00
|
|
|
*
|
1999-08-28 01:08:13 +00:00
|
|
|
* $FreeBSD$
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Virtual memory map module definitions.
|
|
|
|
*/
|
|
|
|
#ifndef _VM_MAP_
|
|
|
|
#define _VM_MAP_
|
|
|
|
|
2002-04-28 23:12:52 +00:00
|
|
|
#include <sys/lock.h>
|
2004-07-30 09:10:28 +00:00
|
|
|
#include <sys/sx.h>
|
2003-01-01 00:13:01 +00:00
|
|
|
#include <sys/_mutex.h>
|
2001-05-03 11:33:51 +00:00
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
|
|
|
* Types defined:
|
|
|
|
*
|
|
|
|
* vm_map_t the high-level address map data structure.
|
|
|
|
* vm_map_entry_t an entry in an address map.
|
|
|
|
*/
|
|
|
|
|
2003-08-11 07:14:08 +00:00
|
|
|
typedef u_char vm_flags_t;
|
2000-02-28 04:10:35 +00:00
|
|
|
typedef u_int vm_eflags_t;
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
|
|
|
* Objects which live in maps may be either VM objects, or
|
|
|
|
* another map (called a "sharing map") which denotes read-write
|
|
|
|
* sharing with other maps.
|
|
|
|
*/
|
|
|
|
union vm_map_object {
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
struct vm_object *vm_object; /* object object */
|
|
|
|
struct vm_map *sub_map; /* belongs to another map */
|
1994-05-24 10:09:53 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Address map entries consist of start and end addresses,
|
|
|
|
* a VM object (or sharing map) and offset into that object,
|
|
|
|
* and user-exported inheritance and protection information.
|
|
|
|
* Also included is control information for virtual copy operations.
|
|
|
|
*/
|
|
|
|
struct vm_map_entry {
|
2019-12-07 17:14:33 +00:00
|
|
|
struct vm_map_entry *left; /* left child or previous entry */
|
|
|
|
struct vm_map_entry *right; /* right child or next entry */
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
vm_offset_t start; /* start address */
|
|
|
|
vm_offset_t end; /* end address */
|
2016-07-07 20:58:16 +00:00
|
|
|
vm_offset_t next_read; /* vaddr of the next sequential read */
|
2004-08-13 08:06:34 +00:00
|
|
|
vm_size_t max_free; /* max free space in subtree */
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
union vm_map_object object; /* object I point to */
|
1995-12-11 04:58:34 +00:00
|
|
|
vm_ooffset_t offset; /* offset into object */
|
2000-02-28 04:10:35 +00:00
|
|
|
vm_eflags_t eflags; /* map entry flags */
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
vm_prot_t protection; /* protection code */
|
|
|
|
vm_prot_t max_protection; /* maximum protection */
|
|
|
|
vm_inherit_t inheritance; /* inheritance */
|
2012-05-10 15:16:42 +00:00
|
|
|
uint8_t read_ahead; /* pages in the read-ahead window */
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
int wired_count; /* can be paged if = 0 */
|
2010-12-02 17:37:16 +00:00
|
|
|
struct ucred *cred; /* tmp storage for creator ref */
|
2013-07-11 05:55:08 +00:00
|
|
|
struct thread *wiring_thread;
|
1994-05-24 10:09:53 +00:00
|
|
|
};
|
|
|
|
|
2019-06-08 20:28:04 +00:00
|
|
|
#define MAP_ENTRY_NOSYNC 0x00000001
|
|
|
|
#define MAP_ENTRY_IS_SUB_MAP 0x00000002
|
|
|
|
#define MAP_ENTRY_COW 0x00000004
|
|
|
|
#define MAP_ENTRY_NEEDS_COPY 0x00000008
|
|
|
|
#define MAP_ENTRY_NOFAULT 0x00000010
|
|
|
|
#define MAP_ENTRY_USER_WIRED 0x00000020
|
|
|
|
|
|
|
|
#define MAP_ENTRY_BEHAV_NORMAL 0x00000000 /* default behavior */
|
|
|
|
#define MAP_ENTRY_BEHAV_SEQUENTIAL 0x00000040 /* expect sequential
|
|
|
|
access */
|
|
|
|
#define MAP_ENTRY_BEHAV_RANDOM 0x00000080 /* expect random
|
|
|
|
access */
|
|
|
|
#define MAP_ENTRY_BEHAV_RESERVED 0x000000c0 /* future use */
|
|
|
|
#define MAP_ENTRY_BEHAV_MASK 0x000000c0
|
|
|
|
#define MAP_ENTRY_IN_TRANSITION 0x00000100 /* entry being
|
|
|
|
changed */
|
|
|
|
#define MAP_ENTRY_NEEDS_WAKEUP 0x00000200 /* waiters in
|
|
|
|
transition */
|
|
|
|
#define MAP_ENTRY_NOCOREDUMP 0x00000400 /* don't include in
|
|
|
|
a core */
|
|
|
|
#define MAP_ENTRY_VN_EXEC 0x00000800 /* text vnode mapping */
|
|
|
|
#define MAP_ENTRY_GROWS_DOWN 0x00001000 /* top-down stacks */
|
|
|
|
#define MAP_ENTRY_GROWS_UP 0x00002000 /* bottom-up stacks */
|
|
|
|
|
|
|
|
#define MAP_ENTRY_WIRE_SKIPPED 0x00004000
|
2019-09-03 20:31:48 +00:00
|
|
|
#define MAP_ENTRY_WRITECNT 0x00008000 /* tracked writeable
|
2019-06-08 20:28:04 +00:00
|
|
|
mapping */
|
|
|
|
#define MAP_ENTRY_GUARD 0x00010000
|
|
|
|
#define MAP_ENTRY_STACK_GAP_DN 0x00020000
|
|
|
|
#define MAP_ENTRY_STACK_GAP_UP 0x00040000
|
|
|
|
#define MAP_ENTRY_HEADER 0x00080000
|
2009-04-10 10:16:03 +00:00
|
|
|
|
2020-09-09 22:02:30 +00:00
|
|
|
#define MAP_ENTRY_SPLIT_BOUNDARY_MASK 0x00300000
|
|
|
|
|
|
|
|
#define MAP_ENTRY_SPLIT_BOUNDARY_SHIFT 20
|
|
|
|
|
2002-06-01 16:59:30 +00:00
|
|
|
#ifdef _KERNEL
|
2003-11-03 16:14:45 +00:00
|
|
|
static __inline u_char
|
2002-06-01 16:59:30 +00:00
|
|
|
vm_map_entry_behavior(vm_map_entry_t entry)
|
2003-11-03 16:14:45 +00:00
|
|
|
{
|
2002-06-01 16:59:30 +00:00
|
|
|
return (entry->eflags & MAP_ENTRY_BEHAV_MASK);
|
|
|
|
}
|
2004-08-09 19:52:29 +00:00
|
|
|
|
|
|
|
static __inline int
|
|
|
|
vm_map_entry_user_wired_count(vm_map_entry_t entry)
|
|
|
|
{
|
|
|
|
if (entry->eflags & MAP_ENTRY_USER_WIRED)
|
|
|
|
return (1);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static __inline int
|
|
|
|
vm_map_entry_system_wired_count(vm_map_entry_t entry)
|
|
|
|
{
|
|
|
|
return (entry->wired_count - vm_map_entry_user_wired_count(entry));
|
|
|
|
}
|
2002-06-01 16:59:30 +00:00
|
|
|
#endif /* _KERNEL */
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
2002-09-22 04:33:43 +00:00
|
|
|
* A map is a set of map entries. These map entries are
|
2019-12-07 17:14:33 +00:00
|
|
|
* organized as a threaded binary search tree. Both structures
|
|
|
|
* are ordered based upon the start and end addresses contained
|
|
|
|
* within each map entry. The largest gap between an entry in a
|
|
|
|
* subtree and one of its neighbors is saved in the max_free
|
|
|
|
* field, and that field is updated when the tree is
|
|
|
|
* restructured.
|
2018-08-29 12:24:19 +00:00
|
|
|
*
|
2018-01-20 12:19:02 +00:00
|
|
|
* Sleator and Tarjan's top-down splay algorithm is employed to
|
|
|
|
* control height imbalance in the binary search tree.
|
1999-08-23 18:08:34 +00:00
|
|
|
*
|
2018-11-02 16:26:44 +00:00
|
|
|
* The map's min offset value is stored in map->header.end, and
|
|
|
|
* its max offset value is stored in map->header.start. These
|
|
|
|
* values act as sentinels for any forward or backward address
|
2019-12-07 17:14:33 +00:00
|
|
|
* scan of the list. The right and left fields of the map
|
|
|
|
* header point to the first and list map entries. The map
|
|
|
|
* header has a special value for the eflags field,
|
|
|
|
* MAP_ENTRY_HEADER, that is set initially, is never changed,
|
|
|
|
* and prevents an eflags match of the header with any other map
|
|
|
|
* entry.
|
2018-11-02 16:26:44 +00:00
|
|
|
*
|
2018-08-29 12:24:19 +00:00
|
|
|
* List of locks
|
2002-04-27 22:01:37 +00:00
|
|
|
* (c) const until freed
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
|
|
|
struct vm_map {
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
struct vm_map_entry header; /* List of entries */
|
2004-07-30 09:10:28 +00:00
|
|
|
struct sx lock; /* Lock for map data */
|
2002-12-31 19:38:04 +00:00
|
|
|
struct mtx system_mtx;
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
int nentries; /* Number of entries */
|
|
|
|
vm_size_t size; /* virtual size */
|
2003-08-13 03:13:22 +00:00
|
|
|
u_int timestamp; /* Version number */
|
2002-07-11 02:39:24 +00:00
|
|
|
u_char needs_wakeup;
|
2008-12-31 05:44:05 +00:00
|
|
|
u_char system_map; /* (c) Am I a system map? */
|
2003-08-11 07:14:08 +00:00
|
|
|
vm_flags_t flags; /* flags for this vm_map */
|
2002-09-22 04:33:43 +00:00
|
|
|
vm_map_entry_t root; /* Root of a binary search tree */
|
2002-04-27 22:01:37 +00:00
|
|
|
pmap_t pmap; /* (c) Physical map */
|
Implement Address Space Layout Randomization (ASLR)
With this change, randomization can be enabled for all non-fixed
mappings. It means that the base address for the mapping is selected
with a guaranteed amount of entropy (bits). If the mapping was
requested to be superpage aligned, the randomization honours the
superpage attributes.
Although the value of ASLR is diminshing over time as exploit authors
work out simple ASLR bypass techniques, it elimintates the trivial
exploitation of certain vulnerabilities, at least in theory. This
implementation is relatively small and happens at the correct
architectural level. Also, it is not expected to introduce
regressions in existing cases when turned off (default for now), or
cause any significant maintaince burden.
The randomization is done on a best-effort basis - that is, the
allocator falls back to a first fit strategy if fragmentation prevents
entropy injection. It is trivial to implement a strong mode where
failure to guarantee the requested amount of entropy results in
mapping request failure, but I do not consider that to be usable.
I have not fine-tuned the amount of entropy injected right now. It is
only a quantitive change that will not change the implementation. The
current amount is controlled by aslr_pages_rnd.
To not spoil coalescing optimizations, to reduce the page table
fragmentation inherent to ASLR, and to keep the transient superpage
promotion for the malloced memory, locality clustering is implemented
for anonymous private mappings, which are automatically grouped until
fragmentation kicks in. The initial location for the anon group range
is, of course, randomized. This is controlled by vm.cluster_anon,
enabled by default.
The default mode keeps the sbrk area unpopulated by other mappings,
but this can be turned off, which gives much more breathing bits on
architectures with small address space, such as i386. This is tied
with the question of following an application's hint about the mmap(2)
base address. Testing shows that ignoring the hint does not affect the
function of common applications, but I would expect more demanding
code could break. By default sbrk is preserved and mmap hints are
satisfied, which can be changed by using the
kern.elf{32,64}.aslr.honor_sbrk sysctl.
ASLR is enabled on per-ABI basis, and currently it is only allowed on
FreeBSD native i386 and amd64 (including compat 32bit) ABIs. Support
for additional architectures will be added after further testing.
Both per-process and per-image controls are implemented:
- procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS;
- NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible
to force ASLR off for the given binary. (A tool to edit the feature
control note is in development.)
Global controls are:
- kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2);
- kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings;
- kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2);
- vm.cluster_anon - enables anon mapping clustering.
PR: 208580 (exp runs)
Exp-runs done by: antoine
Reviewed by: markj (previous version)
Discussed with: emaste
Tested by: pho
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D5603
2019-02-10 17:19:45 +00:00
|
|
|
vm_offset_t anon_loc;
|
2010-12-09 21:02:22 +00:00
|
|
|
int busy;
|
2019-11-09 17:08:27 +00:00
|
|
|
#ifdef DIAGNOSTIC
|
|
|
|
int nupdates;
|
|
|
|
#endif
|
1994-05-24 10:09:53 +00:00
|
|
|
};
|
|
|
|
|
2003-08-11 07:14:08 +00:00
|
|
|
/*
|
|
|
|
* vm_flags_t values
|
|
|
|
*/
|
|
|
|
#define MAP_WIREFUTURE 0x01 /* wire all future pages */
|
exec: Reimplement stack address randomization
The approach taken by the stack gap implementation was to insert a
random gap between the top of the fixed stack mapping and the true top
of the main process stack. This approach was chosen so as to avoid
randomizing the previously fixed address of certain process metadata
stored at the top of the stack, but had some shortcomings. In
particular, mlockall(2) calls would wire the gap, bloating the process'
memory usage, and RLIMIT_STACK included the size of the gap so small
(< several MB) limits could not be used.
There is little value in storing each process' ps_strings at a fixed
location, as only very old programs hard-code this address; consumers
were converted decades ago to use a sysctl-based interface for this
purpose. Thus, this change re-implements stack address randomization by
simply breaking the convention of storing ps_strings at a fixed
location, and randomizing the location of the entire stack mapping.
This implementation is simpler and avoids the problems mentioned above,
while being unlikely to break compatibility anywhere the default ASLR
settings are used.
The kern.elfN.aslr.stack_gap sysctl is renamed to kern.elfN.aslr.stack,
and is re-enabled by default.
PR: 260303
Reviewed by: kib
Discussed with: emaste, mw
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33704
2022-01-17 16:42:56 +00:00
|
|
|
#define MAP_BUSY_WAKEUP 0x02 /* thread(s) waiting on busy state */
|
Implement Address Space Layout Randomization (ASLR)
With this change, randomization can be enabled for all non-fixed
mappings. It means that the base address for the mapping is selected
with a guaranteed amount of entropy (bits). If the mapping was
requested to be superpage aligned, the randomization honours the
superpage attributes.
Although the value of ASLR is diminshing over time as exploit authors
work out simple ASLR bypass techniques, it elimintates the trivial
exploitation of certain vulnerabilities, at least in theory. This
implementation is relatively small and happens at the correct
architectural level. Also, it is not expected to introduce
regressions in existing cases when turned off (default for now), or
cause any significant maintaince burden.
The randomization is done on a best-effort basis - that is, the
allocator falls back to a first fit strategy if fragmentation prevents
entropy injection. It is trivial to implement a strong mode where
failure to guarantee the requested amount of entropy results in
mapping request failure, but I do not consider that to be usable.
I have not fine-tuned the amount of entropy injected right now. It is
only a quantitive change that will not change the implementation. The
current amount is controlled by aslr_pages_rnd.
To not spoil coalescing optimizations, to reduce the page table
fragmentation inherent to ASLR, and to keep the transient superpage
promotion for the malloced memory, locality clustering is implemented
for anonymous private mappings, which are automatically grouped until
fragmentation kicks in. The initial location for the anon group range
is, of course, randomized. This is controlled by vm.cluster_anon,
enabled by default.
The default mode keeps the sbrk area unpopulated by other mappings,
but this can be turned off, which gives much more breathing bits on
architectures with small address space, such as i386. This is tied
with the question of following an application's hint about the mmap(2)
base address. Testing shows that ignoring the hint does not affect the
function of common applications, but I would expect more demanding
code could break. By default sbrk is preserved and mmap hints are
satisfied, which can be changed by using the
kern.elf{32,64}.aslr.honor_sbrk sysctl.
ASLR is enabled on per-ABI basis, and currently it is only allowed on
FreeBSD native i386 and amd64 (including compat 32bit) ABIs. Support
for additional architectures will be added after further testing.
Both per-process and per-image controls are implemented:
- procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS;
- NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible
to force ASLR off for the given binary. (A tool to edit the feature
control note is in development.)
Global controls are:
- kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2);
- kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings;
- kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2);
- vm.cluster_anon - enables anon mapping clustering.
PR: 208580 (exp runs)
Exp-runs done by: antoine
Reviewed by: markj (previous version)
Discussed with: emaste
Tested by: pho
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D5603
2019-02-10 17:19:45 +00:00
|
|
|
#define MAP_IS_SUB_MAP 0x04 /* has parent */
|
|
|
|
#define MAP_ASLR 0x08 /* enabled ASLR */
|
exec: Reimplement stack address randomization
The approach taken by the stack gap implementation was to insert a
random gap between the top of the fixed stack mapping and the true top
of the main process stack. This approach was chosen so as to avoid
randomizing the previously fixed address of certain process metadata
stored at the top of the stack, but had some shortcomings. In
particular, mlockall(2) calls would wire the gap, bloating the process'
memory usage, and RLIMIT_STACK included the size of the gap so small
(< several MB) limits could not be used.
There is little value in storing each process' ps_strings at a fixed
location, as only very old programs hard-code this address; consumers
were converted decades ago to use a sysctl-based interface for this
purpose. Thus, this change re-implements stack address randomization by
simply breaking the convention of storing ps_strings at a fixed
location, and randomizing the location of the entire stack mapping.
This implementation is simpler and avoids the problems mentioned above,
while being unlikely to break compatibility anywhere the default ASLR
settings are used.
The kern.elfN.aslr.stack_gap sysctl is renamed to kern.elfN.aslr.stack,
and is re-enabled by default.
PR: 260303
Reviewed by: kib
Discussed with: emaste, mw
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33704
2022-01-17 16:42:56 +00:00
|
|
|
#define MAP_ASLR_IGNSTART 0x10 /* ASLR ignores data segment */
|
|
|
|
#define MAP_REPLENISH 0x20 /* kmapent zone needs to be refilled */
|
2021-01-08 22:40:04 +00:00
|
|
|
#define MAP_WXORX 0x40 /* enforce W^X */
|
exec: Reimplement stack address randomization
The approach taken by the stack gap implementation was to insert a
random gap between the top of the fixed stack mapping and the true top
of the main process stack. This approach was chosen so as to avoid
randomizing the previously fixed address of certain process metadata
stored at the top of the stack, but had some shortcomings. In
particular, mlockall(2) calls would wire the gap, bloating the process'
memory usage, and RLIMIT_STACK included the size of the gap so small
(< several MB) limits could not be used.
There is little value in storing each process' ps_strings at a fixed
location, as only very old programs hard-code this address; consumers
were converted decades ago to use a sysctl-based interface for this
purpose. Thus, this change re-implements stack address randomization by
simply breaking the convention of storing ps_strings at a fixed
location, and randomizing the location of the entire stack mapping.
This implementation is simpler and avoids the problems mentioned above,
while being unlikely to break compatibility anywhere the default ASLR
settings are used.
The kern.elfN.aslr.stack_gap sysctl is renamed to kern.elfN.aslr.stack,
and is re-enabled by default.
PR: 260303
Reviewed by: kib
Discussed with: emaste, mw
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33704
2022-01-17 16:42:56 +00:00
|
|
|
#define MAP_ASLR_STACK 0x80 /* stack location is randomized */
|
2003-08-11 07:14:08 +00:00
|
|
|
|
2002-04-27 22:01:37 +00:00
|
|
|
#ifdef _KERNEL
|
2018-07-02 19:48:38 +00:00
|
|
|
#if defined(KLD_MODULE) && !defined(KLD_TIED)
|
2018-03-30 10:55:31 +00:00
|
|
|
#define vm_map_max(map) vm_map_max_KBI((map))
|
|
|
|
#define vm_map_min(map) vm_map_min_KBI((map))
|
|
|
|
#define vm_map_pmap(map) vm_map_pmap_KBI((map))
|
2020-07-13 16:39:27 +00:00
|
|
|
#define vm_map_range_valid(map, start, end) \
|
|
|
|
vm_map_range_valid_KBI((map), (start), (end))
|
2018-03-30 10:55:31 +00:00
|
|
|
#else
|
2002-04-27 22:01:37 +00:00
|
|
|
static __inline vm_offset_t
|
2012-07-15 20:29:48 +00:00
|
|
|
vm_map_max(const struct vm_map *map)
|
2002-04-27 22:01:37 +00:00
|
|
|
{
|
2018-08-29 12:24:19 +00:00
|
|
|
|
|
|
|
return (map->header.start);
|
2002-04-27 22:01:37 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static __inline vm_offset_t
|
2012-07-15 20:29:48 +00:00
|
|
|
vm_map_min(const struct vm_map *map)
|
2002-04-27 22:01:37 +00:00
|
|
|
{
|
2018-08-29 12:24:19 +00:00
|
|
|
|
|
|
|
return (map->header.end);
|
2002-04-27 22:01:37 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static __inline pmap_t
|
|
|
|
vm_map_pmap(vm_map_t map)
|
|
|
|
{
|
|
|
|
return (map->pmap);
|
|
|
|
}
|
2003-08-11 07:14:08 +00:00
|
|
|
|
|
|
|
static __inline void
|
|
|
|
vm_map_modflags(vm_map_t map, vm_flags_t set, vm_flags_t clear)
|
|
|
|
{
|
|
|
|
map->flags = (map->flags | set) & ~clear;
|
|
|
|
}
|
2020-06-19 03:32:04 +00:00
|
|
|
|
|
|
|
static inline bool
|
|
|
|
vm_map_range_valid(vm_map_t map, vm_offset_t start, vm_offset_t end)
|
|
|
|
{
|
|
|
|
if (end < start)
|
|
|
|
return (false);
|
|
|
|
if (start < vm_map_min(map) || end > vm_map_max(map))
|
|
|
|
return (false);
|
|
|
|
return (true);
|
|
|
|
}
|
|
|
|
|
2018-03-30 10:55:31 +00:00
|
|
|
#endif /* KLD_MODULE */
|
2002-04-27 22:01:37 +00:00
|
|
|
#endif /* _KERNEL */
|
|
|
|
|
2003-11-03 16:14:45 +00:00
|
|
|
/*
|
1995-12-07 12:48:31 +00:00
|
|
|
* Shareable process virtual address space.
|
2002-06-25 18:14:38 +00:00
|
|
|
*
|
|
|
|
* List of locks
|
|
|
|
* (c) const until freed
|
1995-12-07 12:48:31 +00:00
|
|
|
*/
|
|
|
|
struct vmspace {
|
|
|
|
struct vm_map vm_map; /* VM address map */
|
2002-07-22 16:22:27 +00:00
|
|
|
struct shmmap_state *vm_shm; /* SYS5 shared memory private data XXX */
|
1995-12-07 12:48:31 +00:00
|
|
|
segsz_t vm_swrss; /* resident set size before last swap */
|
|
|
|
segsz_t vm_tsize; /* text size (pages) XXX */
|
|
|
|
segsz_t vm_dsize; /* data size (pages) XXX */
|
|
|
|
segsz_t vm_ssize; /* stack size (pages) */
|
2002-06-25 18:14:38 +00:00
|
|
|
caddr_t vm_taddr; /* (c) user virtual address of text */
|
|
|
|
caddr_t vm_daddr; /* (c) user virtual address of data */
|
1995-12-07 12:48:31 +00:00
|
|
|
caddr_t vm_maxsaddr; /* user VA at max stack growth */
|
exec: Reimplement stack address randomization
The approach taken by the stack gap implementation was to insert a
random gap between the top of the fixed stack mapping and the true top
of the main process stack. This approach was chosen so as to avoid
randomizing the previously fixed address of certain process metadata
stored at the top of the stack, but had some shortcomings. In
particular, mlockall(2) calls would wire the gap, bloating the process'
memory usage, and RLIMIT_STACK included the size of the gap so small
(< several MB) limits could not be used.
There is little value in storing each process' ps_strings at a fixed
location, as only very old programs hard-code this address; consumers
were converted decades ago to use a sysctl-based interface for this
purpose. Thus, this change re-implements stack address randomization by
simply breaking the convention of storing ps_strings at a fixed
location, and randomizing the location of the entire stack mapping.
This implementation is simpler and avoids the problems mentioned above,
while being unlikely to break compatibility anywhere the default ASLR
settings are used.
The kern.elfN.aslr.stack_gap sysctl is renamed to kern.elfN.aslr.stack,
and is re-enabled by default.
PR: 260303
Reviewed by: kib
Discussed with: emaste, mw
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33704
2022-01-17 16:42:56 +00:00
|
|
|
vm_offset_t vm_stacktop; /* top of the stack, may not be page-aligned */
|
2022-06-02 07:58:12 +00:00
|
|
|
vm_offset_t vm_shp_base; /* shared page address */
|
2020-11-04 16:30:56 +00:00
|
|
|
u_int vm_refcnt; /* number of references */
|
2008-03-01 22:54:42 +00:00
|
|
|
/*
|
|
|
|
* Keep the PMAP last, so that CPU-specific variations of that
|
|
|
|
* structure on a single architecture don't result in offset
|
|
|
|
* variations of the machine-independent fields in the vmspace.
|
|
|
|
*/
|
|
|
|
struct pmap vm_pmap; /* private physical map */
|
1995-12-07 12:48:31 +00:00
|
|
|
};
|
|
|
|
|
2002-06-01 22:41:43 +00:00
|
|
|
#ifdef _KERNEL
|
|
|
|
static __inline pmap_t
|
|
|
|
vmspace_pmap(struct vmspace *vmspace)
|
|
|
|
{
|
|
|
|
return &vmspace->vm_pmap;
|
|
|
|
}
|
|
|
|
#endif /* _KERNEL */
|
|
|
|
|
2001-05-19 01:28:09 +00:00
|
|
|
#ifdef _KERNEL
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
|
|
|
* Macros: vm_map_lock, etc.
|
|
|
|
* Function:
|
1999-08-16 18:21:09 +00:00
|
|
|
* Perform locking on the data portion of a map. Note that
|
|
|
|
* these macros mimic procedure calls returning void. The
|
|
|
|
* semicolon is supplied by the user of these macros, not
|
|
|
|
* by the macros themselves. The macros can safely be used
|
|
|
|
* as unbraced elements in a higher level statement.
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
|
|
|
|
2002-04-28 23:12:52 +00:00
|
|
|
void _vm_map_lock(vm_map_t map, const char *file, int line);
|
|
|
|
void _vm_map_unlock(vm_map_t map, const char *file, int line);
|
2010-09-19 17:43:22 +00:00
|
|
|
int _vm_map_unlock_and_wait(vm_map_t map, int timo, const char *file, int line);
|
2002-04-28 23:12:52 +00:00
|
|
|
void _vm_map_lock_read(vm_map_t map, const char *file, int line);
|
|
|
|
void _vm_map_unlock_read(vm_map_t map, const char *file, int line);
|
|
|
|
int _vm_map_trylock(vm_map_t map, const char *file, int line);
|
2003-03-12 23:13:16 +00:00
|
|
|
int _vm_map_trylock_read(vm_map_t map, const char *file, int line);
|
2002-04-28 23:12:52 +00:00
|
|
|
int _vm_map_lock_upgrade(vm_map_t map, const char *file, int line);
|
|
|
|
void _vm_map_lock_downgrade(vm_map_t map, const char *file, int line);
|
2009-01-01 00:31:46 +00:00
|
|
|
int vm_map_locked(vm_map_t map);
|
2002-07-11 02:39:24 +00:00
|
|
|
void vm_map_wakeup(vm_map_t map);
|
2010-12-09 21:02:22 +00:00
|
|
|
void vm_map_busy(vm_map_t map);
|
|
|
|
void vm_map_unbusy(vm_map_t map);
|
|
|
|
void vm_map_wait_busy(vm_map_t map);
|
2018-03-30 10:55:31 +00:00
|
|
|
vm_offset_t vm_map_max_KBI(const struct vm_map *map);
|
|
|
|
vm_offset_t vm_map_min_KBI(const struct vm_map *map);
|
|
|
|
pmap_t vm_map_pmap_KBI(vm_map_t map);
|
2020-07-13 16:39:27 +00:00
|
|
|
bool vm_map_range_valid_KBI(vm_map_t map, vm_offset_t start, vm_offset_t end);
|
2002-04-28 23:12:52 +00:00
|
|
|
|
|
|
|
#define vm_map_lock(map) _vm_map_lock(map, LOCK_FILE, LOCK_LINE)
|
|
|
|
#define vm_map_unlock(map) _vm_map_unlock(map, LOCK_FILE, LOCK_LINE)
|
2010-09-19 17:43:22 +00:00
|
|
|
#define vm_map_unlock_and_wait(map, timo) \
|
|
|
|
_vm_map_unlock_and_wait(map, timo, LOCK_FILE, LOCK_LINE)
|
2002-04-28 23:12:52 +00:00
|
|
|
#define vm_map_lock_read(map) _vm_map_lock_read(map, LOCK_FILE, LOCK_LINE)
|
|
|
|
#define vm_map_unlock_read(map) _vm_map_unlock_read(map, LOCK_FILE, LOCK_LINE)
|
|
|
|
#define vm_map_trylock(map) _vm_map_trylock(map, LOCK_FILE, LOCK_LINE)
|
2003-03-12 23:13:16 +00:00
|
|
|
#define vm_map_trylock_read(map) \
|
|
|
|
_vm_map_trylock_read(map, LOCK_FILE, LOCK_LINE)
|
2002-04-28 23:12:52 +00:00
|
|
|
#define vm_map_lock_upgrade(map) \
|
|
|
|
_vm_map_lock_upgrade(map, LOCK_FILE, LOCK_LINE)
|
|
|
|
#define vm_map_lock_downgrade(map) \
|
|
|
|
_vm_map_lock_downgrade(map, LOCK_FILE, LOCK_LINE)
|
2001-07-04 20:15:18 +00:00
|
|
|
|
|
|
|
long vmspace_resident_count(struct vmspace *vmspace);
|
2001-05-19 01:28:09 +00:00
|
|
|
#endif /* _KERNEL */
|
1994-05-24 10:09:53 +00:00
|
|
|
|
1996-01-19 04:00:31 +00:00
|
|
|
/*
|
|
|
|
* Copy-on-write flags for vm_map operations
|
|
|
|
*/
|
2019-06-08 20:28:04 +00:00
|
|
|
#define MAP_INHERIT_SHARE 0x00000001
|
|
|
|
#define MAP_COPY_ON_WRITE 0x00000002
|
|
|
|
#define MAP_NOFAULT 0x00000004
|
|
|
|
#define MAP_PREFAULT 0x00000008
|
|
|
|
#define MAP_PREFAULT_PARTIAL 0x00000010
|
|
|
|
#define MAP_DISABLE_SYNCER 0x00000020
|
|
|
|
#define MAP_CHECK_EXCL 0x00000040
|
|
|
|
#define MAP_CREATE_GUARD 0x00000080
|
|
|
|
#define MAP_DISABLE_COREDUMP 0x00000100
|
|
|
|
#define MAP_PREFAULT_MADVISE 0x00000200 /* from (user) madvise request */
|
2019-09-03 20:31:48 +00:00
|
|
|
#define MAP_WRITECOUNT 0x00000400
|
2019-06-08 20:28:04 +00:00
|
|
|
#define MAP_REMAP 0x00000800
|
|
|
|
#define MAP_STACK_GROWS_DOWN 0x00001000
|
|
|
|
#define MAP_STACK_GROWS_UP 0x00002000
|
|
|
|
#define MAP_ACC_CHARGED 0x00004000
|
|
|
|
#define MAP_ACC_NO_CHARGE 0x00008000
|
|
|
|
#define MAP_CREATE_STACK_GAP_UP 0x00010000
|
|
|
|
#define MAP_CREATE_STACK_GAP_DN 0x00020000
|
|
|
|
#define MAP_VN_EXEC 0x00040000
|
2020-09-09 22:02:30 +00:00
|
|
|
#define MAP_SPLIT_BOUNDARY_MASK 0x00180000
|
|
|
|
|
|
|
|
#define MAP_SPLIT_BOUNDARY_SHIFT 19
|
1996-01-19 04:00:31 +00:00
|
|
|
|
1996-12-14 17:54:17 +00:00
|
|
|
/*
|
|
|
|
* vm_fault option flags
|
|
|
|
*/
|
2020-10-02 17:50:22 +00:00
|
|
|
#define VM_FAULT_NORMAL 0x00 /* Nothing special */
|
|
|
|
#define VM_FAULT_WIRE 0x01 /* Wire the mapped page */
|
|
|
|
#define VM_FAULT_DIRTY 0x02 /* Dirty the page; use w/VM_PROT_COPY */
|
|
|
|
#define VM_FAULT_NOFILL 0x04 /* Fail if the pager doesn't have a copy */
|
1996-12-14 17:54:17 +00:00
|
|
|
|
2012-05-10 15:16:42 +00:00
|
|
|
/*
|
|
|
|
* Initially, mappings are slightly sequential. The maximum window size must
|
|
|
|
* account for the map entry's "read_ahead" field being defined as an uint8_t.
|
|
|
|
*/
|
|
|
|
#define VM_FAULT_READ_AHEAD_MIN 7
|
|
|
|
#define VM_FAULT_READ_AHEAD_INIT 15
|
Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.
Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*). Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.
Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys. Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight. Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.
Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.
Suggested by: mav (*)
Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27225
2020-11-28 12:12:51 +00:00
|
|
|
#define VM_FAULT_READ_AHEAD_MAX min(atop(maxphys) - 1, UINT8_MAX)
|
2012-05-10 15:16:42 +00:00
|
|
|
|
2008-05-10 18:55:35 +00:00
|
|
|
/*
|
2013-08-16 21:13:55 +00:00
|
|
|
* The following "find_space" options are supported by vm_map_find().
|
|
|
|
*
|
|
|
|
* For VMFS_ALIGNED_SPACE, the desired alignment is specified to
|
|
|
|
* the macro argument as log base 2 of the desired alignment.
|
2008-05-10 18:55:35 +00:00
|
|
|
*/
|
|
|
|
#define VMFS_NO_SPACE 0 /* don't find; use the given range */
|
|
|
|
#define VMFS_ANY_SPACE 1 /* find a range with any alignment */
|
2013-07-19 19:06:15 +00:00
|
|
|
#define VMFS_OPTIMAL_SPACE 2 /* find a range with optimal alignment*/
|
2013-08-16 21:13:55 +00:00
|
|
|
#define VMFS_SUPER_SPACE 3 /* find a superpage-aligned range */
|
|
|
|
#define VMFS_ALIGNED_SPACE(x) ((x) << 8) /* find a range with fixed alignment */
|
2008-05-10 18:55:35 +00:00
|
|
|
|
2003-08-11 07:14:08 +00:00
|
|
|
/*
|
|
|
|
* vm_map_wire and vm_map_unwire option flags
|
|
|
|
*/
|
|
|
|
#define VM_MAP_WIRE_SYSTEM 0 /* wiring in a kernel map */
|
|
|
|
#define VM_MAP_WIRE_USER 1 /* wiring in a user map */
|
|
|
|
|
|
|
|
#define VM_MAP_WIRE_NOHOLES 0 /* region must not have holes */
|
|
|
|
#define VM_MAP_WIRE_HOLESOK 2 /* region may have holes */
|
|
|
|
|
2011-03-21 09:40:01 +00:00
|
|
|
#define VM_MAP_WIRE_WRITE 4 /* Validate writable. */
|
|
|
|
|
2019-12-08 22:33:51 +00:00
|
|
|
typedef int vm_map_entry_reader(void *token, vm_map_entry_t addr,
|
|
|
|
vm_map_entry_t dest);
|
|
|
|
|
|
|
|
#ifndef _KERNEL
|
|
|
|
/*
|
|
|
|
* Find the successor of a map_entry, using a reader to dereference pointers.
|
|
|
|
* '*clone' is a copy of a vm_map entry. 'reader' is used to copy a map entry
|
|
|
|
* at some address into '*clone'. Change *clone to a copy of the next map
|
|
|
|
* entry, and return the address of that entry, or NULL if copying has failed.
|
|
|
|
*
|
|
|
|
* This function is made available to user-space code that needs to traverse
|
|
|
|
* map entries.
|
|
|
|
*/
|
|
|
|
static inline vm_map_entry_t
|
|
|
|
vm_map_entry_read_succ(void *token, struct vm_map_entry *const clone,
|
|
|
|
vm_map_entry_reader reader)
|
|
|
|
{
|
|
|
|
vm_map_entry_t after, backup;
|
|
|
|
vm_offset_t start;
|
|
|
|
|
|
|
|
after = clone->right;
|
|
|
|
start = clone->start;
|
|
|
|
if (!reader(token, after, clone))
|
|
|
|
return (NULL);
|
|
|
|
backup = clone->left;
|
|
|
|
if (!reader(token, backup, clone))
|
|
|
|
return (NULL);
|
|
|
|
if (clone->start > start) {
|
|
|
|
do {
|
|
|
|
after = backup;
|
|
|
|
backup = clone->left;
|
|
|
|
if (!reader(token, backup, clone))
|
|
|
|
return (NULL);
|
|
|
|
} while (clone->start != start);
|
|
|
|
}
|
|
|
|
if (!reader(token, after, clone))
|
|
|
|
return (NULL);
|
|
|
|
return (after);
|
|
|
|
}
|
|
|
|
#endif /* ! _KERNEL */
|
|
|
|
|
2019-12-09 05:09:46 +00:00
|
|
|
#ifdef _KERNEL
|
|
|
|
boolean_t vm_map_check_protection (vm_map_t, vm_offset_t, vm_offset_t, vm_prot_t);
|
|
|
|
int vm_map_delete(vm_map_t, vm_offset_t, vm_offset_t);
|
|
|
|
int vm_map_find(vm_map_t, vm_object_t, vm_ooffset_t, vm_offset_t *, vm_size_t,
|
|
|
|
vm_offset_t, int, vm_prot_t, vm_prot_t, int);
|
|
|
|
int vm_map_find_min(vm_map_t, vm_object_t, vm_ooffset_t, vm_offset_t *,
|
|
|
|
vm_size_t, vm_offset_t, vm_offset_t, int, vm_prot_t, vm_prot_t, int);
|
2020-09-09 21:44:59 +00:00
|
|
|
int vm_map_find_aligned(vm_map_t map, vm_offset_t *addr, vm_size_t length,
|
|
|
|
vm_offset_t max_addr, vm_offset_t alignment);
|
2019-12-09 05:09:46 +00:00
|
|
|
int vm_map_fixed(vm_map_t, vm_object_t, vm_ooffset_t, vm_offset_t, vm_size_t,
|
|
|
|
vm_prot_t, vm_prot_t, int);
|
|
|
|
vm_offset_t vm_map_findspace(vm_map_t, vm_offset_t, vm_size_t);
|
|
|
|
int vm_map_inherit (vm_map_t, vm_offset_t, vm_offset_t, vm_inherit_t);
|
|
|
|
void vm_map_init(vm_map_t, pmap_t, vm_offset_t, vm_offset_t);
|
|
|
|
int vm_map_insert (vm_map_t, vm_object_t, vm_ooffset_t, vm_offset_t, vm_offset_t, vm_prot_t, vm_prot_t, int);
|
|
|
|
int vm_map_lookup (vm_map_t *, vm_offset_t, vm_prot_t, vm_map_entry_t *, vm_object_t *,
|
|
|
|
vm_pindex_t *, vm_prot_t *, boolean_t *);
|
|
|
|
int vm_map_lookup_locked(vm_map_t *, vm_offset_t, vm_prot_t, vm_map_entry_t *, vm_object_t *,
|
|
|
|
vm_pindex_t *, vm_prot_t *, boolean_t *);
|
|
|
|
void vm_map_lookup_done (vm_map_t, vm_map_entry_t);
|
|
|
|
boolean_t vm_map_lookup_entry (vm_map_t, vm_offset_t, vm_map_entry_t *);
|
|
|
|
|
2019-11-13 15:56:07 +00:00
|
|
|
static inline vm_map_entry_t
|
2019-11-20 16:06:48 +00:00
|
|
|
vm_map_entry_first(vm_map_t map)
|
2019-11-13 15:56:07 +00:00
|
|
|
{
|
|
|
|
|
2019-12-07 17:14:33 +00:00
|
|
|
return (map->header.right);
|
2019-11-13 15:56:07 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline vm_map_entry_t
|
2019-11-20 16:06:48 +00:00
|
|
|
vm_map_entry_succ(vm_map_entry_t entry)
|
2019-11-13 15:56:07 +00:00
|
|
|
{
|
2019-12-07 17:14:33 +00:00
|
|
|
vm_map_entry_t after;
|
|
|
|
|
|
|
|
after = entry->right;
|
|
|
|
if (after->left->start > entry->start) {
|
|
|
|
do
|
|
|
|
after = after->left;
|
|
|
|
while (after->left != entry);
|
|
|
|
}
|
|
|
|
return (after);
|
2019-11-13 15:56:07 +00:00
|
|
|
}
|
|
|
|
|
2019-11-20 16:06:48 +00:00
|
|
|
#define VM_MAP_ENTRY_FOREACH(it, map) \
|
|
|
|
for ((it) = vm_map_entry_first(map); \
|
2019-10-08 07:14:21 +00:00
|
|
|
(it) != &(map)->header; \
|
2019-11-13 15:56:07 +00:00
|
|
|
(it) = vm_map_entry_succ(it))
|
2021-01-12 12:43:39 +00:00
|
|
|
|
|
|
|
#define VM_MAP_PROTECT_SET_PROT 0x0001
|
|
|
|
#define VM_MAP_PROTECT_SET_MAXPROT 0x0002
|
|
|
|
|
|
|
|
int vm_map_protect(vm_map_t map, vm_offset_t start, vm_offset_t end,
|
|
|
|
vm_prot_t new_prot, vm_prot_t new_maxprot, int flags);
|
2001-07-04 20:15:18 +00:00
|
|
|
int vm_map_remove (vm_map_t, vm_offset_t, vm_offset_t);
|
2019-08-25 07:06:51 +00:00
|
|
|
void vm_map_try_merge_entries(vm_map_t map, vm_map_entry_t prev,
|
|
|
|
vm_map_entry_t entry);
|
2001-07-04 20:15:18 +00:00
|
|
|
void vm_map_startup (void);
|
|
|
|
int vm_map_submap (vm_map_t, vm_offset_t, vm_offset_t, vm_map_t);
|
2003-11-09 05:25:35 +00:00
|
|
|
int vm_map_sync(vm_map_t, vm_offset_t, vm_offset_t, boolean_t, boolean_t);
|
2001-07-04 20:15:18 +00:00
|
|
|
int vm_map_madvise (vm_map_t, vm_offset_t, vm_offset_t, int);
|
|
|
|
int vm_map_stack (vm_map_t, vm_offset_t, vm_size_t, vm_prot_t, vm_prot_t, int);
|
2002-06-07 18:34:23 +00:00
|
|
|
int vm_map_unwire(vm_map_t map, vm_offset_t start, vm_offset_t end,
|
2003-08-11 07:14:08 +00:00
|
|
|
int flags);
|
Provide separate accounting for user-wired pages.
Historically we have not distinguished between kernel wirings and user
wirings for accounting purposes. User wirings (via mlock(2)) were
subject to a global limit on the number of wired pages, so if large
swaths of physical memory were wired by the kernel, as happens with
the ZFS ARC among other things, the limit could be exceeded, causing
user wirings to fail.
The change adds a new counter, v_user_wire_count, which counts the
number of virtual pages wired by user processes via mlock(2) and
mlockall(2). Only user-wired pages are subject to the system-wide
limit which helps provide some safety against deadlocks. In
particular, while sources of kernel wirings typically support some
backpressure mechanism, there is no way to reclaim user-wired pages
shorting of killing the wiring process. The limit is exported as
vm.max_user_wired, renamed from vm.max_wired, and changed from u_int
to u_long.
The choice to count virtual user-wired pages rather than physical
pages was done for simplicity. There are mechanisms that can cause
user-wired mappings to be destroyed while maintaining a wiring of
the backing physical page; these make it difficult to accurately
track user wirings at the physical page layer.
The change also closes some holes which allowed user wirings to succeed
even when they would cause the system limit to be exceeded. For
instance, mmap() may now fail with ENOMEM in a process that has called
mlockall(MCL_FUTURE) if the new mapping would cause the user wiring
limit to be exceeded.
Note that bhyve -S is subject to the user wiring limit, which defaults
to 1/3 of physical RAM. Users that wish to exceed the limit must tune
vm.max_user_wired.
Reviewed by: kib, ngie (mlock() test changes)
Tested by: pho (earlier version)
MFC after: 45 days
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19908
2019-05-13 16:38:48 +00:00
|
|
|
int vm_map_wire(vm_map_t map, vm_offset_t start, vm_offset_t end, int flags);
|
|
|
|
int vm_map_wire_locked(vm_map_t map, vm_offset_t start, vm_offset_t end,
|
2003-08-11 07:14:08 +00:00
|
|
|
int flags);
|
2011-03-01 11:04:30 +00:00
|
|
|
long vmspace_swap_count(struct vmspace *vmspace);
|
Switch to use shared vnode locks for text files during image activation.
kern_execve() locks text vnode exclusive to be able to set and clear
VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0
condition.
The change removes VV_TEXT, replacing it with the condition
v_writecount <= -1, and puts v_writecount under the vnode interlock.
Each text reference decrements v_writecount. To clear the text
reference when the segment is unmapped, it is recorded in the
vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and
v_writecount is incremented on the map entry removal
The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that
v_writecount does not contradict the desired change. vn_writecheck()
is now racy and its use was eliminated everywhere except access.
Atomic check for writeability and increment of v_writecount is
performed by the VOP. vn_truncate() now increments v_writecount
around VOP_SETATTR() call, lack of which is arguably a bug on its own.
nullfs bypasses v_writecount to the lower vnode always, so nullfs
vnode has its own v_writecount correct, and lower vnode gets all
references, since object->handle is always lower vnode.
On the text vnode' vm object dealloc, the v_writecount value is reset
to zero, and deadfs vop_unset_text short-circuit the operation.
Reclamation of lowervp always reclaims all nullfs vnodes referencing
lowervp first, so no stray references are left.
Reviewed by: markj, trasz
Tested by: mjg, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D19923
2019-05-05 11:20:43 +00:00
|
|
|
void vm_map_entry_set_vnode_text(vm_map_entry_t entry, bool add);
|
2002-03-10 21:52:48 +00:00
|
|
|
#endif /* _KERNEL */
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
#endif /* _VM_MAP_ */
|