2005-01-07 02:29:27 +00:00
|
|
|
/*-
|
2017-11-30 15:48:35 +00:00
|
|
|
* SPDX-License-Identifier: (BSD-3-Clause AND MIT-CMU)
|
2017-11-20 19:43:44 +00:00
|
|
|
*
|
1994-05-24 10:09:53 +00:00
|
|
|
* Copyright (c) 1991, 1993
|
|
|
|
* The Regents of the University of California. All rights reserved.
|
|
|
|
*
|
|
|
|
* This code is derived from software contributed to Berkeley by
|
|
|
|
* The Mach Operating System project at Carnegie-Mellon University.
|
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
2017-02-28 23:42:47 +00:00
|
|
|
* 3. Neither the name of the University nor the names of its contributors
|
1994-05-24 10:09:53 +00:00
|
|
|
* may be used to endorse or promote products derived from this software
|
|
|
|
* without specific prior written permission.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
|
|
|
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
|
|
|
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
|
|
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
|
|
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
|
|
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
|
|
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
* SUCH DAMAGE.
|
|
|
|
*
|
1997-02-10 02:22:35 +00:00
|
|
|
* @(#)vm_map.h 8.9 (Berkeley) 5/17/95
|
1994-05-24 10:09:53 +00:00
|
|
|
*
|
|
|
|
*
|
|
|
|
* Copyright (c) 1987, 1990 Carnegie-Mellon University.
|
|
|
|
* All rights reserved.
|
|
|
|
*
|
|
|
|
* Authors: Avadis Tevanian, Jr., Michael Wayne Young
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
*
|
1994-05-24 10:09:53 +00:00
|
|
|
* Permission to use, copy, modify and distribute this software and
|
|
|
|
* its documentation is hereby granted, provided that both the copyright
|
|
|
|
* notice and this permission notice appear in all copies of the
|
|
|
|
* software, derivative works or modified versions, and any portions
|
|
|
|
* thereof, and that both notices appear in supporting documentation.
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
*
|
|
|
|
* CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
|
|
|
|
* CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
|
1994-05-24 10:09:53 +00:00
|
|
|
* FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
*
|
1994-05-24 10:09:53 +00:00
|
|
|
* Carnegie Mellon requests users of this software to return to
|
|
|
|
*
|
|
|
|
* Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU
|
|
|
|
* School of Computer Science
|
|
|
|
* Carnegie Mellon University
|
|
|
|
* Pittsburgh PA 15213-3890
|
|
|
|
*
|
|
|
|
* any improvements or extensions that they make and grant Carnegie the
|
|
|
|
* rights to redistribute these changes.
|
1994-08-02 07:55:43 +00:00
|
|
|
*
|
1999-08-28 01:08:13 +00:00
|
|
|
* $FreeBSD$
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Virtual memory map module definitions.
|
|
|
|
*/
|
|
|
|
#ifndef _VM_MAP_
|
|
|
|
#define _VM_MAP_
|
|
|
|
|
2002-04-28 23:12:52 +00:00
|
|
|
#include <sys/lock.h>
|
2004-07-30 09:10:28 +00:00
|
|
|
#include <sys/sx.h>
|
2003-01-01 00:13:01 +00:00
|
|
|
#include <sys/_mutex.h>
|
2001-05-03 11:33:51 +00:00
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
|
|
|
* Types defined:
|
|
|
|
*
|
|
|
|
* vm_map_t the high-level address map data structure.
|
|
|
|
* vm_map_entry_t an entry in an address map.
|
|
|
|
*/
|
|
|
|
|
2003-08-11 07:14:08 +00:00
|
|
|
typedef u_char vm_flags_t;
|
2000-02-28 04:10:35 +00:00
|
|
|
typedef u_int vm_eflags_t;
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
|
|
|
* Objects which live in maps may be either VM objects, or
|
|
|
|
* another map (called a "sharing map") which denotes read-write
|
|
|
|
* sharing with other maps.
|
|
|
|
*/
|
|
|
|
union vm_map_object {
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
struct vm_object *vm_object; /* object object */
|
|
|
|
struct vm_map *sub_map; /* belongs to another map */
|
1994-05-24 10:09:53 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Address map entries consist of start and end addresses,
|
|
|
|
* a VM object (or sharing map) and offset into that object,
|
|
|
|
* and user-exported inheritance and protection information.
|
|
|
|
* Also included is control information for virtual copy operations.
|
|
|
|
*/
|
|
|
|
struct vm_map_entry {
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
struct vm_map_entry *prev; /* previous entry */
|
|
|
|
struct vm_map_entry *next; /* next entry */
|
2002-09-22 04:33:43 +00:00
|
|
|
struct vm_map_entry *left; /* left child in binary search tree */
|
|
|
|
struct vm_map_entry *right; /* right child in binary search tree */
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
vm_offset_t start; /* start address */
|
|
|
|
vm_offset_t end; /* end address */
|
2016-07-07 20:58:16 +00:00
|
|
|
vm_offset_t next_read; /* vaddr of the next sequential read */
|
2004-08-13 08:06:34 +00:00
|
|
|
vm_size_t max_free; /* max free space in subtree */
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
union vm_map_object object; /* object I point to */
|
1995-12-11 04:58:34 +00:00
|
|
|
vm_ooffset_t offset; /* offset into object */
|
2000-02-28 04:10:35 +00:00
|
|
|
vm_eflags_t eflags; /* map entry flags */
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
vm_prot_t protection; /* protection code */
|
|
|
|
vm_prot_t max_protection; /* maximum protection */
|
|
|
|
vm_inherit_t inheritance; /* inheritance */
|
2012-05-10 15:16:42 +00:00
|
|
|
uint8_t read_ahead; /* pages in the read-ahead window */
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
int wired_count; /* can be paged if = 0 */
|
2010-12-02 17:37:16 +00:00
|
|
|
struct ucred *cred; /* tmp storage for creator ref */
|
2013-07-11 05:55:08 +00:00
|
|
|
struct thread *wiring_thread;
|
1994-05-24 10:09:53 +00:00
|
|
|
};
|
|
|
|
|
2019-06-08 20:28:04 +00:00
|
|
|
#define MAP_ENTRY_NOSYNC 0x00000001
|
|
|
|
#define MAP_ENTRY_IS_SUB_MAP 0x00000002
|
|
|
|
#define MAP_ENTRY_COW 0x00000004
|
|
|
|
#define MAP_ENTRY_NEEDS_COPY 0x00000008
|
|
|
|
#define MAP_ENTRY_NOFAULT 0x00000010
|
|
|
|
#define MAP_ENTRY_USER_WIRED 0x00000020
|
|
|
|
|
|
|
|
#define MAP_ENTRY_BEHAV_NORMAL 0x00000000 /* default behavior */
|
|
|
|
#define MAP_ENTRY_BEHAV_SEQUENTIAL 0x00000040 /* expect sequential
|
|
|
|
access */
|
|
|
|
#define MAP_ENTRY_BEHAV_RANDOM 0x00000080 /* expect random
|
|
|
|
access */
|
|
|
|
#define MAP_ENTRY_BEHAV_RESERVED 0x000000c0 /* future use */
|
|
|
|
#define MAP_ENTRY_BEHAV_MASK 0x000000c0
|
|
|
|
#define MAP_ENTRY_IN_TRANSITION 0x00000100 /* entry being
|
|
|
|
changed */
|
|
|
|
#define MAP_ENTRY_NEEDS_WAKEUP 0x00000200 /* waiters in
|
|
|
|
transition */
|
|
|
|
#define MAP_ENTRY_NOCOREDUMP 0x00000400 /* don't include in
|
|
|
|
a core */
|
|
|
|
#define MAP_ENTRY_VN_EXEC 0x00000800 /* text vnode mapping */
|
|
|
|
#define MAP_ENTRY_GROWS_DOWN 0x00001000 /* top-down stacks */
|
|
|
|
#define MAP_ENTRY_GROWS_UP 0x00002000 /* bottom-up stacks */
|
|
|
|
|
|
|
|
#define MAP_ENTRY_WIRE_SKIPPED 0x00004000
|
2019-09-03 20:31:48 +00:00
|
|
|
#define MAP_ENTRY_WRITECNT 0x00008000 /* tracked writeable
|
2019-06-08 20:28:04 +00:00
|
|
|
mapping */
|
|
|
|
#define MAP_ENTRY_GUARD 0x00010000
|
|
|
|
#define MAP_ENTRY_STACK_GAP_DN 0x00020000
|
|
|
|
#define MAP_ENTRY_STACK_GAP_UP 0x00040000
|
|
|
|
#define MAP_ENTRY_HEADER 0x00080000
|
2009-04-10 10:16:03 +00:00
|
|
|
|
2002-06-01 16:59:30 +00:00
|
|
|
#ifdef _KERNEL
|
2003-11-03 16:14:45 +00:00
|
|
|
static __inline u_char
|
2002-06-01 16:59:30 +00:00
|
|
|
vm_map_entry_behavior(vm_map_entry_t entry)
|
2003-11-03 16:14:45 +00:00
|
|
|
{
|
2002-06-01 16:59:30 +00:00
|
|
|
return (entry->eflags & MAP_ENTRY_BEHAV_MASK);
|
|
|
|
}
|
2004-08-09 19:52:29 +00:00
|
|
|
|
|
|
|
static __inline int
|
|
|
|
vm_map_entry_user_wired_count(vm_map_entry_t entry)
|
|
|
|
{
|
|
|
|
if (entry->eflags & MAP_ENTRY_USER_WIRED)
|
|
|
|
return (1);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static __inline int
|
|
|
|
vm_map_entry_system_wired_count(vm_map_entry_t entry)
|
|
|
|
{
|
|
|
|
return (entry->wired_count - vm_map_entry_user_wired_count(entry));
|
|
|
|
}
|
2002-06-01 16:59:30 +00:00
|
|
|
#endif /* _KERNEL */
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
2002-09-22 04:33:43 +00:00
|
|
|
* A map is a set of map entries. These map entries are
|
|
|
|
* organized both as a binary search tree and as a doubly-linked
|
|
|
|
* list. Both structures are ordered based upon the start and
|
2018-08-29 12:24:19 +00:00
|
|
|
* end addresses contained within each map entry.
|
|
|
|
*
|
2018-01-20 12:19:02 +00:00
|
|
|
* Sleator and Tarjan's top-down splay algorithm is employed to
|
|
|
|
* control height imbalance in the binary search tree.
|
1999-08-23 18:08:34 +00:00
|
|
|
*
|
2018-11-02 16:26:44 +00:00
|
|
|
* The map's min offset value is stored in map->header.end, and
|
|
|
|
* its max offset value is stored in map->header.start. These
|
|
|
|
* values act as sentinels for any forward or backward address
|
|
|
|
* scan of the list. The map header has a special value for the
|
|
|
|
* eflags field, MAP_ENTRY_HEADER, that is set initially, is
|
|
|
|
* never changed, and prevents an eflags match of the header
|
|
|
|
* with any other map entry.
|
|
|
|
*
|
2018-08-29 12:24:19 +00:00
|
|
|
* List of locks
|
2002-04-27 22:01:37 +00:00
|
|
|
* (c) const until freed
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
|
|
|
struct vm_map {
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
struct vm_map_entry header; /* List of entries */
|
2004-07-30 09:10:28 +00:00
|
|
|
struct sx lock; /* Lock for map data */
|
2002-12-31 19:38:04 +00:00
|
|
|
struct mtx system_mtx;
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
int nentries; /* Number of entries */
|
|
|
|
vm_size_t size; /* virtual size */
|
2003-08-13 03:13:22 +00:00
|
|
|
u_int timestamp; /* Version number */
|
2002-07-11 02:39:24 +00:00
|
|
|
u_char needs_wakeup;
|
2008-12-31 05:44:05 +00:00
|
|
|
u_char system_map; /* (c) Am I a system map? */
|
2003-08-11 07:14:08 +00:00
|
|
|
vm_flags_t flags; /* flags for this vm_map */
|
2002-09-22 04:33:43 +00:00
|
|
|
vm_map_entry_t root; /* Root of a binary search tree */
|
2002-04-27 22:01:37 +00:00
|
|
|
pmap_t pmap; /* (c) Physical map */
|
Implement Address Space Layout Randomization (ASLR)
With this change, randomization can be enabled for all non-fixed
mappings. It means that the base address for the mapping is selected
with a guaranteed amount of entropy (bits). If the mapping was
requested to be superpage aligned, the randomization honours the
superpage attributes.
Although the value of ASLR is diminshing over time as exploit authors
work out simple ASLR bypass techniques, it elimintates the trivial
exploitation of certain vulnerabilities, at least in theory. This
implementation is relatively small and happens at the correct
architectural level. Also, it is not expected to introduce
regressions in existing cases when turned off (default for now), or
cause any significant maintaince burden.
The randomization is done on a best-effort basis - that is, the
allocator falls back to a first fit strategy if fragmentation prevents
entropy injection. It is trivial to implement a strong mode where
failure to guarantee the requested amount of entropy results in
mapping request failure, but I do not consider that to be usable.
I have not fine-tuned the amount of entropy injected right now. It is
only a quantitive change that will not change the implementation. The
current amount is controlled by aslr_pages_rnd.
To not spoil coalescing optimizations, to reduce the page table
fragmentation inherent to ASLR, and to keep the transient superpage
promotion for the malloced memory, locality clustering is implemented
for anonymous private mappings, which are automatically grouped until
fragmentation kicks in. The initial location for the anon group range
is, of course, randomized. This is controlled by vm.cluster_anon,
enabled by default.
The default mode keeps the sbrk area unpopulated by other mappings,
but this can be turned off, which gives much more breathing bits on
architectures with small address space, such as i386. This is tied
with the question of following an application's hint about the mmap(2)
base address. Testing shows that ignoring the hint does not affect the
function of common applications, but I would expect more demanding
code could break. By default sbrk is preserved and mmap hints are
satisfied, which can be changed by using the
kern.elf{32,64}.aslr.honor_sbrk sysctl.
ASLR is enabled on per-ABI basis, and currently it is only allowed on
FreeBSD native i386 and amd64 (including compat 32bit) ABIs. Support
for additional architectures will be added after further testing.
Both per-process and per-image controls are implemented:
- procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS;
- NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible
to force ASLR off for the given binary. (A tool to edit the feature
control note is in development.)
Global controls are:
- kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2);
- kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings;
- kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2);
- vm.cluster_anon - enables anon mapping clustering.
PR: 208580 (exp runs)
Exp-runs done by: antoine
Reviewed by: markj (previous version)
Discussed with: emaste
Tested by: pho
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D5603
2019-02-10 17:19:45 +00:00
|
|
|
vm_offset_t anon_loc;
|
2010-12-09 21:02:22 +00:00
|
|
|
int busy;
|
2019-11-09 17:08:27 +00:00
|
|
|
#ifdef DIAGNOSTIC
|
|
|
|
int nupdates;
|
|
|
|
#endif
|
1994-05-24 10:09:53 +00:00
|
|
|
};
|
|
|
|
|
2003-08-11 07:14:08 +00:00
|
|
|
/*
|
|
|
|
* vm_flags_t values
|
|
|
|
*/
|
|
|
|
#define MAP_WIREFUTURE 0x01 /* wire all future pages */
|
2010-12-09 21:02:22 +00:00
|
|
|
#define MAP_BUSY_WAKEUP 0x02
|
Implement Address Space Layout Randomization (ASLR)
With this change, randomization can be enabled for all non-fixed
mappings. It means that the base address for the mapping is selected
with a guaranteed amount of entropy (bits). If the mapping was
requested to be superpage aligned, the randomization honours the
superpage attributes.
Although the value of ASLR is diminshing over time as exploit authors
work out simple ASLR bypass techniques, it elimintates the trivial
exploitation of certain vulnerabilities, at least in theory. This
implementation is relatively small and happens at the correct
architectural level. Also, it is not expected to introduce
regressions in existing cases when turned off (default for now), or
cause any significant maintaince burden.
The randomization is done on a best-effort basis - that is, the
allocator falls back to a first fit strategy if fragmentation prevents
entropy injection. It is trivial to implement a strong mode where
failure to guarantee the requested amount of entropy results in
mapping request failure, but I do not consider that to be usable.
I have not fine-tuned the amount of entropy injected right now. It is
only a quantitive change that will not change the implementation. The
current amount is controlled by aslr_pages_rnd.
To not spoil coalescing optimizations, to reduce the page table
fragmentation inherent to ASLR, and to keep the transient superpage
promotion for the malloced memory, locality clustering is implemented
for anonymous private mappings, which are automatically grouped until
fragmentation kicks in. The initial location for the anon group range
is, of course, randomized. This is controlled by vm.cluster_anon,
enabled by default.
The default mode keeps the sbrk area unpopulated by other mappings,
but this can be turned off, which gives much more breathing bits on
architectures with small address space, such as i386. This is tied
with the question of following an application's hint about the mmap(2)
base address. Testing shows that ignoring the hint does not affect the
function of common applications, but I would expect more demanding
code could break. By default sbrk is preserved and mmap hints are
satisfied, which can be changed by using the
kern.elf{32,64}.aslr.honor_sbrk sysctl.
ASLR is enabled on per-ABI basis, and currently it is only allowed on
FreeBSD native i386 and amd64 (including compat 32bit) ABIs. Support
for additional architectures will be added after further testing.
Both per-process and per-image controls are implemented:
- procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS;
- NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible
to force ASLR off for the given binary. (A tool to edit the feature
control note is in development.)
Global controls are:
- kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2);
- kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings;
- kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2);
- vm.cluster_anon - enables anon mapping clustering.
PR: 208580 (exp runs)
Exp-runs done by: antoine
Reviewed by: markj (previous version)
Discussed with: emaste
Tested by: pho
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D5603
2019-02-10 17:19:45 +00:00
|
|
|
#define MAP_IS_SUB_MAP 0x04 /* has parent */
|
|
|
|
#define MAP_ASLR 0x08 /* enabled ASLR */
|
|
|
|
#define MAP_ASLR_IGNSTART 0x10
|
2003-08-11 07:14:08 +00:00
|
|
|
|
2002-04-27 22:01:37 +00:00
|
|
|
#ifdef _KERNEL
|
2018-07-02 19:48:38 +00:00
|
|
|
#if defined(KLD_MODULE) && !defined(KLD_TIED)
|
2018-03-30 10:55:31 +00:00
|
|
|
#define vm_map_max(map) vm_map_max_KBI((map))
|
|
|
|
#define vm_map_min(map) vm_map_min_KBI((map))
|
|
|
|
#define vm_map_pmap(map) vm_map_pmap_KBI((map))
|
|
|
|
#else
|
2002-04-27 22:01:37 +00:00
|
|
|
static __inline vm_offset_t
|
2012-07-15 20:29:48 +00:00
|
|
|
vm_map_max(const struct vm_map *map)
|
2002-04-27 22:01:37 +00:00
|
|
|
{
|
2018-08-29 12:24:19 +00:00
|
|
|
|
|
|
|
return (map->header.start);
|
2002-04-27 22:01:37 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static __inline vm_offset_t
|
2012-07-15 20:29:48 +00:00
|
|
|
vm_map_min(const struct vm_map *map)
|
2002-04-27 22:01:37 +00:00
|
|
|
{
|
2018-08-29 12:24:19 +00:00
|
|
|
|
|
|
|
return (map->header.end);
|
2002-04-27 22:01:37 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static __inline pmap_t
|
|
|
|
vm_map_pmap(vm_map_t map)
|
|
|
|
{
|
|
|
|
return (map->pmap);
|
|
|
|
}
|
2003-08-11 07:14:08 +00:00
|
|
|
|
|
|
|
static __inline void
|
|
|
|
vm_map_modflags(vm_map_t map, vm_flags_t set, vm_flags_t clear)
|
|
|
|
{
|
|
|
|
map->flags = (map->flags | set) & ~clear;
|
|
|
|
}
|
2018-03-30 10:55:31 +00:00
|
|
|
#endif /* KLD_MODULE */
|
2002-04-27 22:01:37 +00:00
|
|
|
#endif /* _KERNEL */
|
|
|
|
|
2003-11-03 16:14:45 +00:00
|
|
|
/*
|
1995-12-07 12:48:31 +00:00
|
|
|
* Shareable process virtual address space.
|
2002-06-25 18:14:38 +00:00
|
|
|
*
|
|
|
|
* List of locks
|
|
|
|
* (c) const until freed
|
1995-12-07 12:48:31 +00:00
|
|
|
*/
|
|
|
|
struct vmspace {
|
|
|
|
struct vm_map vm_map; /* VM address map */
|
2002-07-22 16:22:27 +00:00
|
|
|
struct shmmap_state *vm_shm; /* SYS5 shared memory private data XXX */
|
1995-12-07 12:48:31 +00:00
|
|
|
segsz_t vm_swrss; /* resident set size before last swap */
|
|
|
|
segsz_t vm_tsize; /* text size (pages) XXX */
|
|
|
|
segsz_t vm_dsize; /* data size (pages) XXX */
|
|
|
|
segsz_t vm_ssize; /* stack size (pages) */
|
2002-06-25 18:14:38 +00:00
|
|
|
caddr_t vm_taddr; /* (c) user virtual address of text */
|
|
|
|
caddr_t vm_daddr; /* (c) user virtual address of data */
|
1995-12-07 12:48:31 +00:00
|
|
|
caddr_t vm_maxsaddr; /* user VA at max stack growth */
|
2010-10-21 17:29:32 +00:00
|
|
|
volatile int vm_refcnt; /* number of references */
|
2008-03-01 22:54:42 +00:00
|
|
|
/*
|
|
|
|
* Keep the PMAP last, so that CPU-specific variations of that
|
|
|
|
* structure on a single architecture don't result in offset
|
|
|
|
* variations of the machine-independent fields in the vmspace.
|
|
|
|
*/
|
|
|
|
struct pmap vm_pmap; /* private physical map */
|
1995-12-07 12:48:31 +00:00
|
|
|
};
|
|
|
|
|
2002-06-01 22:41:43 +00:00
|
|
|
#ifdef _KERNEL
|
|
|
|
static __inline pmap_t
|
|
|
|
vmspace_pmap(struct vmspace *vmspace)
|
|
|
|
{
|
|
|
|
return &vmspace->vm_pmap;
|
|
|
|
}
|
|
|
|
#endif /* _KERNEL */
|
|
|
|
|
2001-05-19 01:28:09 +00:00
|
|
|
#ifdef _KERNEL
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
|
|
|
* Macros: vm_map_lock, etc.
|
|
|
|
* Function:
|
1999-08-16 18:21:09 +00:00
|
|
|
* Perform locking on the data portion of a map. Note that
|
|
|
|
* these macros mimic procedure calls returning void. The
|
|
|
|
* semicolon is supplied by the user of these macros, not
|
|
|
|
* by the macros themselves. The macros can safely be used
|
|
|
|
* as unbraced elements in a higher level statement.
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
|
|
|
|
2002-04-28 23:12:52 +00:00
|
|
|
void _vm_map_lock(vm_map_t map, const char *file, int line);
|
|
|
|
void _vm_map_unlock(vm_map_t map, const char *file, int line);
|
2010-09-19 17:43:22 +00:00
|
|
|
int _vm_map_unlock_and_wait(vm_map_t map, int timo, const char *file, int line);
|
2002-04-28 23:12:52 +00:00
|
|
|
void _vm_map_lock_read(vm_map_t map, const char *file, int line);
|
|
|
|
void _vm_map_unlock_read(vm_map_t map, const char *file, int line);
|
|
|
|
int _vm_map_trylock(vm_map_t map, const char *file, int line);
|
2003-03-12 23:13:16 +00:00
|
|
|
int _vm_map_trylock_read(vm_map_t map, const char *file, int line);
|
2002-04-28 23:12:52 +00:00
|
|
|
int _vm_map_lock_upgrade(vm_map_t map, const char *file, int line);
|
|
|
|
void _vm_map_lock_downgrade(vm_map_t map, const char *file, int line);
|
2009-01-01 00:31:46 +00:00
|
|
|
int vm_map_locked(vm_map_t map);
|
2002-07-11 02:39:24 +00:00
|
|
|
void vm_map_wakeup(vm_map_t map);
|
2010-12-09 21:02:22 +00:00
|
|
|
void vm_map_busy(vm_map_t map);
|
|
|
|
void vm_map_unbusy(vm_map_t map);
|
|
|
|
void vm_map_wait_busy(vm_map_t map);
|
2018-03-30 10:55:31 +00:00
|
|
|
vm_offset_t vm_map_max_KBI(const struct vm_map *map);
|
|
|
|
vm_offset_t vm_map_min_KBI(const struct vm_map *map);
|
|
|
|
pmap_t vm_map_pmap_KBI(vm_map_t map);
|
2002-04-28 23:12:52 +00:00
|
|
|
|
|
|
|
#define vm_map_lock(map) _vm_map_lock(map, LOCK_FILE, LOCK_LINE)
|
|
|
|
#define vm_map_unlock(map) _vm_map_unlock(map, LOCK_FILE, LOCK_LINE)
|
2010-09-19 17:43:22 +00:00
|
|
|
#define vm_map_unlock_and_wait(map, timo) \
|
|
|
|
_vm_map_unlock_and_wait(map, timo, LOCK_FILE, LOCK_LINE)
|
2002-04-28 23:12:52 +00:00
|
|
|
#define vm_map_lock_read(map) _vm_map_lock_read(map, LOCK_FILE, LOCK_LINE)
|
|
|
|
#define vm_map_unlock_read(map) _vm_map_unlock_read(map, LOCK_FILE, LOCK_LINE)
|
|
|
|
#define vm_map_trylock(map) _vm_map_trylock(map, LOCK_FILE, LOCK_LINE)
|
2003-03-12 23:13:16 +00:00
|
|
|
#define vm_map_trylock_read(map) \
|
|
|
|
_vm_map_trylock_read(map, LOCK_FILE, LOCK_LINE)
|
2002-04-28 23:12:52 +00:00
|
|
|
#define vm_map_lock_upgrade(map) \
|
|
|
|
_vm_map_lock_upgrade(map, LOCK_FILE, LOCK_LINE)
|
|
|
|
#define vm_map_lock_downgrade(map) \
|
|
|
|
_vm_map_lock_downgrade(map, LOCK_FILE, LOCK_LINE)
|
2001-07-04 20:15:18 +00:00
|
|
|
|
|
|
|
long vmspace_resident_count(struct vmspace *vmspace);
|
2001-05-19 01:28:09 +00:00
|
|
|
#endif /* _KERNEL */
|
1994-05-24 10:09:53 +00:00
|
|
|
|
1999-02-19 14:25:37 +00:00
|
|
|
|
2014-09-08 00:19:03 +00:00
|
|
|
/* XXX: number of kernel maps to statically allocate */
|
1994-05-24 10:09:53 +00:00
|
|
|
#define MAX_KMAP 10
|
|
|
|
|
1996-01-19 04:00:31 +00:00
|
|
|
/*
|
|
|
|
* Copy-on-write flags for vm_map operations
|
|
|
|
*/
|
2019-06-08 20:28:04 +00:00
|
|
|
#define MAP_INHERIT_SHARE 0x00000001
|
|
|
|
#define MAP_COPY_ON_WRITE 0x00000002
|
|
|
|
#define MAP_NOFAULT 0x00000004
|
|
|
|
#define MAP_PREFAULT 0x00000008
|
|
|
|
#define MAP_PREFAULT_PARTIAL 0x00000010
|
|
|
|
#define MAP_DISABLE_SYNCER 0x00000020
|
|
|
|
#define MAP_CHECK_EXCL 0x00000040
|
|
|
|
#define MAP_CREATE_GUARD 0x00000080
|
|
|
|
#define MAP_DISABLE_COREDUMP 0x00000100
|
|
|
|
#define MAP_PREFAULT_MADVISE 0x00000200 /* from (user) madvise request */
|
2019-09-03 20:31:48 +00:00
|
|
|
#define MAP_WRITECOUNT 0x00000400
|
2019-06-08 20:28:04 +00:00
|
|
|
#define MAP_REMAP 0x00000800
|
|
|
|
#define MAP_STACK_GROWS_DOWN 0x00001000
|
|
|
|
#define MAP_STACK_GROWS_UP 0x00002000
|
|
|
|
#define MAP_ACC_CHARGED 0x00004000
|
|
|
|
#define MAP_ACC_NO_CHARGE 0x00008000
|
|
|
|
#define MAP_CREATE_STACK_GAP_UP 0x00010000
|
|
|
|
#define MAP_CREATE_STACK_GAP_DN 0x00020000
|
|
|
|
#define MAP_VN_EXEC 0x00040000
|
1996-01-19 04:00:31 +00:00
|
|
|
|
1996-12-14 17:54:17 +00:00
|
|
|
/*
|
|
|
|
* vm_fault option flags
|
|
|
|
*/
|
2015-07-30 18:28:34 +00:00
|
|
|
#define VM_FAULT_NORMAL 0 /* Nothing special */
|
|
|
|
#define VM_FAULT_WIRE 1 /* Wire the mapped page */
|
|
|
|
#define VM_FAULT_DIRTY 2 /* Dirty the page; use w/VM_PROT_COPY */
|
1996-12-14 17:54:17 +00:00
|
|
|
|
2012-05-10 15:16:42 +00:00
|
|
|
/*
|
|
|
|
* Initially, mappings are slightly sequential. The maximum window size must
|
|
|
|
* account for the map entry's "read_ahead" field being defined as an uint8_t.
|
|
|
|
*/
|
|
|
|
#define VM_FAULT_READ_AHEAD_MIN 7
|
|
|
|
#define VM_FAULT_READ_AHEAD_INIT 15
|
|
|
|
#define VM_FAULT_READ_AHEAD_MAX min(atop(MAXPHYS) - 1, UINT8_MAX)
|
|
|
|
|
2008-05-10 18:55:35 +00:00
|
|
|
/*
|
2013-08-16 21:13:55 +00:00
|
|
|
* The following "find_space" options are supported by vm_map_find().
|
|
|
|
*
|
|
|
|
* For VMFS_ALIGNED_SPACE, the desired alignment is specified to
|
|
|
|
* the macro argument as log base 2 of the desired alignment.
|
2008-05-10 18:55:35 +00:00
|
|
|
*/
|
|
|
|
#define VMFS_NO_SPACE 0 /* don't find; use the given range */
|
|
|
|
#define VMFS_ANY_SPACE 1 /* find a range with any alignment */
|
2013-07-19 19:06:15 +00:00
|
|
|
#define VMFS_OPTIMAL_SPACE 2 /* find a range with optimal alignment*/
|
2013-08-16 21:13:55 +00:00
|
|
|
#define VMFS_SUPER_SPACE 3 /* find a superpage-aligned range */
|
|
|
|
#define VMFS_ALIGNED_SPACE(x) ((x) << 8) /* find a range with fixed alignment */
|
2008-05-10 18:55:35 +00:00
|
|
|
|
2003-08-11 07:14:08 +00:00
|
|
|
/*
|
|
|
|
* vm_map_wire and vm_map_unwire option flags
|
|
|
|
*/
|
|
|
|
#define VM_MAP_WIRE_SYSTEM 0 /* wiring in a kernel map */
|
|
|
|
#define VM_MAP_WIRE_USER 1 /* wiring in a user map */
|
|
|
|
|
|
|
|
#define VM_MAP_WIRE_NOHOLES 0 /* region must not have holes */
|
|
|
|
#define VM_MAP_WIRE_HOLESOK 2 /* region may have holes */
|
|
|
|
|
2011-03-21 09:40:01 +00:00
|
|
|
#define VM_MAP_WIRE_WRITE 4 /* Validate writable. */
|
|
|
|
|
1999-12-29 05:07:58 +00:00
|
|
|
#ifdef _KERNEL
|
2001-07-04 20:15:18 +00:00
|
|
|
boolean_t vm_map_check_protection (vm_map_t, vm_offset_t, vm_offset_t, vm_prot_t);
|
2003-06-15 07:28:33 +00:00
|
|
|
vm_map_t vm_map_create(pmap_t, vm_offset_t, vm_offset_t);
|
2009-02-24 20:57:43 +00:00
|
|
|
int vm_map_delete(vm_map_t, vm_offset_t, vm_offset_t);
|
2008-05-10 18:55:35 +00:00
|
|
|
int vm_map_find(vm_map_t, vm_object_t, vm_ooffset_t, vm_offset_t *, vm_size_t,
|
2013-09-09 18:11:59 +00:00
|
|
|
vm_offset_t, int, vm_prot_t, vm_prot_t, int);
|
Treat the addr argument for mmap(2) request without MAP_FIXED flag as
a hint.
Right now, for non-fixed mmap(2) calls, addr is de-facto interpreted
as the absolute minimal address of the range where the mapping is
created. The VA allocator only allocates in the range [addr,
VM_MAXUSER_ADDRESS]. This is too restrictive, the mmap(2) call might
unduly fail if there is no free addresses above addr but a lot of
usable space below it.
Lift this implementation limitation by allocating VA in two passes.
First, try to allocate above addr, as before. If that fails, do the
second pass with less restrictive constraints for the start of
allocation by specifying minimal allocation address at the max bss
end, if this limit is less than addr.
One important case where this change makes a difference is the
allocation of the stacks for new threads in libthr. Under some
configuration conditions, libthr tries to hint kernel to reuse the
main thread stack grow area for the new stacks. This cannot work by
design now after grow area is converted to stack, and there is no
unallocated VA above the main stack. Interpreting requested stack
base address as the hint provides compatibility with old libthr and
with (mis-)configured current libthr.
Reviewed by: alc
Tested by: dim (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
2017-06-28 04:02:36 +00:00
|
|
|
int vm_map_find_min(vm_map_t, vm_object_t, vm_ooffset_t, vm_offset_t *,
|
|
|
|
vm_size_t, vm_offset_t, vm_offset_t, int, vm_prot_t, vm_prot_t, int);
|
2008-04-28 05:30:23 +00:00
|
|
|
int vm_map_fixed(vm_map_t, vm_object_t, vm_ooffset_t, vm_offset_t, vm_size_t,
|
|
|
|
vm_prot_t, vm_prot_t, int);
|
Eliminate adj_free field from vm_map_entry.
Drop the adj_free field from vm_map_entry_t. Refine the max_free field
so that p->max_free is the size of the largest gap with one endpoint
in the subtree rooted at p. Change vm_map_findspace so that, first,
the address-based splay is restricted to tree nodes with large-enough
max_free value, to avoid searching for the right starting point in a
subtree where all the gaps are too small. Second, when the address
search leads to a tree search for the first large-enough gap, that gap
is the subject of a splay-search that brings the gap to the top of the
tree, so that an immediate insertion will take constant time.
Break up the splay code into separate components, one for searching
and breaking up the tree and another for reassembling it. Use these
components, and not splay itself, for linking and unlinking. Drop the
after-where parameter to link, as it is computed as a side-effect of
the splay search.
Submitted by: Doug Moore <dougm@rice.edu>
Reviewed by: markj
Tested by: pho
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D17794
2019-03-29 16:53:46 +00:00
|
|
|
vm_offset_t vm_map_findspace(vm_map_t, vm_offset_t, vm_size_t);
|
2001-07-04 20:15:18 +00:00
|
|
|
int vm_map_inherit (vm_map_t, vm_offset_t, vm_offset_t, vm_inherit_t);
|
2010-04-03 19:07:05 +00:00
|
|
|
void vm_map_init(vm_map_t, pmap_t, vm_offset_t, vm_offset_t);
|
2001-07-04 20:15:18 +00:00
|
|
|
int vm_map_insert (vm_map_t, vm_object_t, vm_ooffset_t, vm_offset_t, vm_offset_t, vm_prot_t, vm_prot_t, int);
|
|
|
|
int vm_map_lookup (vm_map_t *, vm_offset_t, vm_prot_t, vm_map_entry_t *, vm_object_t *,
|
|
|
|
vm_pindex_t *, vm_prot_t *, boolean_t *);
|
2004-08-12 20:14:49 +00:00
|
|
|
int vm_map_lookup_locked(vm_map_t *, vm_offset_t, vm_prot_t, vm_map_entry_t *, vm_object_t *,
|
|
|
|
vm_pindex_t *, vm_prot_t *, boolean_t *);
|
2001-07-04 20:15:18 +00:00
|
|
|
void vm_map_lookup_done (vm_map_t, vm_map_entry_t);
|
2019-06-26 03:12:57 +00:00
|
|
|
boolean_t vm_map_lookup_entry (vm_map_t, vm_offset_t, vm_map_entry_t *);
|
2019-11-13 15:56:07 +00:00
|
|
|
|
|
|
|
static inline vm_map_entry_t
|
2019-11-20 16:06:48 +00:00
|
|
|
vm_map_entry_first(vm_map_t map)
|
2019-11-13 15:56:07 +00:00
|
|
|
{
|
|
|
|
|
2019-11-20 16:06:48 +00:00
|
|
|
return (map->header.next);
|
2019-11-13 15:56:07 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline vm_map_entry_t
|
2019-11-20 16:06:48 +00:00
|
|
|
vm_map_entry_succ(vm_map_entry_t entry)
|
2019-11-13 15:56:07 +00:00
|
|
|
{
|
|
|
|
|
2019-11-20 16:06:48 +00:00
|
|
|
return (entry->next);
|
2019-11-13 15:56:07 +00:00
|
|
|
}
|
|
|
|
|
2019-11-20 16:06:48 +00:00
|
|
|
#define VM_MAP_ENTRY_FOREACH(it, map) \
|
|
|
|
for ((it) = vm_map_entry_first(map); \
|
2019-10-08 07:14:21 +00:00
|
|
|
(it) != &(map)->header; \
|
2019-11-13 15:56:07 +00:00
|
|
|
(it) = vm_map_entry_succ(it))
|
2001-07-04 20:15:18 +00:00
|
|
|
int vm_map_protect (vm_map_t, vm_offset_t, vm_offset_t, vm_prot_t, boolean_t);
|
|
|
|
int vm_map_remove (vm_map_t, vm_offset_t, vm_offset_t);
|
2019-08-25 07:06:51 +00:00
|
|
|
void vm_map_try_merge_entries(vm_map_t map, vm_map_entry_t prev,
|
|
|
|
vm_map_entry_t entry);
|
2001-07-04 20:15:18 +00:00
|
|
|
void vm_map_startup (void);
|
|
|
|
int vm_map_submap (vm_map_t, vm_offset_t, vm_offset_t, vm_map_t);
|
2003-11-09 05:25:35 +00:00
|
|
|
int vm_map_sync(vm_map_t, vm_offset_t, vm_offset_t, boolean_t, boolean_t);
|
2001-07-04 20:15:18 +00:00
|
|
|
int vm_map_madvise (vm_map_t, vm_offset_t, vm_offset_t, int);
|
|
|
|
int vm_map_stack (vm_map_t, vm_offset_t, vm_size_t, vm_prot_t, vm_prot_t, int);
|
2002-06-07 18:34:23 +00:00
|
|
|
int vm_map_unwire(vm_map_t map, vm_offset_t start, vm_offset_t end,
|
2003-08-11 07:14:08 +00:00
|
|
|
int flags);
|
Provide separate accounting for user-wired pages.
Historically we have not distinguished between kernel wirings and user
wirings for accounting purposes. User wirings (via mlock(2)) were
subject to a global limit on the number of wired pages, so if large
swaths of physical memory were wired by the kernel, as happens with
the ZFS ARC among other things, the limit could be exceeded, causing
user wirings to fail.
The change adds a new counter, v_user_wire_count, which counts the
number of virtual pages wired by user processes via mlock(2) and
mlockall(2). Only user-wired pages are subject to the system-wide
limit which helps provide some safety against deadlocks. In
particular, while sources of kernel wirings typically support some
backpressure mechanism, there is no way to reclaim user-wired pages
shorting of killing the wiring process. The limit is exported as
vm.max_user_wired, renamed from vm.max_wired, and changed from u_int
to u_long.
The choice to count virtual user-wired pages rather than physical
pages was done for simplicity. There are mechanisms that can cause
user-wired mappings to be destroyed while maintaining a wiring of
the backing physical page; these make it difficult to accurately
track user wirings at the physical page layer.
The change also closes some holes which allowed user wirings to succeed
even when they would cause the system limit to be exceeded. For
instance, mmap() may now fail with ENOMEM in a process that has called
mlockall(MCL_FUTURE) if the new mapping would cause the user wiring
limit to be exceeded.
Note that bhyve -S is subject to the user wiring limit, which defaults
to 1/3 of physical RAM. Users that wish to exceed the limit must tune
vm.max_user_wired.
Reviewed by: kib, ngie (mlock() test changes)
Tested by: pho (earlier version)
MFC after: 45 days
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19908
2019-05-13 16:38:48 +00:00
|
|
|
int vm_map_wire(vm_map_t map, vm_offset_t start, vm_offset_t end, int flags);
|
|
|
|
int vm_map_wire_locked(vm_map_t map, vm_offset_t start, vm_offset_t end,
|
2003-08-11 07:14:08 +00:00
|
|
|
int flags);
|
2011-03-01 11:04:30 +00:00
|
|
|
long vmspace_swap_count(struct vmspace *vmspace);
|
Switch to use shared vnode locks for text files during image activation.
kern_execve() locks text vnode exclusive to be able to set and clear
VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0
condition.
The change removes VV_TEXT, replacing it with the condition
v_writecount <= -1, and puts v_writecount under the vnode interlock.
Each text reference decrements v_writecount. To clear the text
reference when the segment is unmapped, it is recorded in the
vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and
v_writecount is incremented on the map entry removal
The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that
v_writecount does not contradict the desired change. vn_writecheck()
is now racy and its use was eliminated everywhere except access.
Atomic check for writeability and increment of v_writecount is
performed by the VOP. vn_truncate() now increments v_writecount
around VOP_SETATTR() call, lack of which is arguably a bug on its own.
nullfs bypasses v_writecount to the lower vnode always, so nullfs
vnode has its own v_writecount correct, and lower vnode gets all
references, since object->handle is always lower vnode.
On the text vnode' vm object dealloc, the v_writecount value is reset
to zero, and deadfs vop_unset_text short-circuit the operation.
Reclamation of lowervp always reclaims all nullfs vnodes referencing
lowervp first, so no stray references are left.
Reviewed by: markj, trasz
Tested by: mjg, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D19923
2019-05-05 11:20:43 +00:00
|
|
|
void vm_map_entry_set_vnode_text(vm_map_entry_t entry, bool add);
|
2002-03-10 21:52:48 +00:00
|
|
|
#endif /* _KERNEL */
|
These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
|
|
|
#endif /* _VM_MAP_ */
|