reset by pmap_page_init() right after being initialized in vm_page_initfake().
The statement above is with reference to the amd64 implementation of
pmap_page_init().
Fix this by calling 'pmap_page_init()' in 'vm_page_initfake()' before changing
the 'memattr'.
Reviewed by: kib
MFC after: 2 weeks
originally inspired by the Solaris vmem detailed in the proceedings
of usenix 2001. The NetBSD version was heavily refactored for bugs
and simplicity.
- Use this resource allocator to allocate the buffer and transient maps.
Buffer cache defrags are reduced by 25% when used by filesystems with
mixed block sizes. Ultimately this may permit dynamic buffer cache
sizing on low KVA machines.
Discussed with: alc, kib, attilio
Tested by: pho
Sponsored by: EMC / Isilon Storage Division
through bucket_alloc() to uma_zalloc_arg() and uma_zfree_arg().
- Make some smaller buckets for large zones to further reduce memory
waste.
- Implement uma_zone_reserve(). This holds aside a number of items only
for callers who specify M_USE_RESERVE. buckets will never be filled
from reserve allocations.
Sponsored by: EMC / Isilon Storage Division
- Be more explicit about zone vs keg locking. This functionally changes
almost nothing.
- Add a size parameter to uma_zcache_create() so we can size the buckets.
- Pass the zone to bucket_alloc() so it can modify allocation flags
as appropriate.
- Fix a bug in zone_alloc_bucket() where I missed an address of operator
in a failure case. (Found by pho)
Sponsored by: EMC / Isilon Storage Division
to a memory-mapped file in the traced process's address space
even if neither the traced process nor the tracing process had
write access to that file.
Security: CVE-2013-2171
Security: FreeBSD-SA-13:06.mmap
Approved by: so
performance.
- Always free to the alloc bucket if there is space. This gives LIFO
allocation order to improve hot-cache performance. This also allows
for zones with a single bucket per-cpu rather than a pair if the entire
working set fits in one bucket.
- Enable per-cpu caches of buckets. To prevent recursive bucket
allocation one bucket zone still has per-cpu caches disabled.
- Pick the initial bucket size based on a table driven maximum size
per-bucket rather than the number of items per-page. This gives
more sane initial sizes.
- Only grow the bucket size when we face contention on the zone lock, this
causes bucket sizes to grow more slowly.
- Adjust the number of items per-bucket to account for the header space.
This packs the buckets more efficiently per-page while making them
not quite powers of two.
- Eliminate the per-zone free bucket list. Always return buckets back
to the bucket zone. This ensures that as zones grow into larger
bucket sizes they eventually discard the smaller sizes. It persists
fewer buckets in the system. The locking is slightly trickier.
- Only switch buckets in zalloc, not zfree, this eliminates pathological
cases where we ping-pong between two buckets.
- Ensure that the thread that fills a new bucket gets to allocate from
it to give a better upper bound on allocation time.
Sponsored by: EMC / Isilon Storage Division
backing memory that is only a container for per-cpu caches of arbitrary
pointer items. These zones have no kegs.
- Convert the regular keg based allocator to use the new import/release
functions.
- Move some stats to be atomics since they would require excessive zone
locking/unlocking with the new import/release paradigm. Make
zone_free_item simpler now that callers can manage more stats.
- Check for these cache-only zones in the public APIs and debugging
code by checking zone_first_keg() against NULL.
Sponsored by: EMC / Isilong Storage Division
bitmap using sys/bitset. This is much simpler, has lower space
overhead and is cheaper in most cases.
- Use a second bitmap for invariants asserts and improve the quality of
the asserts as well as the number of erroneous conditions that we will
catch.
- Drastically simplify sizing code. Special case refcnt zones since they
will be going away.
- Update stale comments.
Sponsored by: EMC / Isilon Storage Division
Avoid to busy/unbusy a page in cases where there is no need to drop the
vm_obj lock, more nominally when the page is full valid after
vm_page_grab().
Sponsored by: EMC / Isilon storage division
Reviewed by: alc
swap_pager_copy() is invoked, otherwise there is no reason to do so.
This will eliminate the necessity to busy pages most of the times.
Sponsored by: EMC / Isilon storage division
Reviewed by: alc
clearing the page's PGA_REFERENCED flag. Since we are typically
manipulating the page's act_count field when we are clearing its
PGA_REFERENCED flag, the page lock is already held everywhere that we clear
the PGA_REFERENCED flag. So, in fact, this revision only changes some
comments and an assertion. Nonetheless, it will enable later changes to
object locking in the pageout code.
Introduce vm_page_assert_locked(), which completely hides the implementation
details of the page lock from the caller, and use it in
vm_page_aflag_clear(). (The existing vm_page_lock_assert() could not be
used in vm_page_aflag_clear().) Over the coming weeks, I expect that we'll
either eliminate or replace the various uses of vm_page_lock_assert() with
vm_page_assert_locked().
Reviewed by: attilio
Sponsored by: EMC / Isilon Storage Division
lock instead of the object lock, there is no reason for vm_page_activate()
to assert that the object is locked for either read or write access.
(The "VPO_UNMANAGED" flag never changes after page allocation.)
Sponsored by: EMC / Isilon Storage Division
reason to inline the implementation of vm_page_lock_assert() in the
!KLD_MODULES case. Use the same implementation for both KLD_MODULES and
!KLD_MODULES.
Reviewed by: kib
change. Retest the ref_count and return from the function to not
execute the further code which assumes that ref_count == 1 if it is
not. Also, do not leak vnode lock if other thread cleared OBJ_TMPFS
flag meantime.
Reported by: bdrewery
Tested by: bdrewery, pho
Sponsored by: The FreeBSD Foundation
It can now be accessed with a write lock on the object containing it OR
with a read lock on the object containing it along with the swhash_mtx.
o Remove some duplicate assertions for swap_pager_freespace() and
swap_pager_unswapped() but keep the object locking references for
documentation.
Sponsored by: EMC / Isilon storage division
Reviewed by: alc
o Relax locking assertions for pmap_enter_object() and add them also
to architectures that currently don't have any
o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade
operation on the per-object rwlock
o Use all the mechanisms above to make vm_map_pmap_enter() to work
mostl of the times only with readlocks.
Sponsored by: EMC / Isilon storage division
Reviewed by: alc
a vm page, denoted either by an address of the struct vm_page, or, if
the '/p' modifier is specified, by a physical address of the
corresponding frame.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
tree is used to maintain the object's collection of resident pages,
vm_page_lookup() no longer needs an exclusive lock.
Reviewed by: attilio
Sponsored by: EMC / Isilon Storage Division
freelist.
o Split the pool of free pages queues really by domain and not rely on
definition of VM_RAW_NFREELIST.
o For MAXMEMDOM > 1, wrap the RR allocation logic into a specific
function that is called when calculating the allocation domain.
The RR counter is kept, currently, per-thread.
In the future it is expected that such function evolves in a real
policy decision referee, based on specific informations retrieved by
per-thread and per-vm_object attributes.
o Add the concept of "probed domains" under the form of vm_ndomains.
It is responsibility for every architecture willing to support multiple
memory domains to correctly probe vm_ndomains along with mem_affinity
segments attributes. Those two values are supposed to remain always
consistent.
Please also note that vm_ndomains and td_dom_rr_idx are both int
because segments already store domains as int. Ideally u_int would
have much more sense. Probabilly this should be cleaned up in the
future.
o Apply RR domain selection also to vm_phys_zero_pages_idle().
Sponsored by: EMC / Isilon storage division
Partly obtained from: jeff
Reviewed by: alc
Tested by: jeff
vm_page_insert() so that (1) vm_radix_lookup_le() is never called while the
free page queues lock is held and (2) vm_radix_lookup_le() is called at most
once. This change reduces the average time that the free page queues lock
is held by vm_page_alloc() as well as vm_page_alloc()'s average overall
running time.
Sponsored by: EMC / Isilon Storage Division
functions, reverse the numbering scheme for the levels. The highest
numbered level in the tree now appears near the root instead of the leaves.
Sponsored by: EMC / Isilon Storage Division
order to match the MAXCPU concept. The change should also be useful
for consolidation and consistency.
Sponsored by: EMC / Isilon storage division
Obtained from: jeff
Reviewed by: alc
change the way that these functions ascend the tree when the search for a
matching leaf fails at an interior node. Rather than returning to the root
of the tree and repeating the lookup with an updated key, maintain a stack
of interior nodes that were visited during the descent and use that stack
to resume the lookup at the closest ancestor that might have a matching
descendant.
Sponsored by: EMC / Isilon Storage Division
Reviewed by: attilio
Tested by: pho
- vm_phys_alloc_freelist_pages() can be called by vm_page_alloc_freelist()
to allocate a page from a specific freelist. In the NUMA case it did not
properly map the public VM_FREELIST_* constants to the correct backing
freelists, nor did it try all NUMA domains for allocations from
VM_FREELIST_DEFAULT.
- vm_phys_alloc_pages() did not pin the thread and each call to
vm_phys_alloc_freelist_pages() fetched the current domain to choose
which freelist to use. If a thread migrated domains during the loop
in vm_phys_alloc_pages() it could skip one of the freelists. If the
other freelists were out of memory then it is possible that
vm_phys_alloc_pages() would fail to allocate a page even though pages
were available resulting in a panic in vm_page_alloc().
Reviewed by: alc
MFC after: 1 week
vnode v_object to avoid double-buffering. Use the same object both as
the backing store for tmpfs node and as the v_object.
Besides reducing memory use up to 2x times for situation of mapping
files from tmpfs, it also makes tmpfs read and write operations copy
twice bytes less.
VM subsystem was already slightly adapted to tolerate OBJT_SWAP object
as v_object. Now the vm_object_deallocate() is modified to not
reinstantiate OBJ_ONEMAPPING flag and help the VFS to correctly handle
VV_TEXT flag on the last dereference of the tmpfs backing object.
Reviewed by: alc
Tested by: pho, bf
MFC after: 1 month
v_object of non OBJT_VNODE type.
For vm_object_page_clean(), simply do not assert that object type must
be OBJT_VNODE, and add a comment explaining how the check for
OBJ_MIGHTBEDIRTY prevents the rest of function from operating on such
objects.
For vm_mmap_vnode(), if the object type is not OBJT_VNODE, require it
to be for swap pager (or default), handle the bypass filesystems, and
correctly acquire the object reference in this case.
Reviewed by: alc
Tested by: pho, bf
MFC after: 1 week
to vnode_pager_setsize(), is either OBJT_VNODE, or, if vnode was
already reclaimed, OBJT_DEAD. Note that the later is only possible
due to some filesystems, in particular, nfsiods from nfs clients, call
vnode_pager_setsize() with unlocked vnode.
More, if the object is terminated, do not perform the resizing
operation.
Reviewed by: alc
Tested by: pho, bf
MFC after: 1 week
the number of interior nodes, we have previously created a level zero
interior node at the root of every non-empty trie, even when that node is
not strictly necessary, i.e., it has only one child. This change is the
second (and final) step in eliminating those unnecessary level zero interior
nodes. Specifically, it updates the deletion and insertion functions so
that they do not require a level zero interior node at the root of the trie.
For a "buildworld" workload, this change results in a 16.8% reduction in the
number of interior nodes allocated and a similar reduction in the average
execution time for lookup functions. For example, the average execution
time for a call to vm_radix_lookup_ge() is reduced by 22.9%.
Reviewed by: attilio, jeff (an earlier version)
Sponsored by: EMC / Isilon Storage Division
the number of interior nodes, we always create a level zero interior node at
the root of every non-empty trie, even when that node is not strictly
necessary, i.e., it has only one child. This change is the first step in
eliminating those unnecessary level zero interior nodes. Specifically, it
updates all of the lookup functions so that they do not require a level zero
interior node at the root.
Reviewed by: attilio, jeff (an earlier version)
Sponsored by: EMC / Isilon Storage Division