* VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations
* VM_OBJECT_SLEEP() is introduced as a general purpose primitve to
get a sleep operation using a VM_OBJECT_LOCK() as protection
* The approach must bear with vm_pager.h namespace pollution so many
files require including directly rwlock.h
machine to another. Therefore, VM_MAX_KERNEL_ADDRESS can't be a constant.
Instead, #define it to be a variable, vm_max_kernel_address, just like we
do on sparc64.
Reviewed by: kib
Tested by: ian
the actual number of vm_page_t that will be derived, so v_page_count
should be used appropriately.
Besides that, add a panic condition in case UMA fails to properly
restrict the area in a way to keep all the desired objects.
Sponsored by: EMC / Isilon storage division
Reported by: alc
- Use predict_false() to tag boot-time cache decisions
- Compact boot-time cache allocation into a separate, non-inline,
function that won't be called most of the times.
Sponsored by: EMC / Isilon storage division
assertions and other code in this file.
- Reinsert some comments that were lost during the work but which are
actual yet, reducing differences with HEAD.
Sponsoed by: EMC / Isilon storage division
use a different scheme for preallocation: reserve few KB of nodes to be
used to cater page allocations before the memory can be efficiently
pre-allocated by UMA.
This at all effects remove boot_pages further carving and along with
this modifies to the boot_pages allocation system and necessity to
initialize the UMA zone before pmap_init().
Reported by: pho, jhb
includes path-compression. This greatly helps with sparsely populated
tries, where an uncompressed trie may end up by having a lot of
intermediate nodes for very little leaves.
The new algorithm introduces 2 main concepts: the node level and the
node owner. Every node represents a branch point where the leaves share
the key up to the level specified in the node-level (current level
excluded, of course). Such key partly shared is the one contained in
the owner. Of course, the root branch is exempted to keep a valid
owner, because theoretically all the keys are contained in the space
designed by the root branch node. The search algorithm seems very
intuitive and that is where one should start reading to understand the
full approach.
In the end, the algorithm ends up by demanding only one node per insert
and this is not necessary in all the cases. To stay safe, we basically
preallocate as many nodes as the number of physical pages are in the
system, using uma_preallocate(). However, this raises 2 concerns:
* As pmap_init() needs to kmem_alloc(), the nodes must be pre-allocated
when vm_radix_init() is currently called, which is much before UMA
is fully initialized. This means that uma_prealloc() will dig into the
UMA_BOOT_PAGES pool of pages, which is often not enough to keep track
of such large allocations.
In order to fix this, change a bit the concept of UMA_BOOT_PAGES and
vm.boot_pages. More specifically make the UMA_BOOT_PAGES an initial "value"
as long as vm.boot_pages and extend the boot_pages physical area by as
many bytes as needed with the information returned by
vm_radix_allocphys_size().
* A small amount of pages will be held in per-cpu buckets and won't be
accessible from curcpu, so the vm_radix_node_get() could really panic
when the pre-allocation pool is close to be exhausted.
In theory we could pre-allocate more pages than the number of physical
frames to satisfy such request, but as many insert would happen without
a node allocation anyway, I think it is safe to assume that the
over-allocation is already compensating for such problem.
On the field testing can stand me correct, of course. This could be
further helped by the case where we allow a single-page insert to not
require a complete root node.
The use of pre-allocation gets rid all the non-direct mapping trickery
and introduced lock recursion allowance for vm_page_free_queue.
The nodes children are reduced in number from 32 -> 16 and from 16 -> 8
(for respectively 64 bits and 32 bits architectures).
This would make the children to fit into cacheline for amd64 case,
for example, and in general spawn less cacheline, which may be
helpful in lookup_ge() case.
Also, path-compression cames to help in cases where there are many levels,
making the fallouts of such change less hurting.
Sponsored by: EMC / Isilon storage division
Reviewed by: jeff (partially)
Tested by: flo
- Avoid the return value for vm_radix_insert()
- Name the functions argument per-style(9)
- Avoid to get and return opaque objects but use vm_page_t as vm_radix is
thought to not really be general code but to cater specifically page
cache and resident cache.