31 Commits

Author SHA1 Message Date
alc
1a535523cd Addendum to r254141: The call to vm_radix_insert() in vm_page_cache() can
reclaim the last preexisting cached page in the object, resulting in a call
to vdrop().  Detect this scenario so that the vnode's hold count is
correctly maintained.  Otherwise, we panic.

Reported by:	scottl
Tested by:	pho
Discussed with:	attilio, jeff, kib
2013-08-23 17:27:12 +00:00
attilio
e9f37cac74 On all the architectures, avoid to preallocate the physical memory
for nodes used in vm_radix.
On architectures supporting direct mapping, also avoid to pre-allocate
the KVA for such nodes.

In order to do so make the operations derived from vm_radix_insert()
to fail and handle all the deriving failure of those.

vm_radix-wise introduce a new function called vm_radix_replace(),
which can replace a leaf node, already present, with a new one,
and take into account the possibility, during vm_radix_insert()
allocation, that the operations on the radix trie can recurse.
This means that if operations in vm_radix_insert() recursed
vm_radix_insert() will start from scratch again.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc (older version)
Reviewed by:	jeff
Tested by:	pho, scottl
2013-08-09 11:28:55 +00:00
attilio
919afa77e4 Commit new file FreeBSD tags.
Sponsored by:	EMC / Isilon storage division
2013-03-17 23:53:06 +00:00
alc
b346e448af Simplify the interface to vm_radix_insert() by eliminating the parameter
"index".  The content of a radix tree leaf, or at least its "key", is not
opaque to the other radix tree operations.  Specifically, they know how to
extract the "key" from a leaf.  So, eliminating the parameter "index" isn't
breaking the abstraction.  Moreover, eliminating the parameter "index"
effectively prevents the caller from passing an inconsistent "index" and
leaf to vm_radix_insert().

Reviewed by:	attilio
Sponsored by:	EMC / Isilon Storage Division
2013-03-17 16:06:03 +00:00
attilio
60e39c95b8 Remove the boot-time cache support and rely on UMA boot-time slab cache
for allocating the nodes before to have the possibility to carve
directly from the UMA subsystem.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
2013-03-04 00:07:23 +00:00
attilio
eafe26c8a6 The radix preallocation pages can overfow the biggestone segment, so
use a different scheme for preallocation: reserve few KB of nodes to be
used to cater page allocations before the memory can be efficiently
pre-allocated by UMA.

This at all effects remove boot_pages further carving and along with
this modifies to the boot_pages allocation system and necessity to
initialize the UMA zone before pmap_init().

Reported by:	pho, jhb
2013-02-14 15:23:00 +00:00
attilio
53f78d1a7d Implement a new algorithm for managing the radix trie which also
includes path-compression. This greatly helps with sparsely populated
tries, where an uncompressed trie may end up by having a lot of
intermediate nodes for very little leaves.

The new algorithm introduces 2 main concepts: the node level and the
node owner.  Every node represents a branch point where the leaves share
the key up to the level specified in the node-level (current level
excluded, of course).  Such key partly shared is the one contained in
the owner.  Of course, the root branch is exempted to keep a valid
owner, because theoretically all the keys are contained in the space
designed by the root branch node.  The search algorithm seems very
intuitive and that is where one should start reading to understand the
full approach.

In the end, the algorithm ends up by demanding only one node per insert
and this is not necessary in all the cases.  To stay safe, we basically
preallocate as many nodes as the number of physical pages are in the
system, using uma_preallocate().  However, this raises 2 concerns:
* As pmap_init() needs to kmem_alloc(), the nodes must be pre-allocated
  when vm_radix_init() is currently called, which is much before UMA
  is fully initialized.  This means that uma_prealloc() will dig into the
  UMA_BOOT_PAGES pool of pages, which is often not enough to keep track
  of such large allocations.
  In order to fix this, change a bit the concept of UMA_BOOT_PAGES and
  vm.boot_pages. More specifically make the UMA_BOOT_PAGES an initial "value"
  as long as vm.boot_pages and extend the boot_pages physical area by as
  many bytes as needed with the information returned by
  vm_radix_allocphys_size().
* A small amount of pages will be held in per-cpu buckets and won't be
  accessible from curcpu, so the vm_radix_node_get() could really panic
  when the pre-allocation pool is close to be exhausted.
  In theory we could pre-allocate more pages than the number of physical
  frames to satisfy such request, but as many insert would happen without
  a node allocation anyway, I think it is safe to assume that the
  over-allocation is already compensating for such problem.
  On the field testing can stand me correct, of course.  This could be
  further helped by the case where we allow a single-page insert to not
  require a complete root node.

The use of pre-allocation gets rid all the non-direct mapping trickery
and introduced lock recursion allowance for vm_page_free_queue.

The nodes children are reduced in number from 32 -> 16 and from 16 -> 8
(for respectively 64 bits and 32 bits architectures).
This would make the children to fit into cacheline for amd64 case,
for example, and in general spawn less cacheline, which may be
helpful in lookup_ge() case.
Also, path-compression cames to help in cases where there are many levels,
making the fallouts of such change less hurting.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	jeff (partially)
Tested by:	flo
2013-02-13 01:19:31 +00:00
attilio
4c22b4bafe Cleanup vm_radix KPI:
- Avoid the return value for vm_radix_insert()
- Name the functions argument per-style(9)
- Avoid to get and return opaque objects but use vm_page_t as vm_radix is
  thought to not really be general code but to cater specifically page
  cache and resident cache.
2013-02-06 18:37:46 +00:00
attilio
c02c27a33d Avoid a namespace pollution in vm_object.h by defining separately the
structure for vm_radix implementation.
2013-02-06 18:04:28 +00:00
attilio
f458bac614 Remove vm_radix_lookupn() and its usage in the kernel. 2013-01-10 12:30:58 +00:00
attilio
26c71bd174 Style. 2012-07-12 11:02:57 +00:00
attilio
34064c9bef Remove unused iterating functions. 2012-07-12 11:02:04 +00:00
attilio
a23ed68137 - Move VM_RADIX_STACK in vm_object.c because it is the only consumer
- Import the check for the return value of vm_radix_lookup() directly
  in the while removing the need to use a spourious check.
2012-07-08 23:50:57 +00:00
attilio
ffa3f082ff - Split the cached and resident pages tree into 2 distinct ones.
This makes the RED/BLACK support go away and simplifies a lot vmradix
  functions used here. This happens because with patricia trie support
  the trie will be little enough that keeping 2 diffetnt will be
  efficient too.
- Reduce differences with head, in places like backing scan where the
  optimizazions used shuffled the code a little bit around.

Tested by:	flo, Andrea Barberio
2012-07-08 14:01:25 +00:00
attilio
e761e0c4bc MFC 2012-06-01 14:57:55 +00:00
attilio
3bd53aaf3c - Fix a bug where lookupn can wrap up looking for the pages to scan,
returning a non correct very low address again.
- Stub out vm_lookup_foreach as it is not used
2012-05-12 19:22:57 +00:00
attilio
df89a6a2db - Exclude vm_radix_shrink() from the interface but retain the code
still as it can be useful.
- Make most of the interface private as it is unnecessary public right
  now.  This will help in making nodes changing with arch and still avoid
  namespace pollution.
2012-03-01 00:54:08 +00:00
attilio
1b454e6b83 Fix a bug in vm_radix_leaf() where the shifting start address can
wrap-up at some point.
This bug is triggered very easilly by indirect blocks in UFS which grow
negative resulting in very high counts.

In collabouration with:	flo
2012-01-29 16:44:21 +00:00
attilio
1f27e97ae5 Use atomics for rn_count on leaf node because RED operations happen
without the VM_OBJECT_LOCK held, thus can be concurrent with BLACK ones.
However, also use a write memory barrier in order to not reorder the
operation of decrementing rn_count in respect fetching the pointer.

Discussed with:	jeff
2011-12-06 22:57:48 +00:00
attilio
2436e63a9c - Make rn_count 32-bits as it will naturally pad for 32-bit arches
- Avoid to use atomic to manipulate it at level0 because it seems
  unneeded and introduces a bug on big-endian architectures where only
  the top half (2 bits) of the double-words are written (as sparc64,
  for example, doesn't support atomics at 16-bits) heading to a wrong
  handling of rn_count.

Reported by:	flo, andreast
Found by:	marius
No answer by:	jeff
2011-12-06 19:04:45 +00:00
attilio
b2701fb716 MFC 2011-12-02 21:45:46 +00:00
attilio
2cf76e3d27 Fix compilation for userland:
- Use CTASSERT() only in the kernel.
- the root pointer is required by struct vm_object which is accessible
  (maybe incorrectly?) by userland.
2011-11-15 23:37:15 +00:00
jeff
31902d5a7c - Add some convenience inlines.
- Update the copyright.
2011-11-01 04:21:57 +00:00
attilio
f8c5162413 Add kernel protection to the header file for vmradix.
Likely this file needs some more restructuration (and we should
make a lot of macros private to radix implementation) but leave them
as they are so far because we may enrich the KPI much further.
2011-11-01 03:53:10 +00:00
attilio
7272c08497 vm_object_terminate() doesn't actually free the pages in the splay
tree.
Reclaim all the nodes related to the radix tree for a specified
vm_object when calling vm_object_terminate() via the newly added
interface vm_radix_reclaim_nodes().
The function is recursive, but we have a well-defined maximum depth,
thus the amount of necessary stack can be easilly calculated.

Reported by:	alc
Discussed and reviewed by:	jeff
2011-11-01 03:40:38 +00:00
jeff
146842f2d2 - Extract part of vm_radix_lookupn() into a function that just finds the
first leaf page in a specified range.  This permits us to make many
   search & operate functions without much code duplication.
 - Make a generic iterator for radix items.
2011-10-30 22:57:42 +00:00
jeff
9e2e6a2980 - Support two types of nodes, red and black, within the same radix tree.
Black nodes support standard active pages and red nodes support cached
   pages.  Red nodes may be removed without the object lock but will not
   collapse unused tree nodes.  Red nodes may not be directly inserted,
   instead a new function is supplied to convert between black and red.
 - Handle cached pages and active pages in the same loop in vm_object_split,
   vm_object_backing_scan, and vm_object_terminate.
 - Retire the splay page handling as the ifdefs are too difficult to
   maintain.
 - Slightly optimize the vm_radix_lookupn() function.
2011-10-30 11:11:04 +00:00
jeff
1f2c60154d - Use a single uintptr_t for the root of the radix node that encodes the
height and a pointer so that the update to the root is atomic.  This
   permits safe lookups in parallel with tree expansion.  Shrinking the
   space requirements is a small bonus.
2011-10-28 03:42:41 +00:00
attilio
588d89046c Use an UMA zone for the radix node. This avoids the problem to check
for the kernel_map/kmem_map recursion because it uses direct mapping
provided by amd64 to avoid object and map search and recursion.

Probabilly all the others architectures using UMA_MD_SMALL_ALLOC are also
fixed by this, but other remains, where the most notable case is i386.
For it a solution has still to be determined.  A way to do this would
be to have a reserved map just for radix node and mark all accesses to
its lock to be witness safe, but that would still be unoptimal due to
the large amount of virtual address space needed to cater the whole
tree.
2011-10-28 01:56:36 +00:00
jeff
e6b90196c0 - Implement vm_radix_lookup_le().
- Fix vm_radix_lookupn() when max == -1 by making the end parameter
   inclusive.
2011-10-23 01:19:01 +00:00
attilio
c2583b6922 Check in an intial implementation of radix tree implementation to replace
the vm object pages splay.

TODO:
- Handle differently the negative keys for having smaller depth
  index nodes (negative keys caming from indirect blocks)
- Fix the get_node() by having support for a low reserved objects
  directly from UMA
- Implement the lookup_le and re-enable VM_NRESERVELEVEL = 1
- Try to rework the superpage splay of idle pages and the cache splay
  for every vm object in order to regain space on vm_page structure
- Verify performance and improve them (likely by having consumers to deal
  with several ranges of pages manually?)

Obtained from:	jeff, Mayur Shardul (GSoC 2009)
2011-10-22 23:34:37 +00:00