Commit Graph

57 Commits

Author SHA1 Message Date
attilio
a2e67affe3 Expand ambiguous comments some more.
Requested by:	alc
2013-03-17 15:27:26 +00:00
attilio
07b5846fc9 Fix compilation.
Sponsored by:	EMC / Isilon storage division
2013-03-13 01:38:32 +00:00
attilio
02cf10e6db Add a further safety belt to prevent inconsistencies.
Sponsored by:	EMC / Isilon storage division
Submitted by:	alc
2013-03-13 01:00:34 +00:00
attilio
ba43ac477b For uniformity, use the user provided index.
Sponsored by:	EMC / Isilon storage division
Reviewed and reported by:	alc
2013-03-13 00:41:37 +00:00
attilio
82aa86d64f Improve comments.
Sponsored by:	EMC / Isilon storage division
Submitted by:	mdf
2013-03-07 23:37:10 +00:00
alc
a8671df14b Fix a typo.
Sponsored by:	EMC / Isilon Storage Division
2013-03-04 07:25:11 +00:00
alc
c3be5353b8 Make a pass over most of the comments. 2013-03-04 07:11:10 +00:00
alc
475367da61 Simplify Boolean expressions.
Sponsored by:	EMC / Isilon Storage Division
2013-03-04 06:26:25 +00:00
alc
5094368613 Fix spelling.
Sponsored by:	EMC / Isilon Storage Division
2013-03-04 06:13:26 +00:00
attilio
60e39c95b8 Remove the boot-time cache support and rely on UMA boot-time slab cache
for allocating the nodes before to have the possibility to carve
directly from the UMA subsystem.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
2013-03-04 00:07:23 +00:00
attilio
343c9f6f19 Missing semicolon.
Sponsored by:	EMC / Isilon storage division
Submitted by:	alc
Pointy hat to:	me
2013-02-24 19:10:16 +00:00
attilio
1a753217f3 Simplify return logic.
Sponsored by:	EMC / Isilon storage division
Submitted by:	alc
2013-02-24 19:05:11 +00:00
attilio
12289fcebc Retire the old UMA primitive uma_zone_set_obj() and replace it with the
more modern uma_zone_reserve_kva(). The difference is that it doesn't
rely anymore on an obj to allocate pages and the slab allocator doesn't
use any more any specific locking but atomic operations to complete
the operation.
Where possible, the uma_small_alloc() is instead used and the uk_kva
member becomes unused.

The subsequent cleanups also brings along the removal of
VM_OBJECT_LOCK_INIT() macro which is not used anymore as the code
can be easilly cleaned up to perform a single mtx_init(), private
to vm_object.c.
For the same reason, _vm_object_allocate() becomes private as well.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
2013-02-24 16:41:36 +00:00
attilio
f6d331e804 Fix an inverted check that was reporting indexes wrongly detected
as wrapped.

Sponsored by:	EMC / Isilon storage divison
Reported by:	alc
2013-02-24 16:08:37 +00:00
attilio
1cf2f4550b On arches with VM_PHYSSEG_DENSE the vm_page_array is larger than
the actual number of vm_page_t that will be derived, so v_page_count
should be used appropriately.

Besides that, add a panic condition in case UMA fails to properly
restrict the area in a way to keep all the desired objects.

Sponsored by:	EMC / Isilon storage division
Reported by:	alc
2013-02-15 16:05:18 +00:00
attilio
757b950804 Remove unused headers. 2013-02-15 15:34:19 +00:00
attilio
daa1f2caab Fix comment. 2013-02-15 14:54:09 +00:00
attilio
fa12391493 Move the radix node zone destructor definition closer to
vm_radix_init() definition.

Sponsored by:	EMC / Isilon storage division
2013-02-15 14:53:42 +00:00
attilio
be627ca24c - When panicing for "too small boot cache" reason, print the actual
cache size value
- Add a way to specify the size of the boot cache at compile time

Sponsored by:	EMC / Isilon storage division
2013-02-15 14:50:36 +00:00
attilio
47ecbcf556 Improve dynamic branch prediction and i-cache utilization:
- Use predict_false() to tag boot-time cache decisions
- Compact boot-time cache allocation into a separate, non-inline,
  function that won't be called most of the times.

Sponsored by:	EMC / Isilon storage division
2013-02-15 14:48:06 +00:00
attilio
908e129569 Fix style. 2013-02-14 15:24:13 +00:00
attilio
eafe26c8a6 The radix preallocation pages can overfow the biggestone segment, so
use a different scheme for preallocation: reserve few KB of nodes to be
used to cater page allocations before the memory can be efficiently
pre-allocated by UMA.

This at all effects remove boot_pages further carving and along with
this modifies to the boot_pages allocation system and necessity to
initialize the UMA zone before pmap_init().

Reported by:	pho, jhb
2013-02-14 15:23:00 +00:00
attilio
3db337c2ea Grammar.
Sponsored by:	EMC / Isilon storage division
2013-02-13 02:04:49 +00:00
attilio
53f78d1a7d Implement a new algorithm for managing the radix trie which also
includes path-compression. This greatly helps with sparsely populated
tries, where an uncompressed trie may end up by having a lot of
intermediate nodes for very little leaves.

The new algorithm introduces 2 main concepts: the node level and the
node owner.  Every node represents a branch point where the leaves share
the key up to the level specified in the node-level (current level
excluded, of course).  Such key partly shared is the one contained in
the owner.  Of course, the root branch is exempted to keep a valid
owner, because theoretically all the keys are contained in the space
designed by the root branch node.  The search algorithm seems very
intuitive and that is where one should start reading to understand the
full approach.

In the end, the algorithm ends up by demanding only one node per insert
and this is not necessary in all the cases.  To stay safe, we basically
preallocate as many nodes as the number of physical pages are in the
system, using uma_preallocate().  However, this raises 2 concerns:
* As pmap_init() needs to kmem_alloc(), the nodes must be pre-allocated
  when vm_radix_init() is currently called, which is much before UMA
  is fully initialized.  This means that uma_prealloc() will dig into the
  UMA_BOOT_PAGES pool of pages, which is often not enough to keep track
  of such large allocations.
  In order to fix this, change a bit the concept of UMA_BOOT_PAGES and
  vm.boot_pages. More specifically make the UMA_BOOT_PAGES an initial "value"
  as long as vm.boot_pages and extend the boot_pages physical area by as
  many bytes as needed with the information returned by
  vm_radix_allocphys_size().
* A small amount of pages will be held in per-cpu buckets and won't be
  accessible from curcpu, so the vm_radix_node_get() could really panic
  when the pre-allocation pool is close to be exhausted.
  In theory we could pre-allocate more pages than the number of physical
  frames to satisfy such request, but as many insert would happen without
  a node allocation anyway, I think it is safe to assume that the
  over-allocation is already compensating for such problem.
  On the field testing can stand me correct, of course.  This could be
  further helped by the case where we allow a single-page insert to not
  require a complete root node.

The use of pre-allocation gets rid all the non-direct mapping trickery
and introduced lock recursion allowance for vm_page_free_queue.

The nodes children are reduced in number from 32 -> 16 and from 16 -> 8
(for respectively 64 bits and 32 bits architectures).
This would make the children to fit into cacheline for amd64 case,
for example, and in general spawn less cacheline, which may be
helpful in lookup_ge() case.
Also, path-compression cames to help in cases where there are many levels,
making the fallouts of such change less hurting.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	jeff (partially)
Tested by:	flo
2013-02-13 01:19:31 +00:00
attilio
4c22b4bafe Cleanup vm_radix KPI:
- Avoid the return value for vm_radix_insert()
- Name the functions argument per-style(9)
- Avoid to get and return opaque objects but use vm_page_t as vm_radix is
  thought to not really be general code but to cater specifically page
  cache and resident cache.
2013-02-06 18:37:46 +00:00
attilio
f458bac614 Remove vm_radix_lookupn() and its usage in the kernel. 2013-01-10 12:30:58 +00:00
attilio
ffa3f082ff - Split the cached and resident pages tree into 2 distinct ones.
This makes the RED/BLACK support go away and simplifies a lot vmradix
  functions used here. This happens because with patricia trie support
  the trie will be little enough that keeping 2 diffetnt will be
  efficient too.
- Reduce differences with head, in places like backing scan where the
  optimizazions used shuffled the code a little bit around.

Tested by:	flo, Andrea Barberio
2012-07-08 14:01:25 +00:00
attilio
807db03f96 Revert r231027 and fix the prototype for vm_radix_remove().
The target of this is getting at the point where the recovery path is
completely removed as we could count on pre-allocation once the
path compressed trie is implemented.
2012-06-08 18:44:54 +00:00
attilio
7b7f4887b9 Revert r236367.
The target of this is getting at the point where the recovery path is
completely removed as we could count on pre-allocation once the
path compressed trie is implemented.
2012-06-08 18:08:31 +00:00
attilio
ab9d63eba7 Simplify insert path by using the same logic of vm_radix_remove() for
the recovery path. The bulk of vm_radix_remove() is put into a generic
function vm_radix_sweep() which allows 2 different modes (hard and soft):
the soft one will deal with half-constructed paths by cleaning them up.

Ideally all these complications should go once that a way to pre-allocate
is implemented, possibly by implementing path compression.

Requested and discussed with:	jeff
Tested by:	pho
2012-05-31 22:54:08 +00:00
attilio
c72fe43a63 Add braces. 2012-05-12 19:54:57 +00:00
attilio
e5220032ec On 32-bits architecture KTR has a bug as it cannot correctly grok
64-bits numbers. ktr_tracepoint() infacts casts all the passed value to
u_long values as that is what the ktr entries can handle.

However, we have to work a lot with vm_pindex_t which are always 64-bit
also on 32-bits architectures (most notable case being i386).

Use macros to split the 64 bits printing into 32-bits chunks which
KTR can correctly handle.

Reported and tested by:	flo
2012-05-12 19:52:59 +00:00
attilio
3bd53aaf3c - Fix a bug where lookupn can wrap up looking for the pages to scan,
returning a non correct very low address again.
- Stub out vm_lookup_foreach as it is not used
2012-05-12 19:22:57 +00:00
attilio
f9319cf885 Fix the nodes allocator in architectures without direct-mapping:
- Fix bugs in the free path where the pages were not unwired and
  relevant locking wasn't acquired.
- Introduce the rnode_map, submap of kernel_map, where to allocate from.
  The reason is that, in architectures without direct-mapping,
  kmem_alloc*() will try to insert the newly created mapping while
  holding the vm_object lock introducing a LOR or lock recursion.
  rnode_map is however a leafly-used submap, thus there cannot be any
  deadlock.
  Notes: Size the submap in order to be, by default, around 64 MB and
  decrase the size of the nodes as the allocation will be much smaller
  (and when the compacting code in the vm_radix will be implemented this
  will aim for much less space to be used).  However note that the
  size of the submap can be changed at boot time via the
  hw.rnode_map_scale scaling factor.
- Use uma_zone_set_max() covering the size of the submap.

Tested by:	flo
2012-03-16 15:41:07 +00:00
attilio
9e63566650 Fix a compile time bug by adding a check just after the struct
definition
2012-03-06 23:37:53 +00:00
attilio
df89a6a2db - Exclude vm_radix_shrink() from the interface but retain the code
still as it can be useful.
- Make most of the interface private as it is unnecessary public right
  now.  This will help in making nodes changing with arch and still avoid
  namespace pollution.
2012-03-01 00:54:08 +00:00
flo
1e497814c3 fix KTR consistency
I'm committing this on behalf of Attilio as he cannot access svn right now.
2012-02-05 18:55:20 +00:00
attilio
6587a6afdd Remove the panic from vm_radix_insert() and propagate the error to the
callers of vm_page_insert().

The default action for every caller is to unwind-back the operation
besides vm_page_rename() where this has proven to be impossible to do.
For that case, it just spins until the page is not available to be
allocated. However, due to vm_page_rename() to be mostly rare (and
having never hit this panic in the past) it is tought to be a very
seldom thing and not a possible performance factor.

The patch has been tested with an atomic counter returning NULL from
the zone allocator every 1/100000 allocations. Per-printf, I've verified
that a typical buildkernel could trigger this 30 times. The patch
survived to 2 hours of repeated buildkernel/world.

Several technical notes:
- The vm_page_insert() is moved, in several callers, closer to failure
  points.  This could be committed separately before vmcontention hits
  the tree just to verify -CURRENT is happy with it.
- vm_page_rename() does not need to have the page lock in the callers
  as it hide that as an implementation detail. Do the locking internally.
- now vm_page_insert() returns an int, with 0 meaning everything was ok,
  thus KPI is broken by this patch.
2012-02-05 17:37:26 +00:00
attilio
1b454e6b83 Fix a bug in vm_radix_leaf() where the shifting start address can
wrap-up at some point.
This bug is triggered very easilly by indirect blocks in UFS which grow
negative resulting in very high counts.

In collabouration with:	flo
2012-01-29 16:44:21 +00:00
attilio
8bc5caadc8 Fix format string for the pindex members as they should be treated
as uintmax_t for compatibility among 32/64 bits.
2012-01-29 16:29:06 +00:00
attilio
1f27e97ae5 Use atomics for rn_count on leaf node because RED operations happen
without the VM_OBJECT_LOCK held, thus can be concurrent with BLACK ones.
However, also use a write memory barrier in order to not reorder the
operation of decrementing rn_count in respect fetching the pointer.

Discussed with:	jeff
2011-12-06 22:57:48 +00:00
attilio
2436e63a9c - Make rn_count 32-bits as it will naturally pad for 32-bit arches
- Avoid to use atomic to manipulate it at level0 because it seems
  unneeded and introduces a bug on big-endian architectures where only
  the top half (2 bits) of the double-words are written (as sparc64,
  for example, doesn't support atomics at 16-bits) heading to a wrong
  handling of rn_count.

Reported by:	flo, andreast
Found by:	marius
No answer by:	jeff
2011-12-06 19:04:45 +00:00
andreast
8c385e6008 Fix compilation issue on 32-bit targets.
Reviewed by:	attilio
2011-12-05 16:06:12 +00:00
attilio
8fbad61ab4 Revert a change that sneaked in during the last MFC. 2011-12-02 23:21:59 +00:00
attilio
b2701fb716 MFC 2011-12-02 21:45:46 +00:00
attilio
984f9ddc2b - Remove unnecessary checks on rnode in KTR prints
- Track rn_count in KTR prints
- Improve KTR in a way it best fits rn_count tracking
2011-11-29 02:07:07 +00:00
attilio
27cea0290f Fix compile.
Submitted by:	flo
2011-11-28 19:14:38 +00:00
attilio
77442309d8 Improve the diagnostic in the remove case. 2011-11-28 17:26:19 +00:00
attilio
4391f6c279 Fix a bug when the 'rnode' pointer can be NULL and we try to track
the children.  This helps in debugging case.

Reported by:	flo
2011-11-26 14:26:37 +00:00
attilio
062b8ecde4 Add more KTR points for failure in vm_radix_insert(). 2011-11-20 14:51:27 +00:00