freebsd-skq

Author	SHA1	Message	Date
alc	1a535523cd	Addendum to r254141: The call to vm_radix_insert() in vm_page_cache() can reclaim the last preexisting cached page in the object, resulting in a call to vdrop(). Detect this scenario so that the vnode's hold count is correctly maintained. Otherwise, we panic. Reported by: scottl Tested by: pho Discussed with: attilio, jeff, kib	2013-08-23 17:27:12 +00:00
attilio	e9f37cac74	On all the architectures, avoid to preallocate the physical memory for nodes used in vm_radix. On architectures supporting direct mapping, also avoid to pre-allocate the KVA for such nodes. In order to do so make the operations derived from vm_radix_insert() to fail and handle all the deriving failure of those. vm_radix-wise introduce a new function called vm_radix_replace(), which can replace a leaf node, already present, with a new one, and take into account the possibility, during vm_radix_insert() allocation, that the operations on the radix trie can recurse. This means that if operations in vm_radix_insert() recursed vm_radix_insert() will start from scratch again. Sponsored by: EMC / Isilon storage division Reviewed by: alc (older version) Reviewed by: jeff Tested by: pho, scottl	2013-08-09 11:28:55 +00:00
attilio	919afa77e4	Commit new file FreeBSD tags. Sponsored by: EMC / Isilon storage division	2013-03-17 23:53:06 +00:00
alc	b346e448af	Simplify the interface to vm_radix_insert() by eliminating the parameter "index". The content of a radix tree leaf, or at least its "key", is not opaque to the other radix tree operations. Specifically, they know how to extract the "key" from a leaf. So, eliminating the parameter "index" isn't breaking the abstraction. Moreover, eliminating the parameter "index" effectively prevents the caller from passing an inconsistent "index" and leaf to vm_radix_insert(). Reviewed by: attilio Sponsored by: EMC / Isilon Storage Division	2013-03-17 16:06:03 +00:00
attilio	60e39c95b8	Remove the boot-time cache support and rely on UMA boot-time slab cache for allocating the nodes before to have the possibility to carve directly from the UMA subsystem. Sponsored by: EMC / Isilon storage division Reviewed by: alc	2013-03-04 00:07:23 +00:00
attilio	eafe26c8a6	The radix preallocation pages can overfow the biggestone segment, so use a different scheme for preallocation: reserve few KB of nodes to be used to cater page allocations before the memory can be efficiently pre-allocated by UMA. This at all effects remove boot_pages further carving and along with this modifies to the boot_pages allocation system and necessity to initialize the UMA zone before pmap_init(). Reported by: pho, jhb	2013-02-14 15:23:00 +00:00
attilio	53f78d1a7d	Implement a new algorithm for managing the radix trie which also includes path-compression. This greatly helps with sparsely populated tries, where an uncompressed trie may end up by having a lot of intermediate nodes for very little leaves. The new algorithm introduces 2 main concepts: the node level and the node owner. Every node represents a branch point where the leaves share the key up to the level specified in the node-level (current level excluded, of course). Such key partly shared is the one contained in the owner. Of course, the root branch is exempted to keep a valid owner, because theoretically all the keys are contained in the space designed by the root branch node. The search algorithm seems very intuitive and that is where one should start reading to understand the full approach. In the end, the algorithm ends up by demanding only one node per insert and this is not necessary in all the cases. To stay safe, we basically preallocate as many nodes as the number of physical pages are in the system, using uma_preallocate(). However, this raises 2 concerns: * As pmap_init() needs to kmem_alloc(), the nodes must be pre-allocated when vm_radix_init() is currently called, which is much before UMA is fully initialized. This means that uma_prealloc() will dig into the UMA_BOOT_PAGES pool of pages, which is often not enough to keep track of such large allocations. In order to fix this, change a bit the concept of UMA_BOOT_PAGES and vm.boot_pages. More specifically make the UMA_BOOT_PAGES an initial "value" as long as vm.boot_pages and extend the boot_pages physical area by as many bytes as needed with the information returned by vm_radix_allocphys_size(). * A small amount of pages will be held in per-cpu buckets and won't be accessible from curcpu, so the vm_radix_node_get() could really panic when the pre-allocation pool is close to be exhausted. In theory we could pre-allocate more pages than the number of physical frames to satisfy such request, but as many insert would happen without a node allocation anyway, I think it is safe to assume that the over-allocation is already compensating for such problem. On the field testing can stand me correct, of course. This could be further helped by the case where we allow a single-page insert to not require a complete root node. The use of pre-allocation gets rid all the non-direct mapping trickery and introduced lock recursion allowance for vm_page_free_queue. The nodes children are reduced in number from 32 -> 16 and from 16 -> 8 (for respectively 64 bits and 32 bits architectures). This would make the children to fit into cacheline for amd64 case, for example, and in general spawn less cacheline, which may be helpful in lookup_ge() case. Also, path-compression cames to help in cases where there are many levels, making the fallouts of such change less hurting. Sponsored by: EMC / Isilon storage division Reviewed by: jeff (partially) Tested by: flo	2013-02-13 01:19:31 +00:00
attilio	4c22b4bafe	Cleanup vm_radix KPI: - Avoid the return value for vm_radix_insert() - Name the functions argument per-style(9) - Avoid to get and return opaque objects but use vm_page_t as vm_radix is thought to not really be general code but to cater specifically page cache and resident cache.	2013-02-06 18:37:46 +00:00
attilio	c02c27a33d	Avoid a namespace pollution in vm_object.h by defining separately the structure for vm_radix implementation.	2013-02-06 18:04:28 +00:00
attilio	f458bac614	Remove vm_radix_lookupn() and its usage in the kernel.	2013-01-10 12:30:58 +00:00
attilio	26c71bd174	Style.	2012-07-12 11:02:57 +00:00
attilio	34064c9bef	Remove unused iterating functions.	2012-07-12 11:02:04 +00:00
attilio	a23ed68137	- Move VM_RADIX_STACK in vm_object.c because it is the only consumer - Import the check for the return value of vm_radix_lookup() directly in the while removing the need to use a spourious check.	2012-07-08 23:50:57 +00:00
attilio	ffa3f082ff	- Split the cached and resident pages tree into 2 distinct ones. This makes the RED/BLACK support go away and simplifies a lot vmradix functions used here. This happens because with patricia trie support the trie will be little enough that keeping 2 diffetnt will be efficient too. - Reduce differences with head, in places like backing scan where the optimizazions used shuffled the code a little bit around. Tested by: flo, Andrea Barberio	2012-07-08 14:01:25 +00:00
attilio	e761e0c4bc	MFC	2012-06-01 14:57:55 +00:00
attilio	3bd53aaf3c	- Fix a bug where lookupn can wrap up looking for the pages to scan, returning a non correct very low address again. - Stub out vm_lookup_foreach as it is not used	2012-05-12 19:22:57 +00:00
attilio	df89a6a2db	- Exclude vm_radix_shrink() from the interface but retain the code still as it can be useful. - Make most of the interface private as it is unnecessary public right now. This will help in making nodes changing with arch and still avoid namespace pollution.	2012-03-01 00:54:08 +00:00
attilio	1b454e6b83	Fix a bug in vm_radix_leaf() where the shifting start address can wrap-up at some point. This bug is triggered very easilly by indirect blocks in UFS which grow negative resulting in very high counts. In collabouration with: flo	2012-01-29 16:44:21 +00:00
attilio	1f27e97ae5	Use atomics for rn_count on leaf node because RED operations happen without the VM_OBJECT_LOCK held, thus can be concurrent with BLACK ones. However, also use a write memory barrier in order to not reorder the operation of decrementing rn_count in respect fetching the pointer. Discussed with: jeff	2011-12-06 22:57:48 +00:00
attilio	2436e63a9c	- Make rn_count 32-bits as it will naturally pad for 32-bit arches - Avoid to use atomic to manipulate it at level0 because it seems unneeded and introduces a bug on big-endian architectures where only the top half (2 bits) of the double-words are written (as sparc64, for example, doesn't support atomics at 16-bits) heading to a wrong handling of rn_count. Reported by: flo, andreast Found by: marius No answer by: jeff	2011-12-06 19:04:45 +00:00
attilio	b2701fb716	MFC	2011-12-02 21:45:46 +00:00
attilio	2cf76e3d27	Fix compilation for userland: - Use CTASSERT() only in the kernel. - the root pointer is required by struct vm_object which is accessible (maybe incorrectly?) by userland.	2011-11-15 23:37:15 +00:00
jeff	31902d5a7c	- Add some convenience inlines. - Update the copyright.	2011-11-01 04:21:57 +00:00
attilio	f8c5162413	Add kernel protection to the header file for vmradix. Likely this file needs some more restructuration (and we should make a lot of macros private to radix implementation) but leave them as they are so far because we may enrich the KPI much further.	2011-11-01 03:53:10 +00:00
attilio	7272c08497	vm_object_terminate() doesn't actually free the pages in the splay tree. Reclaim all the nodes related to the radix tree for a specified vm_object when calling vm_object_terminate() via the newly added interface vm_radix_reclaim_nodes(). The function is recursive, but we have a well-defined maximum depth, thus the amount of necessary stack can be easilly calculated. Reported by: alc Discussed and reviewed by: jeff	2011-11-01 03:40:38 +00:00
jeff	146842f2d2	- Extract part of vm_radix_lookupn() into a function that just finds the first leaf page in a specified range. This permits us to make many search & operate functions without much code duplication. - Make a generic iterator for radix items.	2011-10-30 22:57:42 +00:00
jeff	9e2e6a2980	- Support two types of nodes, red and black, within the same radix tree. Black nodes support standard active pages and red nodes support cached pages. Red nodes may be removed without the object lock but will not collapse unused tree nodes. Red nodes may not be directly inserted, instead a new function is supplied to convert between black and red. - Handle cached pages and active pages in the same loop in vm_object_split, vm_object_backing_scan, and vm_object_terminate. - Retire the splay page handling as the ifdefs are too difficult to maintain. - Slightly optimize the vm_radix_lookupn() function.	2011-10-30 11:11:04 +00:00
jeff	1f2c60154d	- Use a single uintptr_t for the root of the radix node that encodes the height and a pointer so that the update to the root is atomic. This permits safe lookups in parallel with tree expansion. Shrinking the space requirements is a small bonus.	2011-10-28 03:42:41 +00:00
attilio	588d89046c	Use an UMA zone for the radix node. This avoids the problem to check for the kernel_map/kmem_map recursion because it uses direct mapping provided by amd64 to avoid object and map search and recursion. Probabilly all the others architectures using UMA_MD_SMALL_ALLOC are also fixed by this, but other remains, where the most notable case is i386. For it a solution has still to be determined. A way to do this would be to have a reserved map just for radix node and mark all accesses to its lock to be witness safe, but that would still be unoptimal due to the large amount of virtual address space needed to cater the whole tree.	2011-10-28 01:56:36 +00:00
jeff	e6b90196c0	- Implement vm_radix_lookup_le(). - Fix vm_radix_lookupn() when max == -1 by making the end parameter inclusive.	2011-10-23 01:19:01 +00:00
attilio	c2583b6922	Check in an intial implementation of radix tree implementation to replace the vm object pages splay. TODO: - Handle differently the negative keys for having smaller depth index nodes (negative keys caming from indirect blocks) - Fix the get_node() by having support for a low reserved objects directly from UMA - Implement the lookup_le and re-enable VM_NRESERVELEVEL = 1 - Try to rework the superpage splay of idle pages and the cache splay for every vm object in order to regain space on vm_page structure - Verify performance and improve them (likely by having consumers to deal with several ranges of pages manually?) Obtained from: jeff, Mayur Shardul (GSoC 2009)	2011-10-22 23:34:37 +00:00

31 Commits