freebsd-skq

Author	SHA1	Message	Date
dim	ecbf461bca	MFC r259893: In sys/vm/vm_pageout.c, since vm_pageout_worker() takes a void * as argument, cast the incoming 0 argument to void , to silence a warning from clang 3.4 ("expression which evaluates to zero treated as a null pointer constant of type 'void ' [-Wnon-literal-null-conversion]").	2013-12-28 02:07:29 +00:00
kib	758e3a9934	MFC r258039: Avoid overflow for the page counts. MFC r258365: Revert back to use int for the page counts. Rearrange the checks to correctly handle overflowing address arithmetic.	2013-12-17 09:21:56 +00:00
kib	37e02e5c7a	MFC r258367: Verify for zero-length requests and act as if it is always successfull without performing any action on the address space.	2013-12-13 06:28:18 +00:00
kib	acce26c3d9	MFC r258366: Add assertions to cover all places in the wiring and unwiring code where MAP_ENTRY_IN_TRANSITION is set or cleared.	2013-12-13 06:25:08 +00:00
kib	f79abc87df	MFC r257899: If filesystem declares that it supports shared locking for writes, use shared vnode lock for VOP_PUTPAGES() as well.	2013-12-13 06:12:21 +00:00
rodrigc	84898ef06b	MFC r258737 In keg_dtor(), print out the keg name in the "Freed UMA keg was not empty" message printed to the console. This makes it easier to track down the source of certain memory leaks. Suggested by: adrian Approved by: re (gjb)	2013-12-04 07:46:53 +00:00
kib	faa83dc918	MFC r257680: Do not coalesce if the swap object belongs to tmpfs vnode. Approved by: re (glebius)	2013-11-12 08:01:58 +00:00
alc	83e71fe4f7	Tidy up the output of "sysctl vm.phys_free". Approved by: re (glebius) Sponsored by: EMC / Isilon Storage Division	2013-10-10 16:11:45 +00:00
alc	6e2676ddc1	Both the vm_map and vmspace zones are defined as "no free". So, there is no point in defining a fini function for these zones. Reviewed by: kib Approved by: re (glebius) Sponsored by: EMC / Isilon Storage Division	2013-09-22 17:48:10 +00:00
neel	44c4dbefdb	Merge the following changes from projects/bhyve_npt_pmap: - add fields to 'struct pmap' that are required to manage nested page tables. - add a parameter to 'vmspace_alloc()' that can be used to override the default pmap initialization routine 'pmap_pinit()'. These changes are pushed ahead of the remaining changes in 'bhyve_npt_pmap' in anticipation of the upcoming KBI freeze for 10.0. Reviewed by: kib@, alc@ Approved by: re (glebius)	2013-09-20 17:06:49 +00:00
alc	88a4d0f31a	The pmap function pmap_clear_reference() is no longer used. Remove it. pmap_clear_reference() has had exactly one caller in the kernel for several years, more precisely, since FreeBSD 8. Now, that call no longer exists. Approved by: re (kib) Sponsored by: EMC / Isilon Storage Division	2013-09-20 04:30:18 +00:00
jhb	d3ef75b6c7	Extend the support for exempting processes from being killed when swap is exhausted. - Add a new protect(1) command that can be used to set or revoke protection from arbitrary processes. Similar to ktrace it can apply a change to all existing descendants of a process as well as future descendants. - Add a new procctl(2) system call that provides a generic interface for control operations on processes (as opposed to the debugger-specific operations provided by ptrace(2)). procctl(2) uses a combination of idtype_t and an id to identify the set of processes on which to operate similar to wait6(). - Add a PROC_SPROTECT control operation to manage the protection status of a set of processes. MADV_PROTECT still works for backwards compatability. - Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc) the first bit of which is used to track if P_PROTECT should be inherited by new child processes. Reviewed by: kib, jilles (earlier version) Approved by: re (delphij) MFC after: 1 month	2013-09-19 18:53:42 +00:00
kib	8ca067efb2	PG_SLAB no longer serves a useful purpose, since m->object is no longer abused to store pointer to slab. Remove it. Reviewed by: alc Sponsored by: The FreeBSD Foundation Approved by: re (hrs)	2013-09-17 07:35:26 +00:00
kib	6796656333	Remove zero-copy sockets code. It only worked for anonymous memory, and the equivalent functionality is now provided by sendfile(2) over posix shared memory filedescriptor. Remove the cow member of struct vm_page, and rearrange the remaining members. While there, make hold_count unsigned. Requested and reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation Approved by: re (delphij)	2013-09-16 06:25:54 +00:00
kib	889b9d0e0b	If the last page of the file is partially full and whole valid portion is invalidated, invalidate the whole page. Otherwise, partially valid page appears on a page queue, which is wrong. This could only happen for the last page, because only then buffer which triggered invalidation could not cover the whole page. Reported and tested by: pho (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation Approved by: re (delphij) MFC after: 2 weeks	2013-09-14 10:11:38 +00:00
jhb	3c31e1fb75	Fix an off-by-one error when populating mincore(2) entries for skipped entries. lastvecindex references the last valid byte, so the new bytes should come after it. Approved by: re (kib) MFC after: 1 week	2013-09-12 20:46:32 +00:00
jhb	04bb6e10cd	Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use an address in the first 2GB of the process's address space. This flag should have the same semantics as the same flag on Linux. To facilitate this, add a new parameter to vm_map_find() that specifies an optional maximum virtual address. While here, fix several callers of vm_map_find() to use a VMFS_* constant for the findspace argument instead of TRUE and FALSE. Reviewed by: alc Approved by: re (kib)	2013-09-09 18:11:59 +00:00
kib	56cc686058	Drain for the xbusy state for two places which potentially do pmap_remove_all(). Not doing the drain allows the pmap_enter() to proceed in parallel, making the pmap_remove_all() effects void. The race results in an invalidated page mapped wired by usermode. Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation Approved by: re (glebius)	2013-09-08 17:51:22 +00:00
kib	7ab18d4990	The vm_page_trysbusy() should not fail when shared busy counter or VPB_BIT_WAITERS flag were changed between reading of busy_lock and the cas. The vm_page_sbusy(), which is the only user of vm_page_trysbusy() in the tree, panics on the failure, which in these cases is transient and do not mean that the current page state prevents sbusying. Retry the operation inside vm_page_trysbusy() if cas failed, only return a failure when VPB_BIT_SHARED is cleared. Reported and tested by: pho Reviewed by: attilio Sponsored by: The FreeBSD Foundation	2013-09-05 12:54:40 +00:00
pjd	029a6f5d92	Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD \| CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t cap_rights_init(cap_rights_t rights, ...); void cap_rights_set(cap_rights_t rights, ...); void cap_rights_clear(cap_rights_t rights, ...); bool cap_rights_is_set(const cap_rights_t rights, ...); bool cap_rights_is_valid(const cap_rights_t rights); void cap_rights_merge(cap_rights_t dst, const cap_rights_t src); void cap_rights_remove(cap_rights_t dst, const cap_rights_t src); bool cap_rights_contains(const cap_rights_t big, const cap_rights_t little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP \| CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation	2013-09-05 00:09:56 +00:00
mckusick	57ee6d3c5d	Fix bug introduced in rewrite of keg_free_slab in -r251894. The consequence of the bug is that fini calls are not done when a slab is freed by a call-back from the page daemon. It went unnoticed for two months because fini is little used. I spotted the bug while reading the code to learn how it works so I could write it up for the next edition of the Design and Implementation of FreeBSD book. No MFC needed as this code exists only in HEAD. Reviewed by: kib, jeff Tested by: pho	2013-08-31 15:40:15 +00:00
alc	aa9a7bb9e6	Significantly reduce the cost, i.e., run time, of calls to madvise(..., MADV_DONTNEED) and madvise(..., MADV_FREE). Specifically, introduce a new pmap function, pmap_advise(), that operates on a range of virtual addresses within the specified pmap, allowing for a more efficient implementation of MADV_DONTNEED and MADV_FREE. Previously, the implementation of MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as pmap_clear_reference(). Intuitively, the problem with this implementation is that the pmap-level locks are acquired and released and the page table traversed repeatedly, once for each resident page in the range that was specified to madvise(2). A more subtle flaw with the previous implementation is that pmap_clear_reference() would clear the reference bit on all mappings to the specified page, not just the mapping in the range specified to madvise(2). Since our malloc(3) makes heavy use of madvise(2), this change can have a measureable impact. For example, the system time for completing a parallel "buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%. Note: This change only contains pmap_advise() implementations for a subset of our supported architectures. I will commit implementations for the remaining architectures after further testing. For now, a stub function is sufficient because of the advisory nature of pmap_advise(). Discussed with: jeff, jhb, kib Tested by: pho (i386), marcel (ia64) Sponsored by: EMC / Isilon Storage Division	2013-08-29 15:49:05 +00:00
glebius	088bcbe3ed	Remove comment that is no longer relevant since r254182.	2013-08-26 14:14:25 +00:00
alc	1a535523cd	Addendum to r254141: The call to vm_radix_insert() in vm_page_cache() can reclaim the last preexisting cached page in the object, resulting in a call to vdrop(). Detect this scenario so that the vnode's hold count is correctly maintained. Otherwise, we panic. Reported by: scottl Tested by: pho Discussed with: attilio, jeff, kib	2013-08-23 17:27:12 +00:00
kib	05a9dff802	Revert r254501. Instead, reuse the type stability of the struct pmap which is the part of struct vmspace, allocated from UMA_ZONE_NOFREE zone. Initialize the pmap lock in the vmspace zone init function, and remove pmap lock initialization and destruction from pmap_pinit() and pmap_release(). Suggested and reviewed by: alc (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation	2013-08-22 18:12:24 +00:00
kib	ba12eedccd	Remove the deprecated VM_ALLOC_RETRY flag for the vm_page_grab(9). The flag was mandatory since r209792, where vm_page_grab(9) was changed to only support the alloc retry semantic. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation	2013-08-22 07:39:53 +00:00
jeff	bef38f5afd	- Eliminate the vm object lock from the active queue scan. It is not necessary since we do not free or cache the page from active anymore. Document the one possible race that is harmless. Sponsored by: EMC / Isilon Storage Division Discussed with: alc	2013-08-21 22:39:19 +00:00
alc	42d76a02b5	Addendum to r254141: Allow recursion on the free pages queues lock in vm_page_alloc_freelist(). Reported and tested by: sbruno Sponsored by: EMC / Isilon Storage Division	2013-08-21 15:31:43 +00:00
jeff	0b78e7c4d9	- Increase the active lru refresh interval to 10 minutes. This has been shown to negatively impact some workloads and the goal is only to eliminate worst case behaviors for very long periods of paging inactivity. Eventually we should determine a more complex scaling factor for this feature. - Rate limit low memory callback handlers to limit thrashing. Set the default to 10 seconds. Sponsored by: EMC / Isilon Storage Division	2013-08-19 23:54:24 +00:00
jeff	ed90d4ba3f	- Use an arbitrary but reasonably large import size for kva on architectures that don't support superpages. This keeps the number of spans and internal fragmentation lower. - When the user asks for alignment from vmem_xalloc adjust the imported size by 2*align to be certain we can satisfy the allocation. This comes at the expense of potential failures when the backend can't supply enough memory but could supply the requested size and alignment. Sponsored by: EMC / Isilon Storage Division	2013-08-19 23:02:39 +00:00
kib	3c951b7b9d	Remove the arbitrary binding of the pagedaemon threads to the domains, update the comment accordingly and make it more precise. Requested and reviewed by: jeff (previous version)	2013-08-17 07:10:01 +00:00
jhb	3bfcb89de4	Add new mmap(2) flags to permit applications to request specific virtual address alignment of mappings. - MAP_ALIGNED(n) requests a mapping aligned on a boundary of (1 << n). Requests for n >= number of bits in a pointer or less than the size of a page fail with EINVAL. This matches the API provided by NetBSD. - MAP_ALIGNED_SUPER is a special case of MAP_ALIGNED. It can be used to optimize the chances of using large pages. By default it will align the mapping on a large page boundary (the system is free to choose any large page size to align to that seems best for the mapping request). However, if the object being mapped is already using large pages, then it will align the virtual mapping to match the existing large pages in the object instead. - Internally, VMFS_ALIGNED_SPACE is now renamed to VMFS_SUPER_SPACE, and VMFS_ALIGNED_SPACE(n) is repurposed for specifying a specific alignment. MAP_ALIGNED(n) maps to using VMFS_ALIGNED_SPACE(n), while MAP_ALIGNED_SUPER maps to VMFS_SUPER_SPACE. - mmap() of a device object now uses VMFS_OPTIMAL_SPACE rather than explicitly using VMFS_SUPER_SPACE. All device objects are forced to use a specific color on creation, so VMFS_OPTIMAL_SPACE is effectively equivalent. Reviewed by: alc MFC after: 1 month	2013-08-16 21:13:55 +00:00
jeff	478dc3171b	- Fix bug in r254304. Use the ACTIVE pq count for the active list processing, not inactive. This was the result of a bad merge. Reported by: pho Sponsored by: EMC / Isilon Storage Division	2013-08-15 22:29:49 +00:00
attilio	ae49aeaba6	On the recovery path for vm_page_alloc(), if a page had been requested wired, unwind back the wiring bits otherwise we can end up freeing a page that is considered wired. Sponsored by: EMC / Isilon storage division Reported by: alc	2013-08-15 11:01:25 +00:00
jeff	d330a11545	- Add a statically allocated memguard arena since it is needed very early on. - Pass the appropriate flags to vmem_xalloc() when allocating space for the arena from kmem_arena. Sponsored by: EMC / Isilon Storage Division	2013-08-13 22:40:43 +00:00
jeff	bc00d6df57	Improve pageout flow control to wakeup more frequently and do less work while maintaining better LRU of active pages. - Change v_free_target to include the quantity previously represented by v_cache_min so we don't need to add them together everywhere we use them. - Add a pageout_wakeup_thresh that sets the free page count trigger for waking the page daemon. Set this 10% above v_free_min so we wakeup before any phase transitions in vm users. - Adjust down v_free_target now that we're willing to accept more pagedaemon wakeups. This means we process fewer pages in one iteration as well, leading to shorter lock hold times and less overall disruption. - Eliminate vm_pageout_page_stats(). This was a minor variation on the PQ_ACTIVE segment of the normal pageout daemon. Instead we now process 1 / vm_pageout_update_period pages every second. This causes us to visit the whole active list every 60 seconds. Previously we would only maintain the active LRU when we were short on pages which would mean it could be woefully out of date. Reviewed by: alc (slight variant of this) Discussed with: alc, kib, jhb Sponsored by: EMC / Isilon Storage Division	2013-08-13 21:56:16 +00:00
attilio	f2a180739c	Correct the recovery logic in vm_page_alloc_contig: what is really needed on this code snipped is that all the pages that are already fully inserted gets fully freed, while for the others the object removal itself might be skipped, hence the object might be set to NULL. Sponsored by: EMC / Isilon storage division Reported by: alc, kib Reviewed by: alc	2013-08-11 21:15:04 +00:00
kib	4675fcfce0	Different consumers of the struct vm_page abuse pageq member to keep additional information, when the page is guaranteed to not belong to a paging queue. Usually, this results in a lot of type casts which make reasoning about the code correctness harder. Sometimes m->object is used instead of pageq, which could cause real and confusing bugs if non-NULL m->object is leaked. See r141955 and r253140 for examples. Change the pageq member into a union containing explicitly-typed members. Use them instead of type-punning or abusing m->object in x86 pmaps, uma and vm_page_alloc_contig(). Requested and reviewed by: alc Sponsored by: The FreeBSD Foundation	2013-08-10 17:36:42 +00:00
zont	340906f426	Remove unused definition for CTL_VM_NAMES. Suggested by: bde	2013-08-09 23:47:43 +00:00
jhb	8f3909e991	Revert the addition of VPO_BUSY and instead update vm_page_replace() to properly unbusy the page. Submitted by: alc	2013-08-09 21:14:55 +00:00
obrien	8b37b80e65	Add missing 'VPO_BUSY' from r254141 to fix kernel build break.	2013-08-09 16:43:50 +00:00
attilio	e9f37cac74	On all the architectures, avoid to preallocate the physical memory for nodes used in vm_radix. On architectures supporting direct mapping, also avoid to pre-allocate the KVA for such nodes. In order to do so make the operations derived from vm_radix_insert() to fail and handle all the deriving failure of those. vm_radix-wise introduce a new function called vm_radix_replace(), which can replace a leaf node, already present, with a new one, and take into account the possibility, during vm_radix_insert() allocation, that the operations on the radix trie can recurse. This means that if operations in vm_radix_insert() recursed vm_radix_insert() will start from scratch again. Sponsored by: EMC / Isilon storage division Reviewed by: alc (older version) Reviewed by: jeff Tested by: pho, scottl	2013-08-09 11:28:55 +00:00
attilio	16c7563cf4	The soft and hard busy mechanism rely on the vm object lock to work. Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl	2013-08-09 11:11:11 +00:00
kib	8de1718b60	Split the pagequeues per NUMA domains, and split pageademon process into threads each processing queue in a single domain. The structure of the pagedaemons and queues is kept intact, most of the changes come from the need for code to find an owning page queue for given page, calculated from the segment containing the page. The tie between NUMA domain and pagedaemon thread/pagequeue split is rather arbitrary, the multithreaded daemon could be allowed for the single-domain machines, or one domain might be split into several page domains, to further increase concurrency. Right now, each pagedaemon thread tries to reach the global target, precalculated at the start of the pass. This is not optimal, since it could cause excessive page deactivation and freeing. The code should be changed to re-check the global page deficit state in the loop after some number of iterations. The pagedaemons reach the quorum before starting the OOM, since one thread inability to meet the target is normal for split queues. Only when all pagedaemons fail to produce enough reusable pages, OOM is started by single selected thread. Launder is modified to take into account the segments layout with regard to the region for which cleaning is performed. Based on the preliminary patch by jeff, sponsored by EMC / Isilon Storage Division. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation	2013-08-07 16:36:38 +00:00
jeff	de4ecca213	Replace kernel virtual address space allocation with vmem. This provides transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division	2013-08-07 06:21:20 +00:00
markj	44ee260831	Fill in the description fields for M_FICT_PAGES. Reviewed by: kib MFC after: 3 days	2013-08-07 00:20:30 +00:00
attilio	899ab64514	Revert r253939: We cannot busy a page before doing pagefaults. Infact, it can deadlock against vnode lock, as it tries to vget(). Other functions, right now, have an opposite lock ordering, like vm_object_sync(), which acquires the vnode lock first and then sleeps on the busy mechanism. Before this patch is reinserted we need to break this ordering. Sponsored by: EMC / Isilon storage division Reported by: kib	2013-08-05 08:55:35 +00:00
attilio	19b2ea9f81	The page hold mechanism is fast but it has couple of fallouts: - It does not let pages respect the LRU policy - It bloats the active/inactive queues of few pages Try to avoid it as much as possible with the long-term target to completely remove it. Use the soft-busy mechanism to protect page content accesses during short-term operations (like uiomove_fromphys()). After this change only vm_fault_quick_hold_pages() is still using the hold mechanism for page content access. There is an additional complexity there as the quick path cannot immediately access the page object to busy the page and the slow path cannot however busy more than one page a time (to avoid deadlocks). Fixing such primitive can bring to complete removal of the page hold mechanism. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff Tested by: pho	2013-08-04 21:07:24 +00:00
zont	f6b004c36a	Unbreak sysctl ABI changes introduced in r253662 Requested by: bde	2013-07-29 18:48:51 +00:00
jeff	8076cabebb	Improve page LRU quality and simplify the logic. - Don't short-circuit aging tests for unmapped objects. This biases against unmapped file pages and transient mappings. - Always honor PGA_REFERENCED. We can now use this after soft busying to lazily restart the LRU. - Don't transition directly from active to cached bypassing the inactive queue. This frees recently used data much too early. - Rename actcount to act_delta to be more consistent with use and meaning. Reviewed by: kib, alc Sponsored by: EMC / Isilon Storage Division	2013-07-26 23:22:05 +00:00

1 2 3 4 5 ...

3365 Commits