freebsd-nq

Author	SHA1	Message	Date
Alan Cox	37244a84fd	Replace an unnecessary call to vm_page_activate() by an assertion that the page is already wired or queued. Prior to the elimination of PG_CACHED pages, vm_page_grab() might have returned a valid, previously PG_CACHED page, in which case enqueueing the page was necessary. Now, that can't happen. Moreover, activating the page is a dubious choice, since the page is not being accessed. Reviewed by: kib MFC after: 1 week	2017-10-08 16:54:42 +00:00
Alan Cox	41e5a22698	When an I/O error occurs on page out, there is no need to dirty the page, because it is already dirty. Instead, assert that the page is dirty. Reviewed by: kib, markj MFC after: 1 week	2017-10-01 17:04:26 +00:00
Alan Cox	cf060942db	Optimize vm_object_page_remove() by eliminating pointless calls to pmap_remove_all(). If the object to which a page belongs has no references, then that page cannot possibly be mapped. Reviewed by: kib MFC after: 1 week	2017-09-28 17:55:41 +00:00
John Baldwin	14c510c0cf	Add UMA_ALIGNOF(). This is a wrapper around _Alignof() that sets the alignment for a zone to the alignment required by a given type. This allows the compiler to determine the proper alignment rather than having the programmer try to guess. Discussed on: arch@ MFC after: 1 week Sponsored by: DARPA / AFRL	2017-09-27 23:15:33 +00:00
Alan Cox	43cc906f40	Change vm_page_try_to_free() to require a managed page. Essentially, vm_page_try_to_free() is testing conditions, like clean versus dirty, that only vary in managed pages. Suggested by: kib Reviewed by: markj X-MFC after: never	2017-09-24 23:35:01 +00:00
Alan Cox	494c6e43d3	Optimize vm_page_try_to_free(). Specifically, the call to pmap_remove_all() can be avoided when the page's containing object has a reference count of zero. (If the object has a reference count of zero, then none of its pages can possibly be mapped.) Address nearby style issues in vm_page_try_to_free(), and change its return type to "bool". Reviewed by: kib, markj MFC after: 1 week	2017-09-24 16:50:10 +00:00
Konstantin Belousov	5bf949377e	For unlinked files, do not msync(2) or sync on the vnode deactivation. One consequence of the patch is that msyncing unlinked file mappings no longer reduces the amount of the dirty memory in the system, but I do not think that there are users of msync(2) that utilize it for such side-effect. Reported and tested by: tjil PR: 222356 Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D12411	2017-09-19 16:46:37 +00:00
Konstantin Belousov	bba52ecadd	Batch freeing of the pages in vm_object_page_remove() under the same free queue mutex lock owning session, same as it was done for the object termination in r323561. Reported and tested by: mjg Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-15 16:07:09 +00:00
Mark Johnston	e04223bf94	Include _bitset.h to get BITSET_DEFINE, used to define struct slabbits. MFC after: 1 week	2017-09-15 14:59:35 +00:00
Mark Johnston	2d54d4bb9f	Widen uk_pgoff, the slab header offset field. 16 bits is only wide enough for kegs with an item size of up to 64KB. At that size or larger, slab headers are typically offpage because the item size is a multiple of the page size, but there is no requirement that this be the case. We can widen the field without affecting the layout of struct uma_keg since the removal of uk_slabsize in r315077 left an adjacent hole. PR: 218911 MFC after: 2 weeks	2017-09-13 21:54:37 +00:00
Konstantin Belousov	e82e50e681	Remove inline specifier from vm_page_free_wakeup(), do not micro-manage compiler. Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-13 19:30:09 +00:00
Konstantin Belousov	2fcd1ff68f	Do not relock free queue mutex for each page, free whole terminating object' page queue under the single mutex lock. First, all pages on the queue are prepared for free by calls to vm_page_free_prep(), and pages which should not be returned to the physical allocator (e.g. wired or fictitious) are simply removed from the queue. On the second pass, vm_page_free_phys_pglist() inserts all pages from the queue without relocking the mutex. The change improves the object termination, e.g. on the process exit where large anonymous memory objects otherwise cause relocks the free queue mutex for each page. More, if several such processes are exiting or execing in parallel, the mutex was highly contended on the address space demolition. Diagnosed and tested by: mjg (previous version) Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-13 19:22:07 +00:00
Konstantin Belousov	540ac3b310	Split vm_page_free_toq() into two parts, preparation vm_page_free_prep() and insertion into the phys allocator free queues vm_page_free_phys(). Also provide a wrapper vm_page_free_phys_pglist() for batched free. Reviewed by: alc, markj Tested by: mjg (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-13 19:11:52 +00:00
Konstantin Belousov	b9e8fb647e	Use existing tag name for the vm_object' memq. Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-13 19:03:59 +00:00
Mark Johnston	2934eb8a22	Fix a logic error in the item size calculation for internal UMA zones. Kegs for internal zones always keep the slab header in the slab itself. Therefore, when determining the allocation size, we need to take the slab header size into account. Reported and tested by: ae, rakuco Reviewed by: avg MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D12342	2017-09-13 15:44:54 +00:00
Mateusz Guzik	1c0b34417b	Move vmmeter atomic counters into dedicated cache lines Prior to the change they were subject to extreme false sharing. In particular this change shaves about 3 seconds real time of -j 80 buildkernel. Reviewed by: alc, markj Differential Revision: https://reviews.freebsd.org/D12281	2017-09-10 19:00:38 +00:00
Alan Cox	d027ed2e7a	To analyze the allocation of swap blocks by blist functions, add a method for analyzing the radix tree structures and reporting on the number, and sizes, of maximal intervals of free blocks. The report includes the number of maximal intervals, and also the number of them in each of several size ranges, from small (size 1, or 3 to 4) to large (28657 to 46367) with size boundaries defined by Fibonacci numbers. The report is written in the test tool with the 's' command, or in a running kernel by sysctl. The analysis of the radix tree frequently computes the position of the lone bit set in a u_daddr_t, a computation that also appears in leaf allocation. That computation has been moved into a function of its own, and optimized for cases where an inlined machine instruction can replace the usual binary search. Submitted by: Doug Moore <dougm@rice.edu> MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D11906	2017-09-10 17:46:03 +00:00
Konstantin Belousov	93c5d3a46a	Add a vm_page_change_lock() helper, the common code to not relock page lock if both old and new pages use the same underlying lock. Convert existing places to use the helper instead of inlining it. Use the optimization in vm_object_page_remove(). Suggested and reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-09 17:35:19 +00:00
Mark Johnston	f93f7cf199	Speed up vm_page_array initialization. We currently initialize the vm_page array in three passes: one to zero the array, one to initialize the "order" field of each page (necessary when inserting them into the vm_phys buddy allocator one-by-one), and one to initialize the remaining non-zero fields and individually insert each page into the allocator. Merge the three passes into one following a suggestion from alc: initialize vm_page fields in a single pass, and use vm_phys_free_contig() to efficiently insert physical memory segments into the buddy allocator. This reduces the initialization time to a third or a quarter of what it was before on most systems that I tested. Reviewed by: alc, kib MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D12248	2017-09-07 21:43:39 +00:00
Mateusz Guzik	fe933c1d88	Start annotating global _padalign locks with __exclusive_cache_line While these locks are guarnteed to not share their respective cache lines, their current placement leaves unnecessary holes in lines which preceeded them. For instance the annotation of vm_page_queue_free_mtx allows 2 neighbour cachelines (previously separate by the lock) to be collapsed into 1. The annotation is only effective on architectures which have it implemented in their linker script (currently only amd64). Thus locks are not converted to their not-padaligned variants as to not affect the rest. MFC after: 1 week	2017-09-06 20:28:18 +00:00
Konstantin Belousov	85d88d8799	Do not leak empty swblk. In swp_pager_meta_build(), if the requested operation results in freeing the last swap pointer in the swblk, free the trie node. Other swap pager code does not expect to find completely empty swblk. Reviewed by: alc, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-06 16:18:53 +00:00
Konstantin Belousov	eed99cb81b	In swp_pager_meta_build(), handle a race with other thread allocating swapblk for our index while we dropped the object lock. Noted by: jeff Reviewed by: alc, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-06 16:16:11 +00:00
Konstantin Belousov	35872e79b7	Adjust interface of swapon_check_swzone() to its actual usage. The function return value is not used. Its argument is always swap_total/PAGE_SIZE, so make it not take any arguments. Submitted by: ota@j.email.ne.jp PR: 221356 MFC after: 1 week	2017-08-30 10:17:00 +00:00
Konstantin Belousov	f08b30995a	Make the swap_pager_full variable static. r290920 removed the use of the variable from vm/vm_pageout.c. Submitted by: ota@j.email.ne.jp PR: 221356 MFC after: 1 week	2017-08-30 09:44:05 +00:00
Mark Johnston	aed9aaaa76	Synchronize page laundering with pmap_extract_and_hold(). Before r207410, the hold count of a page in a page queue was protected by the queue lock, and, before laundering a page, the page daemon removed managed writeable mappings of the page before releasing the queue lock. This ensured that other threads could not concurrently create transient writeable mappings using pmap_extract_and_hold() on a user map, as is done for example by vmapbuf(). With that revision, however, a race can allow the creation of such a mapping, meaning that the page might be modified as it is being laundered, potentially resulting in it being marked clean when its contents do not match those given to the pager. Close the race by using the page lock to synchronize the hold count check in vm_pageout_cluster() with the removal of writeable managed mappings. Reported by: alc Reviewed by: alc, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D12084	2017-08-28 22:10:15 +00:00
Alan Cox	ee620ea47d	Update a couple vm_object lock assertions in the swap pager to reflect the new use of the vm_object's lock to synchronize updates to a radix trie mapping per-vm object page indices to on-disk swap blocks. Fix a typo in a nearby comment. Reviewed by: kib, markj X-MFC with: r322913 Differential Revision: https://reviews.freebsd.org/D12134	2017-08-28 17:02:25 +00:00
Alan Cox	d5efa0a475	Switching from a global hash table to per-vm_object radix tries for mapping vm_object page indices to on-disk swap space (r322913) has changed the synchronization requirements for a couple swap pager functions. Whereas before a read lock on the vm object sufficed because of the global mutex on the hash table, a write lock on the vm object may now be required. In particular, calls to vm_pager_page_unswapped() now require a write lock on the vm_object. Consequently, vm_fault()'s fast path cannot call vm_pager_page_unswapped(). The swap space will have to be released at a later point. Reviewed by: kib, markj X-MFC with: r322913 Differential Revision: https://reviews.freebsd.org/D12134	2017-08-28 16:55:43 +00:00
Konstantin Belousov	f425ab8e50	Replace global swhash in swap pager with per-object trie to track swap blocks assigned to the object pages. - The global swhash_mtx is removed, trie is synchronized by the corresponding object lock. - The swp_pager_meta_free_all() function used during object termination is optimized by only looking at the trie instead of having to search whole hash for the swap blocks owned by the object. - On swap_pager_swapoff(), instead of iterating over the swhash, global object list have to be inspected. There, we have to ensure that we do see valid trie content if we see that the object type is swap. Sizing of the swblk zone is same as for swblock zone, each swblk maps SWAP_META_PAGES pages. Proposed by: alc Reviewed by: alc, markj (previous version) Tested by: alc, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D11435	2017-08-25 23:13:21 +00:00
Ruslan Bukin	7bbdb843b6	Add OBJ_PG_DTOR flag to VM object. Setting this flag allows us to skip pages removal from VM object queue during object termination and to leave that for cdev_pg_dtor function. Move pages removal code to separate function vm_object_terminate_pages() as comments does not survive indentation. This will be required for Intel SGX support where we will have to remove pages from VM object manually. Reviewed by: kib, alc Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D11688	2017-08-16 08:49:11 +00:00
Mark Johnston	33fff5d536	Add vm_page_alloc_after(). This is a variant of vm_page_alloc() which accepts an additional parameter: the page in the object with largest index that is smaller than the requested index. vm_page_alloc() finds this page using a lookup in the object's radix tree, but in some cases its identity is already known, allowing the lookup to be elided. Modify kmem_back() and vm_page_grab_pages() to use vm_page_alloc_after(). vm_page_alloc() is converted into a trivial wrapper of vm_page_alloc_after(). Suggested by: alc Reviewed by: alc, kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D11984	2017-08-15 16:39:49 +00:00
Mark Johnston	9df950b35d	Modify vm_page_grab_pages() to handle VM_ALLOC_NOWAIT. This will allow its use in sendfile_swapin(). Reviewed by: alc, kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D11942	2017-08-11 16:29:22 +00:00
Mark Johnston	7e05ffa6e6	Micro-optimize kmem_unback(). We can remove some unnecessary object radix tree lookups by using the object memq to iterate over pages in the specified range. This does not, however, eliminate the lookup needed in vm_page_free_toq() to remove each tree entry. Reviewed by: alc, kib (previous revision) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D11945	2017-08-11 03:09:11 +00:00
Mark Johnston	2c642ec1e7	Make vm_page_sunbusy() assert that the page is unlocked. Reviewed by: kib MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D11946	2017-08-10 22:43:38 +00:00
Alan Cox	5471caf6f1	Introduce vm_page_grab_pages(), which is intended to replace loops calling vm_page_grab() on consecutive page indices. Besides simplifying the code in the caller, vm_page_grab_pages() allows for batching optimizations. For example, the current implementation replaces calls to vm_page_lookup() on consecutive page indices by cheaper calls to vm_page_next(). Reviewed by: kib, markj Tested by: pho (an earlier version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D11926	2017-08-09 04:23:04 +00:00
Konstantin Belousov	555b7bb4c8	Mark pages after EOF as clean after pageout. Suppose that a file on NFS has partially filled last page, and this page is dirty. NFS VOP_PAGEOUT() method only marks the the page clean up to the block of the last written byte, leaving other blocks dirty. Also any page which erronously exists in the vnode vm_object past EOF is also left marked as dirty. With the introduction of the buf-cache coherent pager, each pass of syncer over the object with such page results in creation of B_DELWRI buffer due to VOP_WRITE() call. This buffer is noted on next syncer pass, which results e.g. a visible manifestation of shutdown never finishing vnode sync. Note that before buf-cache coherency commit, a dirty page might left never synced to server if a partial writes occur. Fix this by clearing dirty bits after EOF. Only blocks of the partial page which are completely after EOF are marked clean, to avoid possible user data loss. Reported by: mav Reviewed by: alc, markj Tested by: mav, pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D11697	2017-07-26 20:07:05 +00:00
Alan Cox	90ea34bf97	Address a compilation warning on some architectures that was introduced by the previous change, r321386. Reported by: ian MFC after: 10 days X-MFC after: r321386	2017-07-23 19:35:14 +00:00
Alan Cox	8b5e1472d2	Utilize pmap_enter(..., psind=1) in vm_fault_soft_fast() on amd64. (The Differential Revision discusses the benefits of this change.) Add a function, vm_reserv_to_superpage(), that returns the superpage containing the specified base page. Reviewed by: kib, markj Tested by: pho MFC after: 10 days Differential Revision: https://reviews.freebsd.org/D11556	2017-07-23 16:28:13 +00:00
Alan Cox	782e896088	Add support for pmap_enter(..., psind=1) to the amd64 pmap. In other words, add support for explicitly requesting that pmap_enter() create a 2MB page mapping. (Essentially, this feature allows the machine-independent layer to create superpage mappings preemptively, and not wait for automatic promotion to occur.) Export pmap_ps_enabled() to the machine-independent layer. Add a flag to pmap_pv_insert_pde() that specifies whether it should fail or reclaim a PV entry when one is not available. Refactor pmap_enter_pde() into two functions, one by the same name, that is a general-purpose function for creating PDE PG_PS mappings, and another, pmap_enter_2mpage(), that is used to prefault 2MB read- and/or execute-only mappings for execve(2), mmap(2), and shmat(2). Submitted by: Yufeng Zhou <yz70@rice.edu> (an earlier version) Reviewed by: kib, markj Tested by: pho MFC after: 10 days Differential Revision: https://reviews.freebsd.org/D11556	2017-07-23 06:33:58 +00:00
Alan Cox	1d3b9818e7	In vm_page_ps_test(), always check that the base pages within the specified superpage all belong to the same object. To date, that check has not been needed, but upcoming changes require it. (See the Differential Revision.) Reviewed by: kib, markj MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D11556	2017-07-23 05:54:56 +00:00
Konstantin Belousov	0ecee546c5	Do not allocate struct kinfo_vmobject on stack. Its size is 1184 bytes. Noted by: eugen Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-07-22 13:33:06 +00:00
Ruslan Bukin	c2c2be5795	Fix style: change spaces to tabs. Sponsored by: DARPA, AFRL	2017-07-21 14:14:47 +00:00
Konstantin Belousov	cd1241fbd0	Add pctrie_init() and vm_radix_init() to initialize generic pctrie and vm_radix trie. Existing vm_radix_init() function is renamed to vm_radix_zinit(). Inlines moved out of the _ headers. Reviewed by: alc, markj (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D11661	2017-07-19 20:52:47 +00:00
Konstantin Belousov	eb5ea8788f	Disable stack growth when accessed by AIO daemons. Commit message for r321173 incorrectly stated that the change disables automatic stack growth from the AIO daemons contexts, with explanation that this is currently prevents applying wrong resource limits. Fix this by actually disabling the growth. Noted by: alc Reviewed by: alc, jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-07-19 19:00:32 +00:00
Konstantin Belousov	9680bb9877	Remove unused function swap_pager_isswapped(). Noted by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-07-19 17:28:46 +00:00
Konstantin Belousov	f758aadd07	Convert assertion that only vmspace owner grows the stack, into a check blocking grow from other processes accesses. Debugger may access stack grow area with ptrace(2). In this case, real state of the process is to not have the stack grown, which provides more accurate inspection. Technical reason to avoid the grow is to avoid applying wrong process (debugger) stack limit. This change also has a consequence of making aio workers accesses past the bottom of stacks into EFAULT, arguably the situation is a programmers mistake. Reported by: jhb Discussed with: alc, jhb Sponsored by: The FreeBSD Foundation MFC after: 3 days	2017-07-18 20:26:41 +00:00
Alan Cox	8830260128	Generalize vm_page_ps_is_valid() to support testing other predicates on the (super)page, renaming the function to vm_page_ps_test(). Reviewed by: kib, markj MFC after: 1 week	2017-07-14 02:15:48 +00:00
Konstantin Belousov	7683ad70d3	Fix loop termination in vm_map_find_min(). Reported by: antoine Tested by: Stefan Ehmann <shoesoft@gmx.net>, Jan Kokemueller <jan.kokemueller@gmail.com> PR: 220493 Sponsored by: The FreeBSD Foundation MFC after: 3 days	2017-07-09 15:41:49 +00:00
Alan Cox	201f03b8e7	Modify vm_map_growstack() to protect itself from the possibility of the gap entry in the vm map being smaller than the sysctl-derived stack guard size. Otherwise, the value of max_grow can suffer from overflow, and the roundup(grow_amount, sgrowsiz) will not be properly capped, resulting in an assertion failure. In collaboration with: kib MFC after: 3 days	2017-07-01 23:39:49 +00:00
Alan Cox	8056df6e25	Clear the MAP_WIREFUTURE flag on the vm map in exec_new_vmspace() when it recycles the current vm space. Otherwise, an mlockall(MCL_FUTURE) could still be in effect on the process after an execve(2), which violates the specification for mlockall(2). It's pointless for vm_map_stack() to check the MEMLOCK limit. It will never be asked to wire the stack. Moreover, it doesn't even implement wiring of the stack. Reviewed by: kib, markj MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D11421	2017-06-30 15:49:36 +00:00
Konstantin Belousov	6a97a3f756	Treat the addr argument for mmap(2) request without MAP_FIXED flag as a hint. Right now, for non-fixed mmap(2) calls, addr is de-facto interpreted as the absolute minimal address of the range where the mapping is created. The VA allocator only allocates in the range [addr, VM_MAXUSER_ADDRESS]. This is too restrictive, the mmap(2) call might unduly fail if there is no free addresses above addr but a lot of usable space below it. Lift this implementation limitation by allocating VA in two passes. First, try to allocate above addr, as before. If that fails, do the second pass with less restrictive constraints for the start of allocation by specifying minimal allocation address at the max bss end, if this limit is less than addr. One important case where this change makes a difference is the allocation of the stacks for new threads in libthr. Under some configuration conditions, libthr tries to hint kernel to reuse the main thread stack grow area for the new stacks. This cannot work by design now after grow area is converted to stack, and there is no unallocated VA above the main stack. Interpreting requested stack base address as the hint provides compatibility with old libthr and with (mis-)configured current libthr. Reviewed by: alc Tested by: dim (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-06-28 04:02:36 +00:00

1 2 3 4 5 ...

3695 Commits