freebsd-nq

Author	SHA1	Message	Date
Konstantin Belousov	1c778d91b5	vmtotal: extend memory counters to accomodate for current and future hardware sizes. 32bit counters already overflow on approachable virtual memory page counts, and soon would overflow on the physical pages counts as well. Bump sizes to 64bit types. Bump __FreeBSD_version. It is impossible to provide perfect backward ABI compat for this change. If a program requests an old structure, it can be detected by size. But if it queries the size first by passing NULL old req pointer, there is almost nothing we can do to detect the desired ABI. As a partial solution, check p_osrel of the quering process when selecting the size to report. Submitted by: Pawel Biernacki <pawel.biernacki@gmail.com> Differential revision: https://reviews.freebsd.org/D13018	2017-11-15 13:41:03 +00:00
Konstantin Belousov	772c8b6749	Fix operator priority. Sponsored by: The FreeBSD Foundation	2017-11-08 23:25:05 +00:00
Mark Johnston	e0b2fc3a51	Allow various page daemon parameters to be set from loader.conf. MFC after: 1 week	2017-11-08 19:55:17 +00:00
Jeff Roberson	8d6fbbb867	Replace manyinstances of VM_WAIT with blocking page allocation flags similar to the kernel memory allocator. This simplifies NUMA allocation because the domain will be known at wait time and races between failure and sleeping are eliminated. This also reduces boilerplate code and simplifies callers. A wait primitive is supplied for uma zones for similar reasons. This eliminates some non-specific VM_WAIT calls in favor of more explicit sleeps that may be satisfied without new pages. Reviewed by: alc, kib, markj Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon	2017-11-08 02:39:37 +00:00
Mark Johnston	bd0e1beb98	Correct the type of foff. No functional change intended. Github PR: 124 Submitted by: Wuyang Chung <wuyang.m.chung@outlook.com> MFC after: 1 week	2017-11-08 01:53:03 +00:00
Alan Cox	3a757e5403	Micro-optimize the handling of fictitious pages in vm_page_free_prep(). A fictitious page is always wired, so there is no point in trying to remove one from the page queues. Completely remove one inaccurate comment from vm_page_free_prep() and correct another. Reviewed by: kib, markj MFC after: 1 week	2017-10-24 17:14:53 +00:00
Edward Tomasz Napierala	be7d4ac586	Add OID for the vm.overcommit sysctl. This makes it possible to remove one call to sysctl(2) from jemalloc startup code. (That also requires changes to jemalloc, but I plan to push those to upstream first.) Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D12745	2017-10-22 10:35:29 +00:00
Konstantin Belousov	422fe502b3	Check that the page which is freed as zeroed, indeed has all-zero content. This catches some rare mysterious failures at the source. The check is only performed on architectures which implement direct map, and only enabled with option DIAGNOSTIC, similar to other costly consistency checks. Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-10-21 17:28:12 +00:00
Mark Johnston	eadbeae5e7	Free the right address range if kmem_back() fails in memguard_alloc(). MFC after: 1 week Sponsored by: Dell EMC Isilon	2017-10-20 21:13:19 +00:00
Konstantin Belousov	b3d4ab6645	Take the vm object lock in read mode in vnode_generic_putpages(). Only upgrade it to write mode if we need to clear dirty bits of the partially valid page after EOF. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2017-10-20 18:40:29 +00:00
Konstantin Belousov	ac04195ba6	Move swapout code into vm/vm_swapout.c. There is no NO_SWAPPING #ifdef left in the code. Requested by: alc Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D12663	2017-10-20 09:10:49 +00:00
Konstantin Belousov	05877a8595	Do not overwrite clean blocks on pageout. If filesystem block size is less than the page size, it is possible that the page-out run contains partially clean pages. E.g., the chunk of the page might be bdwrite()-ed, or some thread performed bwrite() on a buffer which references a chunk of the paged out page. As result, the assertion added in r319975, which checked that all pages in the run are dirty, does not hold on such filesystems. One solution is to remove the assert, but it is undesirable, because we do overwrite the valid on-disk content. I cannot provide a scenario where such write would corrupt the file data, but I do not like it on principle. Another, in my opinion proper, solution is to only write parts of the pages still marked dirty. The patch implements this, it skips clean blocks and only writes the dirty block runs. Note that due to clustering, write one page might clean other pages in the run, so the next write range must be calculated only after the current range is written out. More, due to a possible invalidation, and the fact that the object lock is dropped and reacquired before the checks, it is possible that the whole page-out pages run appears to consist of only clean pages. For this reason, it is impossible to assert that there is some work for the pageout method to do (i.e. assert that there is at least one dirty page in the run). But such clearing can only occur due to invalidation, and not due to a parallel write, because we own the vnode lock exclusive. Reported by: fsu In collaboration with: pho Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D12668	2017-10-20 08:32:37 +00:00
Konstantin Belousov	4313989360	In vm_page_free_phys_pglist(), do not take vm_page_queue_free_mtx if there is nothing to do. Suggested by: mjg Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-20 08:25:49 +00:00
Alan Cox	4074d642d2	Batch atomic updates to the number of active, inactive, and laundry pages by vm_object_terminate_pages(). For example, for a "buildworld" workload, this batching reduces vm_object_terminate_pages()'s average execution time by 12%. (The total savings were about 11.7 billion processor cycles.) Reviewed by: kib MFC after: 1 week	2017-10-19 04:13:47 +00:00
Konstantin Belousov	1fffcd755d	Do not report reduction of swap zone if it was not. After r324600 we see the actual reservation. Reported by: jkim Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-18 07:27:43 +00:00
Mateusz Guzik	1dbf52e7d9	Reduce traffic on vm_cnt.v_free_count The variable is modified with the highly contended page free queue lock. It unnecessarily shares a cacheline with purely read-only fields and is re-read after the lock is dropped in the page allocation code making the hold time longer. Pad the variable just like the others and store the value as found with the lock held instead of re-reading. Provides a modest 1%-ish speed up in concurrent page faults. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D12665	2017-10-13 21:54:34 +00:00
Konstantin Belousov	53faf5a7d4	Evaluate the real size of the sblk_zone. Submitted by: ota@j.email.ne.jp PR: 221356 Reviewed by: alc, markj MFC after: 1 week Differential revision: https://reviews.freebsd.org/D12660	2017-10-13 16:23:05 +00:00
Ed Maste	6e309d75d2	ANSIfy vm_kern.c PR: 222673 Submitted by: ota@j.email.ne.jp MFC after: 1 week	2017-10-13 13:53:19 +00:00
Alan Cox	37244a84fd	Replace an unnecessary call to vm_page_activate() by an assertion that the page is already wired or queued. Prior to the elimination of PG_CACHED pages, vm_page_grab() might have returned a valid, previously PG_CACHED page, in which case enqueueing the page was necessary. Now, that can't happen. Moreover, activating the page is a dubious choice, since the page is not being accessed. Reviewed by: kib MFC after: 1 week	2017-10-08 16:54:42 +00:00
Alan Cox	41e5a22698	When an I/O error occurs on page out, there is no need to dirty the page, because it is already dirty. Instead, assert that the page is dirty. Reviewed by: kib, markj MFC after: 1 week	2017-10-01 17:04:26 +00:00
Alan Cox	cf060942db	Optimize vm_object_page_remove() by eliminating pointless calls to pmap_remove_all(). If the object to which a page belongs has no references, then that page cannot possibly be mapped. Reviewed by: kib MFC after: 1 week	2017-09-28 17:55:41 +00:00
John Baldwin	14c510c0cf	Add UMA_ALIGNOF(). This is a wrapper around _Alignof() that sets the alignment for a zone to the alignment required by a given type. This allows the compiler to determine the proper alignment rather than having the programmer try to guess. Discussed on: arch@ MFC after: 1 week Sponsored by: DARPA / AFRL	2017-09-27 23:15:33 +00:00
Alan Cox	43cc906f40	Change vm_page_try_to_free() to require a managed page. Essentially, vm_page_try_to_free() is testing conditions, like clean versus dirty, that only vary in managed pages. Suggested by: kib Reviewed by: markj X-MFC after: never	2017-09-24 23:35:01 +00:00
Alan Cox	494c6e43d3	Optimize vm_page_try_to_free(). Specifically, the call to pmap_remove_all() can be avoided when the page's containing object has a reference count of zero. (If the object has a reference count of zero, then none of its pages can possibly be mapped.) Address nearby style issues in vm_page_try_to_free(), and change its return type to "bool". Reviewed by: kib, markj MFC after: 1 week	2017-09-24 16:50:10 +00:00
Konstantin Belousov	5bf949377e	For unlinked files, do not msync(2) or sync on the vnode deactivation. One consequence of the patch is that msyncing unlinked file mappings no longer reduces the amount of the dirty memory in the system, but I do not think that there are users of msync(2) that utilize it for such side-effect. Reported and tested by: tjil PR: 222356 Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D12411	2017-09-19 16:46:37 +00:00
Konstantin Belousov	bba52ecadd	Batch freeing of the pages in vm_object_page_remove() under the same free queue mutex lock owning session, same as it was done for the object termination in r323561. Reported and tested by: mjg Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-15 16:07:09 +00:00
Mark Johnston	e04223bf94	Include _bitset.h to get BITSET_DEFINE, used to define struct slabbits. MFC after: 1 week	2017-09-15 14:59:35 +00:00
Mark Johnston	2d54d4bb9f	Widen uk_pgoff, the slab header offset field. 16 bits is only wide enough for kegs with an item size of up to 64KB. At that size or larger, slab headers are typically offpage because the item size is a multiple of the page size, but there is no requirement that this be the case. We can widen the field without affecting the layout of struct uma_keg since the removal of uk_slabsize in r315077 left an adjacent hole. PR: 218911 MFC after: 2 weeks	2017-09-13 21:54:37 +00:00
Konstantin Belousov	e82e50e681	Remove inline specifier from vm_page_free_wakeup(), do not micro-manage compiler. Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-13 19:30:09 +00:00
Konstantin Belousov	2fcd1ff68f	Do not relock free queue mutex for each page, free whole terminating object' page queue under the single mutex lock. First, all pages on the queue are prepared for free by calls to vm_page_free_prep(), and pages which should not be returned to the physical allocator (e.g. wired or fictitious) are simply removed from the queue. On the second pass, vm_page_free_phys_pglist() inserts all pages from the queue without relocking the mutex. The change improves the object termination, e.g. on the process exit where large anonymous memory objects otherwise cause relocks the free queue mutex for each page. More, if several such processes are exiting or execing in parallel, the mutex was highly contended on the address space demolition. Diagnosed and tested by: mjg (previous version) Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-13 19:22:07 +00:00
Konstantin Belousov	540ac3b310	Split vm_page_free_toq() into two parts, preparation vm_page_free_prep() and insertion into the phys allocator free queues vm_page_free_phys(). Also provide a wrapper vm_page_free_phys_pglist() for batched free. Reviewed by: alc, markj Tested by: mjg (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-13 19:11:52 +00:00
Konstantin Belousov	b9e8fb647e	Use existing tag name for the vm_object' memq. Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-13 19:03:59 +00:00
Mark Johnston	2934eb8a22	Fix a logic error in the item size calculation for internal UMA zones. Kegs for internal zones always keep the slab header in the slab itself. Therefore, when determining the allocation size, we need to take the slab header size into account. Reported and tested by: ae, rakuco Reviewed by: avg MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D12342	2017-09-13 15:44:54 +00:00
Mateusz Guzik	1c0b34417b	Move vmmeter atomic counters into dedicated cache lines Prior to the change they were subject to extreme false sharing. In particular this change shaves about 3 seconds real time of -j 80 buildkernel. Reviewed by: alc, markj Differential Revision: https://reviews.freebsd.org/D12281	2017-09-10 19:00:38 +00:00
Alan Cox	d027ed2e7a	To analyze the allocation of swap blocks by blist functions, add a method for analyzing the radix tree structures and reporting on the number, and sizes, of maximal intervals of free blocks. The report includes the number of maximal intervals, and also the number of them in each of several size ranges, from small (size 1, or 3 to 4) to large (28657 to 46367) with size boundaries defined by Fibonacci numbers. The report is written in the test tool with the 's' command, or in a running kernel by sysctl. The analysis of the radix tree frequently computes the position of the lone bit set in a u_daddr_t, a computation that also appears in leaf allocation. That computation has been moved into a function of its own, and optimized for cases where an inlined machine instruction can replace the usual binary search. Submitted by: Doug Moore <dougm@rice.edu> MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D11906	2017-09-10 17:46:03 +00:00
Konstantin Belousov	93c5d3a46a	Add a vm_page_change_lock() helper, the common code to not relock page lock if both old and new pages use the same underlying lock. Convert existing places to use the helper instead of inlining it. Use the optimization in vm_object_page_remove(). Suggested and reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-09 17:35:19 +00:00
Mark Johnston	f93f7cf199	Speed up vm_page_array initialization. We currently initialize the vm_page array in three passes: one to zero the array, one to initialize the "order" field of each page (necessary when inserting them into the vm_phys buddy allocator one-by-one), and one to initialize the remaining non-zero fields and individually insert each page into the allocator. Merge the three passes into one following a suggestion from alc: initialize vm_page fields in a single pass, and use vm_phys_free_contig() to efficiently insert physical memory segments into the buddy allocator. This reduces the initialization time to a third or a quarter of what it was before on most systems that I tested. Reviewed by: alc, kib MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D12248	2017-09-07 21:43:39 +00:00
Mateusz Guzik	fe933c1d88	Start annotating global _padalign locks with __exclusive_cache_line While these locks are guarnteed to not share their respective cache lines, their current placement leaves unnecessary holes in lines which preceeded them. For instance the annotation of vm_page_queue_free_mtx allows 2 neighbour cachelines (previously separate by the lock) to be collapsed into 1. The annotation is only effective on architectures which have it implemented in their linker script (currently only amd64). Thus locks are not converted to their not-padaligned variants as to not affect the rest. MFC after: 1 week	2017-09-06 20:28:18 +00:00
Konstantin Belousov	85d88d8799	Do not leak empty swblk. In swp_pager_meta_build(), if the requested operation results in freeing the last swap pointer in the swblk, free the trie node. Other swap pager code does not expect to find completely empty swblk. Reviewed by: alc, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-06 16:18:53 +00:00
Konstantin Belousov	eed99cb81b	In swp_pager_meta_build(), handle a race with other thread allocating swapblk for our index while we dropped the object lock. Noted by: jeff Reviewed by: alc, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-06 16:16:11 +00:00
Konstantin Belousov	35872e79b7	Adjust interface of swapon_check_swzone() to its actual usage. The function return value is not used. Its argument is always swap_total/PAGE_SIZE, so make it not take any arguments. Submitted by: ota@j.email.ne.jp PR: 221356 MFC after: 1 week	2017-08-30 10:17:00 +00:00
Konstantin Belousov	f08b30995a	Make the swap_pager_full variable static. r290920 removed the use of the variable from vm/vm_pageout.c. Submitted by: ota@j.email.ne.jp PR: 221356 MFC after: 1 week	2017-08-30 09:44:05 +00:00
Mark Johnston	aed9aaaa76	Synchronize page laundering with pmap_extract_and_hold(). Before r207410, the hold count of a page in a page queue was protected by the queue lock, and, before laundering a page, the page daemon removed managed writeable mappings of the page before releasing the queue lock. This ensured that other threads could not concurrently create transient writeable mappings using pmap_extract_and_hold() on a user map, as is done for example by vmapbuf(). With that revision, however, a race can allow the creation of such a mapping, meaning that the page might be modified as it is being laundered, potentially resulting in it being marked clean when its contents do not match those given to the pager. Close the race by using the page lock to synchronize the hold count check in vm_pageout_cluster() with the removal of writeable managed mappings. Reported by: alc Reviewed by: alc, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D12084	2017-08-28 22:10:15 +00:00
Alan Cox	ee620ea47d	Update a couple vm_object lock assertions in the swap pager to reflect the new use of the vm_object's lock to synchronize updates to a radix trie mapping per-vm object page indices to on-disk swap blocks. Fix a typo in a nearby comment. Reviewed by: kib, markj X-MFC with: r322913 Differential Revision: https://reviews.freebsd.org/D12134	2017-08-28 17:02:25 +00:00
Alan Cox	d5efa0a475	Switching from a global hash table to per-vm_object radix tries for mapping vm_object page indices to on-disk swap space (r322913) has changed the synchronization requirements for a couple swap pager functions. Whereas before a read lock on the vm object sufficed because of the global mutex on the hash table, a write lock on the vm object may now be required. In particular, calls to vm_pager_page_unswapped() now require a write lock on the vm_object. Consequently, vm_fault()'s fast path cannot call vm_pager_page_unswapped(). The swap space will have to be released at a later point. Reviewed by: kib, markj X-MFC with: r322913 Differential Revision: https://reviews.freebsd.org/D12134	2017-08-28 16:55:43 +00:00
Konstantin Belousov	f425ab8e50	Replace global swhash in swap pager with per-object trie to track swap blocks assigned to the object pages. - The global swhash_mtx is removed, trie is synchronized by the corresponding object lock. - The swp_pager_meta_free_all() function used during object termination is optimized by only looking at the trie instead of having to search whole hash for the swap blocks owned by the object. - On swap_pager_swapoff(), instead of iterating over the swhash, global object list have to be inspected. There, we have to ensure that we do see valid trie content if we see that the object type is swap. Sizing of the swblk zone is same as for swblock zone, each swblk maps SWAP_META_PAGES pages. Proposed by: alc Reviewed by: alc, markj (previous version) Tested by: alc, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D11435	2017-08-25 23:13:21 +00:00
Ruslan Bukin	7bbdb843b6	Add OBJ_PG_DTOR flag to VM object. Setting this flag allows us to skip pages removal from VM object queue during object termination and to leave that for cdev_pg_dtor function. Move pages removal code to separate function vm_object_terminate_pages() as comments does not survive indentation. This will be required for Intel SGX support where we will have to remove pages from VM object manually. Reviewed by: kib, alc Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D11688	2017-08-16 08:49:11 +00:00
Mark Johnston	33fff5d536	Add vm_page_alloc_after(). This is a variant of vm_page_alloc() which accepts an additional parameter: the page in the object with largest index that is smaller than the requested index. vm_page_alloc() finds this page using a lookup in the object's radix tree, but in some cases its identity is already known, allowing the lookup to be elided. Modify kmem_back() and vm_page_grab_pages() to use vm_page_alloc_after(). vm_page_alloc() is converted into a trivial wrapper of vm_page_alloc_after(). Suggested by: alc Reviewed by: alc, kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D11984	2017-08-15 16:39:49 +00:00
Mark Johnston	9df950b35d	Modify vm_page_grab_pages() to handle VM_ALLOC_NOWAIT. This will allow its use in sendfile_swapin(). Reviewed by: alc, kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D11942	2017-08-11 16:29:22 +00:00
Mark Johnston	7e05ffa6e6	Micro-optimize kmem_unback(). We can remove some unnecessary object radix tree lookups by using the object memq to iterate over pages in the specified range. This does not, however, eliminate the lookup needed in vm_page_free_toq() to remove each tree entry. Reviewed by: alc, kib (previous revision) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D11945	2017-08-11 03:09:11 +00:00

1 2 3 4 5 ...

3713 Commits