freebsd-skq

Author	SHA1	Message	Date
markj	f761f23093	Allow for fictitious physical pages in vm_page_scan_contig(). Some drm2 drivers will set PG_FICTITIOUS in physical pages in order to satisfy the OBJT_MGTDEVICE object interface, so a scan may encounter fictitous pages. For now, allow for this possibility; such pages will be skipped later in the scan since they are wired. Reported by: avg Reviewed by: kib MFC after: 1 week	2017-11-21 13:17:40 +00:00
pfg	4736ccfd9c	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
pfg	9da7bdde06	spdx: initial adoption of licensing ID tags. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Initially, only tag files that use BSD 4-Clause "Original" license. RelNotes: yes Differential Revision: https://reviews.freebsd.org/D13133	2017-11-18 14:26:50 +00:00
kib	ea07ce4ed0	vmtotal: extend memory counters to accomodate for current and future hardware sizes. 32bit counters already overflow on approachable virtual memory page counts, and soon would overflow on the physical pages counts as well. Bump sizes to 64bit types. Bump __FreeBSD_version. It is impossible to provide perfect backward ABI compat for this change. If a program requests an old structure, it can be detected by size. But if it queries the size first by passing NULL old req pointer, there is almost nothing we can do to detect the desired ABI. As a partial solution, check p_osrel of the quering process when selecting the size to report. Submitted by: Pawel Biernacki <pawel.biernacki@gmail.com> Differential revision: https://reviews.freebsd.org/D13018	2017-11-15 13:41:03 +00:00
kib	c05cd25802	Fix operator priority. Sponsored by: The FreeBSD Foundation	2017-11-08 23:25:05 +00:00
markj	32541ba3eb	Allow various page daemon parameters to be set from loader.conf. MFC after: 1 week	2017-11-08 19:55:17 +00:00
jeff	3c355d849c	Replace manyinstances of VM_WAIT with blocking page allocation flags similar to the kernel memory allocator. This simplifies NUMA allocation because the domain will be known at wait time and races between failure and sleeping are eliminated. This also reduces boilerplate code and simplifies callers. A wait primitive is supplied for uma zones for similar reasons. This eliminates some non-specific VM_WAIT calls in favor of more explicit sleeps that may be satisfied without new pages. Reviewed by: alc, kib, markj Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon	2017-11-08 02:39:37 +00:00
markj	59bc046828	Correct the type of foff. No functional change intended. Github PR: 124 Submitted by: Wuyang Chung <wuyang.m.chung@outlook.com> MFC after: 1 week	2017-11-08 01:53:03 +00:00
alc	1728bf4480	Micro-optimize the handling of fictitious pages in vm_page_free_prep(). A fictitious page is always wired, so there is no point in trying to remove one from the page queues. Completely remove one inaccurate comment from vm_page_free_prep() and correct another. Reviewed by: kib, markj MFC after: 1 week	2017-10-24 17:14:53 +00:00
trasz	b1c2085f71	Add OID for the vm.overcommit sysctl. This makes it possible to remove one call to sysctl(2) from jemalloc startup code. (That also requires changes to jemalloc, but I plan to push those to upstream first.) Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D12745	2017-10-22 10:35:29 +00:00
kib	09c8e869a8	Check that the page which is freed as zeroed, indeed has all-zero content. This catches some rare mysterious failures at the source. The check is only performed on architectures which implement direct map, and only enabled with option DIAGNOSTIC, similar to other costly consistency checks. Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-10-21 17:28:12 +00:00
markj	042c6a8f6c	Free the right address range if kmem_back() fails in memguard_alloc(). MFC after: 1 week Sponsored by: Dell EMC Isilon	2017-10-20 21:13:19 +00:00
kib	3b0b7e2626	Take the vm object lock in read mode in vnode_generic_putpages(). Only upgrade it to write mode if we need to clear dirty bits of the partially valid page after EOF. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2017-10-20 18:40:29 +00:00
kib	20d037559d	Move swapout code into vm/vm_swapout.c. There is no NO_SWAPPING #ifdef left in the code. Requested by: alc Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D12663	2017-10-20 09:10:49 +00:00
kib	73e703b999	Do not overwrite clean blocks on pageout. If filesystem block size is less than the page size, it is possible that the page-out run contains partially clean pages. E.g., the chunk of the page might be bdwrite()-ed, or some thread performed bwrite() on a buffer which references a chunk of the paged out page. As result, the assertion added in r319975, which checked that all pages in the run are dirty, does not hold on such filesystems. One solution is to remove the assert, but it is undesirable, because we do overwrite the valid on-disk content. I cannot provide a scenario where such write would corrupt the file data, but I do not like it on principle. Another, in my opinion proper, solution is to only write parts of the pages still marked dirty. The patch implements this, it skips clean blocks and only writes the dirty block runs. Note that due to clustering, write one page might clean other pages in the run, so the next write range must be calculated only after the current range is written out. More, due to a possible invalidation, and the fact that the object lock is dropped and reacquired before the checks, it is possible that the whole page-out pages run appears to consist of only clean pages. For this reason, it is impossible to assert that there is some work for the pageout method to do (i.e. assert that there is at least one dirty page in the run). But such clearing can only occur due to invalidation, and not due to a parallel write, because we own the vnode lock exclusive. Reported by: fsu In collaboration with: pho Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D12668	2017-10-20 08:32:37 +00:00
kib	9a4db25ceb	In vm_page_free_phys_pglist(), do not take vm_page_queue_free_mtx if there is nothing to do. Suggested by: mjg Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-20 08:25:49 +00:00
alc	6dd81762df	Batch atomic updates to the number of active, inactive, and laundry pages by vm_object_terminate_pages(). For example, for a "buildworld" workload, this batching reduces vm_object_terminate_pages()'s average execution time by 12%. (The total savings were about 11.7 billion processor cycles.) Reviewed by: kib MFC after: 1 week	2017-10-19 04:13:47 +00:00
kib	b90468ae70	Do not report reduction of swap zone if it was not. After r324600 we see the actual reservation. Reported by: jkim Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-18 07:27:43 +00:00
mjg	03b10745ce	Reduce traffic on vm_cnt.v_free_count The variable is modified with the highly contended page free queue lock. It unnecessarily shares a cacheline with purely read-only fields and is re-read after the lock is dropped in the page allocation code making the hold time longer. Pad the variable just like the others and store the value as found with the lock held instead of re-reading. Provides a modest 1%-ish speed up in concurrent page faults. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D12665	2017-10-13 21:54:34 +00:00
kib	7f82035277	Evaluate the real size of the sblk_zone. Submitted by: ota@j.email.ne.jp PR: 221356 Reviewed by: alc, markj MFC after: 1 week Differential revision: https://reviews.freebsd.org/D12660	2017-10-13 16:23:05 +00:00
emaste	951751c6c6	ANSIfy vm_kern.c PR: 222673 Submitted by: ota@j.email.ne.jp MFC after: 1 week	2017-10-13 13:53:19 +00:00
alc	1615de1f68	Replace an unnecessary call to vm_page_activate() by an assertion that the page is already wired or queued. Prior to the elimination of PG_CACHED pages, vm_page_grab() might have returned a valid, previously PG_CACHED page, in which case enqueueing the page was necessary. Now, that can't happen. Moreover, activating the page is a dubious choice, since the page is not being accessed. Reviewed by: kib MFC after: 1 week	2017-10-08 16:54:42 +00:00
alc	b88a843faf	When an I/O error occurs on page out, there is no need to dirty the page, because it is already dirty. Instead, assert that the page is dirty. Reviewed by: kib, markj MFC after: 1 week	2017-10-01 17:04:26 +00:00
alc	7ad59282da	Optimize vm_object_page_remove() by eliminating pointless calls to pmap_remove_all(). If the object to which a page belongs has no references, then that page cannot possibly be mapped. Reviewed by: kib MFC after: 1 week	2017-09-28 17:55:41 +00:00
jhb	13b1e2684d	Add UMA_ALIGNOF(). This is a wrapper around _Alignof() that sets the alignment for a zone to the alignment required by a given type. This allows the compiler to determine the proper alignment rather than having the programmer try to guess. Discussed on: arch@ MFC after: 1 week Sponsored by: DARPA / AFRL	2017-09-27 23:15:33 +00:00
alc	ec44341cad	Change vm_page_try_to_free() to require a managed page. Essentially, vm_page_try_to_free() is testing conditions, like clean versus dirty, that only vary in managed pages. Suggested by: kib Reviewed by: markj X-MFC after: never	2017-09-24 23:35:01 +00:00
alc	07fc017373	Optimize vm_page_try_to_free(). Specifically, the call to pmap_remove_all() can be avoided when the page's containing object has a reference count of zero. (If the object has a reference count of zero, then none of its pages can possibly be mapped.) Address nearby style issues in vm_page_try_to_free(), and change its return type to "bool". Reviewed by: kib, markj MFC after: 1 week	2017-09-24 16:50:10 +00:00
kib	23d65de60e	For unlinked files, do not msync(2) or sync on the vnode deactivation. One consequence of the patch is that msyncing unlinked file mappings no longer reduces the amount of the dirty memory in the system, but I do not think that there are users of msync(2) that utilize it for such side-effect. Reported and tested by: tjil PR: 222356 Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D12411	2017-09-19 16:46:37 +00:00
kib	4c4cbd798b	Batch freeing of the pages in vm_object_page_remove() under the same free queue mutex lock owning session, same as it was done for the object termination in r323561. Reported and tested by: mjg Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-15 16:07:09 +00:00
markj	d0e4a8e28e	Include _bitset.h to get BITSET_DEFINE, used to define struct slabbits. MFC after: 1 week	2017-09-15 14:59:35 +00:00
markj	7d8d6899fc	Widen uk_pgoff, the slab header offset field. 16 bits is only wide enough for kegs with an item size of up to 64KB. At that size or larger, slab headers are typically offpage because the item size is a multiple of the page size, but there is no requirement that this be the case. We can widen the field without affecting the layout of struct uma_keg since the removal of uk_slabsize in r315077 left an adjacent hole. PR: 218911 MFC after: 2 weeks	2017-09-13 21:54:37 +00:00
kib	32b600317c	Remove inline specifier from vm_page_free_wakeup(), do not micro-manage compiler. Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-13 19:30:09 +00:00
kib	9c0adbf36e	Do not relock free queue mutex for each page, free whole terminating object' page queue under the single mutex lock. First, all pages on the queue are prepared for free by calls to vm_page_free_prep(), and pages which should not be returned to the physical allocator (e.g. wired or fictitious) are simply removed from the queue. On the second pass, vm_page_free_phys_pglist() inserts all pages from the queue without relocking the mutex. The change improves the object termination, e.g. on the process exit where large anonymous memory objects otherwise cause relocks the free queue mutex for each page. More, if several such processes are exiting or execing in parallel, the mutex was highly contended on the address space demolition. Diagnosed and tested by: mjg (previous version) Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-13 19:22:07 +00:00
kib	b938027f01	Split vm_page_free_toq() into two parts, preparation vm_page_free_prep() and insertion into the phys allocator free queues vm_page_free_phys(). Also provide a wrapper vm_page_free_phys_pglist() for batched free. Reviewed by: alc, markj Tested by: mjg (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-13 19:11:52 +00:00
kib	7e40933eb7	Use existing tag name for the vm_object' memq. Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-13 19:03:59 +00:00
markj	60fea4eba0	Fix a logic error in the item size calculation for internal UMA zones. Kegs for internal zones always keep the slab header in the slab itself. Therefore, when determining the allocation size, we need to take the slab header size into account. Reported and tested by: ae, rakuco Reviewed by: avg MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D12342	2017-09-13 15:44:54 +00:00
mjg	7495dfa6d7	Move vmmeter atomic counters into dedicated cache lines Prior to the change they were subject to extreme false sharing. In particular this change shaves about 3 seconds real time of -j 80 buildkernel. Reviewed by: alc, markj Differential Revision: https://reviews.freebsd.org/D12281	2017-09-10 19:00:38 +00:00
alc	e7430120e9	To analyze the allocation of swap blocks by blist functions, add a method for analyzing the radix tree structures and reporting on the number, and sizes, of maximal intervals of free blocks. The report includes the number of maximal intervals, and also the number of them in each of several size ranges, from small (size 1, or 3 to 4) to large (28657 to 46367) with size boundaries defined by Fibonacci numbers. The report is written in the test tool with the 's' command, or in a running kernel by sysctl. The analysis of the radix tree frequently computes the position of the lone bit set in a u_daddr_t, a computation that also appears in leaf allocation. That computation has been moved into a function of its own, and optimized for cases where an inlined machine instruction can replace the usual binary search. Submitted by: Doug Moore <dougm@rice.edu> MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D11906	2017-09-10 17:46:03 +00:00
kib	d27f2b11c2	Add a vm_page_change_lock() helper, the common code to not relock page lock if both old and new pages use the same underlying lock. Convert existing places to use the helper instead of inlining it. Use the optimization in vm_object_page_remove(). Suggested and reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-09 17:35:19 +00:00
markj	1db0a22947	Speed up vm_page_array initialization. We currently initialize the vm_page array in three passes: one to zero the array, one to initialize the "order" field of each page (necessary when inserting them into the vm_phys buddy allocator one-by-one), and one to initialize the remaining non-zero fields and individually insert each page into the allocator. Merge the three passes into one following a suggestion from alc: initialize vm_page fields in a single pass, and use vm_phys_free_contig() to efficiently insert physical memory segments into the buddy allocator. This reduces the initialization time to a third or a quarter of what it was before on most systems that I tested. Reviewed by: alc, kib MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D12248	2017-09-07 21:43:39 +00:00
mjg	4cc87bd651	Start annotating global _padalign locks with __exclusive_cache_line While these locks are guarnteed to not share their respective cache lines, their current placement leaves unnecessary holes in lines which preceeded them. For instance the annotation of vm_page_queue_free_mtx allows 2 neighbour cachelines (previously separate by the lock) to be collapsed into 1. The annotation is only effective on architectures which have it implemented in their linker script (currently only amd64). Thus locks are not converted to their not-padaligned variants as to not affect the rest. MFC after: 1 week	2017-09-06 20:28:18 +00:00
kib	6fc3cacfc6	Do not leak empty swblk. In swp_pager_meta_build(), if the requested operation results in freeing the last swap pointer in the swblk, free the trie node. Other swap pager code does not expect to find completely empty swblk. Reviewed by: alc, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-06 16:18:53 +00:00
kib	100162ab54	In swp_pager_meta_build(), handle a race with other thread allocating swapblk for our index while we dropped the object lock. Noted by: jeff Reviewed by: alc, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-06 16:16:11 +00:00
kib	13aa7b3b56	Adjust interface of swapon_check_swzone() to its actual usage. The function return value is not used. Its argument is always swap_total/PAGE_SIZE, so make it not take any arguments. Submitted by: ota@j.email.ne.jp PR: 221356 MFC after: 1 week	2017-08-30 10:17:00 +00:00
kib	9fa4cbc3e4	Make the swap_pager_full variable static. r290920 removed the use of the variable from vm/vm_pageout.c. Submitted by: ota@j.email.ne.jp PR: 221356 MFC after: 1 week	2017-08-30 09:44:05 +00:00
markj	f3e8cdafb5	Synchronize page laundering with pmap_extract_and_hold(). Before r207410, the hold count of a page in a page queue was protected by the queue lock, and, before laundering a page, the page daemon removed managed writeable mappings of the page before releasing the queue lock. This ensured that other threads could not concurrently create transient writeable mappings using pmap_extract_and_hold() on a user map, as is done for example by vmapbuf(). With that revision, however, a race can allow the creation of such a mapping, meaning that the page might be modified as it is being laundered, potentially resulting in it being marked clean when its contents do not match those given to the pager. Close the race by using the page lock to synchronize the hold count check in vm_pageout_cluster() with the removal of writeable managed mappings. Reported by: alc Reviewed by: alc, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D12084	2017-08-28 22:10:15 +00:00
alc	a492a3f631	Update a couple vm_object lock assertions in the swap pager to reflect the new use of the vm_object's lock to synchronize updates to a radix trie mapping per-vm object page indices to on-disk swap blocks. Fix a typo in a nearby comment. Reviewed by: kib, markj X-MFC with: r322913 Differential Revision: https://reviews.freebsd.org/D12134	2017-08-28 17:02:25 +00:00
alc	4ab94b03c3	Switching from a global hash table to per-vm_object radix tries for mapping vm_object page indices to on-disk swap space (r322913) has changed the synchronization requirements for a couple swap pager functions. Whereas before a read lock on the vm object sufficed because of the global mutex on the hash table, a write lock on the vm object may now be required. In particular, calls to vm_pager_page_unswapped() now require a write lock on the vm_object. Consequently, vm_fault()'s fast path cannot call vm_pager_page_unswapped(). The swap space will have to be released at a later point. Reviewed by: kib, markj X-MFC with: r322913 Differential Revision: https://reviews.freebsd.org/D12134	2017-08-28 16:55:43 +00:00
kib	5ed2d3f017	Replace global swhash in swap pager with per-object trie to track swap blocks assigned to the object pages. - The global swhash_mtx is removed, trie is synchronized by the corresponding object lock. - The swp_pager_meta_free_all() function used during object termination is optimized by only looking at the trie instead of having to search whole hash for the swap blocks owned by the object. - On swap_pager_swapoff(), instead of iterating over the swhash, global object list have to be inspected. There, we have to ensure that we do see valid trie content if we see that the object type is swap. Sizing of the swblk zone is same as for swblock zone, each swblk maps SWAP_META_PAGES pages. Proposed by: alc Reviewed by: alc, markj (previous version) Tested by: alc, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D11435	2017-08-25 23:13:21 +00:00
br	bc66f23e16	Add OBJ_PG_DTOR flag to VM object. Setting this flag allows us to skip pages removal from VM object queue during object termination and to leave that for cdev_pg_dtor function. Move pages removal code to separate function vm_object_terminate_pages() as comments does not survive indentation. This will be required for Intel SGX support where we will have to remove pages from VM object manually. Reviewed by: kib, alc Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D11688	2017-08-16 08:49:11 +00:00

1 2 3 4 5 ...

3924 Commits