freebsd-nq

Author	SHA1	Message	Date
Konstantin Belousov	a823302783	Allow to create swap zone larger than v_page_count / 2. If user configured the maxswapzone tunable, just take the literal value for the initial zone sizing attempt. Before, it was only possible to reduce the zone by the tunable. While there, correct the message which was not correct when zone creation rounded the size up. Reported by: jmg Reviewed by: markj MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D18381	2018-12-01 16:50:12 +00:00
Alan Cox	541a117532	Use swp_pager_isondev() throughout. Submitted by: ota@j.email.ne.jp Change swp_pager_isondev()'s return type to bool. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D16712	2018-11-19 17:17:23 +00:00
Mark Johnston	150d384e5c	Fix a use-after-free in swp_pager_meta_free(). This was introduced in r326329 and explains the crashes mentioned in the commit log message for r339934. In particular, on INVARIANTS kernels, UMA trashing causes the loop to exit early, leaving swap blocks behind when they should have been freed. After r336984 this became more problematic since new anonymous mappings were more likely to reuse swapped-out subranges of existing VM objects, so faults would trigger pageins of freed memory rather than returning zeroed pages. Reviewed by: kib MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17897	2018-11-07 23:28:11 +00:00
Matt Macy	e8bb589d56	eliminate locking surrounding ui_vmsize and swap reserve by using atomics Change swap_reserve and swap_total to be in units of pages so that swap reservations can be done using only atomics instead of using a single global mutex for swap_reserve and a single mutex for all processes running under the same uid for uid accounting. Results in mmap speed up and a 70% increase in brk calls / second. Reviewed by: alc@, markj@, kib@ Approved by: re (delphij@) Differential Revision: https://reviews.freebsd.org/D16273	2018-10-05 05:50:56 +00:00
Alan Cox	f5fbe90de4	Passing UMA_ZONE_NOFREE to uma_zcreate() for swpctrie_zone and swblk_zone is redundant, because uma_zone_reserve_kva() is performed on both zones and it sets this same flag on the zone. (Moreover, the implementation of the swap pager does not itself require these zones to be UMA_ZONE_NOFREE.) Reviewed by: kib, markj Approved by: re (gjb) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D17296	2018-09-24 16:49:02 +00:00
Alan Cox	78f1deeffe	Defer and aggregate swap_pager_meta_build frees. Before swp_pager_meta_build replaces an old swapblk with an new one, it frees the old one. To allow such freeing of blocks to be aggregated, have swp_pager_meta_build return the old swap block, and make the caller responsible for freeing it. Define a pair of short static functions, swp_pager_init_freerange and swp_pager_update_freerange, to do the initialization and updating of blk addresses and counters used in aggregating blocks to be freed. Submitted by: Doug Moore <dougm@rice.edu> Reviewed by: kib, markj (an earlier version) Tested by: pho MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D13707	2018-08-08 02:30:34 +00:00
Konstantin Belousov	b7b8a09658	Handle the race between fork/vm_object_split() and faults. If fault started before vmspace_fork() locked the map, and then during fork, vm_map_copy_entry()->vm_object_split() is executed, it is possible that the fault instantiate the page into the original object when the page was already copied into the new object (see vm_map_split() for the orig/new objects terminology). This can happen if split found a busy page (e.g. from the fault) and slept dropping the objects lock, which allows the swap pager to instantiate read-behind pages for the fault. Then the restart of the scan can see a page in the scanned range, where it was already copied to the upper object. Fix it by instantiating the read-ahead pages before swap_pager_getpages() method drops the lock to allocate pbuf. The object scan would see the whole range prefilled with the busy pages and not proceed the range. Note that vm_fault rechecks the map generation count after the object unlock, so that it restarts the handling if raced with split, and re-lookups the right page from the upper object. In collaboration with: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-06-14 19:41:02 +00:00
Brooks Davis	6469bdcdb6	Move most of the contents of opt_compat.h to opt_global.h. opt_compat.h is mentioned in nearly 180 files. In-progress network driver compabibility improvements may add over 100 more so this is closer to "just about everywhere" than "only some files" per the guidance in sys/conf/options. Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h is created on all architectures. Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the set of compiled files. Reviewed by: kib, cem, jhb, jtl Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14941	2018-04-06 17:35:35 +00:00
Mark Johnston	3f060b60b1	Use the conventional name for an array of pages. No functional change intended. Discussed with: kib MFC after: 3 days	2018-02-16 15:38:22 +00:00
Jeff Roberson	e958ad4cf3	Make v_wire_count a per-cpu counter(9) counter. This eliminates a significant source of cache line contention from vm_page_alloc(). Use accessors and vm_page_unwire_noq() so that the mechanism can be easily changed in the future. Reviewed by: markj Discussed with: kib, glebius Tested by: pho (earlier version) Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14273	2018-02-12 22:53:00 +00:00
Jeff Roberson	e2068d0bcd	Use per-domain locks for vm page queue free. Move paging control from global to per-domain state. Protect reservations with the free lock from the domain that they belong to. Refactor to make vm domains more of a first class object. Reviewed by: markj, kib, gallatin Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14000	2018-02-06 22:10:07 +00:00
Alan Cox	4abca9bb05	Previously, swap_pager_copy() freed swap blocks one at at time, via swp_pager_meta_ctl(), with no opportunity to recognize freeing of consecutive blocks and free fewer block ranges. To open that opportunity, this change removes the SWM_FREE option from swp_pager_meta_ctl(), and compels the caller to do the freeing when a valid block address is returned. In swap_pager_copy(), these frees are aggregated, so that a sequence of them can be done at one time. The only other caller to swp_pager_meta_ctl() that passed SWM_FREE, swp_pager_unswapped(), is also modified to handle its single free explicitly. Submitted by: Doug Moore <dougm@rice.edu> Reviewed by: kib (an earlier version) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D13290	2017-12-31 04:01:47 +00:00
Alan Cox	230869e051	When the swap pager allocates space on disk, it requests contiguous blocks in a single call to blist_alloc(). However, when it frees that space, it previously called blist_free() on each block, one at a time. With this change, the swap pager identifies ranges of contiguous blocks to be freed, and calls blist_free() once per range. In one extreme case, that is described in the review, the time to perform an munmap(2) was reduced by 55%. Submitted by: Doug Moore <dougm@rice.edu> Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D12397	2017-11-28 17:46:03 +00:00
Pedro F. Giffuni	df57947f08	spdx: initial adoption of licensing ID tags. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Initially, only tag files that use BSD 4-Clause "Original" license. RelNotes: yes Differential Revision: https://reviews.freebsd.org/D13133	2017-11-18 14:26:50 +00:00
Jeff Roberson	8d6fbbb867	Replace manyinstances of VM_WAIT with blocking page allocation flags similar to the kernel memory allocator. This simplifies NUMA allocation because the domain will be known at wait time and races between failure and sleeping are eliminated. This also reduces boilerplate code and simplifies callers. A wait primitive is supplied for uma zones for similar reasons. This eliminates some non-specific VM_WAIT calls in favor of more explicit sleeps that may be satisfied without new pages. Reviewed by: alc, kib, markj Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon	2017-11-08 02:39:37 +00:00
Edward Tomasz Napierala	be7d4ac586	Add OID for the vm.overcommit sysctl. This makes it possible to remove one call to sysctl(2) from jemalloc startup code. (That also requires changes to jemalloc, but I plan to push those to upstream first.) Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D12745	2017-10-22 10:35:29 +00:00
Konstantin Belousov	1fffcd755d	Do not report reduction of swap zone if it was not. After r324600 we see the actual reservation. Reported by: jkim Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-18 07:27:43 +00:00
Konstantin Belousov	53faf5a7d4	Evaluate the real size of the sblk_zone. Submitted by: ota@j.email.ne.jp PR: 221356 Reviewed by: alc, markj MFC after: 1 week Differential revision: https://reviews.freebsd.org/D12660	2017-10-13 16:23:05 +00:00
Alan Cox	37244a84fd	Replace an unnecessary call to vm_page_activate() by an assertion that the page is already wired or queued. Prior to the elimination of PG_CACHED pages, vm_page_grab() might have returned a valid, previously PG_CACHED page, in which case enqueueing the page was necessary. Now, that can't happen. Moreover, activating the page is a dubious choice, since the page is not being accessed. Reviewed by: kib MFC after: 1 week	2017-10-08 16:54:42 +00:00
Alan Cox	41e5a22698	When an I/O error occurs on page out, there is no need to dirty the page, because it is already dirty. Instead, assert that the page is dirty. Reviewed by: kib, markj MFC after: 1 week	2017-10-01 17:04:26 +00:00
Alan Cox	d027ed2e7a	To analyze the allocation of swap blocks by blist functions, add a method for analyzing the radix tree structures and reporting on the number, and sizes, of maximal intervals of free blocks. The report includes the number of maximal intervals, and also the number of them in each of several size ranges, from small (size 1, or 3 to 4) to large (28657 to 46367) with size boundaries defined by Fibonacci numbers. The report is written in the test tool with the 's' command, or in a running kernel by sysctl. The analysis of the radix tree frequently computes the position of the lone bit set in a u_daddr_t, a computation that also appears in leaf allocation. That computation has been moved into a function of its own, and optimized for cases where an inlined machine instruction can replace the usual binary search. Submitted by: Doug Moore <dougm@rice.edu> MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D11906	2017-09-10 17:46:03 +00:00
Konstantin Belousov	85d88d8799	Do not leak empty swblk. In swp_pager_meta_build(), if the requested operation results in freeing the last swap pointer in the swblk, free the trie node. Other swap pager code does not expect to find completely empty swblk. Reviewed by: alc, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-06 16:18:53 +00:00
Konstantin Belousov	eed99cb81b	In swp_pager_meta_build(), handle a race with other thread allocating swapblk for our index while we dropped the object lock. Noted by: jeff Reviewed by: alc, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-06 16:16:11 +00:00
Konstantin Belousov	35872e79b7	Adjust interface of swapon_check_swzone() to its actual usage. The function return value is not used. Its argument is always swap_total/PAGE_SIZE, so make it not take any arguments. Submitted by: ota@j.email.ne.jp PR: 221356 MFC after: 1 week	2017-08-30 10:17:00 +00:00
Konstantin Belousov	f08b30995a	Make the swap_pager_full variable static. r290920 removed the use of the variable from vm/vm_pageout.c. Submitted by: ota@j.email.ne.jp PR: 221356 MFC after: 1 week	2017-08-30 09:44:05 +00:00
Alan Cox	ee620ea47d	Update a couple vm_object lock assertions in the swap pager to reflect the new use of the vm_object's lock to synchronize updates to a radix trie mapping per-vm object page indices to on-disk swap blocks. Fix a typo in a nearby comment. Reviewed by: kib, markj X-MFC with: r322913 Differential Revision: https://reviews.freebsd.org/D12134	2017-08-28 17:02:25 +00:00
Konstantin Belousov	f425ab8e50	Replace global swhash in swap pager with per-object trie to track swap blocks assigned to the object pages. - The global swhash_mtx is removed, trie is synchronized by the corresponding object lock. - The swp_pager_meta_free_all() function used during object termination is optimized by only looking at the trie instead of having to search whole hash for the swap blocks owned by the object. - On swap_pager_swapoff(), instead of iterating over the swhash, global object list have to be inspected. There, we have to ensure that we do see valid trie content if we see that the object type is swap. Sizing of the swblk zone is same as for swblock zone, each swblk maps SWAP_META_PAGES pages. Proposed by: alc Reviewed by: alc, markj (previous version) Tested by: alc, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D11435	2017-08-25 23:13:21 +00:00
Konstantin Belousov	9680bb9877	Remove unused function swap_pager_isswapped(). Noted by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-07-19 17:28:46 +00:00
Alan Cox	e22415906d	Increase the pageout cluster size to 32 pages. Decouple the pageout cluster size from the size of the hash table entry used by the swap pager for mapping (object, pindex) to a block on the swap device(s), and keep the size of a hash table entry at its current size. Eliminate a pointless macro. Reviewed by: kib, markj (an earlier version) MFC after: 4 weeks Differential Revision: https://reviews.freebsd.org/D11305	2017-06-24 17:10:33 +00:00
Alan Cox	3a5d839ebc	Eliminate an unused macro. MFC after: 3 days	2017-06-21 03:55:45 +00:00
Alan Cox	87b0ab69a9	Pages that are passed to swap_pager_putpages() should already be fully dirty. Assert that they are fully dirty rather than redundantly calling vm_page_dirty() on them. Reviewed by: kib, markj MFC after: 1 week X-MFC after: r319932	2017-06-17 03:05:25 +00:00
Alan Cox	761097c85e	Starting in r118390, swaponsomething() began to reserve the blocks at the beginning of a swap area for a disk label. However, neither r118390 nor r118544, which increased the reservation from one to two blocks, correctly accounted for these blocks when updating the variable "swap_pager_avail". This change corrects that error. Reviewed by: kib MFC after: 5 days	2017-06-06 16:52:07 +00:00
Alan Cox	03bdd65f18	When the function blist_fill() was added to the kernel in r107913, the swap pager used a different scheme for striping the allocation of swap space across multiple devices. And, although blist_fill() was intended to support fill operations with large counts, the old striping scheme never performed a fill larger than the stripe size. Consequently, the misplacement of a sanity check in blst_meta_fill() went undetected. Now, moving forward in time to r118390, a new scheme for striping was introduced that maintained a blist allocator per device, but as noted in r318995, swapoff_one() was not fully and correctly converted to the new scheme. This change completes what was started in r318995 by fixing the underlying bug in blst_meta_fill() that stops swapoff_one() from simply performing a single blist_fill() operation. Reviewed by: kib MFC after: 5 days Differential Revision: https://reviews.freebsd.org/D11043	2017-06-06 03:32:17 +00:00
Alan Cox	064650c180	Halve the memory being internally allocated by the blist allocator. In short, half of the memory that is allocated to implement the radix tree is wasted because we did not change "u_daddr_t" to be a 64-bit unsigned int when we changed "daddr_t" to be a 64-bit (signed) int. (See r96849 and r96851.) Reviewed by: kib, markj Tested by: pho MFC after: 5 days Differential Revision: https://reviews.freebsd.org/D11028	2017-06-05 17:14:16 +00:00
Alan Cox	07c348ea7b	After r118390, the variable "dmmax" was neither the correct strip size nor the correct maximum block size. Moreover, after r318995, it serves no purpose except to provide information to user space through a read- sysctl. This change eliminates the variable "dmmax" but retains the sysctl. It also corrects the value returned by the sysctl. Reviewed by: kib, markj MFC after: 3 days	2017-05-27 21:46:00 +00:00
Alan Cox	fe71561af2	In r118390, the swap pager's approach to striping swap allocation over multiple devices was changed. However, swapoff_one() was not fully and correctly converted. In particular, with r118390's introduction of a per- device blist, the maximum swap block size, "dmmax", became irrelevant to swapoff_one()'s operation. Moreover, swapoff_one() was performing out-of- range operations on the per-device blist that were silently ignored by blist_fill(). This change corrects both of these problems with swapoff_one(), which will allow us to potentially increase MAX_PAGEOUT_CLUSTER. Previously, swapoff_one() would panic inside of blist_fill() if you increased MAX_PAGEOUT_CLUSTER. Reviewed by: kib, markj MFC after: 3 days	2017-05-27 16:40:00 +00:00
Konstantin Belousov	6992112349	Commit the 64-bit inode project. Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify struct dirent layout to add d_off, increase the size of d_fileno to 64-bits, increase the size of d_namlen to 16-bits, and change the required alignment. Increase struct statfs f_mntfromname[] and f_mntonname[] array length MNAMELEN to 1024. ABI breakage is mitigated by providing compatibility using versioned symbols, ingenious use of the existing padding in structures, and by employing other tricks. Unfortunately, not everything can be fixed, especially outside the base system. For instance, third-party APIs which pass struct stat around are broken in backward and forward incompatible ways. Kinfo sysctl MIBs ABI is changed in backward-compatible way, but there is no general mechanism to handle other sysctl MIBS which return structures where the layout has changed. It was considered that the breakage is either in the management interfaces, where we usually allow ABI slip, or is not important. Struct xvnode changed layout, no compat shims are provided. For struct xtty, dev_t tty device member was reduced to uint32_t. It was decided that keeping ABI compat in this case is more useful than reporting 64-bit dev_t, for the sake of pstat. Update note: strictly follow the instructions in UPDATING. Build and install the new kernel with COMPAT_FREEBSD11 option enabled, then reboot, and only then install new world. Credits: The 64-bit inode project, also known as ino64, started life many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick (mckusick) then picked up and updated the patch, and acted as a flag-waver. Feedback, suggestions, and discussions were carried by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles), and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial ports investigation followed by an exp-run by Antoine Brodin (antoine). Essential and all-embracing testing was done by Peter Holm (pho). The heavy lifting of coordinating all these efforts and bringing the project to completion were done by Konstantin Belousov (kib). Sponsored by: The FreeBSD Foundation (emaste, kib) Differential revision: https://reviews.freebsd.org/D10439	2017-05-23 09:29:05 +00:00
Gleb Smirnoff	83c9dea1ba	- Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter in place. To do per-cpu stats, convert all fields that previously were maintained in the vmmeters that sit in pcpus to counter(9). - Since some vmmeter stats may be touched at very early stages of boot, before we have set up UMA and we can do counter_u64_alloc(), provide an early counter mechanism: o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter. o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter, so that at early stages of boot, before counters are allocated we already point to a counter that can be safely written to. o For sparc64 that required a whole dummy pcpu[MAXCPU] array. Further related changes: - Don't include vmmeter.h into pcpu.h. - vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit, to match kernel representation. - struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion. This is based on benno@'s 4-year old patch: https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html Reviewed by: kib, gallatin, marius, lidl Differential Revision: https://reviews.freebsd.org/D10156	2017-04-17 17:34:47 +00:00
Mark Johnston	b1fd102ee7	Add a page queue for holding dirty anonymous unswappable pages. On systems without a configured swap device, an attempt to launder pages from a swap object will always fail and result in the page being reactivated. This means that the page daemon will continuously scan pages that can never be evicted. With this change, anonymous pages are instead moved to PQ_UNSWAPPABLE after a failed laundering attempt when no swap devices are configured. PQ_UNSWAPPABLE is not scanned unless a swap device is configured, so unreferenced unswappable pages are excluded from the page daemon's workload. Reviewed by: alc	2017-01-03 00:05:44 +00:00
Konstantin Belousov	2e56b64fa4	Fix argument type and microoptimize swp_pager_meta_free(). The count argument natural type if vm_pindex_t, but due to the loop organization, it has to be signed type to detect the termination condition. Replace this logic by using distinguished counter for the processed pages, and terminate loop when the counter exceeds the argument. Completely process one swblock for all relevant indexes instead of doing relookup in hash when incrementing page index on the loop step. Do not drop hash mutex around iterations. Noted and reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-12-24 09:57:31 +00:00
Konstantin Belousov	77d6fd97ef	Improve vm_object_scan_all_shadowed() to also check swap backing objects. As noted in the removed comment, it is possible and not prohibitively costly to look up the swap blocks for the given page index. Implement a swap_pager_find_least() function to do that, and use it to iterate simultaneously over both backing object page queue and swap allocations when looking for shadowed pages. Testing shows that number of new succesful scans, enabled by this addition, is small but non-zero. When worked out, the change both further reduces the depth of the shadow object chain, and frees unused but allocated swap and memory. Suggested and reviewed by: alc Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-12-18 20:56:14 +00:00
Konstantin Belousov	71057cd207	In swp_pager_meta_free_all(), fix type of the index variable. Style. Noted and reviewed by: alc (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-12-16 23:33:37 +00:00
Alan Cox	bba39b9ae3	Remove PG_CACHED-related fields from struct vmmeter, because they are no longer used. More precisely, they are always zero because the code that decremented and incremented them no longer exists. Bump __FreeBSD_version to mark this change. Reviewed by: kib, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8583	2016-11-22 18:13:46 +00:00
Alan Cox	7667839a7e	Remove most of the code for implementing PG_CACHED pages. (This change does not remove user-space visible fields from vm_cnt or all of the references to cached pages from comments. Those changes will come later.) Reviewed by: kib, markj Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8497	2016-11-15 18:22:50 +00:00
Alan Cox	ebcddc7217	Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty pages, specificially, dirty pages that have passed once through the inactive queue. A new, dedicated thread is responsible for both deciding when to launder pages and actually laundering them. The new policy uses the relative sizes of the inactive and laundry queues to determine whether to launder pages at a given point in time. In general, this leads to more intelligent swapping behavior, since the laundry thread will avoid pageouts when the marginal benefit of doing so is low. Previously, without a dedicated queue for dirty pages, the page daemon didn't have the information to determine whether pageout provides any benefit to the system. Thus, the previous policy often resulted in small but steadily increasing amounts of swap usage when the system is under memory pressure, even when the inactive queue consisted mostly of clean pages. This change addresses that issue, and also paves the way for some future virtual memory system improvements by removing the last source of object-cached clean pages, i.e., PG_CACHE pages. The new laundry thread sleeps while waiting for a request from the page daemon thread(s). A request is raised by setting the variable vm_laundry_request and waking the laundry thread. We request launderings for two reasons: to try and balance the inactive and laundry queue sizes ("background laundering"), and to quickly make up for a shortage of free pages and clean inactive pages ("shortfall laundering"). When background laundering is requested, the laundry thread computes the number of page daemon wakeups that have taken place since the last laundering. If this number is large enough relative to the ratio of the laundry and (global) inactive queue sizes, we will launder vm_background_launder_target pages at vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back to sleep without doing any work. When scanning the laundry queue during background laundering, reactivated pages are counted towards the laundry thread's target. In contrast, shortfall laundering is requested when an inactive queue scan fails to meet its target. In this case, the laundry thread attempts to launder enough pages to meet v_free_target within 0.5s, which is the inactive queue scan period. A laundry request can be latched while another is currently being serviced. In particular, a shortfall request will immediately preempt a background laundering. This change also redefines the meaning of vm_cnt.v_reactivated and removes the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning of vm_cnt.v_reactivated now better reflects its name. It represents the number of inactive or laundry pages that are returned to the active queue on account of a reference. In collaboration with: markj Reviewed by: kib Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8302	2016-11-09 18:48:37 +00:00
Mark Johnston	dd9cb6da0b	Respect the caller's hints when performing swap readahead. The pager getpages interface allows the caller to bound the number of readahead and readbehind pages, and vm_fault_hold() makes use of this feature. These bounds were ignored after r305056, causing the swap pager to potentially page in more than the specified number of pages. Reported and reviewed by: alc X-MFC with: r305056	2016-09-04 00:25:49 +00:00
Konstantin Belousov	9815066425	Make swapoff reliable. The swap_pager_swapoff() function uses trylock for the object lock before pagein, which means that either i/o to md(4) over swap, or intensive page faults over swap pager objects might prevent swapoff() from making any progress. Then the retry < 100 check fails and machine panics. If trylock fails, acquire the object lock in the blockable way and restart the hash bucket walk. Keep retries logic for now. Reported and tested by: pho Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D7688	2016-08-31 14:49:58 +00:00
Mark Johnston	915d1b71cd	Restore swap pager readahead after r292373. The removal of vm_fault_additional_pages() meant that a hard fault on a swap-backed page would result in only that page being read in. This change implements readahead and readbehind for the swap pager in swap_pager_getpages(). swap_pager_haspage() is modified to return the largest contiguous non-resident range of pages containing the requested range. Reviewed by: alc, kib Tested by: pho MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7677	2016-08-30 05:56:21 +00:00
Konstantin Belousov	0c657d22eb	Explain why swapgeom_close_ev() is delegated. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-03 07:11:19 +00:00
Konstantin Belousov	88ad2d7b47	Do not delegate a work to geom event thread which can be done inline. In particular, swapongeom_ev() needed event thread context when swap pager configuration was performed under Giant and geom asserted that Giant is not owned. Now both of the reason went away. On the other hand, note that swpageom_release() is called from the bio_done context, and possible close cannot be performed inline. Also fix some minor issues. The swapgeom() function does not use the td argument, remove it. Recheck that the vnode passed is still VCHR and not reclaimed after the lock. Reviewed by: mav Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-07-28 15:57:01 +00:00

1 2 3 4 5 ...

432 Commits