freebsd-skq

Author	SHA1	Message	Date
Jeff Roberson	a67d540832	Use atomics in more cases for object references. We now can completely omit the object lock if we are above a certain threshold. Hold only a single vnode reference when the vnode object has any ref > 0. This allows us to only lock the object and vnode on 0-1 and 1-0 transitions. Differential Revision: https://reviews.freebsd.org/D22452	2019-11-27 00:39:23 +00:00
Jeff Roberson	beb8beef81	Refactor uma_zalloc_arg(). It is a mess of gotos and code which doesn't make sense after many partial refactors. Attempt to make a smaller cache footprint for the fast path. Reviewed by: markj, rlibby Differential Revision: https://reviews.freebsd.org/D22470	2019-11-26 22:17:02 +00:00
Ryan Libby	6a14746c01	vm_object_collapse_scan_wait: drop locks before reacquiring Regression from r352174. In the vm_page_rename() failure case we forgot to unlock the vm object locks before sleeping and reacquiring them. Reviewed by: jeff Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D22542	2019-11-25 07:38:31 +00:00
Jeff Roberson	4d987866e6	Move anonymous object copying for fork into its own routine and so that we can avoid locking non-anonymous objects. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D22472	2019-11-25 07:13:05 +00:00
Doug Moore	2767c9f36a	Where 'current' is used to index over vm_map entries, use 'entry'. Where 'entry' is used to identify the starting point for iteration, use 'first_entry'. These are the naming conventions used in most of the vm_map.c code. Where VM_MAP_ENTRY_FOREACH can be used, do so. Squeeze a few lines to fit in 80 columns. Where lines are being modified for these reasons, look to remove style(9) violations. Reviewed by: alc, markj Differential Revision: https://reviews.freebsd.org/D22458	2019-11-25 02:19:47 +00:00
Konstantin Belousov	3236244936	Ignore object->handle for OBJ_ANON objects. Note that the change in vm_object_collapse() is arguably a correctness fix. We must not collapse into content-identity carrying objects. Reviewed by: jeff Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D22467	2019-11-24 19:18:12 +00:00
Konstantin Belousov	b631c36f0d	Record part of the owner struct thread pointer into busy_lock. Record as much bits from curthread into busy_lock as fits. Low bits for struct thread * representation are zero due to struct and zone alignment, and they leave space for busy flags (perhaps except statically allocated thread0). Upper bits are not very interesting for assert, and in most practical situations recorded value should allow to manually identify the owner with certainity. Assert that unbusy is performed by the owner, except few places where unbusy is done in io completion handler. For this case, add _unchecked variants of asserts and unbusy primitives. Reviewed by: markj (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D22298	2019-11-24 19:12:23 +00:00
Mark Johnston	9c770a27ce	Simplify vm_pageout_init_domain() and add a "big picture" comment. Stop subtracting 1024/200 from vmd_page_count/200. I cannot see how such precise accounting can make a difference on modern systems. Add some explanation of what the page daemon does and how it handles memory shortages. Reviewed by: dougm Discussed with: jeff, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22396	2019-11-22 16:31:43 +00:00
Mark Johnston	8fc2550837	Reclaim memory from UMA if the page daemon is struggling. Use the UMA reclaim thread to asynchronously drain all caches if there is a severe shortage in a domain. Otherwise we only trigger UMA reclamation every 10s even when the system has completely run out of memory. Stop entirely draining the caches when one domain falls below its min threshold. In some workloads it is normal for one NUMA domain to end up being nearly depleted by kernel memory allocations, for example for the ZFS ARC. The domainset iterators skip domains below the vmd_min_free theshold on the first iteration, so we should allow that mechanism to limit further depletion of the domain's free pages before taking the extreme step of calling uma_reclaim(UMA_RECLAIM_DRAIN_CPU). Discussed with: jeff MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22395	2019-11-22 16:31:30 +00:00
Mark Johnston	bf0d60af92	Update the checks in vm_page_zone_import(). - Remove the cnt == 1 check. UMA passes cnt == 1 when it has disabled per-CPU caching. In this case we might as well just allocate a single page and return it to the caller, since the caller is going to do exactly that anyway if the UMA cache allocation attempt fails. - Don't replenish caches if the domain is severely short on free pages. With large buckets we may otherwise quickly exacerbate a situation where the page daemon is failing to keep up. - Don't replenish caches if the calling thread belongs to the page daemon, which should avoid creating extra memory pressure when it is trying to free memory. Virtually all such allocations while occur in the context of laundering, where the laundry thread must allocate slabs for various swap and I/O-related UMA zones. Reviewed by: kib Discussed with: alc, jeff MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22394	2019-11-22 16:31:10 +00:00
Mark Johnston	003cf08ba9	Revise the page cache size policy. In r353734 the use of the page caches was limited to systems with a relatively large amount of RAM per CPU. This was to mitigate some issues reported with the system not able to keep up with memory pressure in cases where it had been able to do so prior to the addition of the direct free pool cache. This change re-enables those caches. The change modifies uma_zone_set_maxcache(), which was introduced specifically for the page cache zones. Rather than using it to limit only the full bucket cache, have it also set uz_count_max to provide an upper bound on the per-CPU cache size that is consistent with the number of items requested. Remove its return value since it has no use. Enable the page cache zones unconditionally, and limit them to 0.1% of the domain's pages. The limit can be overridden by the vm.pgcache_zone_max tunable as before. Change the item size parameter passed to uma_zcache_create() to the correct size, and stop setting UMA_ZONE_MAXBUCKET. This allows the page cache buckets to be adaptively sized, like the rest of UMA's caches. This also causes the initial bucket size to be small, so only systems which benefit from large caches will get them. Reviewed by: gallatin, jeff MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22393	2019-11-22 16:30:47 +00:00
Mark Johnston	b378d29687	Fix locking in vm_reserv_reclaim_contig(). We were not properly handling the case where the trylock of the reservaton fails, in which case we could leak reservation lock. Introduce a marker reservation to implement precise scanning in vm_reserv_reclaim_contig(). Before, a race could result in early termination of the scan in rare situations. Use the marker's lock to serialize scans of the partpop queue so that a global marker structure can be used. Modify vm_reserv_reclaim_inactive() to handle the presence of a marker while minimizing the hold time of domain-global locks. Reviewed by: alc, jeff, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22392	2019-11-22 16:28:52 +00:00
Andrew Turner	09a65f9ff5	As with r354905 use uint16_t to store aflags on the stack and as function arguments as the aflags size in vm_page_t has increased. Sponsored by: DARPA, AFRL	2019-11-20 18:00:43 +00:00
Andrew Turner	ad216bc10d	Use atomic_load_16 to load aflags as it's a uint16_t after r354820. Sponsored by: DARPA, AFRL	2019-11-20 17:49:58 +00:00
Doug Moore	83704cc236	Instead of looking up a predecessor or successor to the current map entry, when that entry has been seen already, keep the already-looked-up value in a variable and use that instead of looking it up again. Approved by: alc, markj (earlier version), kib (earlier version) Differential Revision: https://reviews.freebsd.org/D22348	2019-11-20 16:06:48 +00:00
Jeff Roberson	71353f7a2f	When we set OFFPAGE to limit fragmentation we should also set VTOSLAB so that we avoid the hashtables. The hashtable is now only required if a zone is created with OFFPAGE specified initially, not internally. This flag signals to UMA that it can't touch the allocated memory and so can't store a slab pointer in the containing page. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D22453	2019-11-20 01:57:33 +00:00
Jeff Roberson	51b867e56b	Only keep anonymous objects on shadow lists. This eliminates locking of globally visible objects when they are part of a backing chain. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D22423	2019-11-20 00:31:14 +00:00
Jeff Roberson	7f935055d3	Remove unnecessary object locking from the vnode pager. Recent changes to busy/valid/dirty locking make these acquires redundant. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D22186	2019-11-19 23:30:09 +00:00
Jeff Roberson	639676877b	Simplify anonymous memory handling with an OBJ_ANON flag. This eliminates reudundant complicated checks and additional locking required only for anonymous memory. Introduce vm_object_allocate_anon() to create these objects. DEFAULT and SWAP objects now have the correct settings for non-anonymous consumers and so individual consumers need not modify the default flags to create super-pages and avoid ONEMAPPING/NOSPLIT. Reviewed by: alc, dougm, kib, markj Tested by: pho Differential Revision: https://reviews.freebsd.org/D22119	2019-11-19 23:19:43 +00:00
Doug Moore	8ecbf14b74	Drop the extra argument from swp_pager_meta_ctl and have it do lookup only. Rename it swp_pager_meta_lookup. Stop checking for obj->type == swap there and assert it instead. Make the caller responsible for the obj->type check. Move the meta_ctl 'pop' functionality to swap_pager_unswapped, the only place that uses it, and assume obj->type == swap there too. Assisted by: ota_j.email.ne.jp Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D22437	2019-11-19 08:06:31 +00:00
Mark Johnston	fe6d5344c2	Group per-domain reservation data in the same structure. We currently have the per-domain partially populated reservation queues and the per-domain queue locks. Define a new per-domain padded structure to contain both of them. This puts the queue fields and lock in the same cache line and avoids the false sharing within the old queue array. Also fix field packing in the reservation structure. In many places we assume that a domain index fits in 8 bits, so we can do the same there as well. This reduces the size of the structure by 8 bytes. Update some comments while here. No functional change intended. Reviewed by: dougm, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22391	2019-11-18 18:25:51 +00:00
Mark Johnston	3a2ba9974d	Widen the vm_page aflags field to 16 bits. We are now out of aflags bits, whereas the "flags" field only makes use of five of its sixteen bits, so narrow "flags" to eight bits. I have no intention of adding a new aflag in the near future, but would like to combine the aflags, queue and act_count fields into a single atomically updated word. This will allow vm_page_pqstate_cmpset() to become much simpler and is a step towards eliminating the use of the page lock array in updating per-page queue state. The change modifies the layout of struct vm_page, so bump __FreeBSD_version. Reviewed by: alc, dougm, jeff, kib Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22397	2019-11-18 18:22:41 +00:00
Doug Moore	abdab7b633	Add a helper function for testing a swap block and freeing it if empty. Submitted by: ota_j.email.ne.jp Approved by: alc, kib, dougm Differential Revision: https://reviews.freebsd.org/D22402	2019-11-17 18:38:37 +00:00
Konstantin Belousov	156e865494	Add elf image flag to disable stack gap. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22379	2019-11-17 14:54:07 +00:00
Doug Moore	bdb90e7613	The loop in vm_map_protect that verifies that all transition map entries are stabilized, repeatedly verifies the same entry. Check each entry in turn. Reviewed by: kib (code only), alc Tested by: pho MFC after: 7 days Differential Revision: https://reviews.freebsd.org/D22405	2019-11-17 06:50:36 +00:00
Doug Moore	7cdcf86360	Define wrapper functions vm_map_entry_{succ,pred} to act as wrappers around entry->{next,prev} when those are used for ordered list traversal, and use those wrapper functions everywhere. Where the next field is used for maintaining a stack of deferred operations, #define defer_next to make that different usage clearer, and then use the 'right' pointer instead of 'next' for that purpose. Approved by: markj Tested by: pho (as part of a larger patch) Differential Revision: https://reviews.freebsd.org/D22347	2019-11-13 15:56:07 +00:00
Doug Moore	467057fcd9	swap_pager_meta_free() frees allocated blocks in a way that exploits the sparsity of allocated blocks in a range, without issuing an "are you there?" query for every block in the range. swap_pager_copy() is not so smart. Modify the implementation of swap_pager_meta_free() slightly so that swap_pager_copy() can use that smarter implementation too. Based on an observation of: Yoshihiro Ota (ota_j.email.ne.jp) Reviewed by: kib,alc Tested by: pho Differential Revision: https://reviews.freebsd.org/D22280	2019-11-11 16:59:49 +00:00
Konstantin Belousov	08034d1006	Include cache zones into zone_foreach() where appropriate. The r354367 is reverted since it is subsumed by this, more complete, approach. Suggested by: markj Reviewed by: alc. glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22242	2019-11-10 09:25:19 +00:00
Doug Moore	461587dc9b	For vm_map, #defining DIAGNOSTIC to turn on full assertion-based consistency checking slows performance dramatically. This change reduces the number of assertions checked by completely walking the vm_map tree only when the write-lock is released, and only then if the number of modifications to the tree since the last walk exceeds the number of tree nodes. Reviewed by: alc, kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D22163	2019-11-09 17:08:27 +00:00
Mark Johnston	c95f8ed885	Drop Giant before sleeping on a busy page. Before the page busy code was converted to make direct use of sleepqueues, this was handled by _sleep(). Reported by: glebius Reviewed by: kib Sponsored by: The FreeBSD Foundation	2019-11-07 18:26:29 +00:00
Mark Johnston	be801aaaef	Fix a race in release_page(). Since r354156 we may call release_page() without the page's object lock held, specifically following the page copy during a CoW fault. release_page() must therefore unbusy the page only after scheduling the requeue, to avoid racing with a free of the page. Previously, the object lock prevented this race from occurring. Add some assertions that were helpful in tracking this down. Reported by: pho, syzkaller Tested by: pho Reviewed by: alc, jeff, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22234	2019-11-06 16:59:16 +00:00
Konstantin Belousov	432fc36da1	Switch cache zones from early counters to real implementation. Early counter mock can be only used on BSP for amd64, when APs try to update it that causes random memory corruption. N.B. This is a temporary patch to plug the corruption for now, while a proper solution for handling cache zones in zone_foreach() is being developed. In collaboration with: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation, Mellanox Technologies	2019-11-05 21:38:48 +00:00
Konstantin Belousov	afa7e88a18	vm_page_wire_mapped: explain why failure does not affect correctness. Reviewed by: markj (previous version) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D22196	2019-10-30 17:33:17 +00:00
Jeff Roberson	67d0e29304	Replace OBJ_MIGHTBEDIRTY with a system using atomics. Remove the TMPFS_DIRTY flag and use the same system. This enables further fault locking improvements by allowing more faults to proceed with a shared lock. Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D22116	2019-10-29 21:06:34 +00:00
Jeff Roberson	51df53213c	Use atomics and a shared object lock to protect the object reference count. Certain consumers still need to guarantee a stable reference so we can not switch entirely to atomics yet. Exclusive lock holders can still modify and examine the refcount without using the ref api. Reviewed by: kib Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21598	2019-10-29 20:58:46 +00:00
Jeff Roberson	4b3e066539	Drop the object lock earlier in fault and don't relock it after pmap_enter(). Recent changes in object and page locking have enabled more lock pushdown. Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D22036	2019-10-29 20:46:25 +00:00
Gleb Smirnoff	42a621624d	Add couple more assertions to vm_pager_assert_in(). The bogus page is not allowed at ends of the request, and all non-bogus pages must be consecutive. Reviewed by: kib	2019-10-25 16:59:54 +00:00
Andrew Gallatin	0dc59d7632	Add a tunable to set the pgcache zone's maxcache When it is set to 0 (the default), a heavy Netflix-style web workload suffers from heavy lock contention on the vm page free queue called from vm_page_zone_{import,release}() as the buckets are frequently drained. When setting the maxcache, this contention goes away. We should eventually try to autotune this, as well as make this zone eligable for uma_reclaim(). Reviewed by: alc, markj Not Objected to by: jeff Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D22112	2019-10-24 18:39:05 +00:00
Mark Johnston	be2c561003	Modify release_page() to handle a missing fault page. r353890 introduced a case where we may call release_page() with fs.m == NULL, since the fault handler may now lock the vnode prior to allocating a page for a page-in. Reported by: jhb Reviewed by: kib MFC with: r353890 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22120	2019-10-23 20:39:21 +00:00
Mark Johnston	2f81c92e55	Check for bogus_page in vnode_pager_generic_getpages_done(). We now assert that a page is busy when updating its validity-tracking state, but bogus_page is not busied during a getpages operation. Reported by: syzkaller Reviewed by: alc, kib Discussed with: jeff MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22124	2019-10-23 18:00:22 +00:00
Mark Johnston	e6f1a58082	Verify identity after checking for WAITFAIL in vm_page_busy_acquire(). A caller that does not guarantee that a page's identity won't change while sleeping for a busy lock must specify either NOWAIT or WAITFAIL. Reported by: syzkaller Reviewed by: alc, kib Discussed with: jeff Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22124	2019-10-23 17:58:19 +00:00
Konstantin Belousov	16b0c09225	Assert that vm_fault_lock_vnode() returns locked saved vnode. Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D22113	2019-10-23 07:36:26 +00:00
Konstantin Belousov	5b87ecc643	Assert that vnode_pager_setsize() is called with the vnode exclusively locked except for filesystems that set the MNTK_VMSETSIZE_BUG, Set the flag for ZFS. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D21883	2019-10-22 16:21:24 +00:00
Konstantin Belousov	208b81bb05	Add VV_VMSIZEVNLOCK flag. The flag specifies that vm_fault() handler should check the vnode' vm_object size under the vnode lock. It is converted into the object' OBJ_SIZEVNLOCK flag in vnode_pager_alloc(). Tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D21883	2019-10-22 16:09:25 +00:00
Konstantin Belousov	0ddd3082a4	vm_fault(): extract code to lock the vnode into a helper vn_fault_lock_vnode(). Tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D21883	2019-10-22 15:59:16 +00:00
Mark Johnston	1de9724e55	Avoid reloading bucket pointers in uma_vm_zone_stats(). The correctness of per-CPU cache accounting in that function is dependent on reading per-CPU pointers exactly once. Ensure that the compiler does not emit multiple loads of those pointers. Reported and tested by: pho Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22081	2019-10-22 14:20:06 +00:00
Mark Johnston	d307bdcc2c	Further constrain the use of per-CPU caches for free pages. In low memory conditions a significant number of pages may end up stuck in the caches, and currently these caches cannot be reaped, leading to spurious memory allocation failures and OOM kills. So: - Take into account the fact that we may cache up to two full buckets of pages per CPU, not just one. - Increase the amount of RAM required per CPU to enable the caches. This is a temporary measure until the page cache management policy is improved. PR: 241048 Reported and tested by: Kevin Oberman <rkoberman@gmail.com> Reviewed by: alc, kib Discussed with: jeff MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22040	2019-10-18 17:36:42 +00:00
Mark Johnston	f822c9e287	Apply mapping protections to preloaded kernel modules on amd64. With an upcoming change the amd64 kernel will map preloaded files RW instead of RWX, so the kernel linker must adjust protections appropriately using pmap_change_prot(). Reviewed by: kib MFC after: 1 month Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21860	2019-10-18 13:56:45 +00:00
Konstantin Belousov	303fa05a1f	swapon_check_swzone(): use already calculated static variables. Submitted by: ota@j.email.ne.jp MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22065	2019-10-17 13:49:47 +00:00
Mark Johnston	01cef4caa7	Remove page locking from pmap_mincore(). After r352110 the page lock no longer protects a page's identity, so there is no purpose in locking the page in pmap_mincore(). Instead, if vm.mincore_mapped is set to the non-default value of 0, re-lookup the page after acquiring its object lock, which holds the page's identity stable. The change removes the last callers of vm_page_pa_tryrelock(), so remove it. Reviewed by: kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21823	2019-10-16 22:03:27 +00:00

1 2 3 4 5 ...

4197 Commits