freebsd-skq

Author	SHA1	Message	Date
markj	094736f08f	Provide separate accounting for user-wired pages. Historically we have not distinguished between kernel wirings and user wirings for accounting purposes. User wirings (via mlock(2)) were subject to a global limit on the number of wired pages, so if large swaths of physical memory were wired by the kernel, as happens with the ZFS ARC among other things, the limit could be exceeded, causing user wirings to fail. The change adds a new counter, v_user_wire_count, which counts the number of virtual pages wired by user processes via mlock(2) and mlockall(2). Only user-wired pages are subject to the system-wide limit which helps provide some safety against deadlocks. In particular, while sources of kernel wirings typically support some backpressure mechanism, there is no way to reclaim user-wired pages shorting of killing the wiring process. The limit is exported as vm.max_user_wired, renamed from vm.max_wired, and changed from u_int to u_long. The choice to count virtual user-wired pages rather than physical pages was done for simplicity. There are mechanisms that can cause user-wired mappings to be destroyed while maintaining a wiring of the backing physical page; these make it difficult to accurately track user wirings at the physical page layer. The change also closes some holes which allowed user wirings to succeed even when they would cause the system limit to be exceeded. For instance, mmap() may now fail with ENOMEM in a process that has called mlockall(MCL_FUTURE) if the new mapping would cause the user wiring limit to be exceeded. Note that bhyve -S is subject to the user wiring limit, which defaults to 1/3 of physical RAM. Users that wish to exceed the limit must tune vm.max_user_wired. Reviewed by: kib, ngie (mlock() test changes) Tested by: pho (earlier version) MFC after: 45 days Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19908	2019-05-13 16:38:48 +00:00
dougm	24c307c3c0	A new parameter to blist_alloc specifies an upper bound on the size of the allocation request, so that the blocks allocated are from the next set of free blocks big enough to satisfy the minimum requirements of the request, and the number of blocks allocated are as many as possible, up to the specified maximum. The implementation of swp_pager_getswapspace uses this parameter to ask for a number of blocks between the new halved request size and the previous failed request size. Thus a request for 32 blocks may fail, but instead of getting only 16 blocks instead, the caller asks for 16 to 31 next, and might get 19 or 27, which is closer to what they originally wanted. I expect this to lead to bigger block allocations and less block fragmentation, at least in some cases. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20001	2019-05-11 16:15:13 +00:00
dougm	1b3da1aa6d	Callers of swp_pager_getswapspace get either as many blocks as they requested, or none, and in the latter case it is up to them to pick a smaller request to make - which they always do by halving the failed request. This change to swp_pager_getswapspace leaves the task of downsizing the request to the function and not its caller. It still does so by halving the original request. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20228	2019-05-11 10:16:43 +00:00
kib	9d98ef7bc1	Noted by: alc Reviewed by: alc, markj (previous version) Sponsored by: The FreeBSD Foundation MFC after: 6 days	2019-05-06 08:46:11 +00:00
kib	2dc0d9edaa	Switch to use shared vnode locks for text files during image activation. kern_execve() locks text vnode exclusive to be able to set and clear VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0 condition. The change removes VV_TEXT, replacing it with the condition v_writecount <= -1, and puts v_writecount under the vnode interlock. Each text reference decrements v_writecount. To clear the text reference when the segment is unmapped, it is recorded in the vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and v_writecount is incremented on the map entry removal The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that v_writecount does not contradict the desired change. vn_writecheck() is now racy and its use was eliminated everywhere except access. Atomic check for writeability and increment of v_writecount is performed by the VOP. vn_truncate() now increments v_writecount around VOP_SETATTR() call, lack of which is arguably a bug on its own. nullfs bypasses v_writecount to the lower vnode always, so nullfs vnode has its own v_writecount correct, and lower vnode gets all references, since object->handle is always lower vnode. On the text vnode' vm object dealloc, the v_writecount value is reset to zero, and deadfs vop_unset_text short-circuit the operation. Reclamation of lowervp always reclaims all nullfs vnodes referencing lowervp first, so no stray references are left. Reviewed by: markj, trasz Tested by: mjg, pho Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D19923	2019-05-05 11:20:43 +00:00
kib	ce1a272ee3	Do not collapse objects with OBJ_NOSPLIT backing swap object. NOSPLIT swap objects are not anonymous, they are used by tmpfs regular files and POSIX shared memory. For such objects, collapse is not permitted. Reported by: mjg Reviewed by: markj, trasz Tested by: mjg, pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D19923	2019-05-05 11:06:19 +00:00
dougm	8bebf2c329	fls() should find the most significant bit of an int faster than a linear search can, so use it to avoid a linear search in isqrt. Approved by: kib (mentor), markj (mentor) Differential Revision: https://reviews.freebsd.org/D20102	2019-05-03 02:55:54 +00:00
kib	f79fcaf038	Fix another race between vm_map_protect() and vm_map_wire(). vm_map_wire() increments entry->wire_count, after that it drops the map lock both for faulting in the entry' pages, and for marking next entry in the requested region as IN_TRANSITION. Only after all entries are faulted in, MAP_ENTRY_USER_WIRE flag is set. This makes it possible for vm_map_protect() to run while other entry' MAP_ENTRY_IN_TRANSITION flag is handled, and vm_map_busy() lock does not prevent it. In particular, if the call to vm_map_protect() adds VM_PROT_WRITE to CoW entry, it would fail to call vm_fault_copy_entry(). There are at least two consequences of the race: the top object in the shadow chain is not populated with writeable pages, and second, the entry eventually get contradictory flags MAP_ENTRY_NEEDS_COPY \| MAP_ENTRY_USER_WIRED with VM_PROT_WRITE set. Handle it by waiting for all MAP_ENTRY_IN_TRANSITION flags to go away in vm_map_protect(), which does not drop map lock afterwards. Note that vm_map_busy_wait() is left as is. Reported and tested by: pho (previous version) Reviewed by: Doug Moore <dougm@rice.edu>, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D20091	2019-05-01 13:15:06 +00:00
markj	056f72107f	Disable vm map consistency checking by default on INVARIANTS kernels. The checks are too expensive for a general-purpose kernel. Enable the checks when DIAGNOSTIC is defined and provide a sysctl to enable the checks in a non-DIAGNOSTIC INVARIANTS kernel. Reviewed by: kib Discussed with: Doug Moore <dougm@rice.edu> MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19999	2019-04-22 11:23:35 +00:00
tychon	e660248c13	for a cache-only zone the destructor tries to destroy a non-existent keg Reviewed by: markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D19835	2019-04-12 12:46:25 +00:00
kib	3a48424552	Fix mis-merge. Amusingly, it is nop. Noted by: trasz Sponsored by: The FreeBSD Foundation MFC after: 1 week X-MFC-rev: r345702	2019-04-05 16:12:35 +00:00
kib	9df1f56292	Eliminate adj_free field from vm_map_entry. Drop the adj_free field from vm_map_entry_t. Refine the max_free field so that p->max_free is the size of the largest gap with one endpoint in the subtree rooted at p. Change vm_map_findspace so that, first, the address-based splay is restricted to tree nodes with large-enough max_free value, to avoid searching for the right starting point in a subtree where all the gaps are too small. Second, when the address search leads to a tree search for the first large-enough gap, that gap is the subject of a splay-search that brings the gap to the top of the tree, so that an immediate insertion will take constant time. Break up the splay code into separate components, one for searching and breaking up the tree and another for reassembling it. Use these components, and not splay itself, for linking and unlinking. Drop the after-where parameter to link, as it is computed as a side-effect of the splay search. Submitted by: Doug Moore <dougm@rice.edu> Reviewed by: markj Tested by: pho MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D17794	2019-03-29 16:53:46 +00:00
trasz	75a305f873	Improve error reporting when the swap pager runs out of memory. Reviewed by: kib MFC after: 2 weeks Sponsored by: Klara Inc. Differential Revision: https://reviews.freebsd.org/D19699	2019-03-26 19:11:15 +00:00
kib	b37c7d4a72	ASLR: check for max_addr after applying randomization, not before. Otherwise resulting address from vm_map_find() migh not satisfy the upper limit. For instance, it could affect MAP_32BIT flag from 64bit processes. Found by: Doug Moore <dougm@rice.edu> Reviewed by: alc, Doug Moore <dougm@rice.edu> Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D19688	2019-03-23 16:36:18 +00:00
markj	1ab80ddad8	Disallow preemptive creation of wired superpage mappings. There are some unusual cases where a process may cause an mlock()ed range of memory to be unmapped. If the application subsequently faults on that region, the handler may attempt to create a superpage mapping backed by the resident, wired pages. However, the pmap code responsible for creating such a mapping (pmap_enter_pde() on i386 and amd64) does not ensure that a leaf page table page is available if the superpage is later demoted; the demotion operation must therefore perform a non-blocking page allocation and must unmap the entire superpage if the allocation fails. The pmap layer ensures that this can never happen for wired mappings, and so the case described above breaks that invariant. For now, simply ensure that the MI fault handler never attempts to create a wired superpage except via promotion. Reviewed by: kib Reported by: syzbot+292d3b0416c27c131505@syzkaller.appspotmail.com MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19670	2019-03-21 19:52:50 +00:00
kib	609c32a75e	vm_fault_copy_entry: accept invalid source pages. Either msync(MS_INVALIDATE) or the object unlock during vnode truncation can expose invalid pages backing wired entries. Accept them, but do not install them into destrination pmap. We must create copied pages in the copy case, because e.g. vm_object_unwire() expects that the entry is fully backed. Reported by: syzkaller, via emaste Reported by: syzbot+514d40ce757a3f8b15bc@syzkaller.appspotmail.com Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D19615	2019-03-20 13:07:57 +00:00
markj	6a69e0551f	Implement minidump support for RISC-V. Submitted by: Mitchell Horne <mhorne063@gmail.com> Differential Revision: https://reviews.freebsd.org/D18320	2019-03-06 00:01:06 +00:00
mjg	a0cdab129c	vm: remove seq.h inclusion made obsolete by NUMA rewrite Sponsored by: The FreeBSD Foundation	2019-02-27 22:42:29 +00:00
jah	6e7cd8fbe0	Fix incorrect assertion in vnode_pager_generic_getpages() Reviewed by: kib, glebius MFC after: 1 week	2019-02-26 04:50:46 +00:00
markj	435f0b6365	Improve vmem tuning for platforms without a direct map. On platforms without a direct map (i.e., platforms without UMA_MD_SMALL_ALLOC defined), the boundary tag allocator reserves a number of tags for use when allocating a new slab of boundary tags, as such platforms require free boundary tags in order to allocate boundary tags. r327899 increased the number of boundary tags required for a KVA allocation in the worst case, and the aforementioned reservation was not updated accordingly. In some cases, this could lead to a system hang. Fix the problem by increasing this reservation. Also reduce KVA_QUANTUM on systems lacking superpage support. The previous import quantum (4MB with a 4KB page size) was quite large for systems with limited KVA, and fragmentation in kernel_arena could cause kernel memory allocation failures even with a substantial amount of free KVA. Reported and tested by: jhibbits Reviewed by: alc, kib No objections: jeff MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19337	2019-02-25 19:22:13 +00:00
markj	00ee058700	Clear pointers to indicate that the respective locks are released. This fixes a problem in r344231: vm_pageout_launder() may scan two queues when swap is disabled. Reported by: pho MFC with: r344231	2019-02-21 15:44:32 +00:00
kib	4adce57d6f	Add kernel support for Intel userspace protection keys feature on Skylake Xeons. See SDM rev. 68 Vol 3 4.6.2 Protection Keys and the description of the RDPKRU and WRPKRU instructions. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D18893	2019-02-20 09:51:13 +00:00
markj	88d649090e	Remove a redundant flag variable. Use the object pointer itself to determine whether the object is locked. No functional change intended. Reviewed by: kib MFC after: 1 week Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19215	2019-02-17 16:35:19 +00:00
glebius	d889424078	For 32-bit machines rollback the default number of vnode pager pbufs back to the lever before r343030. For 64-bit machines reduce it slightly, too. Together with r343030 I bumped the limit up to the value we use at Netflix to serve 100 Gbit/s of sendfile traffic, and it probably isn't a good default. Provide a loader tunable to change vnode pager pbufs count. Document it.	2019-02-15 23:36:22 +00:00
kib	4bb576720e	Make anon clustering more compatible. Make the clustering enabling knob more fine-grained by providing a setting where the allocation with hint is not clustered. This is aimed to be somewhat more compatible with e.g. go 1.4 which expects that hinted mmap without MAP_FIXED does not change the allocation address. Now the vm.cluster_anon can be set to 1 to only cluster when no hints, and to 2 to always cluster. Default value is 1. Requested by: peter Reviewed by: emaste, markj Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D19194	2019-02-14 15:45:53 +00:00
markj	9d5cba36c5	Implement transparent 2MB superpage promotion for RISC-V. This includes support for pmap_enter(..., psind=1) as described in the commit log message for r321378. The changes are largely modelled after amd64. arm64 has more stringent requirements around superpage creation to avoid the possibility of TLB conflict aborts, and these requirements do not apply to RISC-V, which like amd64 permits simultaneous caching of 4KB and 2MB translations for a given page. RISC-V's PTE format includes only two software bits, and as these are already consumed we do not have an analogue for amd64's PG_PROMOTED. Instead, pmap_remove_l2() always invalidates the entire 2MB address range. pmap_ts_referenced() is modified to clear PTE_A, now that we support both hardware- and software-managed reference and dirty bits. Also fix pmap_fault_fixup() so that it does not set PTE_A or PTE_D on kernel mappings. Reviewed by: kib (earlier version) Discussed with: jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18863 Differential Revision: https://reviews.freebsd.org/D18864 Differential Revision: https://reviews.freebsd.org/D18865 Differential Revision: https://reviews.freebsd.org/D18866 Differential Revision: https://reviews.freebsd.org/D18867 Differential Revision: https://reviews.freebsd.org/D18868	2019-02-13 17:19:37 +00:00
pfg	793037178f	UMA: unsign some variables related to allocation in hash_alloc(). As a followup to r343673, unsign some variables related to allocation since the hashsize cannot be negative. This gives a bit more space to handle bigger allocations and avoid some implicit casting. While here also unsign uh_hashmask, it makes little sense to keep that signed. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D19148	2019-02-12 04:33:05 +00:00
kib	e8307d185a	struct xswdev on amd64 requires compat32 shims after ino64. i386 is the only architecture where uint64_t does not specify 8-bytes alignment, which makes struct xswdev layout not compatible between 64bit and i386. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-02-10 19:01:05 +00:00
kib	08849e56ba	Implement Address Space Layout Randomization (ASLR) With this change, randomization can be enabled for all non-fixed mappings. It means that the base address for the mapping is selected with a guaranteed amount of entropy (bits). If the mapping was requested to be superpage aligned, the randomization honours the superpage attributes. Although the value of ASLR is diminshing over time as exploit authors work out simple ASLR bypass techniques, it elimintates the trivial exploitation of certain vulnerabilities, at least in theory. This implementation is relatively small and happens at the correct architectural level. Also, it is not expected to introduce regressions in existing cases when turned off (default for now), or cause any significant maintaince burden. The randomization is done on a best-effort basis - that is, the allocator falls back to a first fit strategy if fragmentation prevents entropy injection. It is trivial to implement a strong mode where failure to guarantee the requested amount of entropy results in mapping request failure, but I do not consider that to be usable. I have not fine-tuned the amount of entropy injected right now. It is only a quantitive change that will not change the implementation. The current amount is controlled by aslr_pages_rnd. To not spoil coalescing optimizations, to reduce the page table fragmentation inherent to ASLR, and to keep the transient superpage promotion for the malloced memory, locality clustering is implemented for anonymous private mappings, which are automatically grouped until fragmentation kicks in. The initial location for the anon group range is, of course, randomized. This is controlled by vm.cluster_anon, enabled by default. The default mode keeps the sbrk area unpopulated by other mappings, but this can be turned off, which gives much more breathing bits on architectures with small address space, such as i386. This is tied with the question of following an application's hint about the mmap(2) base address. Testing shows that ignoring the hint does not affect the function of common applications, but I would expect more demanding code could break. By default sbrk is preserved and mmap hints are satisfied, which can be changed by using the kern.elf{32,64}.aslr.honor_sbrk sysctl. ASLR is enabled on per-ABI basis, and currently it is only allowed on FreeBSD native i386 and amd64 (including compat 32bit) ABIs. Support for additional architectures will be added after further testing. Both per-process and per-image controls are implemented: - procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS; - NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible to force ASLR off for the given binary. (A tool to edit the feature control note is in development.) Global controls are: - kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2); - kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings; - kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2); - vm.cluster_anon - enables anon mapping clustering. PR: 208580 (exp runs) Exp-runs done by: antoine Reviewed by: markj (previous version) Discussed with: emaste Tested by: pho MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D5603	2019-02-10 17:19:45 +00:00
kib	d0cb5e667f	i386: honor kern.elf32.read_exec for ommap(2) and break(2), as already done on amd64. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-02-09 03:56:48 +00:00
kib	32c9348f1f	Normalize the declaration of i386_read_exec variable. It is currently re-declared in sys/sysent.h which is a wrong place for MD variable. Which causes redeclaration error with gcc when sys/sysent.h and machine/md_var.h are included both. Remove it from sys/sysent.h and instead include machine/md_var.h when needed, under #ifdef for both i386 and amd64. Reported and tested by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-02-09 03:51:51 +00:00
glebius	38bb4995d7	Now that there is only one way to allocate a slab, remove uz_slab method. Discussed with: jeff	2019-02-07 03:55:05 +00:00
glebius	e502a5bda0	Report cache zones in UMA stats sysctl, that 'vmstat -z' uses. This should had been part of r251826.	2019-02-07 03:32:45 +00:00
kib	7baa661f36	contigmalloc: handle M_EXEC. Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D19092	2019-02-07 02:00:23 +00:00
markj	93cad7dba5	Allow vm_page_free_prep() to dequeue pages without the page lock. This is a step towards being able to free pages without the page lock held. The approach is simply to add an implementation of vm_page_dequeue_deferred() which does not assert that the page lock is held. Formally, the page lock is required to set PGA_DEQUEUE, but in the case of vm_page_free_prep() we get the same mutual exclusion for free by virtue of the fact that no other references to the page may exist. No functional change intended. Reviewed by: kib (previous version) MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19065	2019-02-03 18:43:20 +00:00
markj	ae1284109d	Fix a race in vm_page_dequeue_deferred(). To detect the case where the page is already marked for a deferred dequeue, we must read the "queue" and "aflags" fields in a precise order. Otherwise, a race with a concurrent vm_page_dequeue_complete() could leave the page with PGA_DEQUEUE set despite it already having been dequeued. Fix the problem by using vm_page_queue() to check the queue state, which correctly handles the race. Reviewed by: kib Tested by: pho MFC after: 3 days Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19039	2019-02-03 18:38:58 +00:00
mav	7aad216459	Fix integer math overflow in UMA hash_alloc(). 512GB of ZFS ABD ARC means abd_chunk zone of 128M 4KB items. To manage them UMA tries to allocate 2GB hash table, which size does not fit into the int variable, causing later allocation failure, which makes ARC shrink back below the 512GB, not letting it to use more RAM. With this change I easily reached >700GB ARC size on 768GB RAM machine. MFC after: 1 week Sponsored by: iXsystems, Inc.	2019-02-02 04:11:59 +00:00
glebius	058f928a27	In zone_alloc_bucket() max argument was calculated based on uz_count. Then bucket_alloc() also selects bucket size based on uz_count. However, since zone lock is dropped, uz_count may reduce. In this case max may be greater than ub_entries and that would yield into writing beyond end of the allocation. Reported by: pho	2019-01-31 17:52:48 +00:00
markj	bbf6b587b5	Correct uma_prealloc()'s use of domainset iterators after r339925. The iterator should be reinitialized after every successful slab allocation. A request to advance the iterator is interpreted as an allocation failure, so a sufficiently large preallocation would cause the iterator to believe that all domains were exhausted, resulting in a sleep with the keg lock held. [1] Also, keg_alloc_slab() should pass the unmodified wait flag to the item initialization routine, which may use it to perform allocations from other zones. Reported and tested by: slavah Diagnosed by: kib [1] Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation	2019-01-23 18:58:15 +00:00
kib	db5030058f	MI VM: Make it possible to set size of superpage at boot instead of compile time. In order to allow single kernel to use PAE pagetables on i386 if hardware supports it, and fall back to classic two-level paging structures if not, superpage code should be able to adopt to either 2M or 4M superpages size. There I make MI VM structures large enough to track the biggest possible superpage, by allowing architecture to define VM_NFREEORDER_MAX and VM_LEVEL_0_ORDER_MAX constants. Corresponding VM_NFREEORDER and VM_LEVEL_0_ORDER symbols can be defined as runtime values and must be less than the _MAX constants. If architecture does not define _MAXs, it is assumed that _MAX == normal constant. Reviewed by: markj Tested by: pho (as part of the larger patch) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D18853	2019-01-18 13:35:06 +00:00
glebius	39991209c1	Do not reserve KVA for paging bufs in vm_ksubmap_init(), since now they allocate it in pbuf_init(). This should have been done together with r343030.	2019-01-16 20:14:16 +00:00
kib	f34dbcf7d0	Implement shmat(2) flag SHM_REMAP. Based on the description in Linux man page. Reviewed by: markj, ngie (previous version) Sponsored by: Mellanox Technologies MFC after: 1 week Differential revision: https://reviews.freebsd.org/D18837	2019-01-16 05:15:57 +00:00
glebius	7de9bd7cd4	Whitespace.	2019-01-16 04:02:08 +00:00
glebius	37fd65a0e1	Fix compilation failures on different arches that have vm_machdep.c not aware of counter_u64_t by including counter.h into uma_int.h. I'm not happy about this inclusion, but it fixes compilation ASAP.	2019-01-15 19:33:47 +00:00
glebius	25c8562625	style(9): break long line.	2019-01-15 18:50:11 +00:00
glebius	7546fbe224	Remove harmless leftover from code that cycles over zone's kegs. Just use + instead of +=. There is no functional change.	2019-01-15 18:49:31 +00:00
glebius	8c4ff6ac75	Only do uz_items accounting for zones that have a limit set in uz_max_items. This reduces amount of locking required for these zones. Also, for cache only zones (UMA_ZFLAG_CACHE) accounting uz_items wasn't correct at all, since they may allocate items directly from their backing store and then free them via UMA underflowing uz_items. Tested by: pho	2019-01-15 18:32:26 +00:00
glebius	60d5d98bc3	Make uz_allocs, uz_frees and uz_fails counter(9). This removes some atomic updates and reduces amount of data protected by zone lock. During startup point these fields to EARLY_COUNTER. After startup allocate them for all early zones. Tested by: pho	2019-01-15 18:24:34 +00:00
glebius	dd582940fe	Fix compilation on 32-bit.	2019-01-15 03:43:46 +00:00
glebius	7ee1aa34d4	Allocate pager bufs from UMA instead of 80-ish mutex protected linked list. o In vm_pager_bufferinit() create pbuf_zone and start accounting on how many pbufs are we going to have set. In various subsystems that are going to utilize pbufs create private zones via call to pbuf_zsecond_create(). The latter calls uma_zsecond_create(), and sets a limit on created zone. After startup preallocate pbufs according to requirements of all pbuf zones. Subsystems that used to have a private limit with old allocator now have private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS cluster, FFS, swap, vnode pager. The following subsystems use shared pbuf zone: cam(4), nvme(4), physio(9), aio(4). They should have their private limits, but changing that is out of scope of this commit. o Fetch tunable value of kern.nswbuf from init_param2() and while here move NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that was holding only this option. Default values aren't touched by this commit, but they probably should be reviewed wrt to modern hardware. This change removes a tight bottleneck from sendfile(2) operation, that uses pbufs in vnode pager. Other pagers also would benefit from faster allocation. Together with: gallatin Tested by: pho	2019-01-15 01:02:16 +00:00

1 2 3 4 5 ...

4225 Commits