freebsd-nq

Author	SHA1	Message	Date
Alan Cox	087a613247	Exploit r288122 to address a cosmetic issue. Since the pages allocated by noobj_alloc() don't belong to a vm object, they can't be paged out. Since they can't be paged out, they are never enqueued in a paging queue. Nonetheless, passing PQ_INACTIVE to vm_page_unwire() creates the appearance that these pages are being enqueued in the inactive queue. As of r288122, we can avoid giving this false impression by passing PQ_NONE. Submitted by: kmacy Differential Revision: https://reviews.freebsd.org/D1674	2015-09-26 17:45:10 +00:00
Alan Cox	15aaea7892	Change vm_page_unwire() such that it (1) accepts PQ_NONE as the specified queue and (2) returns a Boolean indicating whether the page's wire count transitioned to zero. Exploit this change in vfs_vmio_release() to avoid pointlessly enqueueing a page that is about to be freed. (An earlier version of this change was developed by attilio@ and kmacy@. Any errors in this version are my own.) Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2015-09-22 18:16:52 +00:00
Alan Cox	d9347bca9a	Correct a non-fatal error in vm_pageout_worker(). vm_pageout_worker() should not assume that vm_pages_needed will remain set while it sleeps. Other threads can clear vm_pages_needed by performing a sufficient number of vm_page_free() calls, e.g., process termination. The effect of this error was that vm_pageout_worker() would free and/or launder pages when, in fact, there was no shortage of free pages. Rewrite a nearby comment to describe all of the possible cases and not just the most common case. The problem being that the comment made the most common case seem like the only case. Reviewed by: kib MFC after: 1 week Sponsored by: EMC / Isilon Storage Division	2015-09-20 19:20:03 +00:00
Alan Cox	c9af644e5c	Eliminate (many) unnecessary calls to pmap_remove_all(). Pages from objects with a reference count of zero can't possibly be mapped, so there is never a need for vm_page_set_invalid() to call pmap_remove_all() on them. Reviewed by: kib MFC after: 1 week Sponsored by: EMC / Isilon Storage Division	2015-09-17 22:28:38 +00:00
Mark Johnston	d73ce4c698	Remove the v_cache_min and v_cache_max sysctls. They are unused and have no effect. Reviewed by: alc Sponsored by: EMC / Isilon Storage Division	2015-09-11 03:00:20 +00:00
Konstantin Belousov	b8db977617	Remove a check which caused spurious SIGSEGV on usermode access to the mapped address without valid pte installed, when parallel wiring of the entry happen. The entry must be copy on write. If entry is COW but was already copied, and parallel wiring set MAP_ENTRY_IN_TRANSITION, vm_fault() would sleep waiting for the MAP_ENTRY_IN_TRANSITION flag to clear. After that, the fault handler is restarted and vm_map_lookup() or vm_map_lookup_locked() trip over the check. Note that this is race, if the address is accessed after the wiring is done, the entry does not fault at all. There is no reason in the current kernel to disallow write access to the COW wired entry if the entry permissions allow it. Initially this was done in r24666, since that kernel did not supported proper copy-on-write for wired text, which was fixed in r199869. The r251901 revision re-introduced the r24666 fix for the current VM. Note that write access must clear MAP_ENTRY_NEEDS_COPY entry flag by performing COW. In reverse, when MAP_ENTRY_NEEDS_COPY is set in vmspace_fork(), the MAP_ENTRY_USER_WIRED flag is cleared. Put the assert stating the invariant, instead of returning the error. Reported and debugging help by: peter Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-09-09 06:19:33 +00:00
Warner Losh	9e3e3fe5b3	The swap pager is compatible with direct dispatch. It does its own locking and doesn't sleep. Flag the consumer we create as such. In addition, decrement the in flight index when we have an out of memory error after having incremented it previously. This would have prevented swapoff from working if the swap pager ever hit a resource shortage trying to swap out something (the swap in path always waits for a bio, so won't have this issue). Simplify the close logic by abandoning the use of private and initializing the index to 1 and dropping that reference when we previously set private. Also, set sw_id only while sw_dev_mtx is held. This should only affect swapping to a vnode, as opposed to a geom whose close always sets it to NULL with sw_dev_mtx held. Differential Review: https://reviews.freebsd.org/D3547	2015-09-08 17:47:56 +00:00
Alan Cox	27a9fb2fc2	To simplify upcoming changes to the inactive queue scan, change the code so that there is only one place where pages are freed and only one place where pages are moved to the tail of the queue. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2015-09-08 04:18:57 +00:00
Alan Cox	960810ccea	Eliminate pointless requeueing of pages from terminated objects. These pages will have left the inactive queue before the page daemon performs its next scan. Also, ignore references to pages from terminated objects. This allows the clean pages to be freed a little sooner. Move some comments to their proper place, i.e., next to the code that they describe, and update other nearby comments. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2015-09-05 17:34:49 +00:00
Mateusz Guzik	19c591bfe2	Don't trash memory from UMA_ZONE_NOFREE zones. Objects obtained from such zones are supposed to retain type stability, which was violated by aforementioned trashing. This is a follow-up to r284861. Discussed with: kib	2015-09-02 23:09:01 +00:00
Alan Cox	a3aeedabb4	Handle held pages earlier in the inactive queue scan. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2015-09-01 06:21:12 +00:00
Mark Johnston	c25fabea97	Remove weighted page handling from vm_page_advise(). This was added in r51337 as part of the implementation of madvise(MADV_DONTNEED). Its objective was to ensure that the page daemon would eventually reclaim other unreferenced pages (i.e., unreferenced pages not touched by madvise()) from the active queue. Now that the pagedaemon performs steady scanning of the active page queue, this weighted handling is unnecessary. Instead, always "cache" clean pages by moving them to the head of the inactive page queue. This simplifies the implementation of vm_page_advise() and eliminates the fragmentation that resulted from the distribution of pages among multiple queues. Suggested by: alc Reviewed by: alc Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3401	2015-08-28 00:44:17 +00:00
Alan Cox	40aa80a7c2	In vm_pageout_scan(), simplify the logic for determining if a page can be paged out and apply some nearby style fixes. In collaboration with: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation, EMC / Isilon Storage Division	2015-08-27 20:38:45 +00:00
Alan Cox	eb5d39694e	Testing whether a page is dirty does not require the page lock. Moreover, it may involve a pmap operation that iterates over the page's PV list, so unnecessarily holding the page lock is undesirable. MFC after: 1 week Sponsored by: EMC / Isilon Storage Division	2015-08-25 01:01:25 +00:00
Mark Murray	e866d8f05b	Make the UMA harvesting go away completely if not wanted. Default to "not wanted". Provide and document the RANDOM_ENABLE_UMA option. Change RANDOM_FAST to RANDOM_UMA to clarify the harvesting. Remove RANDOM_DEBUG option, replace with SDT probes. These will be of use to folks measuring the harvesting effect when deciding whether to use RANDOM_ENABLE_UMA. Requested by: scottl and others. Approved by: so (/dev/random blanket) Differential Revision: https://reviews.freebsd.org/D3197	2015-08-22 12:59:05 +00:00
Alan Cox	77923df2c1	Eliminate pointless assignments to rtvals[] in swap_pager_putpages(). Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2015-08-21 17:00:39 +00:00
Ryan Stone	a6bf3a9ef6	Prevent ticks rollover from preventing vm_lowmem event Currently vm_pageout_scan() uses a ticks-based scheme to rate-limit the number of times that the vm_lowmem event will happen. However if no events happen for long enough for ticks to roll over, this leaves us in a long window in which vm_lowmem events will not happen. Replace the use of ticks with time_t to prevent rollover from ever being an issue. Reviewed by: ian MFC after: 3 weeks Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3439	2015-08-20 20:28:51 +00:00
Andrew Turner	52afd687c3	Add the kernel support for minidumps on arm64. Obtained from: ABT Systems Ltd Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3318	2015-08-20 12:49:56 +00:00
Alan Cox	f9b11500c2	As another piece of PG_CACHE page elimination, remove an LRU-defeating call to vm_page_try_to_cache() from vm_pageout_flush(). Other changes, most recently r286814, have made this call unnecessary. Reviewed by: kib Discussed with: jeff Tested by: pho Sponsored by: EMC / Isilon Storage Division	2015-08-16 17:07:53 +00:00
Konstantin Belousov	edc8222303	Make kstack_pages a tunable on arm, x86, and powepc. On i386, the initial thread stack is not adjusted by the tunable, the stack is allocated too early to get access to the kernel environment. See TD0_KSTACK_PAGES for the thread0 stack sizing on i386. The tunable was tested on x86 only. From the visual inspection, it seems that it might work on arm and powerpc. The arm USPACE_SVC_STACK_TOP and powerpc USPACE macros seems to be already incorrect for the threads with non-default kstack size. I only changed the macros to use variable instead of constant, since I cannot test. On arm64, mips and sparc64, some static data structures are sized by KSTACK_PAGES, so the tunable is disabled. Sponsored by: The FreeBSD Foundation MFC after: 2 week	2015-08-10 17:18:21 +00:00
Zbigniew Bodek	9ba30bcb42	Avoid sign extension of value passed to kva_alloc from uma_zone_reserve_kva Fixes "panic: vm_radix_reserve_kva: unable to reserve KVA" caused by sign extention of "pages * UMA_SLAB_SIZE" value passed to kva_alloc() which takes unsigned long argument. In the erroneus case that triggered this bug, the number of pages to allocate in uma_zone_reserve_kva() was 0x8ebe6, that gave the total number of bytes to allocate equal to 0x8ebe6000 (int). This was then sign extended in kva_alloc() to 0xffffffff8ebe6000 (unsigned long). Reviewed by: alc, kib Submitted by: Zbigniew Bodek <zbb@semihalf.com> Obtained from: Semihalf Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3346	2015-08-10 17:16:49 +00:00
Alan Cox	e0a63baae4	Introduce a sysctl for reporting the number of fully populated reservations.	2015-08-06 21:27:50 +00:00
Jason A. Harmening	0a3e154709	Properly sort the function declarations added in r286296 Submitted by: alc Approved by: kib (mentor)	2015-08-05 10:48:32 +00:00
Jason A. Harmening	713841afb2	Add two new pmap functions: vm_offset_t pmap_quick_enter_page(vm_page_t m) void pmap_quick_remove_page(vm_offset_t kva) These will create and destroy a temporary, CPU-local KVA mapping of a specified page. Guarantees: --Will not sleep and will not fail. --Safe to call under a non-sleepable lock or from an ithread Restrictions: --Not guaranteed to be safe to call from an interrupt filter or under a spin mutex on all platforms --Current implementation does not guarantee more than one page of mapping space across all platforms. MI code should not make nested calls to pmap_quick_enter_page. --MI code should not perform locking while holding onto a mapping created by pmap_quick_enter_page The idea is to use this in busdma, for bounce buffer copies as well as virtually-indexed cache maintenance on mips and arm. NOTE: the non-i386, non-amd64 implementations of these functions still need review and testing. Reviewed by: kib Approved by: kib (mentor) Differential Revision: http://reviews.freebsd.org/D3013	2015-08-04 19:46:13 +00:00
Alan Cox	d8015db3b5	Refinements to r281079's sequential access optimization: Prefetched pages, which constitute the majority of the pages that are processed by vm_fault_dontneed(), are already near the tail of the inactive queue. Only the pages at faulting virtual addresses are actually moved by vm_page_advise(..., MADV_DONTNEED). However, vm_page_advise(..., MADV_DONTNEED) is simultaneously too aggressive and passive for the moved pages. It makes most of these pages too easily reclaimable, and at the same time it leaves enough pages in the active queue to trigger pageouts by the page daemon. Instead, with this change, the pages at faulting virtual addresses are moved to the tail of the inactive queue, where they are relatively close to the pages prefetched by the same page fault. Discussed with: jeff Sponsored by: EMC / Isilon Storage Division	2015-08-03 20:30:27 +00:00
Konstantin Belousov	6a875bf929	Do not pretend that vm_fault(9) supports unwiring the address. Rename the VM_FAULT_CHANGE_WIRING flag to VM_FAULT_WIRE. Assert that the flag is only passed when faulting on the wired map entry. Remove the vm_page_unwire() call, which should be never reachable. Since VM_FAULT_WIRE flag implies wired map entry, the TRYPAGER() macro is reduced to the testing of the fs.object having a default pager. Inline the check. Suggested and reviewed by: alc Tested by: pho (previous version) MFC after: 1 week	2015-07-30 18:28:34 +00:00
Jeff Roberson	98082691bb	- Make 'struct buf *buf' private to vfs_bio.c. Having a global variable 'buf' is inconvenient and has lead me to some irritating to discover bugs over the years. It also makes it more challenging to refactor the buf allocation system. - Move swbuf and declare it as an extern in vfs_bio.c. This is still not perfect but better than it was before. - Eliminate the unused ffs function that relied on knowledge of the buf array. - Move the shutdown code that iterates over the buf array into vfs_bio.c. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2015-07-29 02:26:57 +00:00
Konstantin Belousov	6195b24a79	Revert r173708's modifications to vm_object_page_remove(). Assume that a vnode is mapped shared and mlocked(), and then the vnode is truncated, or truncated and then again extended past the mapping point EOF. Truncation removes the pages past the truncation point, and if pages are later created at this range, they are not properly mapped into the mlocked region, and their wiring count is wrong. The revert leaves the invalidated but wired pages on the object queue, which means that the pages are found by vm_object_unwire() when the mapped range is munlock()ed, and reused by the buffer cache when the vnode is extended again. The changes in r173708 were required since then vm_map_unwire() looked at the page tables to find the page to unwire. This is no longer needed with the vm_object_unwire() introduction, which follows the objects shadow chain. Also eliminate OBJPR_NOTWIRED flag for vm_object_page_remove(), which is now redundand, we do not remove wired pages. Reported by: trasz, Dmitry Sivachenko <trtrmitya@gmail.com> Suggested and reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-07-25 18:29:06 +00:00
Jeff Roberson	fade8dd714	Refactor unmapped buffer address handling. - Use pointer assignment rather than a combination of pointers and flags to switch buffers between unmapped and mapped. This eliminates multiple flags and generally simplifies the logic. - Eliminate b_saveaddr since it is only used with pager bufs which have their b_data re-initialized on each allocation. - Gather up some convenience routines in the buffer cache for manipulating buf space and buf malloc space. - Add an inline, buf_mapped(), to standardize checks around unmapped buffers. In collaboration with: mlaier Reviewed by: kib Tested by: pho (many small revisions ago) Sponsored by: EMC / Isilon Storage Division	2015-07-23 19:13:41 +00:00
Adrian Chadd	6520495abc	Add an initial NUMA affinity/policy configuration for threads and processes. This is based on work done by jeff@ and jhb@, as well as the numa.diff patch that has been circulating when someone asks for first-touch NUMA on -10 or -11. * Introduce a simple set of VM policy and iterator types. * tie the policy types into the vm_phys path for now, mirroring how the initial first-touch allocation work was enabled. * add syscalls to control changing thread and process defaults. * add a global NUMA VM domain policy. * implement a simple cascade policy order - if a thread policy exists, use it; if a process policy exists, use it; use the default policy. * processes inherit policies from their parent processes, threads inherit policies from their parent threads. * add a simple tool (numactl) to query and modify default thread/process policities. * add documentation for the new syscalls, for numa and for numactl. * re-enable first touch NUMA again by default, as now policies can be set in a variety of methods. This is only relevant for very specific workloads. This doesn't pretend to be a final NUMA solution. The previous defaults in -HEAD (with MAXMEMDOM set) can be achieved by 'sysctl vm.default_policy=rr'. This is only relevant if MAXMEMDOM is set to something other than 1. Ie, if you're using GENERIC or a modified kernel with non-NUMA, then this is a glorified no-op for you. Thank you to Norse Corp for giving me access to rather large (for FreeBSD!) NUMA machines in order to develop and verify this. Thank you to Dell for providing me with dual socket sandybridge and westmere v3 hardware to do NUMA development with. Thank you to Scott Long at Netflix for providing me with access to the two-socket, four-domain haswell v3 hardware. Thank you to Peter Holm for running the stress testing suite against the NUMA branch during various stages of development! Tested: * MIPS (regression testing; non-NUMA) * i386 (regression testing; non-NUMA GENERIC) * amd64 (regression testing; non-NUMA GENERIC) * westmere, 2 socket (thankyou norse!) * sandy bridge, 2 socket (thankyou dell!) * ivy bridge, 2 socket (thankyou norse!) * westmere-EX, 4 socket / 1TB RAM (thankyou norse!) * haswell, 2 socket (thankyou norse!) * haswell v3, 2 socket (thankyou dell) * haswell v3, 2x18 core (thankyou scott long / netflix!) * Peter Holm ran a stress test suite on this work and found one issue, but has not been able to verify it (it doesn't look NUMA related, and he only saw it once over many testing runs.) * I've tested bhyve instances running in fixed NUMA domains and cpusets; all seems to work correctly. Verified: * intel-pcm - pcm-numa.x and pcm-memory.x, whilst selecting different NUMA policies for processes under test. Review: This was reviewed through phabricator (https://reviews.freebsd.org/D2559) as well as privately and via emails to freebsd-arch@. The git history with specific attributes is available at https://github.com/erikarn/freebsd/ in the NUMA branch (https://github.com/erikarn/freebsd/compare/local/adrian_numa_policy). This has been reviewed by a number of people (stas, rpaulo, kib, ngie, wblock) but not achieved a clear consensus. My hope is that with further exposure and testing more functionality can be implemented and evaluated. Notes: * The VM doesn't handle unbalanced domains very well, and if you have an overly unbalanced memory setup whilst under high memory pressure, VM page allocation may fail leading to a kernel panic. This was a problem in the past, but it's much more easily triggered now with these tools. * This work only controls the path through vm_phys; it doesn't yet strongly/predictably affect contigmalloc, KVA placement, UMA, etc. So, driver placement of memory isn't really guaranteed in any way. That's next on my plate. Sponsored by: Norse Corp, Inc.; Dell	2015-07-11 15:21:37 +00:00
Alan Cox	22cf98d1f3	The intention of r254304 was to scan the active queue continuously. However, I've observed the active queue scan stopping when there are frequent free page shortages and the inactive queue is steadily refilled by other mechanisms, such as the sequential access heuristic in vm_fault() or madvise(2). To remedy this problem, record the time of the last active queue scan, and always scan a number of pages proportional to the time since the last scan, regardless of whether that last scan was a timeout-triggered ("pass == 0") or free-page-shortage-triggered ("pass > 0") scan. Also, on a timeout-triggered scan, allow a full scan of the active queue when the system is short of inactive pages. Reviewed by: kib MFC after: 6 weeks Sponsored by: EMC / Isilon Storage Division	2015-07-08 17:45:59 +00:00
Mark Johnston	010ba3842c	Add a local variable initialization needed in the OBJT_DEFAULT case. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D2992	2015-07-05 22:26:19 +00:00
Mateusz Guzik	cd336bad26	vm: don't lock proc around accesses to vm_{t,d}addr and RLIMIT_DATA in sys_mmap vm_{t,d}addr are constant and we can use thread's copy of resource limits	2015-07-02 18:30:12 +00:00
Konstantin Belousov	be930a2021	Account for the main process stack being one page below the highest user address when ABI uses shared page. Note that the change is no-op for correctness, since shared page does not fault. The mapping for the shared page is installed at the address space creation, the page is unmanaged and its pte/pv entry cannot be reclaimed. Submitted by: Oliver Pinter Review: https://reviews.freebsd.org/D2954 MFC after: 1 week	2015-07-02 15:22:13 +00:00
Mark Murray	d1b06863fb	Huge cleanup of random(4) code. * GENERAL - Update copyright. - Make kernel options for RANDOM_YARROW and RANDOM_DUMMY. Set neither to ON, which means we want Fortuna - If there is no 'device random' in the kernel, there will be NO random(4) device in the kernel, and the KERN_ARND sysctl will return nothing. With RANDOM_DUMMY there will be a random(4) that always blocks. - Repair kern.arandom (KERN_ARND sysctl). The old version went through arc4random(9) and was a bit weird. - Adjust arc4random stirring a bit - the existing code looks a little suspect. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Redo read_random(9) so as to duplicate random(4)'s read internals. This makes it a first-class citizen rather than a hack. - Move stuff out of locked regions when it does not need to be there. - Trim RANDOM_DEBUG printfs. Some are excess to requirement, some behind boot verbose. - Use SYSINIT to sequence the startup. - Fix init/deinit sysctl stuff. - Make relevant sysctls also tunables. - Add different harvesting "styles" to allow for different requirements (direct, queue, fast). - Add harvesting of FFS atime events. This needs to be checked for weighing down the FS code. - Add harvesting of slab allocator events. This needs to be checked for weighing down the allocator code. - Fix the random(9) manpage. - Loadable modules are not present for now. These will be re-engineered when the dust settles. - Use macros for locks. - Fix comments. * src/share/man/... - Update the man pages. * src/etc/... - The startup/shutdown work is done in D2924. * src/UPDATING - Add UPDATING announcement. * src/sys/dev/random/build.sh - Add copyright. - Add libz for unit tests. * src/sys/dev/random/dummy.c - Remove; no longer needed. Functionality incorporated into randomdev.. live_entropy_sources.c live_entropy_sources.h - Remove; content moved. - move content to randomdev.[ch] and optimise. * src/sys/dev/random/random_adaptors.c src/sys/dev/random/random_adaptors.h - Remove; plugability is no longer used. Compile-time algorithm selection is the way to go. * src/sys/dev/random/random_harvestq.c src/sys/dev/random/random_harvestq.h - Add early (re)boot-time randomness caching. * src/sys/dev/random/randomdev_soft.c src/sys/dev/random/randomdev_soft.h - Remove; no longer needed. * src/sys/dev/random/uint128.h - Provide a fake uint128_t; if a real one ever arrived, we can use that instead. All that is needed here is N=0, N++, N==0, and some localised trickery is used to manufacture a 128-bit 0ULLL. * src/sys/dev/random/unit_test.c src/sys/dev/random/unit_test.h - Improve unit tests; previously the testing human needed clairvoyance; now the test will do a basic check of compressibility. Clairvoyant talent is still a good idea. - This is still a long way off a proper unit test. * src/sys/dev/random/fortuna.c src/sys/dev/random/fortuna.h - Improve messy union to just uint128_t. - Remove unneeded 'static struct fortuna_start_cache'. - Tighten up up arithmetic. - Provide a method to allow eternal junk to be introduced; harden it against blatant by compress/hashing. - Assert that locks are held correctly. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Turn into self-sufficient module (no longer requires randomdev_soft.[ch]) * src/sys/dev/random/yarrow.c src/sys/dev/random/yarrow.h - Improve messy union to just uint128_t. - Remove unneeded 'staic struct start_cache'. - Tighten up up arithmetic. - Provide a method to allow eternal junk to be introduced; harden it against blatant by compress/hashing. - Assert that locks are held correctly. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Turn into self-sufficient module (no longer requires randomdev_soft.[ch]) - Fix some magic numbers elsewhere used as FAST and SLOW. Differential Revision: https://reviews.freebsd.org/D2025 Reviewed by: vsevolod,delphij,rwatson,trasz,jmg Approved by: so (delphij)	2015-06-30 17:00:45 +00:00
John-Mark Gurney	afc6dc3669	If INVARIANTS is specified, add ctor/dtor to junk memory if they are unspecified... Submitted by: Suresh Gumpula at Netapp Differential Revision: https://reviews.freebsd.org/D2725	2015-06-25 20:44:46 +00:00
Alan Cox	aa04413540	Avoid pmap_is_modified() on pages that can't be mapped. MFC after: 1 week Sponsored by: EMC / Isilon Storage Division	2015-06-21 01:22:35 +00:00
Gleb Smirnoff	093ebe1d28	o Un-inline vm_pager_get_pages(), vm_pager_get_pages_async(). o Provide an extensive set of assertions for input array of pages. o Remove now duplicate assertions from different pagers. Sponsored by: Nginx, Inc. Sponsored by: Netflix	2015-06-17 22:44:27 +00:00
Konstantin Belousov	776f729c86	Invalid pages do not need neither update of the activation count nor they coould be dirty. Move the handling if the invalid pages in the inactive scan earlier. Remove some code duplication in the scan by introducing the 'drop_page' label, which centralizes the object and the page unlock. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-06-14 20:23:41 +00:00
Alan Cox	78afdce6af	As the next step in eliminating PG_CACHE pages, free rather than cache pages in vm_pageout_scan(). The reactivation rate of cache pages created by vm_pageout_scan() is extremely low; typically no more than 0.5% to 2.25% of the pages are ever reactivated. At the same time, caching pages is more expensive than freeing them. For example, in a test with PostgreSQL, this change reduced the amount of time spent in the inactive queue scan by 1/6. Differential Revision: https://reviews.freebsd.org/D2805 Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2015-06-14 05:23:39 +00:00
Gleb Smirnoff	093c7f396d	Make KPI of vm_pager_get_pages() more strict: if a pager changes a page in the requested array, then it is responsible for disposition of previous page and is responsible for updating the entry in the requested array. Now consumers of KPI do not need to re-lookup the pages after call to vm_pager_get_pages(). Reviewed by: kib Sponsored by: Netflix Sponsored by: Nginx, Inc.	2015-06-12 11:32:20 +00:00
Mateusz Guzik	f6f6d24062	Implement lockless resource limits. Use the same scheme implemented to manage credentials. Code needing to look at process's credentials (as opposed to thred's) is provided with *_proc variants of relevant functions. Places which possibly had to take the proc lock anyway still use the proc pointer to access limits.	2015-06-10 10:48:12 +00:00
Alan Cox	5cd7a4f76c	Correct a type error in kmem_unback(). Previously, kmem_unback() did not correctly handle deallocation requests of two or more gigabytes in size. Eventually, this would lead to a panic elsewhere in the kernel, such as "vm_radix_insert: key <vm_pindex_t> is already present". Reported by: Ilias Marinos MFC after: 1 week	2015-06-10 05:17:14 +00:00
Alan Cox	966272ca33	Retire VM_FREEPOOL_CACHE as the next step in eliminating PG_CACHE pages. Differential Revision: https://reviews.freebsd.org/D2712 Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2015-06-08 04:59:32 +00:00
John Baldwin	7077c42623	Add a new file operations hook for mmap operations. File type-specific logic is now placed in the mmap hook implementation rather than requiring it to be placed in sys/vm/vm_mmap.c. This hook allows new file types to support mmap() as well as potentially allowing mmap() for existing file types that do not currently support any mapping. The vm_mmap() function is now split up into two functions. A new vm_mmap_object() function handles the "back half" of vm_mmap() and accepts a referenced VM object to map rather than a (handle, handle_type) tuple. vm_mmap() is now reduced to converting a (handle, handle_type) tuple to a a VM object and then calling vm_mmap_object() to handle the actual mapping. The vm_mmap() function remains for use by other parts of the kernel (e.g. device drivers and exec) but now only supports mapping vnodes, character devices, and anonymous memory. The mmap() system call invokes vm_mmap_object() directly with a NULL object for anonymous mappings. For mappings using a file descriptor, the descriptors fo_mmap() hook is invoked instead. The fo_mmap() hook is responsible for performing type-specific checks and adjustments to arguments as well as possibly modifying mapping parameters such as flags or the object offset. The fo_mmap() hook routines then call vm_mmap_object() to handle the actual mapping. The fo_mmap() hook is optional. If it is not set, then fo_mmap() will fail with ENODEV. A fo_mmap() hook is implemented for regular files, character devices, and shared memory objects (created via shm_open()). While here, consistently use the VM_PROT_* constants for the vm_prot_t type for the 'prot' variable passed to vm_mmap() and vm_mmap_object() as well as the vm_mmap_vnode() and vm_mmap_cdev() helper routines. Previously some places were using the mmap()-specific PROT_* constants instead. While this happens to work because PROT_xx == VM_PROT_xx, using VM_PROT_* is more correct. Differential Revision: https://reviews.freebsd.org/D2658 Reviewed by: alc (glanced over), kib MFC after: 1 month Sponsored by: Chelsio	2015-06-04 19:41:15 +00:00
Eric van Gyzen	63e4c6cdf9	Provide vnode in memory map info for files on tmpfs When providing memory map information to userland, populate the vnode pointer for tmpfs files. Set the memory mapping to appear as a vnode type, to match FreeBSD 9 behavior. This fixes the use of tmpfs files with the dtrace pid provider, procstat -v, procfs, linprocfs, pmc (pmcstat), and ptrace (PT_VM_ENTRY). Submitted by: Eric Badger <eric@badgerio.us> (initial revision) Obtained from: Dell Inc. PR: 198431 MFC after: 2 weeks Reviewed by: jhb Approved by: kib (mentor)	2015-06-02 18:37:04 +00:00
Alan Cox	c4e49ba477	Document vm_page_alloc_contig()'s support for the VM_ALLOC_NODUMP option. MFC after: 3 days	2015-05-30 23:37:47 +00:00
John Baldwin	ff87ae350e	Export a list of VM objects in the system via a sysctl. The list can be examined via 'vmstat -o'. It can be used to determine which files are using physical pages of memory and how much each is using. Differential Revision: https://reviews.freebsd.org/D2277 Reviewed by: alc, kib MFC after: 2 weeks Sponsored by: Norse Corp, Inc. (forward porting to HEAD/10)	2015-05-27 18:11:05 +00:00
Jung-uk Kim	fd90e2ed54	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
Konstantin Belousov	2c20bd8b99	Do grammar fix in the comment to record the right commit message for r283162. Fix a cosmetic issue with vm_page_alloc() calling vm_page_free_toq() with the page not completely satisfying vm_page_free() assertions. The page is not owned by the object, since insertion failed. But besides m->object reset to NULL, we should also set VPO_UNMANAGED flag for consistency. Reported by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-05-20 23:15:56 +00:00

1 2 3 4 5 ...

3398 Commits