freebsd-nq

Author	SHA1	Message	Date
Konstantin Belousov	987ff18184	Consistently handle negative or wrapping offsets in the mmap(2) syscalls. For regular files and posix shared memory, POSIX requires that [offset, offset + size) range is legitimate. At the maping time, check that offset is not negative. Allowing negative offsets might expose the data that filesystem put into vm_object for internal use, esp. due to OFF_TO_IDX() signess treatment. Fault handler verifies that the mapped range is valid, assuming that mmap(2) checked that arithmetic gives no undefined results. For device mappings, leave the semantic of negative offsets to the driver. Correct object page index calculation to not erronously propagate sign. In either case, disallow overflow of offset + size. Update mmap(2) man page to explain the requirement of the range validity, and behaviour when the range becomes invalid after mapping. Reported and tested by: royger (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-02-12 21:05:44 +00:00
Konstantin Belousov	1c2ad3e962	Change type of the prot parameter for kern_vm_mmap() from vm_prot_t to int. This makes the code to pass whole word of the mmap(2) syscall argument prot to the syscall helper kern_vm_mmap(), which can validate all bits. The change provides temporal fix for sys/vm/mmap_test mmap__bad_arguments, which was broken after r313352. PR: 216976 Reported and tested by: ngie Sponsored by: The FreeBSD Foundation	2017-02-11 20:27:39 +00:00
Edward Tomasz Napierala	69cdfcef2e	Add kern_vm_mmap2(), kern_vm_mprotect(), kern_vm_msync(), kern_vm_munlock(), kern_vm_munmap(), and kern_vm_madvise(), and use them in various compats instead of their sys_*() counterparts. Reviewed by: ed, dchagin, kib MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9378	2017-02-06 20:57:12 +00:00
Konstantin Belousov	5fca242374	Style, use tab after #define. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 3 days	2017-02-04 19:16:19 +00:00
Alan Cox	8a99f1cc59	Over the years, the code and comments in vm_page_startup() have diverged in one respect. When determining how many page structures to allocate, contrary to what the comments say, the code does not account for the overhead of a page structure per page of physical memory. This revision changes the code to match the comments. Reviewed by: kib, markj MFC after: 6 weeks Differential Revision: https://reviews.freebsd.org/D9081	2017-02-04 05:23:10 +00:00
Edward Tomasz Napierala	a6b15641d6	Ifdef out the unused vm_rr_selectdomain(). MFC after: 2 weeks Sponsored by: DARPA, AFRL	2017-02-02 17:44:55 +00:00
Mark Johnston	aa3650ea36	Avoid page lookups in the top-level object in vm_object_madvise(). We can iterate over consecutive resident pages in the top-level object using the object's page list rather than by performing lookups in the object radix tree. This extends one of the optimizations in r312208 to the case where a shadow chain is present. Suggested by: alc Reviewed by: alc, kib (previous version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D9282	2017-01-30 18:51:43 +00:00
Mateusz Guzik	736ff8c396	hwpmc: partially depessimize munmap handling if the module is not loaded HWPMC_HOOKS is enabled in GENERIC and triggers some work avoidable in the common (module not loaded) case. In particular this avoids permission checks + lock downgrade singlethreaded and in cases were an executable mapping is found the pmc sx lock is no longer bounced. Note this is a band aid. MFC after: 1 week	2017-01-24 22:00:16 +00:00
Mark Johnston	c2655a40a7	Avoid unnecessary page lookups in vm_object_madvise(). vm_object_madvise() is frequently used to apply advice to a contiguous set of pages in an object with no backing object. Optimize this case by skipping non-resident subranges in constant time, and by iterating over resident pages using the object memq, thus avoiding radix tree lookups on each page index in the specified range. While here, move MADV_WILLNEED handling to vm_page_advise(), and rename the "advise" parameter to vm_object_madvise() to "advice." Reviewed by: alc, kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D9098	2017-01-15 03:50:08 +00:00
Gleb Smirnoff	4f56243aad	Fix the contiguity once more.	2017-01-12 20:26:02 +00:00
Mark Johnston	8d65cba217	Remove a redundant use of min(). Reported by: rpokala X-MFC With: r311346	2017-01-05 03:13:45 +00:00
Mark Johnston	ec492b13f1	Add a small allocator for exec_map entries. Upon each execve, we allocate a KVA range for use in copying data to the new image. Pages must be faulted into the range, and when the range is freed, the backing pages are freed and their mappings are destroyed. This is a lot of needless overhead, and the exec_map management becomes a bottleneck when many CPUs are executing execve concurrently. Moreover, the number of available ranges is fixed at 16, which is insufficient on large systems and potentially excessive on 32-bit systems. The new allocator reduces overhead by making exec_map allocations persistent. When a range is freed, pages backing the range are marked clean and made easy to reclaim. With this change, the exec_map is sized based on the number of CPUs. Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D8921	2017-01-05 01:44:12 +00:00
Gleb Smirnoff	1e0c121f3a	Fix assertion that checks that pages are consecutive to properly handle bogus_page insertion(s).	2017-01-04 22:31:09 +00:00
Gleb Smirnoff	bfc8c24c73	Move bogus_page declaration to vm_page.h and initialization to vm_page.c. Reviewed by: kib	2017-01-04 22:27:19 +00:00
Mark Johnston	b1fd102ee7	Add a page queue for holding dirty anonymous unswappable pages. On systems without a configured swap device, an attempt to launder pages from a swap object will always fail and result in the page being reactivated. This means that the page daemon will continuously scan pages that can never be evicted. With this change, anonymous pages are instead moved to PQ_UNSWAPPABLE after a failed laundering attempt when no swap devices are configured. PQ_UNSWAPPABLE is not scanned unless a swap device is configured, so unreferenced unswappable pages are excluded from the page daemon's workload. Reviewed by: alc	2017-01-03 00:05:44 +00:00
Justin Hibbits	b5345ef10e	Print flags in hex instead of decimal. Hex is easier to grok for flags, and consistent with other prints.	2017-01-02 16:50:52 +00:00
Konstantin Belousov	1569205f0a	Style fixes for vm_map_insert(). Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-01 18:49:46 +00:00
Konstantin Belousov	03302b1380	Ansify vm/vm_pager.c. Style. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-12-31 19:30:22 +00:00
Mateusz Guzik	6ff51a3685	Use vrefact in vnode_pager_alloc.	2016-12-31 10:37:56 +00:00
Konstantin Belousov	7a432b84e8	Fix two similar bugs in the populate vm_fault() code. If pager' populate method succeeded, but other thread raced with us and modified vm_map, we must unbusy all pages busied by the pager, before we retry the whole fault handling. If pager instantiated more pages than fit into the current map entry, we must unbusy the pages which are clipped. Also do some refactoring, clarify comments and use more clear local variable names. Reported and tested by: kargl, subbsd@gmail.com (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-12-30 18:55:33 +00:00
Konstantin Belousov	0c8bd6a7d8	Assert that the pages found on the object queue by vm_page_next() and vm_page_prev() have correct ownership. In collaboration with: alc Sponsored by: The FreeBSD Foundation (kib) MFC after: 1 week	2016-12-30 17:37:06 +00:00
Konstantin Belousov	9a4ee196dd	Style. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-12-30 13:04:43 +00:00
Mateusz Guzik	0b3b55a0f2	Remove cpu_spinwait after seq_consistent. It does not add any benefit as the read routine will do it as necessary.	2016-12-30 06:26:17 +00:00
Alan Cox	920da7e4d2	Relax the object type restrictions on vm_page_alloc_contig(). Specifically, add support for object types that were previously prohibited because they could contain PG_CACHED pages. Roughly halve the number of radix trie operations performed by vm_page_alloc_contig() using the same approach that is employed by vm_page_alloc(). Also, eliminate the radix trie lookup performed with the free page queues lock held. Tidy up the handling of radix trie insert failures in vm_page_alloc() and vm_page_alloc_contig(). Reviewed by: kib, markj Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8878	2016-12-28 18:32:13 +00:00
Konstantin Belousov	8b590e9506	Remove redundancy in vmtotal(). There are two instances of inlined unlocks + continue in vmtotal() switch statements, which are ordinary expressed with break from the switch case and code after the switch. Also, the combination of continue and break statement is redundand. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-12-26 19:29:04 +00:00
Konstantin Belousov	2e56b64fa4	Fix argument type and microoptimize swp_pager_meta_free(). The count argument natural type if vm_pindex_t, but due to the loop organization, it has to be signed type to detect the termination condition. Replace this logic by using distinguished counter for the processed pages, and terminate loop when the counter exceeds the argument. Completely process one swblock for all relevant indexes instead of doing relookup in hash when incrementing page index on the loop step. Do not drop hash mutex around iterations. Noted and reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-12-24 09:57:31 +00:00
Konstantin Belousov	77d6fd97ef	Improve vm_object_scan_all_shadowed() to also check swap backing objects. As noted in the removed comment, it is possible and not prohibitively costly to look up the swap blocks for the given page index. Implement a swap_pager_find_least() function to do that, and use it to iterate simultaneously over both backing object page queue and swap allocations when looking for shadowed pages. Testing shows that number of new succesful scans, enabled by this addition, is small but non-zero. When worked out, the change both further reduces the depth of the shadow object chain, and frees unused but allocated swap and memory. Suggested and reviewed by: alc Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-12-18 20:56:14 +00:00
Konstantin Belousov	71057cd207	In swp_pager_meta_free_all(), fix type of the index variable. Style. Noted and reviewed by: alc (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-12-16 23:33:37 +00:00
Konstantin Belousov	a1e9a3bba3	Provide introductory description of the default pager. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-12-14 23:36:32 +00:00
Konstantin Belousov	a41ece0840	Remove locking around accounting initialization of the default object. The object is not yet fully constructed and must not be available to other threads. This makes default_pager_alloc() almost identical to swap_pager_alloc_init(). Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-12-14 23:34:25 +00:00
Alan Cox	3d026d871f	Tidy up. Mostly, remove or replace stale comments. Most of the comments in this file actually described the operation of the swap pager, not the default pager. Given that this is the wrong place to discuss the implementation of the swap pager, it shouldn't come as a surprise that as the swap pager evolved these comments became increasingly stale. In addition, apply some style fixes, like modernizing a few remaining old- style function definitions. Reviewed by: kib, markj MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D8781	2016-12-14 17:28:55 +00:00
John Baldwin	a9546a6b17	Use db_lookup_proc() in the DDB 'show procvm' command. This allows processes to be identified by PID as well as a pointer address. MFC after: 2 weeks Sponsored by: DARPA / AFRL	2016-12-13 19:22:43 +00:00
Alan Cox	3453bca864	Eliminate every mention of PG_CACHED pages from the comments in the machine- independent layer of the virtual memory system. Update some of the nearby comments to eliminate redundancy and improve clarity. In vm/vm_reserv.c, do not use hyphens after adverbs ending in -ly per The Chicago Manual of Style. Update the comment in vm/vm_page.h defining the four types of page queues to reflect the elimination of PG_CACHED pages and the introduction of the laundry queue. Reviewed by: kib, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8752	2016-12-12 17:47:09 +00:00
Gleb Smirnoff	255003da42	Allow bogus_page to be passed to pager(s).	2016-12-09 21:21:24 +00:00
Mark Johnston	90458813cd	Conditionalize PG_CACHE sysctls on COMPAT_FREEBSD11. Reviewed by: glebius, imp, jhb Differential Revision: https://reviews.freebsd.org/D8736	2016-12-09 18:55:27 +00:00
Konstantin Belousov	ed01d9894e	Implement the populate() pager method for phys pager. It allows to provide configurable agressive prefaulting and useful hints to page daemon about memory allocations, on faults for pages managed by phys pager. In fact, this implementation is superior to the MAP_SHARED_PHYS hack from my Postgresql paper, while giving similar benefits of reducing the page faults numbers on SysV shared memory mappings. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2016-12-08 11:35:53 +00:00
Konstantin Belousov	c42b43a054	Add a new populate() pager method and extend device pager ops vector with cdev_pg_populate() to provide device drivers access to it. It gives drivers fine control of the pages ownership and allows drivers to implement arbitrary prefault policies. The populate method is called on a page fault and is supposed to populate the vm object with the page at the fault location and some amount of pages around it, at pager's discretion. VM provides the pager with the hints about current range of the object mapping, to avoid instantiation of immediately unused pages, if pager decides so. Also, VM passes the fault type and map entry protection to the pager, allowing it to force the optimal required ownership of the mapped pages. Installed pages must contiguously fill the returned region, be fully valid and exclusively busied. Of course, the pages must be compatible with the object' type. After populate() successfully returned, VM fault handler installs as many instantiated pages into the process page tables as it sees reasonable, while still obeying the correct semantic for COW and vm map locking. The method is opt-in, pager sets OBJ_POPULATE flag to indicate that the method can be called. If pager' vm objects can be shadowed, pager must implement the traditional getpages() method in addition to the populate(). Populate() might fall back to the getpages() on per-call basis as well, by returning VM_PAGER_BAD error code. For now for device pagers, the populate() method is only allowed to be used by the managed device pagers, but the limitation is only made because there is no unmanaged fault handlers which could use it right now. KPI designed together with, and reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2016-12-08 11:26:11 +00:00
Konstantin Belousov	dc5401d240	Move map_generation snapshot value into struct faultstate. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-12-08 10:29:41 +00:00
Konstantin Belousov	272cc3c4d0	Style. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-12-08 10:28:51 +00:00
Alan Cox	e94965d82e	Previously, vm_radix_remove() would panic if the radix trie didn't contain a vm_page_t at the specified index. However, with this change, vm_radix_remove() no longer panics. Instead, it returns NULL if there is no vm_page_t at the specified index. Otherwise, it returns the vm_page_t. The motivation for this change is that it simplifies the use of radix tries in the amd64, arm64, and i386 pmap implementations. Instead of performing a lookup before every remove, the pmap can simply perform the remove. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D8708	2016-12-08 04:29:29 +00:00
Mark Johnston	43482f897b	Use the official spelling for NULL arguments to typed sysctl handlers. Reported by: bde	2016-12-07 01:15:10 +00:00
Mark Johnston	77edd8fa00	Provide dummy sysctls for v_cache_count and v_tcached. Some utilities (notably top(1)) exit if any of their input sysctls don't exist, and the removal of the above-mentioned PG_CACHE-related sysctls makes it difficult to run such utilities on different versions of the kernel without recompiling. Requested by: bde	2016-12-06 22:52:45 +00:00
Alan Cox	8804a2b030	Eliminate a stale comment; vm_radix_prealloc() was replaced in r254141. MFC after: 3 days	2016-12-02 16:29:30 +00:00
Alan Cox	563a19d546	During vm_page_cache()'s call to vm_radix_insert(), if vm_page_alloc() was called to allocate a new page of radix trie nodes, there could be a call to vm_radix_remove() on the same trie (of PG_CACHED pages) as the in-progress vm_radix_insert(). With the removal of PG_CACHED pages, we can simplify vm_radix_insert() and vm_radix_remove() by removing the flags on the root of the trie that were used to detect this case and the code for restarting vm_radix_insert() when it happened. Reviewed by: kib, markj Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8664	2016-12-01 17:26:37 +00:00
Alan Cox	ba67369628	Recursion on the free page queue mutex occurred when UMA needed to allocate a new page of radix trie nodes to complete a vm_radix_insert() operation that was requested by vm_page_cache(). Specifically, vm_page_cache() already held the free page queue lock when UMA tried to acquire it through a call to vm_page_alloc(). This code path no longer exists, so there is no longer any reason to allow recursion on the free page queue mutex. Improve nearby comments. Reviewed by: kib, markj Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8628	2016-11-27 01:42:53 +00:00
Mark Johnston	99e6e1930c	Release laundered vnode pages to the head of the inactive queue. The swap pager enqueues laundered pages near the head of the inactive queue to avoid another trip through LRU before reclamation. This change adds support for this behaviour to the vnode pager and makes use of it in UFS and ext2fs. Some ioflag handling is consolidated into a common subroutine so that this support can be easily extended to other filesystems which make use of the buffer cache. No changes are needed for ZFS since its putpages routine always undirties the pages before returning, and the laundry thread requeues the pages appropriately in this case. Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D8589	2016-11-23 17:53:07 +00:00
Alan Cox	bba39b9ae3	Remove PG_CACHED-related fields from struct vmmeter, because they are no longer used. More precisely, they are always zero because the code that decremented and incremented them no longer exists. Bump __FreeBSD_version to mark this change. Reviewed by: kib, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8583	2016-11-22 18:13:46 +00:00
Gleb Smirnoff	e48b82bd83	- If caller specifies readbehind and readahead that together with count doesn't fit into a buf, then trim readbehind and readahead evenly. If rbehind was limited by the previous BMAP, then roundup its trim to block size. - Add KASSERT to check that b_blkno has proper offset from original blkno returned by BMAP. [1] - Add KASSERT to check that pages in buf are consecutive. Reviewed by: kib Submitted by: kib [1]	2016-11-17 20:32:32 +00:00
Konstantin Belousov	41ddec83c1	Move the fast fault path into the separate function. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-11-16 16:34:17 +00:00
Alan Cox	7667839a7e	Remove most of the code for implementing PG_CACHED pages. (This change does not remove user-space visible fields from vm_cnt or all of the references to cached pages from comments. Those changes will come later.) Reviewed by: kib, markj Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8497	2016-11-15 18:22:50 +00:00

1 2 3 4 5 ...

3593 Commits