freebsd-skq

Author	SHA1	Message	Date
markj	d1498f4e3a	Add support for pmap_enter(psind = 1) to the arm64 pmap. See the commit log messages for r321378 and r336288 for descriptions of this functionality. Reviewed by: alc Differential Revision: https://reviews.freebsd.org/D16303	2018-07-20 16:37:04 +00:00
alc	47fecb79d6	Revert r329254. The underlying cause for the copy-on-write problem in multithreaded programs that was addressed by r329254 was in the implementation of pmap_enter() on some architectures, notably, amd64. kib@, markj@ and I have audited all of the pmap_enter() implementations, and fixed the broken ones, specifically, amd64 (r335784, r335971), i386 (r336092), mips (r336248), and riscv (r336294). To be clear, the reason to address the problem within pmap_enter() and revert r329254 is not just a matter of principle. An effect of r329254 was that a copy-on-write fault actually entailed two page faults, not one, even for single-threaded programs. Now, in the expected case for either single- or multithreaded programs, we are back to a single page fault to complete a copy-on-write operation. (In extremely rare circumstances, a multithreaded program could suffer two page faults.) Reviewed by: kib, markj Tested by: truckman MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D16301	2018-07-19 17:01:10 +00:00
alc	dd64d030ae	Add support for pmap_enter(..., psind=1) to the i386 pmap. In other words, add support for explicitly requesting that pmap_enter() create a 2 or 4 MB page mapping. (Essentially, this feature allows the machine-independent layer to create superpage mappings preemptively, and not wait for automatic promotion to occur.) Export pmap_ps_enabled() to the machine-independent layer. Add a flag to pmap_pv_insert_pde() that specifies whether it should fail or reclaim a PV entry when one is not available. Refactor pmap_enter_pde() into two functions, one by the same name, that is a general-purpose function for creating PDE PG_PS mappings, and another, pmap_enter_4mpage(), that is used to prefault 2 or 4 MB read- and/or execute-only mappings for execve(2), mmap(2), and shmat(2). Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D16246	2018-07-14 17:20:27 +00:00
markj	0b6c109065	Typo. PR: 228533 Submitted by: Jakub Piecuch <j.piecuch96@gmail.com> MFC after: 1 week	2018-05-30 16:48:48 +00:00
alc	8befc10d11	Addendum to r334233. In vm_fault_populate(), since the page lock is held, we must use vm_page_xunbusy_maybelocked() rather than vm_page_xunbusy() to unbusy the page. Reviewed by: kib X-MFC with: r334233	2018-05-28 16:23:39 +00:00
alc	a560ecfcc3	Eliminate duplicate assertions. We assert at the start of vm_fault_hold() that the map entry is wired if the caller passes the flag VM_FAULT_WIRE. Eliminate the same assertion, but spelled differently, at the end of vm_fault_hold() and vm_fault_populate(). Repeat the assertion only if the map is unlocked and the map lookup must be repeated. Reviewed by: kib MFC after: 10 days Differential Revision: https://reviews.freebsd.org/D15582	2018-05-28 04:38:10 +00:00
alc	f2b4bbab1c	Use pmap_enter(..., psind=1) in vm_fault_populate() on amd64. While superpage mappings were already being created by automatic promotion in vm_fault_populate(), this change reduces the cost of creating those mappings. Essentially, one pmap_enter(..., psind=1) call takes the place of 512 pmap_enter(..., psind=0) calls, and that one pmap_enter(..., psind=1) call eliminates the allocation of a page table page. Reviewed by: kib MFC after: 10 days Differential Revision: https://reviews.freebsd.org/D15572	2018-05-26 02:59:34 +00:00
alc	6e9601f6ba	Eliminate an unused parameter from vm_fault_populate(). Reviewed by: kib MFC after: 10 days	2018-05-24 20:43:41 +00:00
kib	65d851d412	Eliminate some vm object relocks in vm fault. For the vm_fault_prefault() call from vm_fault_soft_fast(), extend the scope of the object rlock to avoid re-taking it inside vm_fault_prefault(). It causes pmap_enter_quick() sometimes called with shadow object lock as well as the page lock, but this looks innocent. Noted and measured by: mjg Reviewed by: alc, markj (as part of the larger patch) Tested by: pho (as part of the larger patch) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D15122	2018-04-29 12:43:08 +00:00
kib	fa78a984aa	Allow to specify for vm_fault_quick_hold_pages() that nofault mode should be honored. We must not sleep or acquire any MI VM locks if TDP_NOFAULTING is specified. On the other hand, there were some callers in the tree which set TDP_NOFAULTING for larger scope than needed, I fixed the code which I wrote, but I suspect that linuxkpi and out of tree drm drivers might abuse this still. So only enable the mode for vm_fault_quick_hold_pages() where vm_fault_hold() is not called when specifically asked by user. I decided to use vm_prot_t flag to not change KPI. Since number of flags in vm_prot_t is limited, I reused the same flag which was already consumed for vm_map_lookup(). Reported and tested by: pho (as part of the larger patch) Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D14825	2018-03-26 16:31:12 +00:00
markj	2172a46042	Avoid dequeuing the fault page during a soft fault. Such pages are re-enqueued at the end of the fault handler, preserving LRU. Rather than performing two separate operations per fault, simply requeue the page at the end of the fault (or bump its activation count if it resides in PQ_ACTIVE, avoiding the page queue lock entirely). This elides some page lock and page queue lock operations in common cases, e.g., CoW faults. Note that we must still dequeue the source page for "optimized" CoW faults since the page may not remain enqueued while it is moved to another object. Reviewed by: alc, kib Tested by: pho MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D14625	2018-03-18 16:49:30 +00:00
markj	3394e82adc	Have vm_page_{deactivate,launder}() requeue already-queued pages. In many cases the page is not enqueued so the change will have no effect. However, the change is needed to support an optimization in the fault handler and in some cases (sendfile, the buffer cache) it was being emulated by the caller anyway. Reviewed by: alc Tested by: pho MFC after: 2 weeks X-Differential Revision: https://reviews.freebsd.org/D14625	2018-03-18 16:40:56 +00:00
kib	ee3d0fb8ef	vm_wait() rework. Make vm_wait() take the vm_object argument which specifies the domain set to wait for the min condition pass. If there is no object associated with the wait, use curthread' policy domainset. The mechanics of the wait in vm_wait() and vm_wait_domain() is supplied by the new helper vm_wait_doms(), which directly takes the bitmask of the domains to wait for passing min condition. Eliminate pagedaemon_wait(). vm_domain_clear() handles the same operations. Eliminate VM_WAIT and VM_WAITPFAULT macros, the direct functions calls are enough. Eliminate several control state variables from vm_domain, unneeded after the vm_wait() conversion. Scetched and reviewed by: jeff Tested by: pho Sponsored by: The FreeBSD Foundation, Mellanox Technologies Differential revision: https://reviews.freebsd.org/D14384	2018-02-20 10:13:13 +00:00
kib	c9f8f3e9be	Ensure memory consistency on COW. From the submitter description: The process is forked transitioning a map entry to COW Thread A writes to a page on the map entry, faults, updates the pmap to writable at a new phys addr, and starts TLB invalidations... Thread B acquires a lock, writes to a location on the new phys addr, and releases the lock Thread C acquires the lock, reads from the location on the old phys addr... Thread A ...continues the TLB invalidations which are completed Thread C ...reads from the location on the new phys addr, and releases the lock In this example Thread B and C [lock, use and unlock] properly and neither own the lock at the same time. Thread A was writing somewhere else on the page and so never had/needed the lock. Thread C sees a location that is only ever read\|modified under a lock change beneath it while it is the lock owner. To fix this, perform the two-stage update of the copied PTE. First, the PTE is updated with the address of the new physical page with copied content, but in read-only mode. The pmap locking and the page busy state during PTE update and TLB invalidation IPIs ensure that any writer to the page cannot upgrade the PTE to the writable state until all CPUs updated their TLB to not cache old mapping. Then, after the busy state of the page is lifted, the faults for write can proceed and do not violate the consistency of the reads. The change is done in vm_fault because most architectures do need IPIs to invalidate remote TLBs. More, I think that hardware guarantees of atomicity of the remote TLB invalidation are not enough to prevent the inconsistent reads of non-atomic reads, like multi-word accesses protected by a lock. So instead of modifying each pmap invalidation code, I did it there. Discovered and analyzed by: Elliott.Rabe@dell.com Reviewed by: markj PR: 225584 (appeared to have the same cause) Tested by: Elliott.Rabe@dell.com, emaste, Mike Tancsa <mike@sentex.net>, truckman Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D14347	2018-02-14 00:31:45 +00:00
kib	f2d6aac90a	Do not call pmap_enter() with invalid protection mode. If the map entry elookup was performed due to the mapping changes, we need to ensure that there is still some access permission bit requested which is compatible with the current vm_map_entry mode. If not, restart the handler from scratch instead of trying to save the current progress. Also adjust fault_type to not include cleared permission bits. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D14347	2018-02-14 00:25:18 +00:00
jeff	94c7af8ca2	Implement 'domainset', a cpuset based NUMA policy mechanism. This allows userspace to control NUMA policy administratively and programmatically. Implement domainset based iterators in the page layer. Remove the now legacy numa_* syscalls. Cleanup some header polution created by having seq.h in proc.h. Reviewed by: markj, kib Discussed with: alc Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D13403	2018-01-12 22:48:23 +00:00
pfg	155122ce53	SPDX: Consider code from Carnegie-Mellon University. Interesting cases, most likely from CMU Mach sources.	2017-11-30 15:48:35 +00:00
pfg	9da7bdde06	spdx: initial adoption of licensing ID tags. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Initially, only tag files that use BSD 4-Clause "Original" license. RelNotes: yes Differential Revision: https://reviews.freebsd.org/D13133	2017-11-18 14:26:50 +00:00
alc	4ab94b03c3	Switching from a global hash table to per-vm_object radix tries for mapping vm_object page indices to on-disk swap space (r322913) has changed the synchronization requirements for a couple swap pager functions. Whereas before a read lock on the vm object sufficed because of the global mutex on the hash table, a write lock on the vm object may now be required. In particular, calls to vm_pager_page_unswapped() now require a write lock on the vm_object. Consequently, vm_fault()'s fast path cannot call vm_pager_page_unswapped(). The swap space will have to be released at a later point. Reviewed by: kib, markj X-MFC with: r322913 Differential Revision: https://reviews.freebsd.org/D12134	2017-08-28 16:55:43 +00:00
alc	65ce87ce40	Address a compilation warning on some architectures that was introduced by the previous change, r321386. Reported by: ian MFC after: 10 days X-MFC after: r321386	2017-07-23 19:35:14 +00:00
alc	1179a1717e	Utilize pmap_enter(..., psind=1) in vm_fault_soft_fast() on amd64. (The Differential Revision discusses the benefits of this change.) Add a function, vm_reserv_to_superpage(), that returns the superpage containing the specified base page. Reviewed by: kib, markj Tested by: pho MFC after: 10 days Differential Revision: https://reviews.freebsd.org/D11556	2017-07-23 16:28:13 +00:00
kib	34de63e92d	Implement address space guards. Guard, requested by the MAP_GUARD mmap(2) flag, prevents the reuse of the allocated address space, but does not allow instantiation of the pages in the range. It is useful for more explicit support for usual two-stage reserve then commit allocators, since it prevents accidental instantiation of the mapping, e.g. by mprotect(2). Use guards to reimplement stack grow code. Explicitely track stack grow area with the guard, including the stack guard page. On stack grow, trivial shift of the guard map entry and stack map entry limits makes the stack expansion. Move the code to detect stack grow and call vm_map_growstack(), from vm_fault() into vm_map_lookup(). As result, it is impossible to get random mapping to occur in the stack grow area, or to overlap the stack guard page. Enable stack guard page by default. Reviewed by: alc, markj Man page update reviewed by: alc, bjk, emaste, markj, pho Tested by: pho, Qualys Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D11306 (man pages)	2017-06-24 17:01:11 +00:00
glebius	21ead51d79	- Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter in place. To do per-cpu stats, convert all fields that previously were maintained in the vmmeters that sit in pcpus to counter(9). - Since some vmmeter stats may be touched at very early stages of boot, before we have set up UMA and we can do counter_u64_alloc(), provide an early counter mechanism: o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter. o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter, so that at early stages of boot, before counters are allocated we already point to a counter that can be safely written to. o For sparc64 that required a whole dummy pcpu[MAXCPU] array. Further related changes: - Don't include vmmeter.h into pcpu.h. - vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit, to match kernel representation. - struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion. This is based on benno@'s 4-year old patch: https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html Reviewed by: kib, gallatin, marius, lidl Differential Revision: https://reviews.freebsd.org/D10156	2017-04-17 17:34:47 +00:00
alc	8bad3c5e87	Two changes to vm_fault_populate(): Simplify the logic for clipping the range returned by the pager to fit within the map entry. Use atop() rather than OFF_TO_IDX() on addresses. Reviewed by: kib MFC after: 1 week	2017-03-19 19:52:47 +00:00
kib	340b707be8	Fix off-by-one in the vm_fault_populate() code. When re-calculating the last inclusive page index after the pager call, -1 was erronously ommitted. If the pager extended the run (unlikely), the result would be insertion of the valid page mapping outside the current map entry range. Found by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-19 14:42:16 +00:00
kib	8cf2af841c	Use atop() instead of OFF_TO_IDX() for convertion of addresses or addresses offsets, as intended. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-03-14 19:39:17 +00:00
kib	568d99bbad	Properly handle possible underflow in vm_fault_prefault(). In vm_fault_prefault(), if backward count causes underflow in calculation of starta = addra - backward * PAGE_SIZE; then starta must be clipped to entry->start, instead of zero. Clipping to zero allowed mapping outside of the map entries address ranges, in particular, map at zero. Submitted by: Yanko Yankulov <yanko.yankulov@gmail.com> Reviewed by: alc MFC after: 1 week	2017-02-24 08:09:16 +00:00
bz	21d1178863	Use %s __func__ to print the actual function name (been looking at the wrong one for too often lately at first), and also use %#lx to get the 0x prefix for the address. MFC after: 1 week	2017-02-14 01:20:03 +00:00
kib	80cfc0c41c	Fix two similar bugs in the populate vm_fault() code. If pager' populate method succeeded, but other thread raced with us and modified vm_map, we must unbusy all pages busied by the pager, before we retry the whole fault handling. If pager instantiated more pages than fit into the current map entry, we must unbusy the pages which are clipped. Also do some refactoring, clarify comments and use more clear local variable names. Reported and tested by: kargl, subbsd@gmail.com (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-12-30 18:55:33 +00:00
kib	2819db13ca	Add a new populate() pager method and extend device pager ops vector with cdev_pg_populate() to provide device drivers access to it. It gives drivers fine control of the pages ownership and allows drivers to implement arbitrary prefault policies. The populate method is called on a page fault and is supposed to populate the vm object with the page at the fault location and some amount of pages around it, at pager's discretion. VM provides the pager with the hints about current range of the object mapping, to avoid instantiation of immediately unused pages, if pager decides so. Also, VM passes the fault type and map entry protection to the pager, allowing it to force the optimal required ownership of the mapped pages. Installed pages must contiguously fill the returned region, be fully valid and exclusively busied. Of course, the pages must be compatible with the object' type. After populate() successfully returned, VM fault handler installs as many instantiated pages into the process page tables as it sees reasonable, while still obeying the correct semantic for COW and vm map locking. The method is opt-in, pager sets OBJ_POPULATE flag to indicate that the method can be called. If pager' vm objects can be shadowed, pager must implement the traditional getpages() method in addition to the populate(). Populate() might fall back to the getpages() on per-call basis as well, by returning VM_PAGER_BAD error code. For now for device pagers, the populate() method is only allowed to be used by the managed device pagers, but the limitation is only made because there is no unmanaged fault handlers which could use it right now. KPI designed together with, and reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2016-12-08 11:26:11 +00:00
kib	a497cae7b6	Move map_generation snapshot value into struct faultstate. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-12-08 10:29:41 +00:00
kib	47319ab8b8	Move the fast fault path into the separate function. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-11-16 16:34:17 +00:00
alc	2fa3607305	Remove most of the code for implementing PG_CACHED pages. (This change does not remove user-space visible fields from vm_cnt or all of the references to cached pages from comments. Those changes will come later.) Reviewed by: kib, markj Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8497	2016-11-15 18:22:50 +00:00
alc	5fd3767e0e	Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty pages, specificially, dirty pages that have passed once through the inactive queue. A new, dedicated thread is responsible for both deciding when to launder pages and actually laundering them. The new policy uses the relative sizes of the inactive and laundry queues to determine whether to launder pages at a given point in time. In general, this leads to more intelligent swapping behavior, since the laundry thread will avoid pageouts when the marginal benefit of doing so is low. Previously, without a dedicated queue for dirty pages, the page daemon didn't have the information to determine whether pageout provides any benefit to the system. Thus, the previous policy often resulted in small but steadily increasing amounts of swap usage when the system is under memory pressure, even when the inactive queue consisted mostly of clean pages. This change addresses that issue, and also paves the way for some future virtual memory system improvements by removing the last source of object-cached clean pages, i.e., PG_CACHE pages. The new laundry thread sleeps while waiting for a request from the page daemon thread(s). A request is raised by setting the variable vm_laundry_request and waking the laundry thread. We request launderings for two reasons: to try and balance the inactive and laundry queue sizes ("background laundering"), and to quickly make up for a shortage of free pages and clean inactive pages ("shortfall laundering"). When background laundering is requested, the laundry thread computes the number of page daemon wakeups that have taken place since the last laundering. If this number is large enough relative to the ratio of the laundry and (global) inactive queue sizes, we will launder vm_background_launder_target pages at vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back to sleep without doing any work. When scanning the laundry queue during background laundering, reactivated pages are counted towards the laundry thread's target. In contrast, shortfall laundering is requested when an inactive queue scan fails to meet its target. In this case, the laundry thread attempts to launder enough pages to meet v_free_target within 0.5s, which is the inactive queue scan period. A laundry request can be latched while another is currently being serviced. In particular, a shortfall request will immediately preempt a background laundering. This change also redefines the meaning of vm_cnt.v_reactivated and removes the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning of vm_cnt.v_reactivated now better reflects its name. It represents the number of inactive or laundry pages that are returned to the active queue on account of a reference. In collaboration with: markj Reviewed by: kib Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8302	2016-11-09 18:48:37 +00:00
alc	2e4bd96be8	In vm_fault()'s loop over the shadow chain, move a comment describing our invariants to a better place. Also, add two comments concerning the relationship between the map and vnode locks. Reviewed by: kib MFC after: 3 days	2016-11-03 16:44:55 +00:00
alc	00978d6f5e	Move and revise a comment about the relation between the object's paging- in-progress count and the vnode. Prior to r188331, we always acquired the vnode lock before incrementing the object's paging-in-progress count. Now, we increment it before attempting to acquire the vnode lock with LK_NOWAIT, but we never sleep acquiring the vnode lock while we have the count incremented. Reviewed by: kib MFC after: 3 days	2016-11-01 17:11:10 +00:00
kib	3db20a2896	Change remained internal uses of boolean_t to bool in vm/vm_fault.c. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-10-30 20:39:38 +00:00
alc	3f64f11e04	Merge and sort vm_fault_hold()'s "int" variable definitions. Reviewed by: kib MFC after: 7 days	2016-10-30 19:15:59 +00:00
kib	1fed6374b2	Remove vnode_locked label and goto, by collapsing vp calculation into the conditional. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-10-30 18:05:18 +00:00
alc	ad944a452a	The "lookup_is_valid" field is used as a "bool". Make it one. Convert vm_fault_hold()'s Boolean variables that are only used internally to "bool". Add a comment describing why the one remaining "boolean_t" was not converted. Reviewed by: kib MFC after: 8 days	2016-10-29 21:01:49 +00:00
alc	7601e552ca	With one exception, "hardfault" is used like a "bool". Change that exception and make it a "bool". Reviewed by: kib MFC after: 7 days	2016-10-29 19:22:38 +00:00
markj	3b65e99289	Add one more use of unlock_vp(). Discussed with: kib X-MFC With: r308094	2016-10-29 18:47:28 +00:00
kib	8f55a87473	Add unlock_vp() helper. Trim space. Discussed with: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-10-29 18:03:29 +00:00
kib	d6fb5308f6	If vm_fault_hold(9) finds that fs.m is wired, do not free it after a pager error, leave the page to the wire owner. E.g. the page might be a part of the invalidated buffer. Reported and tested by: pho Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D8197	2016-10-17 08:17:06 +00:00
markj	47ebd84fca	Plug a potential vnode lock leak in vm_fault_hold(). Reviewed by: alc, kib MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8242	2016-10-13 20:39:34 +00:00
alc	7265bbc74b	Add a comment describing the 'fast path' that was introduced in r270011. Reviewed by: kib MFC after: 3 days Sponsored by: EMC / Isilon Storage Division	2016-07-20 17:20:22 +00:00
alc	281dfb8433	Break up vm_fault()'s implementation of the read-ahead and delete-behind optimizations into two distinct pieces. The first piece consists of the code that should only be performed once per page fault and requires the map to be locked. The second piece consists of the code that should be performed each time a pager is called on an object in the shadow chain. (This second piece expects the map to be unlocked.) Previously, the entire implementation could be executed multiple times. Moreover, the second and subsequent executions would occur with the map unlocked. Usually, the ensuing unsynchronized accesses to the map were harmless because the map was not changing. Nonetheless, it was possible for a use-after-free error to occur, where vm_fault() wrote to a freed map entry. This change corrects that problem. Reported by: avg Reviewed by: kib MFC after: 3 days Sponsored by: EMC / Isilon Storage Division	2016-07-18 04:20:26 +00:00
alc	694dd903a3	Change the type of the map entry's next_read field from a vm_pindex_t to a vm_offset_t. (This field is used to detect sequential access to the virtual address range represented by the map entry.) There are three reasons to make this change. First, a vm_offset_t is smaller on 32-bit architectures. Consequently, a struct vm_map_entry is now smaller on 32-bit architectures. Second, a vm_offset_t can be written atomically, whereas it may not be possible to write a vm_pindex_t atomically on a 32-bit architecture. Third, using a vm_pindex_t makes the next_read field dependent on which object in the shadow chain is being read from. Replace an "XXX" comment. Reviewed by: kib Approved by: re (gjb) Sponsored by: EMC / Isilon Storage Division	2016-07-07 20:58:16 +00:00
kib	fb6d455926	Change type of the 'dead' variable to boolean. Requested by: alc MFC after: 1 week Approved by: re (gjb)	2016-07-03 00:08:17 +00:00
kib	8ecc9fc9fd	If the vm_fault() handler raced with the vm_object_collapse() sleepable scan, iteration over the shadow chain looking for a page could find an OBJ_DEAD object. Such state of the mapping is only transient, the dead object will be terminated and removed from the chain shortly. We must not return KERN_PROTECTION_FAILURE unless the object type is changed to OBJT_DEAD in the chain, indicating that paging on this address is really impossible. Returning KERN_PROTECTION_FAILURE prematurely causes spurious SIGSEGV delivered to processes, or kernel accesses to UVA spuriously failing with EFAULT. If the object with OBJ_DEAD flag is found, only return KERN_PROTECTION_FAILURE when object type is already OBJT_DEAD. Otherwise, sleep a tick and retry the fault handling. Ideally, we would wait until the OBJ_DEAD flag is resolved, e.g. by waiting until the paging on this object is finished. But to do so, we need to reference the dead object, while vm_object_collapse() insists on owning the final reference on the collapsed object. This could be fixed by e.g. changing the assert to shared reference release between vm_fault() and vm_object_collapse(), but it seems to be too much complications for rare boundary condition. PR: 204426 Tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation X-Differential revision: https://reviews.freebsd.org/D6085 MFC after: 2 weeks Approved by: re (gjb)	2016-06-27 21:54:19 +00:00

1 2 3 4 5 ...

400 Commits