freebsd-skq

Author	SHA1	Message	Date
glebius	a7d547aa97	Fix the KASSERT and improve wording in r282426. Submitted by: alc	2015-05-06 08:07:11 +00:00
glebius	acfa186500	Fix arithmetical bug in vnode_pager_haspage(). The check against object size should be done not with the number of pages in the first block, but with the overall number of pages. While here, add KASSERT that makes sure that BMAP doesn't return completely irrelevant blocks. Reviewed by: kib Tested by: pho Sponsored by: Netflix Sponsored by: Nginx, Inc.	2015-05-04 18:49:25 +00:00
glebius	0a56d25a94	Instead of reading, validating and adjusting value of the vm.swap_async_max in the main swapper work cycle, do it in the sysctl handler. This removes extra mutex acquisition from the main cycle and makes the sysctl knob return error on an invalid value, instead of accepting and fixing it. Reviewed by: kib Sponsored by: Netflix Sponsored by: Nginx, Inc.	2015-05-02 20:27:37 +00:00
jhb	9c4c8b62fb	Remove support for Xen PV domU kernels. Support for HVM domU kernels remains. Xen is planning to phase out support for PV upstream since it is harder to maintain and has more overhead. Modern x86 CPUs include virtualization extensions that support HVM guests instead of PV guests. In addition, the PV code was i386 only and not as well maintained recently as the HVM code. - Remove the i386-only NATIVE option that was used to disable certain components for PV kernels. These components are now standard as they are on amd64. - Remove !XENHVM bits from PV drivers. - Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3, etc.) - Remove duplicate copy of <xen/features.h>. - Remove unused, i386-only xenstored.h. Differential Revision: https://reviews.freebsd.org/D2362 Reviewed by: royger Tested by: royger (i386/amd64 HVM domU and amd64 PVH dom0) Relnotes: yes	2015-04-30 15:48:48 +00:00
scottl	cac1f63fc8	Improve support for blacklisting bad memory locations. The user can supply a text file with a list of physical memory addresses to exclude, and have it loaded at boot time via the provided example in loader.conf. The tunable 'vm.blacklist' remains, but using an external file means that there's no practical limit to the size of the list. This change also improves the scanning algorithm for processing the list, scanning the list only once instead of scanning it for every page in the system. Both the sysctl and the file can be unsorted and contain duplicates so long as each entry is numeric (decimal or hex) and is separated by a space, comma, or newline character. The sysctl 'vm.page_blacklist' is now provided to report what memory locations were successfully excluded. Reviewed by: imp, emax Obtained from: Netflix, Inc. MFC after: 3 days	2015-04-29 15:57:14 +00:00
trasz	802017a04b	Add kern.racct.enable tunable and RACCT_DISABLED config option. The point of this is to be able to add RACCT (with RACCT_DISABLED) to GENERIC, to avoid having to rebuild the kernel to use rctl(8). Differential Revision: https://reviews.freebsd.org/D2369 Reviewed by: kib@ MFC after: 1 month Relnotes: yes Sponsored by: The FreeBSD Foundation	2015-04-29 10:23:02 +00:00
kib	dfce070d88	Do not sleep waiting for the MAP_ENTRY_IN_TRANSITION state ending with the vnode locked. Review: https://reviews.freebsd.org/D2381 Submitted by: Conrad Meyer, Attilio Rao MFC after: 1 week	2015-04-28 08:20:23 +00:00
scottl	ede23391e8	Revert r281451. It causes a panic/hang early in boot for a number of users, myself included. The original code is likely papering over a larger bug that needs to be explored, but for now get things back to a working state. Obtained from: Netflix, Inc. MFC after: immediately	2015-04-24 17:03:53 +00:00
jhb	e4683250d1	Reassign copyright statements on several files from Advanced Computing Technologies LLC to Hudson River Trading LLC. Approved by: Hudson River Trading LLC (who owns ACT LLC) MFC after: 1 week	2015-04-23 14:22:20 +00:00
alc	3b5965fb8f	Eliminate an unused variable. MFC after: 1 week	2015-04-20 16:48:21 +00:00
alc	d6d560db51	Eliminate an unused variable. MFC after: 1 week	2015-04-19 00:29:02 +00:00
kib	2254748ed0	The lseek(2), mmap(2), truncate(2), ftruncate(2), pread(2), and pwrite(2) syscalls are wrapped to provide compatibility with pre-7.x kernels which required padding before the off_t parameter. The fcntl(2) contains compatibility code to handle kernels before the struct flock was changed during the 8.x CURRENT development. The shims were reasonable to allow easier revert to the older kernel at that time. Now, two or three major releases later, shims do not serve any purpose. Such old kernels cannot handle current libc, so revert the compatibility code. Make padded syscalls support conditional under the COMPAT6 config option. For COMPAT32, the syscalls were under COMPAT6 already. Remove WITHOUT_SYSCALL_COMPAT build option, which only purpose was to (partially) disable the removed shims. Reviewed by: jhb, imp (previous versions) Discussed with: peter Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-04-18 21:50:13 +00:00
dchagin	02e5fc3da7	Rework r281162. Indeed, the flexible array member is preferable here. Suggested by: Justin T. Gibbs MFC after: 3 days	2015-04-12 06:21:58 +00:00
alc	f116ec9943	Correct an off-by-one error in vm_reserv_reclaim_contig() that results in an infinite loop. Submitted by: Svatopluk Kraus MFC after: 1 week	2015-04-11 22:57:13 +00:00
glebius	662cdec164	UMA zone limit can be lowered, so remove protection against from the sysctl_handle_uma_zone_max(). Sponsored by: Nginx, Inc.	2015-04-10 06:56:49 +00:00
mav	2e38078077	Remove sleeps from geom_up thread on device destruction. MFC after: 3 days.	2015-04-09 13:09:05 +00:00
jeff	fa9eb8e1ea	- Simplify vm_pageout_scan() by introducing a new vm_pageout_clean() function that does the locking and validation associated with cleaning a page. This moves 150 lines of code into its own function. - Rename vm_pageout_clean() to vm_pageout_cluster() to define what it really does; clustering nearby pages for pageout optimization. Reviewd by: alc, kib, kmacy Tested by: pho (earlier version) Sponsored by: EMC / Isilon	2015-04-07 02:18:52 +00:00
dchagin	fd38dba27d	Properly calculate "UMA Zones" per cpu cache size. Avoid allocating an extra struct uma_cache since the struct uma_zone already has one. PR: 199169 Submitted by: luke.tw gmail com MFC after: 1 week	2015-04-06 18:45:41 +00:00
alc	7028695829	Until the lock assertions in vm_page_advise() are properly reevaluated, vm_fault_dontneed() should acquire a write lock on the first object in the shadow chain. Reported by: gleb, David Wolfskill	2015-04-05 20:07:33 +00:00
dchagin	1e93730b8c	Fix wrong kassert msg in uma. PR: 199172 Submitted by: luke.tw gmail com MFC after: 1 week	2015-04-05 18:25:23 +00:00
alc	e8bbef8d0b	Replace vm_fault()'s heuristic for automatic cache behind with a heuristic that performs the equivalent of an automatic madvise(..., MADV_DONTNEED). The current heuristic, even with the improvements that I made a few years ago, is a good example of making the wrong trade-off, or optimizing for the infrequent case. The infrequent case being reading a single file that is much larger than memory using mmap(2). And, in this case, the page daemon isn't the bottleneck; it's the I/O. In all other cases, the current heuristic has too many false positives, i.e., it caches too many pages that are later reused. To give one example, thousands of pages are cached by the current heuristic during a buildworld and all of them are reactivated before the buildworld completes. In particular, clang reads source files using mmap(2) and there are some relatively large source files in our source tree, e.g., sqlite, that are read multiple times. With the new heuristic, I see fewer false positives and they have a much lower cost. I actually tried something like this more than two years ago and it didn't perform as well as the cache behind heuristic. However, that was before the changes to the page daemon in late summer of 2013 and the existence of pmap_advise(). In particular, with the page daemon doing its work more frequently and in smaller batches, it now completes its work while the application accessing the file is blocked on I/O. Whereas previously, the page daemon appeared to hog the CPU for so long that it caused "hiccups" in the application's execution. Finally, I'll add that the elimination of cache pages is a prerequisite for NUMA support. Reviewed by: jeff, kib Sponsored by: EMC / Isilon Storage Division	2015-04-04 19:10:22 +00:00
rstone	57feb6fb43	Fix integer truncation bug in malloc(9) A couple of internal functions used by malloc(9) and uma truncated a size_t down to an int. This could cause any number of issues (e.g. indefinite sleeps, memory corruption) if any kernel subsystem tried to allocate 2GB or more through malloc. zfs would attempt such an allocation when run on a system with 2TB or more of RAM. Note to self: When this is MFCed, sparc64 needs the same fix. Differential revision: https://reviews.freebsd.org/D2106 Reviewed by: kib Reported by: Michael Fuckner <michael@fuckner.net> Tested by: Michael Fuckner <michael@fuckner.net> MFC after: 2 weeks	2015-04-01 12:42:26 +00:00
glebius	e4390a8823	Catch up on r271387 and remove unused parameter from VOP_GETPAGES_ASYNC().	2015-03-30 22:49:26 +00:00
jeff	2ef1578319	- Eliminate pagequeue locking in the dirty code in vm_pageout_scan(). - Use a more precise series of tests to see if the page changed while we were locking the vnode. Reviewed by: alc Sponsored by: EMC / Isilon	2015-03-28 02:36:49 +00:00
mav	ee2fe1ad5c	Make swapper release orphaned (lost) GEOM provider. Swap device is still reported as enabled, and system still may crash later if some swapped-out kernel pages were lost with the device, but at least GEOM and CAM can now release the lost disk, allowing it to be reconnected. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2015-03-26 17:21:12 +00:00
rpaulo	4a05532670	Add comments about CTLFLAG_RDTUN vs. TUNABLE_INT_FETCH. Requested by: julian	2015-03-26 05:20:18 +00:00
rpaulo	2ef1cdce8e	Use TUNABLE_INT_FETCH for boot_pages. vm.boot_pages is marked as a CTLFLAG_RDTUN, but it's used by the VM before the sysctl subsystem is initialsed. We manually fetch the variable from the environment to work around this problem. Tested by: Keith White kwhite at uottawa.ca MFC after: 1 week	2015-03-24 20:09:55 +00:00
rpaulo	5ab5a7c167	Remove whitespace.	2015-03-24 20:07:27 +00:00
alc	b131a2abc8	Introduce vm_object_color() and use it in mmap(2) to set the color of named objects to zero before the virtual address is selected. Previously, the color setting was delayed until after the virtual address was selected. In rtld, this delay effectively prevented the mapping of a shared library's code section using superpages. Now, for example, we see the first 1 MB of libc's code on armv6 mapped by a superpage after we've gotten through the initial cold misses that bring the first 1 MB of code into memory. (With the page clustering that we perform on read faults, this happens quickly.) Differential Revision: https://reviews.freebsd.org/D2013 Reviewed by: jhb, kib Tested by: Svatopluk Kraus (armv6) MFC after: 6 weeks	2015-03-21 17:56:55 +00:00
alc	2c4b57d486	Fix the root cause of the "vm_reserv_populate: reserv <address> is already promoted" panics. The sequence of events that leads to a panic is rather long and circuitous. First, suppose that process P has a promoted superpage S within vm object O that it can write to. Then, suppose that P forks, which leads to S being write protected. Now, before P's child exits, suppose that P writes to another virtual page within O. Since the pages within O are copy on write, a shadow object for O is created to house the new physical copy of the faulted on virtual page. Then, before P can fault on S, P's child exists. Now, when P faults on S, it will follow the "optimized" path for copy-on-write faults in vm_fault(), wherein the underlying physical page is moved from O to its shadow object rather than allocating a new page and copying the new page's contents from the old page. Moreover, suppose that every 4 KB physical page making up S is moved to the shadow object in this way. However, the optimized path does not move the underlying superpage reservation, which is the root cause of the panics! Ultimately, P performs vm_object_collapse() on O's shadow object, which destroys O and in doing so breaks any reservations still belonging to O. This leaves the reservation underlying S in an inconsistent state: It's simultaneously not in use and promoted. Breaking a reservation does not demote it because I never intended for a promoted reservation to be broken. It makes little sense. Finally, this inconsistency leads to an assertion failure the next time that the reservation is used. The failing assertion does not (currently) exist in FreeBSD 10.x or earlier. There, we will quietly break the promoted reservation. While illogical and unintended, breaking the reservation is essentially harmless. PR: 198163 Reviewed by: kib Tested by: pho X-MFC after: r267213 Sponsored by: EMC / Isilon Storage Division	2015-03-19 01:40:43 +00:00
glebius	398be53682	o Enhance vm_pager_free_nonreq() function: - Allow to call the function with vm object lock held. - Allow to specify reqpage that doesn't match any page in the region, meaning freeing all pages. o Utilize the new function in couple more places in vnode pager. Reviewed by: alc, kib Sponsored by: Netflix Sponsored by: Nginx, Inc.	2015-03-17 19:19:19 +00:00
glebius	df5d850742	Provide a comment explaining r279688. Suggested by: alc	2015-03-16 14:24:47 +00:00
ian	0dd684d23f	Set the SBUF_INCLUDENUL flag in sbuf_new_for_sysctl() so that sysctl strings returned to userland include the nulterm byte. Some uses of sbuf_new_for_sysctl() write binary data rather than strings; clear the SBUF_INCLUDENUL flag after calling sbuf_new_for_sysctl() in those cases. (Note that the sbuf code still automatically adds a nulterm byte in sbuf_finish(), but since it's not included in the length it won't get copied to userland along with the binary data.) Remove explicit adding of a nulterm byte in a couple places now that it gets done automatically by the sbuf drain code. PR: 195668	2015-03-14 17:08:28 +00:00
ian	eae109babf	Revert r279932; this is going to be fixed in the sbuf code instead. PR: 195668	2015-03-14 13:00:37 +00:00
ian	037188bda9	Nullterminate strings returned via sysctl. PR: 195668	2015-03-12 18:06:30 +00:00
glebius	6bbfdd570a	Fix function name in comment.	2015-03-10 13:06:54 +00:00
kib	c539cecb43	Fix function name in the panic message. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-03-08 02:13:46 +00:00
alc	56c9a1b2f6	Correct a typo in vm_object_backing_scan() that originated in r254141. Specifically, change a lock acquire into a lock release. MFC after: 3 days Sponsored by: EMC / Isilon Storage Division	2015-03-07 04:18:40 +00:00
glebius	3a52ecfa66	- In vnode_pager_generic_getpages() use different free counters for synchronous and asynchronous requests. The latter can saturate the I/O and we do not want them to affect regular paging. - Allocate the pbuf at the very beginning of the function, so that if we are low on certain kind of pbufs don't even proceed to BMAP, but sleep. Reviewed by: kib Sponsored by: Nginx, Inc. Sponsored by: Netflix	2015-03-06 14:15:30 +00:00
alc	2ab42594ef	Use RW_NEW rather than calling bzero().	2015-03-01 05:18:02 +00:00
alc	37e48c6e3a	Eliminate a variable that became unused when VFS_LOCK_GIANT() was eliminated. MFC after: 3 days	2015-02-28 19:11:37 +00:00
ngie	d54589fec4	Some minor style(9) fixes (whitespace + comment) MFC after: 3 days	2015-02-17 08:50:26 +00:00
kib	19abfd4698	Update mtime for tmpfs files modified through memory mapping. Similar to UFS, perform updates during syncer scans, which in particular means that tmpfs now performs scan on sync. Also, this means that a mtime update may be delayed up to 30 seconds after the write. The vm_object' OBJ_TMPFS_DIRTY flag for tmpfs swap object is similar to the OBJ_MIGHTBEDIRTY flag for the vnode object, it indicates that object could have been dirtied. Adapt fast page fault handler and vm_object_set_writeable_dirty() to handle OBJ_TMPFS_NODE same as OBJT_VNODE. Reported by: Ronald Klop <ronald-lists@klop.ws> Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-01-28 10:37:23 +00:00
will	2e062600fe	Add vm.panic_on_oom sysctl, which enables those who would rather panic than kill a process, when the system runs out of memory. Defaults to off. Usually, this is most useful when the OOM condition is due to mismanagement of memory, on a system where the applications in question don't respond well to being killed. In theory, if the system is properly managed, it shouldn't be possible to hit this condition. If it does, the panic can be more desirable for some users (since it can be a good means of finding the root cause) rather than killing the largest process and continuing on its merry way. As kib@ mentions in the differential, there is also protect(1), which uses procctl(PROC_SPROTECT) to ensure that some processes are immune. However, a panic approach is still useful in some environments. This is primarily intended as a development/debugging tool. Differential Revision: D1627 Reviewed by: kib MFC after: 1 week	2015-01-24 17:32:45 +00:00
rstone	520ad84555	vmspace_release() may sleep if the last reference is being released, so add a WITNESS_WARN() to catch cases where it is called with a non-sleepable lock held. MFC after: 1 month Sponsored by: Sandvine Inc.	2015-01-24 16:59:38 +00:00
kib	7cbc6347a2	Avoid calling vmspace_free() while owning the process lock. Freeing of an vm space may require obtaining sleepable locks. Hold the process to keep the pointer valid, and change trylock to lock, since there is no longer two process locks owned simultaneously in vm_pageout_oom(). Note that after the process lock is dropped, process might exec, and no longer qualify as the owner of biggest vm space. In collaboration with: rstone Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-24 15:33:42 +00:00
alc	fb32c103c7	Revamp the default page clustering strategy that is used by the page fault handler. For roughly twenty years, the page fault handler has used the same basic strategy: Fetch a fixed number of non-resident pages both ahead and behind the virtual page that was faulted on. Over the years, alternative strategies have been implemented for optimizing the handling of random and sequential access patterns, but the only change to the default strategy has been to increase the number of pages read ahead to 7 and behind to 8. The problem with the default page clustering strategy becomes apparent when you look at how it behaves on the code section of an executable or shared library. (To simplify the following explanation, I'm going to ignore the read that is performed to obtain the header and assume that no pages are resident at the start of execution.) Suppose that we have a code section consisting of 32 pages. Further, suppose that we access pages 4, 28, and 16 in that order. Under the default page clustering strategy, we page fault three times and perform three I/O operations, because the first and second page faults only read a truncated cluster of 12 pages. In contrast, if we access pages 8, 24, and 16 in that order, we only fault twice and perform two I/O operations, because the first and second page faults read a full cluster of 16 pages. In general, truncated clusters are more common than full clusters. To address this problem, this revision changes the default page clustering strategy to align the start of the cluster to a page offset within the vm object that is a multiple of the cluster size. This results in many fewer truncated clusters. Returning to our example, if we now access pages 4, 28, and 16 in that order, the cluster that is read to satisfy the page fault on page 28 will now include page 16. So, the access to page 16 will no longer page fault and perform an I/O operation. Since the revised default page clustering strategy is typically reading more pages at a time, we are likely to read a few more pages that are never accessed. However, for the various programs that we looked at, including clang, emacs, firefox, and openjdk, the reduction in the number of page faults and I/O operations far outweighed the increase in the number of pages that are never accessed. Moreover, the extra resident pages allowed for many more superpage mappings. For example, if we look at the execution of clang during a buildworld, the number of (hard) page faults on the code section drops by 26%, the number of superpage mappings increases by about 29,000, but the number of never accessed pages only increases from 30.38% to 33.66%. Finally, this leads to a small but measureable reduction in execution time. In collaboration with: Emily Pettigrew <ejp1@rice.edu> Differential Revision: https://reviews.freebsd.org/D1500 Reviewed by: jhb, kib MFC after: 6 weeks	2015-01-16 18:17:09 +00:00
kib	79db3369f9	Revert r263475: TDP_DEVMEMIO no longer needed, since amd64 /dev/kmem does not access kernel mappings directly. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-12 08:58:07 +00:00
alc	b48d1f4410	Eliminate a stale debug message. The per-CPU cache locks were replaced by critical sections in r145686. PR: 193254 Submitted by: luke.tw@gmail.com MFC after: 3 days	2014-12-31 17:44:57 +00:00
alc	369e66acd7	The physical memory allocator supports the use of distinct free lists for managing pages from different address ranges. Generally speaking, this feature is used to increase the likelihood that physical pages are available that can meet special DMA requirements or can be accessed through a limited-coverage direct mapping (e.g., MIPS). However, prior to this change, the configuration of the free lists was static, i.e., it was determined at compile time. Consequentally, free lists could be created for address ranges that held no actual pages, for example, on 32-bit MIPS- based systems with 512 MB or less of physical memory. This change makes the creation of the free lists dynamic, i.e., it is based on the available physical memory at boot time. On 64-bit x86-based systems with 64 GB or more of physical memory, create free lists for managing pages with physical addresses below 4 GB. This change is to address reported problems with initializing devices that require the allocation of physical pages below 4 GB on some systems with 128 GB or more of physical memory. PR: 185727 Differential Revision: https://reviews.freebsd.org/D1274 Reviewed by: jhb, kib MFC after: 3 weeks Sponsored by: EMC / Isilon Storage Division	2014-12-31 00:54:38 +00:00

1 2 3 4 5 ...

3550 Commits