freebsd-nq

Author	SHA1	Message	Date
John Baldwin	5fd3f8b3b6	Add stricter checking of some mmap() arguments: - Fail with EINVAL if an invalid protection mask is passed to mmap(). - Fail with EINVAL if an unknown flag is passed to mmap(). - Fail with EINVAL if both MAP_PRIVATE and MAP_SHARED are passed to mmap(). - Require one of either MAP_PRIVATE or MAP_SHARED for non-anonymous mappings. Reviewed by: alc, kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D698	2014-09-15 17:20:13 +00:00
Alan Cox	a7fecb4d3a	Three improvements to vnode_pager_generic_getpages(): Eliminate an exclusive object lock acquisition and release on the expected execution path. Do page zeroing before the object lock is acquired rather than during the time that the object lock is held. Use vm_pager_free_nonreq() to eliminate duplicated code. Reviewed by: kib MFC after: 6 weeks Sponsored by: EMC / Isilon Storage Division	2014-09-15 17:14:09 +00:00
Gleb Smirnoff	be58a555d2	Remove redundant declaration. vnode.h should be included before vnode_pager.h.	2014-09-15 15:49:29 +00:00
Konstantin Belousov	d15b55c554	Provide the unique implementation for the VOP_GETPAGES() method used by ffs and ext2fs. Remove duplicated call to vm_page_zero_invalid(), done by VOP and by vm_pager_getpages(). Use vm_pager_free_nonreq(). Reviewed by: alc (previous version) Sponsored by: The FreeBSD Foundation MFC after: 6 weeks (after r271596)	2014-09-15 12:28:29 +00:00
Alan Cox	396b3e34b4	Avoid an exclusive acquisition of the object lock on the expected execution path through the NFS clients' getpages functions. Introduce vm_pager_free_nonreq(). This function can be used to eliminate code that is duplicated in many getpages functions. Also, in contrast to the code that currently appears in those getpages functions, vm_pager_free_nonreq() avoids acquiring an exclusive object lock in one case. Reviewed by: kib MFC after: 6 weeks Sponsored by: EMC / Isilon Storage Division	2014-09-14 18:07:55 +00:00
Konstantin Belousov	33cad9e936	Fix mis-spelling of bits and types names in the vnode_pager_putpages(). The changes should not modify the generated code. The pager->pgo_putpages() method takes int flags as its fourth argument, while vnode_pager_putpages() used boolean_t (which is typedef'ed to int). The flags are from VM_PAGER_* namespace, while vnode_pager_putpages() passed TRUE and OBJPC_SYNC to VOP_PUTPAGES(), which both are numerically equal to VM_PAGER_PUT_SYNC. Noted and reviewed by: alc (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-09-14 10:27:36 +00:00
Alan Cox	81a065058c	Update a stale comment.	2014-09-11 03:16:57 +00:00
Gleb Smirnoff	27ad26d8c7	Remove unused arguments for VOP_GETPAGES(), VOP_PUTPAGES().	2014-09-10 12:36:41 +00:00
Alan Cox	64f096eeb2	Fix a boundary case error in vm_reserv_alloc_contig(): If a reservation isn't being allocated for the last of the requested pages, because a reservation won't fit in the gap between allocated pages, then the reservation structure shouldn't be initialized. While I'm here, improve the nearby comments. Reported by: jeff, pho MFC after: 1 week Sponsored by: EMC / Isilon Storage Division	2014-09-10 05:52:30 +00:00
Alan Cox	0afcd3af8b	Oops. vm_map_simplify_entry() is used by mac_proc_vm_revoke_recurse(), so it can't be static.	2014-09-08 02:25:01 +00:00
Alan Cox	077ec27cd6	Make two functions static and eliminate an unused #define.	2014-09-08 00:19:03 +00:00
John Baldwin	1a83a822d2	Fix a typo.	2014-08-29 21:20:36 +00:00
Steven Hartland	4d19f4ad1f	Refactor ZFS ARC reclaim logic to be more VM cooperative Prior to this change we triggered ARC reclaim when kmem usage passed 3/4 of the total available, as indicated by vmem_size(kmem_arena, VMEM_ALLOC). This could lead large amounts of unused RAM e.g. on a 192GB machine with ARC the only major RAM consumer, 40GB of RAM would remain unused. The old method has also been seen to result in extreme RAM usage under certain loads, causing poor performance and stalls. We now trigger ARC reclaim when the number of free pages drops below the value defined by the new sysctl vfs.zfs.arc_free_target, which defaults to the value of vm.v_free_target. Credit to Karl Denninger for the original patch on which this update was based. PR: 191510 and 187594 Tested by: dteske MFC after: 1 week Relnotes: yes Sponsored by: Multiplay	2014-08-28 19:50:08 +00:00
Alan Cox	9452b5eda9	Back in the days when the kernel was single threaded, testing "vm_paging_target() > 0" was a reasonable way of determining if the inactive queue scan met its target. However, now that other threads can be allocating pages while the inactive queue scan is running, it's an unreliable method. The effect of it being unreliable is that we can start swapping out processes when we didn't intend to. This issue has existed since the kernel was multithreaded, but the changes to the inactive queue target in 10.0-RELEASE have made its effects visible. This change introduces a more direct method for determining if the inactive queue scan met its target that is not affected by the actions of other threads. Reported by: Steve Polyack Tested by: pho, Steve Polyack (an earlier version) MFC after: 1 week Sponsored by: EMC / Isilon Storage Division	2014-08-26 16:40:20 +00:00
Alan Cox	b9ce8cc2d7	Relax one of the conditions for mapping a page on the fast path. Reviewed by: kib X-MFC with: r270011 Sponsored by: EMC / Isilon Storage Division	2014-08-23 05:24:31 +00:00
Konstantin Belousov	afe55ca373	Implement 'fast path' for the vm page fault handler. Or, it could be called a scalable path. When several preconditions hold, the vm object lock for the object containing the faulted page is taken in read mode, instead of write, which allows parallel faults processing in the region. Namely, the fast path is taken when the faulted page already exists and does not need copy on write, is already fully valid, and not busy. For technical reasons, fast path is avoided when the fault is the first write on the vnode object, or when the fault is for wiring or debugger read or write. On the fast path, pmap_enter(9) is passed the PMAP_ENTER_NOSLEEP flag, since object lock is kept. Pmap might fail to create the entry, in which case the fallback to slow path is performed. Reviewed by: alc Tested by: pho (previous version) Hardware provided and hosted by: The FreeBSD Foundation and Sentex Data Communications Sponsored by: The FreeBSD Foundation MFC after: 2 week	2014-08-15 07:30:14 +00:00
Alan Cox	9f746b66df	Avoid pointless (but harmless) actions on unmanaged pages. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2014-08-14 15:46:15 +00:00
Konstantin Belousov	70978c93b8	If vm_page_grab() allocates a new page, the page is not inserted into page queue even when the allocation is not wired. It is responsibility of the vm_page_grab() caller to ensure that the page does not end on the vm_object queue but not on the pagedaemon queue, which would effectively create unpageable unwired page. In exec_map_first_page() and vm_imgact_hold_page(), activate the page immediately after unbusying it, to avoid leak. In the uiomove_object_page(), deactivate page before the object is unlocked. There is no leak, since the page is deactivated after uiomove_fromphys() finished. But allowing non-queued non-wired page in the unlocked object queue makes it impossible to assert that leak does not happen in other places. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-08-13 05:44:08 +00:00
Konstantin Belousov	afb69e6b3e	Adapt vm_page_aflag_set(PGA_WRITEABLE) to the locking of pmap_enter(PMAP_ENTER_NOSLEEP). The PGA_WRITEABLE flag can be set when either the page is busied, or the owner object is locked. Update comments, move all assertions about page state when PGA_WRITEABLE flag is set, into new helper vm_page_assert_pga_writeable(). Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-08-09 05:00:34 +00:00
Konstantin Belousov	39ffa8c138	Change pmap_enter(9) interface to take flags parameter and superpage mapping size (currently unused). The flags includes the fault access bits, wired flag as PMAP_ENTER_WIRED, and a new flag PMAP_ENTER_NOSLEEP to indicate that pmap should not sleep. For powerpc aim both 32 and 64 bit, fix implementation to ensure that the requested mapping is created when PMAP_ENTER_NOSLEEP is not specified, in particular, wait for the available memory required to proceed. In collaboration with: alc Tested by: nwhitehorn (ppc aim32 and booke) Sponsored by: The FreeBSD Foundation and EMC / Isilon Storage Division MFC after: 2 weeks	2014-08-08 17:12:03 +00:00
Konstantin Belousov	385b4265fc	The vm_pager_page_unswapped() pager op is only implemented for the swap pager. Swap pager uses a private mutex to protect swap metadata, and does not rely on the vm object lock to ensure integrity of it. Weaken the requirement for the vm object lock by only asserting locked object in vm_pager_page_unswapped(), instead of locked exclusively. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-08-06 19:34:03 +00:00
Konstantin Belousov	faaf544760	Add wrappers to assert that vm object is unlocked and for try upgrade. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-08-06 19:30:35 +00:00
Roger Pau Monné	5ebe728d53	vm_phys: improve robustness of fictitious ranges With the current implementation of managed fictitious ranges when also using VM_PHYSSEG_DENSE, a user could try to register a fictitious range that starts inside of vm_page_array, but then overrruns it (because the end of the fictitious range is greater than vm_page_array_size + first_page). This would result in PHYS_TO_VM_PAGE returning unallocated pages from past the end of vm_page_array. The same could happen if a user tried to register a segment that starts outside of vm_page_array but ends inside of it. In order to fix this, allow vm_phys_fictitious_{reg/unreg}_range to use a set of pages from vm_page_array, and allocate the rest. Sponsored by: Citrix Systems R&D Reviewed by: kib, alc vm/vm_phys.c: - Allow registering/unregistering fictitious ranges that overrun vm_page_array.	2014-08-05 10:29:01 +00:00
Alan Cox	a695d9b25b	Retire pmap_change_wiring(). We have never used it to wire virtual pages. We continue to use pmap_enter() for that. For unwiring virtual pages, we now use pmap_unwire(), which unwires a range of virtual addresses instead of a single virtual page. Sponsored by: EMC / Isilon Storage Division	2014-08-03 20:40:51 +00:00
Alan Cox	0b69568411	Rewrite a loop in vm_map_wire() so that gcc doesn't think that the variable "rv" is uninitialized. Reported by: bz	2014-08-02 17:58:20 +00:00
Alan Cox	66cd575b28	Handle wiring failures in vm_map_wire() with the new functions pmap_unwire() and vm_object_unwire(). Retire vm_fault_{un,}wire(), since they are no longer used. (See r268327 and r269134 for the motivation behind this change.) Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2014-08-02 16:10:24 +00:00
Alan Cox	0346250941	When unwiring a region of an address space, do not assume that the underlying physical pages are mapped by the pmap. If, for example, the application has performed an mprotect(..., PROT_NONE) on any part of the wired region, then those pages will no longer be mapped by the pmap. So, using the pmap to lookup the wired pages in order to unwire them doesn't always work, and when it doesn't work wired pages are leaked. To avoid the leak, introduce and use a new function vm_object_unwire() that locates the wired pages by traversing the object and its backing objects. At the same time, switch from using pmap_change_wiring() to the recently introduced function pmap_unwire() for unwiring the region's mappings. pmap_unwire() is faster, because it operates a range of virtual addresses rather than a single virtual page at a time. Moreover, by operating on a range, it is superpage friendly. It doesn't waste time performing unnecessary demotions. Reported by: markj Reviewed by: kib Tested by: pho, jmg (arm) Sponsored by: EMC / Isilon Storage Division	2014-07-26 18:10:18 +00:00
Konstantin Belousov	4bace8e721	Correct assertion. The shadowing object cannot be tmpfs vm object, and tmpfs object cannot shadow. In other words, tmpfs vm object is always at the bottom of the shadow chain. Reported and tested by: bdrewery Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-07-24 10:25:42 +00:00
Konstantin Belousov	f08f7dca40	The OBJ_TMPFS flag of vm_object means that there is unreclaimed tmpfs vnode for the tmpfs node owning this object. The flag is currently used for two purposes. First, it allows to correctly handle VV_TEXT for tmpfs vnode when the ref count on the object is decremented to 1, similar to vnode_pager_dealloc() for regular filesystems. Second, it prevents some operations, which are done on OBJT_SWAP vm objects backing user anonymous memory, but are incorrect for the object owned by tmpfs node. The second kind of use of the OBJ_TMPFS flag is incorrect, since the vnode might be reclaimed, which clears the flag, but vm object operations must still be disallowed. Introduce one more flag, OBJ_TMPFS_NODE, which is permanently set on the object for VREG tmpfs node, and used instead of OBJ_TMPFS to test whether vm object collapse and similar actions should be disabled. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-07-14 09:30:37 +00:00
Roger Pau Monné	38d6b2dcb2	vm_phys: remove limitation on number of fictitious regions The number of vm fictitious regions was limited to 8 by default, but Xen will make heavy usage of those kind of regions in order to map memory from foreign domains, so instead of increasing the default number, change the implementation to use a red-black tree to track vm fictitious ranges. The public interface remains the same. Sponsored by: Citrix Systems R&D Reviewed by: kib, alc Approved by: gibbs vm/vm_phys.c: - Replace the vm fictitious static array with a red-black tree. - Use a rwlock instead of a mutex, since now we also need to take the lock in vm_phys_fictitious_to_vm_page, and it can be shared.	2014-07-09 08:12:58 +00:00
Marcel Moolenaar	e7d939bda2	Remove ia64. This includes: o All directories named ia64 o All files named ia64 o All ia64-specific code guarded by __ia64__ o All ia64-specific makefile logic o Mention of ia64 in comments and documentation This excludes: o Everything under contrib/ o Everything under crypto/ o sys/xen/interface o sys/sys/elf_common.h Discussed at: BSDcan	2014-07-07 00:27:09 +00:00
Alan Cox	09132ba6ac	Introduce pmap_unwire(). It will replace pmap_change_wiring(). There are several reasons for this change: pmap_change_wiring() has never (in my memory) been used to set the wired attribute on a virtual page. We have always used pmap_enter() to do that. Moreover, it is not really safe to use pmap_change_wiring() to set the wired attribute on a virtual page. The description of pmap_change_wiring() says that it assumes the existence of a mapping in the pmap. However, non-wired mappings may be reclaimed by the pmap at any time. (See pmap_collect().) Many implementations of pmap_change_wiring() will crash if the mapping does not exist. pmap_unwire() accepts a range of virtual addresses, whereas pmap_change_wiring() acts upon a single virtual page. Since we are typically unwiring a range of virtual addresses, pmap_unwire() will be more efficient. Moreover, pmap_unwire() allows us to unwire superpage mappings. Previously, we were forced to demote the superpage mapping, because pmap_change_wiring() only allowed us to express the unwiring of a single base page mapping at a time. This added to the overhead of unwiring for large ranges of addresses, including the implicit unwiring that occurs at process termination. Implementations for arm and powerpc will follow. Discussed with: jeff, marcel Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2014-07-06 17:42:38 +00:00
Hans Petter Selasky	af3b2549c4	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
Glen Barber	37a107a407	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
Hans Petter Selasky	3da1cf1e88	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
Alan Cox	60169c88d9	Delay the call to crhold() in vm_map_insert() until we know that we won't have to undo it by calling crfree(). This reduces the total number of calls by vm_map_insert() to crhold() and crfree() by 45% in my tests. Eliminate an unnecessary variable from vm_map_insert(). Reviewed by: kib Tested by: pho	2014-06-26 16:04:03 +00:00
Alan Cox	eaaf9f7fce	Now that vm_map_insert() sets MAP_ENTRY_GROWS_{DOWN,UP} on the stack entries that it creates (r267645), we can place the check that blocks map entry coalescing on stack entries in vm_map_simplify_entry() where it properly belongs. Reviewed by: kib	2014-06-25 03:30:03 +00:00
Konstantin Belousov	b5f8c226ab	Use correct names for the flags. MAP_ENTRY_GROWS_* have the same numerical values as MAP_STACK_GROWS_*, but the former is for entries' eflags, while the later for the cow argument of vm_map_insert(). Submitted by: alc	2014-06-23 07:03:47 +00:00
Konstantin Belousov	5831f5fc52	Assert that the new entry is inserted into the right location in the map entries list, and that it does not overlap with the previous and next entries. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-06-20 07:01:53 +00:00
Alan Cox	39c18ce157	Eliminate a pointless call to vm_map_clip_start() from vm_map_growstack(). For this call to do anything at all we would have to have two overlapping map entries. Submitted by: kib	2014-06-19 21:05:07 +00:00
Alan Cox	712efe66e2	When MAP_STACK_GROWS_{DOWN,UP} are passed to vm_map_insert() set the corresponding flag(s) in the new map entry. Previously, the caller was responsible for setting them after vm_map_insert() returned. Pass MAP_STACK_GROWS_DOWN to vm_map_insert() from vm_map_growstack() when extending the stack in the downward direction. Together these changes slightly simplify the caller's task when creating a downward growing stack. In particular, the caller no longer needs to clip the previous entry, because the new stack entry can't possibly coalesce with the previous entry. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2014-06-19 16:26:16 +00:00
Konstantin Belousov	11c42bcc54	Add MAP_EXCL flag for mmap(2). It should be combined with MAP_FIXED, and prevents the request from deleting existing mappings in the region, failing instead. Reviewed by: alc Discussed with: jhb Tested by: markj, pho (previous version, as part of the bigger patch) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-06-19 05:00:39 +00:00
Attilio Rao	3ae10f7477	- Modify vm_page_unwire() and vm_page_enqueue() to directly accept the queue where to enqueue pages that are going to be unwired. - Add stronger checks to the enqueue/dequeue for the pagequeues when adding and removing pages to them. Of course, for unmanaged pages the queue parameter of vm_page_unwire() will be ignored, just as the active parameter today. This makes adding new pagequeues quicker. This change effectively modifies the KPI. __FreeBSD_version will be, however, bumped just when the full cache of free pages will be evicted. Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho	2014-06-16 18:15:27 +00:00
Alan Cox	33314db034	Tidy up the early parts of vm_map_insert(), in particular, simplify one of the assertions and eliminate a comment that has grown stale. Reviewed by: kib MFC after: 1 week	2014-06-16 16:37:41 +00:00
Alan Cox	e1f92ccc73	One of the intentions behind r267254 was that the global variable "sgrowsiz" would be read once and cached in a local variable so that the resource limit check and map entry insertion would be guaranteed to use the same value. However, the value being passed to vm_map_insert() is still from "sgrowsiz" and not the local variable. Correct this oversight. Reviewed by: kib	2014-06-15 07:52:59 +00:00
Alexander Motin	1aa6c75827	Introduce new "256 Bucket" zone to split requests and reduce congestion on "128 Bucket" zone lock. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2014-06-12 11:57:07 +00:00
Alexander Motin	20d3ab87cd	Allocating new bucket for bucket zone, never take it from the zone itself, since it will almost certanly fail. Take next bigger zone instead. This situation should not happen with original bucket zones configuration: "32 Bucket" zone uses "64 Bucket" and vice versa. But if "64 Bucket" zone lock is congested, zone may grow its bucket size and start biting itself. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2014-06-12 11:36:22 +00:00
Alan Cox	3180f7573a	Correct a bug in the management of the population map on big-endian machines. Specifically, there was a mismatch between how the routine allocation and deallocation operations accessed the population map and how the aggressively optimized reservation-breaking operation accessed it. So, problems only occurred when reservations were broken. This change makes the routine operations access the population map in the same way as the reservation breaking operation. This bug was introduced in r259999. PR: 187080 Tested by: jmg (on an "armeb" machine) Sponsored by: EMC / Isilon Storage Division	2014-06-11 16:11:12 +00:00
Konstantin Belousov	4648ba0a0f	Make mmap(MAP_STACK) search for the available address space, similar to !MAP_STACK mapping requests. For MAP_STACK \| MAP_FIXED, clear any mappings which could previously exist in the used range. For this, teach vm_map_find() and vm_map_fixed() to handle MAP_STACK_GROWS_DOWN or _UP cow flags, by calling a new vm_map_stack_locked() helper, which is factored out from vm_map_stack(). The side effect of the change is that MAP_STACK started obeying MAP_ALIGNMENT and MAP_32BIT flags. Reported by: rwatson Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-06-09 03:37:41 +00:00
Alan Cox	dd05fa1945	Add a page size field to struct vm_page. Increase the page size field when a partially populated reservation becomes fully populated, and decrease this field when a fully populated reservation becomes partially populated. Use this field to simplify the implementation of pmap_enter_object() on amd64, arm, and i386. On all architectures where we support superpages, the cost of creating a superpage mapping is roughly the same as creating a base page mapping. For example, both kinds of mappings entail the creation of a single PTE and PV entry. With this in mind, use the page size field to make the implementation of vm_map_pmap_enter(..., MAP_PREFAULT_PARTIAL) a little smarter. Previously, if MAP_PREFAULT_PARTIAL was specified to vm_map_pmap_enter(), that function would only map base pages. Now, it will create up to 96 base page or superpage mappings. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2014-06-07 17:12:26 +00:00

1 2 3 4 5 ...

3261 Commits