freebsd-dev

Author	SHA1	Message	Date
Glen Barber	37a107a407	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
Hans Petter Selasky	3da1cf1e88	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
Attilio Rao	3ae10f7477	- Modify vm_page_unwire() and vm_page_enqueue() to directly accept the queue where to enqueue pages that are going to be unwired. - Add stronger checks to the enqueue/dequeue for the pagequeues when adding and removing pages to them. Of course, for unmanaged pages the queue parameter of vm_page_unwire() will be ignored, just as the active parameter today. This makes adding new pagequeues quicker. This change effectively modifies the KPI. __FreeBSD_version will be, however, bumped just when the full cache of free pages will be evicted. Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho	2014-06-16 18:15:27 +00:00
Alan Cox	dd05fa1945	Add a page size field to struct vm_page. Increase the page size field when a partially populated reservation becomes fully populated, and decrease this field when a fully populated reservation becomes partially populated. Use this field to simplify the implementation of pmap_enter_object() on amd64, arm, and i386. On all architectures where we support superpages, the cost of creating a superpage mapping is roughly the same as creating a base page mapping. For example, both kinds of mappings entail the creation of a single PTE and PV entry. With this in mind, use the page size field to make the implementation of vm_map_pmap_enter(..., MAP_PREFAULT_PARTIAL) a little smarter. Previously, if MAP_PREFAULT_PARTIAL was specified to vm_map_pmap_enter(), that function would only map base pages. Now, it will create up to 96 base page or superpage mappings. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2014-06-07 17:12:26 +00:00
Bryan Drewery	44f1c91610	Rename global cnt to vm_cnt to avoid shadowing. To reduce the diff struct pcu.cnt field was not renamed, so PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in kvm(3) and vmstat(8). The goal was to not affect externally used KPI. Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the the global cnt variable. Exp-run revealed no ports using it directly. No objection from: arch@ Sponsored by: EMC / Isilon Storage Division	2014-03-22 10:26:09 +00:00
Alan Cox	793d14076a	In an effort to diagnose possible corruption of struct vm_page on some sparc64 machines make the page queue assert in vm_page_dequeue() more precise. While I'm here switch the page lock assert to the newer style.	2014-01-24 19:08:42 +00:00
Alan Cox	000fb817d8	Since the introduction of the popmap to reservations in r259999, there is no longer any need for the page's PG_CACHED and PG_FREE flags to be set and cleared while the free page queues lock is held. Thus, vm_page_alloc(), vm_page_alloc_contig(), and vm_page_alloc_freelist() can wait until after the free page queues lock is released to clear the page's flags. Moreover, the PG_FREE flag can be retired. Now that the reservation system no longer uses it, its only uses are in a few assertions. Eliminating these assertions is no real loss. Other assertions catch the same types of misbehavior, like doubly freeing a page (see r260032) or dirtying a free page (free pages are invalid and only valid pages can be dirtied). Eliminate an unneeded variable from vm_page_alloc_contig(). Sponsored by: EMC / Isilon Storage Division	2013-12-31 18:25:15 +00:00
Alan Cox	703b304f33	Eliminate a redundant parameter to vm_radix_replace(). Improve the wording of the comment describing vm_radix_replace(). Reviewed by: attilio MFC after: 6 weeks Sponsored by: EMC / Isilon Storage Division	2013-12-08 20:07:02 +00:00
Konstantin Belousov	9eab548476	PG_SLAB no longer serves a useful purpose, since m->object is no longer abused to store pointer to slab. Remove it. Reviewed by: alc Sponsored by: The FreeBSD Foundation Approved by: re (hrs)	2013-09-17 07:35:26 +00:00
Konstantin Belousov	3846a82284	Remove zero-copy sockets code. It only worked for anonymous memory, and the equivalent functionality is now provided by sendfile(2) over posix shared memory filedescriptor. Remove the cow member of struct vm_page, and rearrange the remaining members. While there, make hold_count unsigned. Requested and reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation Approved by: re (delphij)	2013-09-16 06:25:54 +00:00
Konstantin Belousov	196beb5359	If the last page of the file is partially full and whole valid portion is invalidated, invalidate the whole page. Otherwise, partially valid page appears on a page queue, which is wrong. This could only happen for the last page, because only then buffer which triggered invalidation could not cover the whole page. Reported and tested by: pho (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation Approved by: re (delphij) MFC after: 2 weeks	2013-09-14 10:11:38 +00:00
Konstantin Belousov	7a4b2bc56c	The vm_page_trysbusy() should not fail when shared busy counter or VPB_BIT_WAITERS flag were changed between reading of busy_lock and the cas. The vm_page_sbusy(), which is the only user of vm_page_trysbusy() in the tree, panics on the failure, which in these cases is transient and do not mean that the current page state prevents sbusying. Retry the operation inside vm_page_trysbusy() if cas failed, only return a failure when VPB_BIT_SHARED is cleared. Reported and tested by: pho Reviewed by: attilio Sponsored by: The FreeBSD Foundation	2013-09-05 12:54:40 +00:00
Alan Cox	51321f7c31	Significantly reduce the cost, i.e., run time, of calls to madvise(..., MADV_DONTNEED) and madvise(..., MADV_FREE). Specifically, introduce a new pmap function, pmap_advise(), that operates on a range of virtual addresses within the specified pmap, allowing for a more efficient implementation of MADV_DONTNEED and MADV_FREE. Previously, the implementation of MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as pmap_clear_reference(). Intuitively, the problem with this implementation is that the pmap-level locks are acquired and released and the page table traversed repeatedly, once for each resident page in the range that was specified to madvise(2). A more subtle flaw with the previous implementation is that pmap_clear_reference() would clear the reference bit on all mappings to the specified page, not just the mapping in the range specified to madvise(2). Since our malloc(3) makes heavy use of madvise(2), this change can have a measureable impact. For example, the system time for completing a parallel "buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%. Note: This change only contains pmap_advise() implementations for a subset of our supported architectures. I will commit implementations for the remaining architectures after further testing. For now, a stub function is sufficient because of the advisory nature of pmap_advise(). Discussed with: jeff, jhb, kib Tested by: pho (i386), marcel (ia64) Sponsored by: EMC / Isilon Storage Division	2013-08-29 15:49:05 +00:00
Gleb Smirnoff	133dae887b	Remove comment that is no longer relevant since r254182.	2013-08-26 14:14:25 +00:00
Alan Cox	776cad90ff	Addendum to r254141: The call to vm_radix_insert() in vm_page_cache() can reclaim the last preexisting cached page in the object, resulting in a call to vdrop(). Detect this scenario so that the vnode's hold count is correctly maintained. Otherwise, we panic. Reported by: scottl Tested by: pho Discussed with: attilio, jeff, kib	2013-08-23 17:27:12 +00:00
Konstantin Belousov	5944de8ecd	Remove the deprecated VM_ALLOC_RETRY flag for the vm_page_grab(9). The flag was mandatory since r209792, where vm_page_grab(9) was changed to only support the alloc retry semantic. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation	2013-08-22 07:39:53 +00:00
Alan Cox	28a288cbaa	Addendum to r254141: Allow recursion on the free pages queues lock in vm_page_alloc_freelist(). Reported and tested by: sbruno Sponsored by: EMC / Isilon Storage Division	2013-08-21 15:31:43 +00:00
Attilio Rao	a834cbaec8	On the recovery path for vm_page_alloc(), if a page had been requested wired, unwind back the wiring bits otherwise we can end up freeing a page that is considered wired. Sponsored by: EMC / Isilon storage division Reported by: alc	2013-08-15 11:01:25 +00:00
Jeff Roberson	d9e232109f	Improve pageout flow control to wakeup more frequently and do less work while maintaining better LRU of active pages. - Change v_free_target to include the quantity previously represented by v_cache_min so we don't need to add them together everywhere we use them. - Add a pageout_wakeup_thresh that sets the free page count trigger for waking the page daemon. Set this 10% above v_free_min so we wakeup before any phase transitions in vm users. - Adjust down v_free_target now that we're willing to accept more pagedaemon wakeups. This means we process fewer pages in one iteration as well, leading to shorter lock hold times and less overall disruption. - Eliminate vm_pageout_page_stats(). This was a minor variation on the PQ_ACTIVE segment of the normal pageout daemon. Instead we now process 1 / vm_pageout_update_period pages every second. This causes us to visit the whole active list every 60 seconds. Previously we would only maintain the active LRU when we were short on pages which would mean it could be woefully out of date. Reviewed by: alc (slight variant of this) Discussed with: alc, kib, jhb Sponsored by: EMC / Isilon Storage Division	2013-08-13 21:56:16 +00:00
Attilio Rao	6006884122	Correct the recovery logic in vm_page_alloc_contig: what is really needed on this code snipped is that all the pages that are already fully inserted gets fully freed, while for the others the object removal itself might be skipped, hence the object might be set to NULL. Sponsored by: EMC / Isilon storage division Reported by: alc, kib Reviewed by: alc	2013-08-11 21:15:04 +00:00
Konstantin Belousov	c325e866f4	Different consumers of the struct vm_page abuse pageq member to keep additional information, when the page is guaranteed to not belong to a paging queue. Usually, this results in a lot of type casts which make reasoning about the code correctness harder. Sometimes m->object is used instead of pageq, which could cause real and confusing bugs if non-NULL m->object is leaked. See r141955 and r253140 for examples. Change the pageq member into a union containing explicitly-typed members. Use them instead of type-punning or abusing m->object in x86 pmaps, uma and vm_page_alloc_contig(). Requested and reviewed by: alc Sponsored by: The FreeBSD Foundation	2013-08-10 17:36:42 +00:00
John Baldwin	cdc00bf7d2	Revert the addition of VPO_BUSY and instead update vm_page_replace() to properly unbusy the page. Submitted by: alc	2013-08-09 21:14:55 +00:00
Attilio Rao	e946b94934	On all the architectures, avoid to preallocate the physical memory for nodes used in vm_radix. On architectures supporting direct mapping, also avoid to pre-allocate the KVA for such nodes. In order to do so make the operations derived from vm_radix_insert() to fail and handle all the deriving failure of those. vm_radix-wise introduce a new function called vm_radix_replace(), which can replace a leaf node, already present, with a new one, and take into account the possibility, during vm_radix_insert() allocation, that the operations on the radix trie can recurse. This means that if operations in vm_radix_insert() recursed vm_radix_insert() will start from scratch again. Sponsored by: EMC / Isilon storage division Reviewed by: alc (older version) Reviewed by: jeff Tested by: pho, scottl	2013-08-09 11:28:55 +00:00
Attilio Rao	c7aebda8a1	The soft and hard busy mechanism rely on the vm object lock to work. Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl	2013-08-09 11:11:11 +00:00
Konstantin Belousov	449c2e92c9	Split the pagequeues per NUMA domains, and split pageademon process into threads each processing queue in a single domain. The structure of the pagedaemons and queues is kept intact, most of the changes come from the need for code to find an owning page queue for given page, calculated from the segment containing the page. The tie between NUMA domain and pagedaemon thread/pagequeue split is rather arbitrary, the multithreaded daemon could be allowed for the single-domain machines, or one domain might be split into several page domains, to further increase concurrency. Right now, each pagedaemon thread tries to reach the global target, precalculated at the start of the pass. This is not optimal, since it could cause excessive page deactivation and freeing. The code should be changed to re-check the global page deficit state in the loop after some number of iterations. The pagedaemons reach the quorum before starting the OOM, since one thread inability to meet the target is normal for split queues. Only when all pagedaemons fail to produce enough reusable pages, OOM is started by single selected thread. Launder is modified to take into account the segments layout with regard to the region for which cleaning is performed. Based on the preliminary patch by jeff, sponsored by EMC / Isilon Storage Division. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation	2013-08-07 16:36:38 +00:00
Konstantin Belousov	3abeb8113d	In the vm_page_set_invalid() function, do not assert that the page is not busy, since its only caller brelse() can legitimately call it on busy page. This happens for VOP_PUTPAGES() on filesystems that use buffers and which VOP_WRITE() method marked the buffer containing page as non-cacheable. Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-07-11 05:38:39 +00:00
Neel Natu	6b5fbc1225	vm_phys_fictitious_reg_range() was losing the 'memattr' because it would be reset by pmap_page_init() right after being initialized in vm_page_initfake(). The statement above is with reference to the amd64 implementation of pmap_page_init(). Fix this by calling 'pmap_page_init()' in 'vm_page_initfake()' before changing the 'memattr'. Reviewed by: kib MFC after: 2 weeks	2013-07-03 23:38:37 +00:00
Gleb Smirnoff	4aa4cd8e92	Typo in comment.	2013-06-24 13:36:16 +00:00
Alan Cox	2051980f97	Revise the interface between vm_object_madvise() and vm_page_dontneed() so that pointless calls to pmap_is_modified() can be easily avoided when performing madvise(..., MADV_FREE). Sponsored by: EMC / Isilon Storage Division	2013-06-10 01:48:21 +00:00
Konstantin Belousov	be6ec55376	Remove irrelevant comments. Discussed with: alc MFC after: 3 days	2013-06-03 17:30:40 +00:00
Alan Cox	b417181250	Require that the page lock is held, instead of the object lock, when clearing the page's PGA_REFERENCED flag. Since we are typically manipulating the page's act_count field when we are clearing its PGA_REFERENCED flag, the page lock is already held everywhere that we clear the PGA_REFERENCED flag. So, in fact, this revision only changes some comments and an assertion. Nonetheless, it will enable later changes to object locking in the pageout code. Introduce vm_page_assert_locked(), which completely hides the implementation details of the page lock from the caller, and use it in vm_page_aflag_clear(). (The existing vm_page_lock_assert() could not be used in vm_page_aflag_clear().) Over the coming weeks, I expect that we'll either eliminate or replace the various uses of vm_page_lock_assert() with vm_page_assert_locked(). Reviewed by: attilio Sponsored by: EMC / Isilon Storage Division	2013-06-03 01:22:54 +00:00
Alan Cox	b4e498071d	Now that access to the page's "act_count" field is synchronized by the page lock instead of the object lock, there is no reason for vm_page_activate() to assert that the object is locked for either read or write access. (The "VPO_UNMANAGED" flag never changes after page allocation.) Sponsored by: EMC / Isilon Storage Division	2013-06-01 20:32:34 +00:00
Attilio Rao	9af6d512f5	o Relax locking assertions for vm_page_find_least() o Relax locking assertions for pmap_enter_object() and add them also to architectures that currently don't have any o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade operation on the per-object rwlock o Use all the mechanisms above to make vm_map_pmap_enter() to work mostl of the times only with readlocks. Sponsored by: EMC / Isilon storage division Reviewed by: alc	2013-05-21 20:38:19 +00:00
Konstantin Belousov	4fab678be2	Add ddb command 'show pginfo' which provides useful information about a vm page, denoted either by an address of the struct vm_page, or, if the '/p' modifier is specified, by a physical address of the corresponding frame. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-05-21 11:04:00 +00:00
Alan Cox	767a6420bc	Relax the object locking assertion in vm_page_lookup(). Now that a radix tree is used to maintain the object's collection of resident pages, vm_page_lookup() no longer needs an exclusive lock. Reviewed by: attilio Sponsored by: EMC / Isilon Storage Division	2013-05-17 18:49:43 +00:00
Peter Wemm	df839389c5	Bandaid for compiling with gcc, which happens to be the default compiler for a number of platforms still.	2013-05-13 07:09:31 +00:00
Alan Cox	404eb1b3fd	Refactor vm_page_alloc()'s interactions with vm_reserv_alloc_page() and vm_page_insert() so that (1) vm_radix_lookup_le() is never called while the free page queues lock is held and (2) vm_radix_lookup_le() is called at most once. This change reduces the average time that the free page queues lock is held by vm_page_alloc() as well as vm_page_alloc()'s average overall running time. Sponsored by: EMC / Isilon Storage Division	2013-05-12 16:50:18 +00:00
Attilio Rao	774d251d99	Sync back vmcontention branch into HEAD: Replace the per-object resident and cached pages splay tree with a path-compressed multi-digit radix trie. Along with this, switch also the x86-specific handling of idle page tables to using the radix trie. This change is supposed to do the following: - Allowing the acquisition of read locking for lookup operations of the resident/cached pages collections as the per-vm_page_t splay iterators are now removed. - Increase the scalability of the operations on the page collections. The radix trie does rely on the consumers locking to ensure atomicity of its operations. In order to avoid deadlocks the bisection nodes are pre-allocated in the UMA zone. This can be done safely because the algorithm needs at maximum one new node per insert which means the maximum number of the desired nodes is the number of available physical frames themselves. However, not all the times a new bisection node is really needed. The radix trie implements path-compression because UFS indirect blocks can lead to several objects with a very sparse trie, increasing the number of levels to usually scan. It also helps in the nodes pre-fetching by introducing the single node per-insert property. This code is not generalized (yet) because of the possible loss of performance by having much of the sizes in play configurable. However, efforts to make this code more general and then reusable in further different consumers might be really done. The only KPI change is the removal of the function vm_page_splay() which is now reaped. The only KBI change, instead, is the removal of the left/right iterators from struct vm_page, which are now reaped. Further technical notes broken into mealpieces can be retrieved from the svn branch: http://svn.freebsd.org/base/user/attilio/vmcontention/ Sponsored by: EMC / Isilon storage division In collaboration with: alc, jeff Tested by: flo, pho, jhb, davide Tested by: ian (arm) Tested by: andreast (powerpc)	2013-03-18 00:25:02 +00:00
Attilio Rao	4bc80a3402	Simplify vm_page_is_valid(). Sponsored by: EMC / Isilon storage division Reviewed by: alc	2013-03-12 12:20:49 +00:00
Alan Cox	34496b53ee	Update a comment: The object lock is no longer a mutex.	2013-03-09 21:32:24 +00:00
Attilio Rao	89f6b8632c	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho	2013-03-09 02:32:23 +00:00
Attilio Rao	c934116100	Merge from vmc-playground: Introduce a new KPI that verifies if the page cache is empty for a specified vm_object. This KPI does not make assumptions about the locking in order to be used also for building assertions at init and destroy time. It is mostly used to hide implementation details of the page cache. Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: alc (vm_radix based version) Tested by: flo, pho, jhb, davide	2013-03-09 02:05:29 +00:00
Attilio Rao	0dde287b20	Wrap the sleeps synchronized by the vm_object lock into the specific macro VM_OBJECT_SLEEP(). This hides some implementation details like the usage of the msleep() primitive and the necessity to access to the lock address directly. For this reason VM_OBJECT_MTX() macro is now retired. Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho	2013-02-26 17:22:08 +00:00
Alan Cox	2863482058	In the past four years, we've added two new vm object types. Each time, similar changes had to be made in various places throughout the machine- independent virtual memory layer to support the new vm object type. However, in most of these places, it's actually not the type of the vm object that matters to us but instead certain attributes of its pages. For example, OBJT_DEVICE, OBJT_MGTDEVICE, and OBJT_SG objects contain fictitious pages. In other words, in most of these places, we were testing the vm object's type to determine if it contained fictitious (or unmanaged) pages. To both simplify the code in these places and make the addition of future vm object types easier, this change introduces two new vm object flags that describe attributes of the vm object's pages, specifically, whether they are fictitious or unmanaged. Reviewed and tested by: kib	2012-12-09 00:32:38 +00:00
Alan Cox	0d69690e8f	Correct an error in r230623. When both VM_ALLOC_NODUMP and VM_ALLOC_ZERO were specified to vm_page_alloc(), PG_NODUMP wasn't being set on the allocated page when it happened to be pre-zeroed. MFC after: 5 days	2012-11-21 06:26:18 +00:00
Alan Cox	8d22020384	Replace the single, global page queues lock with per-queue locks on the active and inactive paging queues. Reviewed by: kib	2012-11-13 02:50:39 +00:00
Alan Cox	9fc4739d2a	In general, we call pmap_remove_all() before calling vm_page_cache(). So, the call to pmap_remove_all() within vm_page_cache() is usually redundant. This change eliminates that call to pmap_remove_all() and introduces a call to pmap_remove_all() before vm_page_cache() in the one place where it didn't already exist. When iterating over a paging queue, if the object containing the current page has a zero reference count, then the page can't have any managed mappings. So, a call to pmap_remove_all() is pointless. Change a panic() call in vm_page_cache() to a KASSERT(). MFC after: 6 weeks	2012-11-01 16:20:02 +00:00
Attilio Rao	4ceaf45de5	Rework the known mutexes to benefit about staying on their own cache line in order to avoid manual frobbing but using struct mtx_padalign. The sole exception being nvme and sxfge drivers, where the author redefined CACHE_LINE_SIZE manually, so they need to be analyzed and dealt with separately. Reviwed by: jimharris, alc	2012-10-31 18:07:18 +00:00
Alan Cox	081a488159	Replace the page hold queue, PQ_HOLD, by a new page flag, PG_UNHOLDFREE, because the queue itself serves no purpose. When a held page is freed, inserting the page into the hold queue has the side effect of setting the page's "queue" field to PQ_HOLD. Later, when the page is unheld, it will be freed because the "queue" field is PQ_HOLD. In other words, PQ_HOLD is used as a flag, not a queue. So, this change replaces it with a flag. To accomodate the new page flag, make the page's "flags" field wider and "oflags" field narrower. Reviewed by: kib	2012-10-29 06:15:04 +00:00
Alan Cox	7ecfabc7bb	Move vm_page_requeue() to the only file that uses it. MFC after: 3 weeks	2012-10-13 20:19:43 +00:00
Alan Cox	9af47af64a	Eliminate the conditional for releasing the page queues lock in vm_page_sleep(). vm_page_sleep() is no longer called with this lock held. Eliminate assertions that the page queues lock is NOT held. These assertions won't translate well to having distinct locks on the active and inactive page queues, and they really aren't that useful. MFC after: 3 weeks	2012-10-13 18:46:46 +00:00
Alan Cox	4db2c4b8c7	Tidy up a bit: Update some of the comments. In particular, use "sleep" in preference to "block" where appropriate. Eliminate some unnecessary casts. Make a few whitespace changes for consistency. Reviewed by: kib MFC after: 3 days	2012-10-03 05:06:45 +00:00
Konstantin Belousov	b6c00483e9	Do not leave invalid pages in the object after the short read for a network file systems (not only NFS proper). Short reads cause pages other then the requested one, which were not filled by read response, to stay invalid. Change the vm_page_readahead_finish() interface to not take the error code, but instead to make a decision to free or to (de)activate the page only by its validity. As result, not requested invalid pages are freed even if the read RPC indicated success. Noted and reviewed by: alc MFC after: 1 week	2012-08-14 11:45:47 +00:00
Konstantin Belousov	0055cbd3c5	Reduce code duplication and exposure of direct access to struct vm_page oflags by providing helper function vm_page_readahead_finish(), which handles completed reads for pages with indexes other then the requested one, for VOP_GETPAGES(). Reviewed by: alc MFC after: 1 week	2012-08-04 18:16:43 +00:00
Alan Cox	369763e31a	Inline vm_page_aflags_clear() and vm_page_aflags_set(). Add comments stating that neither these functions nor the flags that they are used to manipulate are part of the KBI.	2012-08-03 01:48:15 +00:00
Alan Cox	da1ab8a4a0	Correct vm_page_alloc_contig()'s implementation of VM_ALLOC_NODUMP.	2012-07-17 02:36:59 +00:00
Alan Cox	e30df26e7b	Add new pmap layer locks to the predefined lock order. Change the names of a few existing VM locks to follow a consistent naming scheme.	2012-06-27 03:45:25 +00:00
Alan Cox	eddc92918e	Selectively inline vm_page_dirty().	2012-06-20 23:25:47 +00:00
Alan Cox	6031c68de4	The page flag PGA_WRITEABLE is set and cleared exclusively by the pmap layer, but it is read directly by the MI VM layer. This change introduces pmap_page_is_write_mapped() in order to completely encapsulate all direct access to PGA_WRITEABLE in the pmap layer. Aesthetics aside, I am making this change because amd64 will likely begin using an alternative method to track write mappings, and having pmap_page_is_write_mapped() in place allows me to make such a change without further modification to the MI VM layer. As an added bonus, tidy up some nearby comments concerning page flags. Reviewed by: kib MFC after: 6 weeks	2012-06-16 18:56:19 +00:00
Andrew Turner	c415e17250	Fix booting on ARM. In PHYS_TO_VM_PAGE() when VM_PHYSSEG_DENSE is set the check if we are past the end of vm_page_array was incorrect causing it to return NULL. This value is then used in vm_phys_add_page causing a data abort. Reviewed by: alc, kib, imp Tested by: stas	2012-05-22 07:04:23 +00:00
Nathan Whitehorn	ccc4a5c761	Replace the list of PVOs owned by each PMAP with an RB tree. This simplifies range operations like pmap_remove() and pmap_protect() as well as allowing simple operations like pmap_extract() not to involve any global state. This substantially reduces lock coverages for the global table lock and improves concurrency.	2012-05-20 14:33:28 +00:00
Konstantin Belousov	b6de32bd9b	Add a facility to register a range of physical addresses to be used for allocation of fictitious pages, for which PHYS_TO_VM_PAGE() returns proper fictitious vm_page_t. The range should be de-registered after consumer stopped using it. De-inline the PHYS_TO_VM_PAGE() since it now carries code to iterate over registered ranges. A hash container might be developed instead of range registration interface, and fake pages could be put automatically into the hash, were PHYS_TO_VM_PAGE() could look them up later. This should be considered before the MFC of the commit is done. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month	2012-05-12 20:42:56 +00:00
Konstantin Belousov	e461aae747	Split the code from vm_page_getfake() to initialize the fake page struct vm_page into new interface vm_page_initfake(). Handle the case of fake page re-initialization with changed memattr. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month	2012-05-12 20:34:22 +00:00
Konstantin Belousov	116c213502	Assert that the page passed to vm_page_putfake() is unmanaged. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month	2012-05-12 20:27:51 +00:00
Konstantin Belousov	0c26bb71f6	Make the vm_page_array_size long. Remove redundand zero initialization for vm_page_array_size and nearby variablees. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month	2012-05-12 20:03:06 +00:00
Nathan Whitehorn	0b852c03eb	Avoid a lock order reversal in pmap_extract_and_hold() from relocking the page. This PMAP requires an additional lock besides the PMAP lock in pmap_extract_and_hold(), which vm_page_pa_tryrelock() did not release. Suggested by: kib MFC after: 4 days	2012-04-22 17:58:30 +00:00
Alan Cox	2aa163dc57	As documented in vm_page.h, updates to the vm_page's flags no longer require the page queues lock. MFC after: 1 week	2012-04-21 18:26:16 +00:00
Attilio Rao	a0f2c37b6f	- Introduce a cache-miss optimization for consistency with other accesses of the cache member of vm_object objects. - Use novel vm_page_is_cached() for checks outside of the vm subsystem. Reviewed by: alc MFC after: 2 weeks X-MFC: r234039	2012-04-09 17:05:18 +00:00
Alan Cox	1c8279e4e7	Fix mincore(2) so that it reports PG_CACHED pages as resident. MFC after: 2 weeks	2012-04-08 18:25:12 +00:00
Attilio Rao	d1aa86e151	Staticize vm_page_cache_remove(). Reviewed by: alc	2012-04-06 20:34:00 +00:00
Kip Macy	263811f724	exclude kmem_alloc'ed ARC data buffers from kernel minidumps on amd64 excluding other allocations including UMA now entails the addition of a single flag to kmem_alloc or uma zone create Reviewed by: alc, avg MFC after: 2 weeks	2012-01-27 20:18:31 +00:00
Alan Cox	c68c35372e	Introduce vm_reserv_alloc_contig() and teach vm_page_alloc_contig() how to use superpage reservations. So, for the first time, kernel virtual memory that is allocated by contigmalloc(), kmem_alloc_attr(), and kmem_alloc_contig() can be promoted to superpages. In fact, even a series of small contigmalloc() allocations may collectively result in a promoted superpage. Eliminate some duplication of code in vm_reserv_alloc_page(). Change the type of vm_reserv_reclaim_contig()'s first parameter in order that it be consistent with other vm_*_contig() functions. Tested by: marius (sparc64)	2011-12-05 18:29:25 +00:00
Konstantin Belousov	dc874f9881	Rename vm_page_set_valid() to vm_page_set_valid_range(). The vm_page_set_valid() is the most reasonable name for the m->valid accessor. Reviewed by: attilio, alc	2011-11-30 17:39:00 +00:00
Konstantin Belousov	cf1911a9ad	Hide the internals of vm_page_lock(9) from the loadable modules. Since the address of vm_page lock mutex depends on the kernel options, it is easy for module to get out of sync with the kernel. No vm_page_lockptr() accessor is provided for modules. It can be added later if needed, unless proper KPI is developed to serve the needs. Reviewed by: attilio, alc MFC after: 3 weeks	2011-11-29 13:07:32 +00:00
Alan Cox	5ff276b7f4	Eliminate end-of-line white space.	2011-11-17 06:54:49 +00:00
Alan Cox	fbd80bd047	Refactor the code that performs physically contiguous memory allocation, yielding a new public interface, vm_page_alloc_contig(). This new function addresses some of the limitations of the current interfaces, contigmalloc() and kmem_alloc_contig(). For example, the physically contiguous memory that is allocated with those interfaces can only be allocated to the kernel vm object and must be mapped into the kernel virtual address space. It also provides functionality that vm_phys_alloc_contig() doesn't, such as wiring the returned pages. Moreover, unlike that function, it respects the low water marks on the paging queues and wakes up the page daemon when necessary. That said, at present, this new function can't be applied to all types of vm objects. However, that restriction will be eliminated in the coming weeks. From a design standpoint, this change also addresses an inconsistency between vm_phys_alloc_contig() and the other vm_phys_alloc*() functions. Specifically, vm_phys_alloc_contig() manipulated vm_page fields that other functions in vm/vm_phys.c didn't. Moreover, vm_phys_alloc_contig() knew about vnodes and reservations. Now, vm_page_alloc_contig() is responsible for these things. Reviewed by: kib Discussed with: jhb	2011-11-16 16:46:09 +00:00
Alan Cox	c835bd16a8	Wake up the page daemon in vm_page_alloc_freelist() if it couldn't allocate the requested page because too few pages are cached or free. Document the VM_ALLOC_COUNT() option to vm_page_alloc() and vm_page_alloc_freelist(). Make style changes to vm_page_alloc() and vm_page_alloc_freelist(), such as using a variable name that more closely corresponds to the comments.	2011-11-06 02:03:27 +00:00
Konstantin Belousov	561cc9fcb5	Provide typedefs for the type of bit mask for the page bits. Use the defined types instead of int when manipulating masks. Supposedly, it could fix support for 32KB page size in the machine-independend VM layer. Reviewed by: alc MFC after: 2 weeks	2011-11-05 08:20:32 +00:00
Alan Cox	8393768074	Add support for VM_ALLOC_WIRED and VM_ALLOC_ZERO to vm_page_alloc_freelist() and use these new options in the mips pmap. Wake up the page daemon in vm_page_alloc_freelist() if the number of free and cached pages becomes too low. Tidy up vm_page_alloc_init(). In particular, add a comment about an important restriction on its use. Tested by: jchandra@	2011-11-02 05:42:51 +00:00
Alan Cox	125b695b6e	Tidy up the comment at the head of vm_page_alloc, and mention that the returned page has the flag VPO_BUSY set.	2011-10-27 17:29:19 +00:00
Alan Cox	9c60ca3238	Speed up vm_page_cache() and vm_page_remove() by checking for a few common cases that can be handled in constant time. The insight being that a page's parent in the vm object's tree is very often its predecessor or successor in the vm object's ordered memq. Tested by: jhb MFC after: 10 days	2011-10-25 16:35:08 +00:00
Konstantin Belousov	17514c1bd9	Style nit. Submitted by: jhb MFC after: 2 weeks	2011-09-29 00:44:34 +00:00
Konstantin Belousov	2042bb377a	Fix grammar. Submitted by: bf MFC after: 2 weeks	2011-09-28 16:12:15 +00:00
Konstantin Belousov	abb9b935ca	Use the trick of performing the atomic operation on the contained aligned word to handle the dirty mask updates in vm_page_clear_dirty_mask(). Remove the vm page queue lock around vm_page_dirty() call in vm_fault_hold() the sole purpose of which was to protect dirty on architectures which does not provide short or byte-wide atomics. Reviewed by: alc, attilio Tested by: flo (sparc64) MFC after: 2 weeks	2011-09-28 14:57:50 +00:00
Konstantin Belousov	3407fefef6	Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs. Document the changes to flags field to only require the page lock. Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced. Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)	2011-09-06 10:30:11 +00:00
Konstantin Belousov	d98d0ce27a	- Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag to VPO_UNMANAGED (and also making the flag protected by the vm object lock, instead of vm page queue lock). - Mark the fake pages with both PG_FICTITIOUS (as it is now) and VPO_UNMANAGED. As a consequence, pmap code now can use use just VPO_UNMANAGED to decide whether the page is unmanaged. Reviewed by: alc Tested by: pho (x86, previous version), marius (sparc64), marcel (arm, ia64, powerpc), ray (mips) Sponsored by: The FreeBSD Foundation Approved by: re (bz)	2011-08-09 21:01:36 +00:00
Alan Cox	1bfec3dfb6	Revert to using the page queues lock in vm_page_clear_dirty_mask() on MIPS. (At present, although atomic_clear_char() is defined by atomic.h on MIPS, it is not actually implemented by support.S.)	2011-06-23 05:23:59 +00:00
Alan Cox	3c76db4c64	Precisely document the synchronization rules for the page's dirty field. (Saying that the lock on the object that the page belongs to must be held only represents one aspect of the rules.) Eliminate the use of the page queues lock for atomically performing read- modify-write operations on the dirty field when the underlying architecture supports atomic operations on char and short types. Document the fact that 32KB pages aren't really supported. Reviewed by: attilio, kib	2011-06-19 19:13:24 +00:00
Konstantin Belousov	3b1025d200	Assert that page is VPO_BUSY or page owner object is locked in vm_page_undirty(). The assert is not precise due to VPO_BUSY owner to tracked, so assertion does not catch the case when VPO_BUSY is owned by other thread. Reviewed by: alc	2011-06-11 20:15:19 +00:00
Alan Cox	10cf256074	Eliminate duplication of the fake page code and zone by the device and sg pagers. Reviewed by: jhb	2011-03-11 07:07:48 +00:00
Alan Cox	e6ffa21488	Remove pmap fields that are either unused or not fully implemented. Discussed with: kib	2011-02-17 15:36:29 +00:00
Alan Cox	d7b20e4b45	Retire VFS_BIO_DEBUG. Convert those checks that were still valid into KASSERT()s and eliminate the rest. Replace excessive printf()s and a panic() in bufdone_finish() with a KASSERT() in vm_page_io_finish(). Reviewed by: kib	2011-02-12 01:00:00 +00:00
Alan Cox	3d05198e23	Release the free page queues lock earlier in vm_page_alloc(). Discussed with: kib@	2011-01-30 23:55:48 +00:00
Sergey Kandaurov	4053b05b91	Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize. Submitted by: perryh pluto.rain.com (previous version) Reviewed by: jhb Approved by: kib (mentor) Tested by: universe	2011-01-21 10:26:26 +00:00
Alan Cox	4c6a2e7a1f	Shift responsibility for synchronizing access to the page's act_count field to the object's lock. Reviewed by: kib@	2011-01-16 18:01:39 +00:00
Alan Cox	9648f3447d	Clean up the start of vm_page_alloc(). In particular, eliminate an assertion that is no longer required. Long ago, calls to vm_page_alloc() from an interrupt handler had to specify VM_ALLOC_INTERRUPT so that vm_page_alloc() would not attempt to reclaim a PQ_CACHE page from another vm object. Today, with the synchronization on a vm object's collection of PQ_CACHE pages, this is no longer an issue. In fact, VM_ALLOC_INTERRUPT now reclaims PQ_CACHE pages just like VM_ALLOC_{NORMAL,SYSTEM}. MFC after: 3 weeks	2011-01-16 17:33:34 +00:00
Alan Cox	27772ddf45	Eliminate a redundant alignment directive on the page locks array.	2011-01-09 04:34:02 +00:00
Alan Cox	ce8a13bdb9	Eliminate the counting of vm_page_pa_tryrelock calls. We really don't need it anymore. Moreover, its implementation had a type mismatch, a long is not necessarily an uint64_t. (This mismatch was hidden by casting.) Move the remaining two counters up a level in the sysctl hierarchy. There is no reason for them to be under the vm.pmap node. Reviewed by: kib	2011-01-08 22:45:22 +00:00
Alan Cox	17f6a17bf7	Release the page lock early in vm_pageout_clean(). There is no reason to hold this lock until the end of the function. With the aforementioned change to vm_pageout_clean(), page locks don't need to support recursive (MTX_RECURSE) or duplicate (MTX_DUPOK) acquisitions. Reviewed by: kib	2011-01-03 00:41:56 +00:00
Konstantin Belousov	3280870dca	Move the increment of vm object generation count into vm_object_set_writeable_dirty(). Fix an issue where restart of the scan in vm_object_page_clean() did not removed write permissions for newly added pages or, if the mapping for some already scanned page changed to writeable due to fault. Merge the two loops in vm_object_page_clean(), doing the remove of write permission and cleaning in the same loop. The restart of the loop then correctly downgrade writeable mappings. Fix an issue where a second caller to msync() might actually return before the first caller had actually completed flushing the pages. Clear the OBJ_MIGHTBEDIRTY flag after the cleaning loop, not before. Calls to pmap_is_modified() are not needed after pmap_remove_write() there. Proposed, reviewed and tested by: alc MFC after: 1 week	2010-12-29 12:53:53 +00:00

1 2 3 4 5 ...

578 Commits