sleep, and perform the page allocations with VM_ALLOC_SYSTEM
class. Previously, the allocation was also allowed to completely drain
the reserve of the free pages, being translated to VM_ALLOC_INTERRUPT
request class for vm_page_alloc() and similar functions.
Allow the caller of malloc* to request the 'deep drain' semantic by
providing M_USE_RESERVE flag, now translated to VM_ALLOC_INTERRUPT
class. Previously, it resulted in less aggressive VM_ALLOC_SYSTEM
allocation class.
Centralize the translation of the M_* malloc(9) flags in the single
inline function malloc2vm_flags().
Discussion started by: "Sears, Steven" <Steven.Sears@netapp.com>
Reviewed by: alc, mdf (previous version)
Tested by: pho (previous version)
MFC after: 2 weeks
cache line in order to avoid manual frobbing but using
struct mtx_padalign.
The sole exception being nvme and sxfge drivers, where the author
redefined CACHE_LINE_SIZE manually, so they need to be analyzed and
dealt with separately.
Reviwed by: jimharris, alc
because the queue itself serves no purpose. When a held page is freed,
inserting the page into the hold queue has the side effect of setting the
page's "queue" field to PQ_HOLD. Later, when the page is unheld, it will
be freed because the "queue" field is PQ_HOLD. In other words, PQ_HOLD is
used as a flag, not a queue. So, this change replaces it with a flag.
To accomodate the new page flag, make the page's "flags" field wider and
"oflags" field narrower.
Reviewed by: kib
network file systems (not only NFS proper). Short reads cause pages
other then the requested one, which were not filled by read response,
to stay invalid.
Change the vm_page_readahead_finish() interface to not take the error
code, but instead to make a decision to free or to (de)activate the
page only by its validity. As result, not requested invalid pages are
freed even if the read RPC indicated success.
Noted and reviewed by: alc
MFC after: 1 week
to pull vm_param.h was removed. Other big dependency of vm_page.h on
vm_param.h are PA_LOCK* definitions, which are only needed for
in-kernel code, because modules use KBI-safe functions to lock the
pages.
Stop including vm_param.h into vm_page.h. Include vm_param.h
explicitely for the kernel code which needs it.
Suggested and reviewed by: alc
MFC after: 2 weeks
vm_page oflags by providing helper function
vm_page_readahead_finish(), which handles completed reads for pages
with indexes other then the requested one, for VOP_GETPAGES().
Reviewed by: alc
MFC after: 1 week
layer, but it is read directly by the MI VM layer. This change introduces
pmap_page_is_write_mapped() in order to completely encapsulate all direct
access to PGA_WRITEABLE in the pmap layer.
Aesthetics aside, I am making this change because amd64 will likely begin
using an alternative method to track write mappings, and having
pmap_page_is_write_mapped() in place allows me to make such a change
without further modification to the MI VM layer.
As an added bonus, tidy up some nearby comments concerning page flags.
Reviewed by: kib
MFC after: 6 weeks
for allocation of fictitious pages, for which PHYS_TO_VM_PAGE()
returns proper fictitious vm_page_t. The range should be de-registered
after consumer stopped using it.
De-inline the PHYS_TO_VM_PAGE() since it now carries code to iterate
over registered ranges.
A hash container might be developed instead of range registration
interface, and fake pages could be put automatically into the hash,
were PHYS_TO_VM_PAGE() could look them up later. This should be
considered before the MFC of the commit is done.
Sponsored by: The FreeBSD Foundation
Reviewed by: alc
MFC after: 1 month
vm_page into new interface vm_page_initfake(). Handle the case of fake
page re-initialization with changed memattr.
Sponsored by: The FreeBSD Foundation
Reviewed by: alc
MFC after: 1 month
excluding other allocations including UMA now entails the addition of
a single flag to kmem_alloc or uma zone create
Reviewed by: alc, avg
MFC after: 2 weeks
Since the address of vm_page lock mutex depends on the kernel options,
it is easy for module to get out of sync with the kernel.
No vm_page_lockptr() accessor is provided for modules. It can be added
later if needed, unless proper KPI is developed to serve the needs.
Reviewed by: attilio, alc
MFC after: 3 weeks
yielding a new public interface, vm_page_alloc_contig(). This new function
addresses some of the limitations of the current interfaces, contigmalloc()
and kmem_alloc_contig(). For example, the physically contiguous memory that
is allocated with those interfaces can only be allocated to the kernel vm
object and must be mapped into the kernel virtual address space. It also
provides functionality that vm_phys_alloc_contig() doesn't, such as wiring
the returned pages. Moreover, unlike that function, it respects the low
water marks on the paging queues and wakes up the page daemon when
necessary. That said, at present, this new function can't be applied to all
types of vm objects. However, that restriction will be eliminated in the
coming weeks.
From a design standpoint, this change also addresses an inconsistency
between vm_phys_alloc_contig() and the other vm_phys_alloc*() functions.
Specifically, vm_phys_alloc_contig() manipulated vm_page fields that other
functions in vm/vm_phys.c didn't. Moreover, vm_phys_alloc_contig() knew
about vnodes and reservations. Now, vm_page_alloc_contig() is responsible
for these things.
Reviewed by: kib
Discussed with: jhb
Use the defined types instead of int when manipulating masks.
Supposedly, it could fix support for 32KB page size in the
machine-independend VM layer.
Reviewed by: alc
MFC after: 2 weeks
word to handle the dirty mask updates in vm_page_clear_dirty_mask().
Remove the vm page queue lock around vm_page_dirty() call in vm_fault_hold()
the sole purpose of which was to protect dirty on architectures which
does not provide short or byte-wide atomics.
Reviewed by: alc, attilio
Tested by: flo (sparc64)
MFC after: 2 weeks
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.
Document the changes to flags field to only require the page lock.
Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.
Reviewed by: alc, attilio
Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by: re (bz)
to VPO_UNMANAGED (and also making the flag protected by the vm object
lock, instead of vm page queue lock).
- Mark the fake pages with both PG_FICTITIOUS (as it is now) and
VPO_UNMANAGED. As a consequence, pmap code now can use use just
VPO_UNMANAGED to decide whether the page is unmanaged.
Reviewed by: alc
Tested by: pho (x86, previous version), marius (sparc64),
marcel (arm, ia64, powerpc), ray (mips)
Sponsored by: The FreeBSD Foundation
Approved by: re (bz)
(Saying that the lock on the object that the page belongs to must be held
only represents one aspect of the rules.)
Eliminate the use of the page queues lock for atomically performing read-
modify-write operations on the dirty field when the underlying architecture
supports atomic operations on char and short types.
Document the fact that 32KB pages aren't really supported.
Reviewed by: attilio, kib
vm_page_undirty(). The assert is not precise due to VPO_BUSY owner
to tracked, so assertion does not catch the case when VPO_BUSY is
owned by other thread.
Reviewed by: alc
The current implementation of vm_page_alloc_freelist() does not handle
order > 0 correctly. Remove order parameter to the function and use it
only for order 0 pages.
Submitted by: alc
alc@.
The UMA zone based allocation is replaced by a scheme that creates
a new free page list for the KSEG0 region, and a new function
in sys/vm that allocates pages from a specific free page list.
This also fixes a race condition introduced by the UMA based page table
page allocation code. Dropping the page queue and pmap locks before
the call to uma_zfree, and re-acquiring them afterwards will introduce
a race condtion(noted by alc@).
The changes are :
- Revert the earlier changes in MIPS pmap.c that added UMA zone for
page table pages.
- Add a new freelist VM_FREELIST_HIGHMEM to MIPS vmparam.h for memory that
is not directly mapped (in 32bit kernel). Normal page allocations will first
try the HIGHMEM freelist and then the default(direct mapped) freelist.
- Add a new function 'vm_page_t vm_page_alloc_freelist(int flind, int
order, int req)' to vm/vm_page.c to allocate a page from a specified
freelist. The MIPS page table pages will be allocated using this function
from the freelist containing direct mapped pages.
- Move the page initialization code from vm_phys_alloc_contig() to a
new function vm_page_alloc_init(), and use this function to initialize
pages in vm_page_alloc_freelist() too.
- Split the function vm_phys_alloc_pages(int pool, int order) to create
vm_phys_alloc_freelist_pages(int flind, int pool, int order), and use
this function from both vm_page_alloc_freelist() and vm_phys_alloc_pages().
Reviewed by: alc
the maintenance of vm_pageout_deficit can be localized to just two places:
vm_page_alloc() and vm_pageout_scan().
This change also corrects an off-by-one error in the maintenance of
vm_pageout_deficit. Historically, the buffer cache functions, allocbuf()
and vm_hold_load_pages(), have not taken into account that vm_page_alloc()
already increments vm_pageout_deficit by one.
Reviewed by: kib
flag is always provided, and unconditionally retry after sleep for the
busy page or failed allocation.
The intent is to remove VM_ALLOC_RETRY eventually.
Proposed and reviewed by: alc
specify the increment of vm_pageout_deficit when sleeping due to page
shortage. Then, in allocbuf(), the code to allocate pages when extending
vmio buffer can be replaced by a call to vm_page_grab().
Suggested and reviewed by: alc
MFC after: 2 weeks
is ordered by page index. This greatly simplifies the implementation,
since we no longer need to mark the pages with VPO_CLEANCHK to denote
the progress. It is enough to remember the current position by index
before dropping the object lock.
Remove VPO_CLEANCHK and VM_PAGER_IGNORE_CLEANCHK as unused.
Garbage-collect vm.msync_flush_flags sysctl.
Suggested and reviewed by: alc
Tested by: pho
vm_pageout_clean(). When iterating over a range of pages, these functions
can be cheaper than vm_page_lookup() because their implementation takes
advantage of the vm_object's memq being ordered.
Reviewed by: kib@
MFC after: 3 weeks
PG_REFERENCED changes in vm_pageout_object_deactivate_pages().
Simplify this function's inner loop using TAILQ_FOREACH(), and shorten
some of its overly long lines. Update a stale comment.
Assert that PG_REFERENCED may be cleared only if the object containing
the page is locked. Add a comment documenting this.
Assert that a caller to vm_page_requeue() holds the page queues lock,
and assert that the page is on a page queue.
Push down the page queues lock into pmap_ts_referenced() and
pmap_page_exists_quick(). (As of now, there are no longer any pmap
functions that expect to be called with the page queues lock held.)
Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever
be passed an unmanaged page. Assert this rather than returning "0"
and "FALSE" respectively.
ARM:
Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().
Push down the page queues lock inside of pmap_clearbit(), simplifying
pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write().
Additionally, this allows for avoiding the acquisition of the page
queues lock in some cases.
PowerPC/AIM:
moea*_page_exits_quick() and moea*_page_wired_mappings() will never be
called before pmap initialization is complete. Therefore, the check
for moea_initialized can be eliminated.
Push down the page queues lock inside of moea*_clear_bit(),
simplifying moea*_clear_modify() and moea*_clear_reference().
The last parameter to moea*_clear_bit() is never used. Eliminate it.
PowerPC/BookE:
Simplify mmu_booke_page_exists_quick()'s control flow.
Reviewed by: kib@
an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls
vm_page_dirty() must perform the call first. Otherwise, pmap_is_modified()
could return FALSE without acquiring the page queues lock because the page
is not (currently) writeable, and the caller to pmap_is_modified() might
believe that the page's dirty field is clear because it has not seen the
effect of the vm_page_dirty() call.
When I pushed down the page queues lock into pmap_is_modified(), I
overlooked one place where this ordering dependence is violated:
pmap_enter(). In a rare situation pmap_enter() can be called to replace a
dirty mapping to one page with a mapping to another page. (I say rare
because replacements generally occur as a result of a copy-on-write fault,
and so the old page is not dirty.) This change delays clearing PG_WRITEABLE
until after vm_page_dirty() has been called.
Fixing the ordering dependency also makes it easy to introduce a small
optimization: When pmap_enter() used to replace a mapping to one page with a
mapping to another page, it freed the pv entry for the first mapping and
later called the pv entry allocator for the new mapping. Now, pmap_enter()
attempts to recycle the old pv entry, saving two calls to the pv entry
allocator.
There is no point in setting PG_WRITEABLE on unmanaged pages, so don't.
Update a comment to reflect this.
Tidy up the variable declarations at the start of pmap_enter().
independent code. Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().
Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified(). Assert that these
functions are never passed an unmanaged page.
Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization. Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.
Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean(). Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.
Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().
Reviewed by: kib (an earlier version)
here, make the style of assertion used by pmap_enter() consistent
across all architectures.
On entry to pmap_remove_write(), assert that the page is neither
unmanaged nor fictitious, since we cannot remove write access to
either kind of page.
With the push down of the page queues lock, pmap_remove_write() cannot
condition its behavior on the state of the PG_WRITEABLE flag if the
page is busy. Assert that the object containing the page is locked.
This allows us to know that the page will neither become busy nor will
PG_WRITEABLE be set on it while pmap_remove_write() is running.
Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly
do copy-on-write-based zero-copy transmit on unmanaged or fictitious
pages, so don't even try. Previously, the call to pmap_remove_write()
would have failed silently.