Commit Graph

445 Commits

Author SHA1 Message Date
Alan Cox
6bbee8e28a Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this
option to vm_object_page_remove() asserts that the specified range of pages
is not mapped, or more precisely that none of these pages have any managed
mappings.  Thus, vm_object_page_remove() need not call pmap_remove_all() on
the pages.

This change not only saves time by eliminating pointless calls to
pmap_remove_all(), but it also eliminates an inconsistency in the use of
pmap_remove_all() versus related functions, like pmap_remove_write().  It
eliminates harmless but pointless calls to pmap_remove_all() that were being
performed on PG_UNMANAGED pages.

Update all of the existing assertions on pmap_remove_all() to reflect this
change.

Reviewed by:	kib
2011-06-29 16:40:41 +00:00
Konstantin Belousov
031ec8c10a In the VOP_PUTPAGES() implementations, change the default error from
VM_PAGER_AGAIN to VM_PAGER_ERROR for the uwritten pages. Return
VM_PAGER_AGAIN for the partially written page. Always forward at least
one page in the loop of vm_object_page_clean().

VM_PAGER_ERROR causes the page reactivation and does not clear the
page dirty state, so the write is not lost.

The change fixes an infinite loop in vm_object_page_clean() when the
filesystem returns permanent errors for some page writes.

Reported and tested by:	gavin
Reviewed by:	alc, rmacklem
MFC after:	1 week
2011-06-01 21:00:28 +00:00
Max Laier
e18cc7bf3e Another long standing vm bug found at Isilon:
Fix a race between vm_object_collapse and vm_fault.

Reviewed by:	alc@
MFC after:	3 days
2011-05-09 20:27:49 +00:00
Konstantin Belousov
86769ac0a4 Fix two bugs in r218670.
Hold the vnode around the region where object lock is dropped, until
vnode lock is acquired.

Do not drop the vnode reference for a case when the object was
deallocated during unlock. Note that in this case, VV_TEXT is cleared
by vnode_pager_dealloc().

Reported and tested by:	pho
Reviewed by:	alc
MFC after:	3 days
2011-04-23 21:38:21 +00:00
Konstantin Belousov
03fa5b34a0 Lock the vnode around clearing of VV_TEXT flag. Remove mp_fixme() note
mentioning that vnode lock is needed.

Reviewed by:	alc
Tested by:	pho
MFC after:	1 week
2011-02-13 21:52:26 +00:00
Alan Cox
17f3095d1a Unless "cnt" exceeds MAX_COMMIT_COUNT, nfsrv_commit() and nfsvno_fsync() are
incorrectly calling vm_object_page_clean().  They are passing the length of
the range rather than the ending offset of the range.

Perform the OFF_TO_IDX() conversion in vm_object_page_clean() rather than the
callers.

Reviewed by:	kib
MFC after:	3 weeks
2011-02-05 21:21:27 +00:00
Alan Cox
0cc74f144e Since the last parameter to vm_object_shadow() is a vm_size_t and not a
vm_pindex_t, it makes no sense for its callers to perform atop().  Let
vm_object_shadow() do that instead.
2011-02-04 21:49:24 +00:00
Konstantin Belousov
c6c9025b65 For consistency, use kernel_object instead of &kernel_object_store
when initializing the object mutex. Do the same for kmem_object.

Discussed with:	alc
MFC after:	1 week
2011-01-15 21:56:38 +00:00
Alan Cox
edf93b25d3 Make a couple refinements to r216799 and r216810. In particular, revise
a comment and move it to its proper place.

Reviewed by:	kib
2011-01-01 17:39:38 +00:00
Konstantin Belousov
50cfe7fa50 Remove OBJ_CLEANING flag. The vfs_setdirty_locked_object() is the only
consumer of the flag, and it used the flag because OBJ_MIGHTBEDIRTY
was cleared early in vm_object_page_clean, before the cleaning pass
was done. This is no longer true after r216799.

 Moreover, since OBJ_CLEANING is a flag, and not the counter, it could
be reset too prematurely when parallel vm_object_page_clean() are
performed.

Reviewed by:	alc (as a part of the bigger patch)
MFC after:	1 month (after r216799 is merged)
2010-12-29 22:26:49 +00:00
Konstantin Belousov
3280870dca Move the increment of vm object generation count into
vm_object_set_writeable_dirty().

Fix an issue where restart of the scan in vm_object_page_clean() did
not removed write permissions for newly added pages or, if the mapping
for some already scanned page changed to writeable due to fault.
Merge the two loops in vm_object_page_clean(), doing the remove of
write permission and cleaning in the same loop. The restart of the
loop then correctly downgrade writeable mappings.

Fix an issue where a second caller to msync() might actually return
before the first caller had actually completed flushing the
pages. Clear the OBJ_MIGHTBEDIRTY flag after the cleaning loop, not
before.

Calls to pmap_is_modified() are not needed after pmap_remove_write()
there.

Proposed, reviewed and tested by:	alc
MFC after:	1 week
2010-12-29 12:53:53 +00:00
Edward Tomasz Napierala
ef694c1ac4 Replace pointer to "struct uidinfo" with pointer to "struct ucred"
in "struct vm_object".  This is required to make it possible to account
for per-jail swap usage.

Reviewed by:	kib@
Tested by:	pho@
Sponsored by:	FreeBSD Foundation
2010-12-02 17:37:16 +00:00
Konstantin Belousov
780636b72a After the sleep caused by encountering a busy page, relookup the page.
Submitted and reviewed by:	alc
Reprted and tested by:	pho
MFC after:	5 days
2010-11-24 12:25:17 +00:00
Konstantin Belousov
3157c50313 Eliminate the mab, maf arrays and related variables.
The change also fixes off-by-one error in the calculation of mreq.

Suggested and reviewed by:	alc
Tested by:	pho
MFC after:	5 days
2010-11-21 10:18:28 +00:00
Alan Cox
17ea6f00d5 Optimize vm_object_terminate().
Reviewed by:	kib
MFC after:	1 week
2010-11-20 22:30:09 +00:00
Konstantin Belousov
4c7b9a2063 The runlen returned from vm_pageout_flush() might be zero legitimately,
when mreq page has status VM_PAGER_AGAIN.

MFC after:	5 days
2010-11-20 17:27:38 +00:00
Konstantin Belousov
1e8a675c73 vm_pageout_flush() might cache the pages that finished write to the
backing storage. Such pages might be then reused, racing with the
assert in vm_object_page_collect_flush() that verified that dirty
pages from the run (most likely, pages with VM_PAGER_AGAIN status) are
write-protected still. In fact, the page indexes for the pages that
were removed from the object page list should be ignored by
vm_object_page_clean().

Return the length of successfully written run from vm_pageout_flush(),
that is, the count of pages between requested page and first page
after requested with status VM_PAGER_AGAIN. Supply the requested page
index in the array to vm_pageout_flush(). Use the returned run length
to forward the index of next page to clean in vm_object_page_clean().

Reported by:	avg
Reviewed by:	alc
MFC after:	1 week
2010-11-18 21:09:02 +00:00
Konstantin Belousov
4166faaee0 Only increment object generation count when inserting the page into
object page list.  The only use of object generation count now is a
restart of the scan in vm_object_page_clean(), which makes sense to do
on the page addition. Page removals do not affect the dirtiness of the
object, as well as manipulations with the shadow chain.

Suggested and reviewed by:	alc
MFC after:    1 week
2010-11-18 20:46:28 +00:00
Konstantin Belousov
757216f3a5 Several cleanups for the r209686:
- remove unused defines;
- remove unused curgeneration argument for vm_object_page_collect_flush();
- always assert that vm_object_page_clean() is called for OBJT_VNODE;
- move vm_page_find_least() into for() statement initial clause.

Submitted by:	alc
2010-07-04 19:02:32 +00:00
Konstantin Belousov
e239bb9730 Reimplement vm_object_page_clean(), using the fact that vm object memq
is ordered by page index. This greatly simplifies the implementation,
since we no longer need to mark the pages with VPO_CLEANCHK to denote
the progress. It is enough to remember the current position by index
before dropping the object lock.

Remove VPO_CLEANCHK and VM_PAGER_IGNORE_CLEANCHK as unused.
Garbage-collect vm.msync_flush_flags sysctl.

Suggested and reviewed by:	alc
Tested by:	pho
2010-07-04 11:26:56 +00:00
Konstantin Belousov
b382c10a57 Introduce a helper function vm_page_find_least(). Use it in several places,
which inline the function.

Reviewed by:	alc
Tested by:	pho
MFC after:	1 week
2010-07-04 11:13:33 +00:00
Alan Cox
567e51e18c Roughly half of a typical pmap_mincore() implementation is machine-
independent code.  Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().

Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified().  Assert that these
functions are never passed an unmanaged page.

Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization.  Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.

Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean().  Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.

Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().

Reviewed by:	kib (an earlier version)
2010-05-24 14:26:57 +00:00
Alan Cox
a1a95cd608 Add a comment about the proper use of vm_object_page_remove().
MFC after:	1 week
2010-05-16 16:54:05 +00:00
Alan Cox
3c4a24406b Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and
vm_page_try_to_free().  Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().

Push down the page queues lock into Xen's pmap_page_is_mapped().  (I
overlooked the Xen pmap in r207702.)

Switch to a per-processor counter for the total number of pages cached.
2010-05-08 20:34:01 +00:00
Alan Cox
7072188017 Eliminate acquisitions of the page queues lock that are no longer needed.
Switch to a per-processor counter for the number of pages freed during
process termination.
2010-05-07 05:23:15 +00:00
Alan Cox
eb00b276ab Eliminate page queues locking around most calls to vm_page_free(). 2010-05-06 18:58:32 +00:00
Alan Cox
5ac59343be Acquire the page lock around all remaining calls to vm_page_free() on
managed pages that didn't already have that lock held.  (Freeing an
unmanaged page, such as the various pmaps use, doesn't require the page
lock.)

This allows a change in vm_page_remove()'s locking requirements.  It now
expects the page lock to be held instead of the page queues lock.
Consequently, the page queues lock is no longer required at all by callers
to vm_page_rename().

Discussed with: kib
2010-05-05 18:16:06 +00:00
Alan Cox
ac800a8490 Correct an error in r207410: Remove an unlock of a lock that is no longer
held.
2010-05-02 18:09:33 +00:00
Kip Macy
7bec141b12 push up dropping of the page queue lock to avoid holding it in vm_pageout_flush 2010-04-30 22:31:37 +00:00
Kip Macy
ad0c05daf9 don't call vm_pageout_flush with the page queue mutex held
Reported by: Michael Butler
2010-04-30 21:21:21 +00:00
Kip Macy
2965a45315 On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib
2010-04-30 00:46:43 +00:00
Alan Cox
6a2a3d7338 Change vm_object_madvise() so that it checks whether the page is invalid
or unmanaged before acquiring the page queues lock.  Neither of these
tests require that lock.  Moreover, a better way of testing if the page
is unmanaged is to test the type of vm object.  This avoids a pointless
vm_page_lookup().

MFC after:	3 weeks
2010-04-28 04:57:32 +00:00
Alan Cox
4b9dd5d537 There is no justification for vm_object_split() setting PG_REFERENCED on a
page that it is going to sleep on.  Eliminate it.

MFC after:	3 weeks
2010-04-18 17:50:09 +00:00
Alan Cox
b11b56b55b In vm_object_madvise() setting PG_REFERENCED on a page before sleeping on
that page only makes sense if the advice is MADV_WILLNEED.  In that case,
the intention is to activate the page, so discouraging the page daemon
from reclaiming the page makes sense.  In contrast, in the other cases,
MADV_DONTNEED and MADV_FREE, it makes no sense whatsoever to discourage
the page daemon from reclaiming the page by setting PG_REFERENCED.

Wrap a nearby line.

Discussed with:	kib
MFC after:	3 weeks
2010-04-17 21:14:37 +00:00
Alan Cox
aefea7f519 In vm_object_backing_scan(), setting PG_REFERENCED on a page before
sleeping on that page is nonsensical.  Doing so reduces the likelihood
that the page daemon will reclaim the page before the thread waiting in
vm_object_backing_scan() is reawakened.  However, it does not guarantee
that the page is not reclaimed, so vm_object_backing_scan() restarts
after reawakening.  More importantly, this muddles the meaning of
PG_REFERENCED.  There is no reason to believe that the caller of
vm_object_backing_scan() is going to use (i.e., access) the contents of
the page.  There is especially no reason to believe that an access is
more likely because vm_object_backing_scan() had to sleep on the page.

Discussed with:	kib
MFC after:	3 weeks
2010-04-17 18:35:07 +00:00
Konstantin Belousov
49e3050e6c VI_OBJDIRTY vnode flag mirrors the state of OBJ_MIGHTBEDIRTY vm object
flag. Besides providing the redundand information, need to update both
vnode and object flags causes more acquisition of vnode interlock.
OBJ_MIGHTBEDIRTY is only checked for vnode-backed vm objects.

Remove VI_OBJDIRTY and make sure that OBJ_MIGHTBEDIRTY is set only for
vnode-backed vm objects.

Suggested and reviewed by:	alc
Tested by:	pho
MFC after:	3 weeks
2009-12-21 12:29:38 +00:00
John Baldwin
013818111a Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses.  The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by:	alc
Approved by:	re (kensmith)
MFC after:	2 weeks
2009-07-24 13:50:29 +00:00
Alan Cox
3153e878dd Add support to the virtual memory system for configuring machine-
dependent memory attributes:

Rename vm_cache_mode_t to vm_memattr_t.  The new name reflects the
fact that there are machine-dependent memory attributes that have
nothing to do with controlling the cache's behavior.

Introduce vm_object_set_memattr() for setting the default memory
attributes that will be given to an object's pages.

Introduce and use pmap_page_{get,set}_memattr() for getting and
setting a page's machine-dependent memory attributes.  Add full
support for these functions on amd64 and i386 and stubs for them on
the other architectures.  The function pmap_page_set_memattr() is also
responsible for any other machine-dependent aspects of changing a
page's memory attributes, such as flushing the cache or updating the
direct map.  The uses include kmem_alloc_contig(), vm_page_alloc(),
and the device pager:

  kmem_alloc_contig() can now be used to allocate kernel memory with
  non-default memory attributes on amd64 and i386.

  vm_page_alloc() and the device pager will set the memory attributes
  for the real or fictitious page according to the object's default
  memory attributes.

Update the various pmap functions on amd64 and i386 that map pages to
incorporate each page's memory attributes in the mapping.

Notes: (1) Inherent to this design are safety features that prevent
the specification of inconsistent memory attributes by different
mappings on amd64 and i386.  In addition, the device pager provides a
warning when a device driver creates a fictitious page with memory
attributes that are inconsistent with the real page that the
fictitious page is an alias for. (2) Storing the machine-dependent
memory attributes for amd64 and i386 as a dedicated "int" in "struct
md_page" represents a compromise between space efficiency and the ease
of MFCing these changes to RELENG_7.

In collaboration with: jhb

Approved by:	re (kib)
2009-07-12 23:31:20 +00:00
Konstantin Belousov
9b4d473a6e Eliminiate code duplication by calling vm_object_destroy()
from vm_object_collapse().

Requested and reviewed by:	alc
Approved by:	re (kensmith)
2009-06-28 08:42:17 +00:00
Alan Cox
26f4eea53f The bits set in a page's dirty mask are a subset of the bits set in its
valid mask.  Consequently, there is no need to perform a bit-wise and of
the page's dirty and valid masks in order to determine which parts of a
page are dirty and valid.

Eliminate an unnecessary #include.
2009-06-24 04:45:03 +00:00
Konstantin Belousov
3364c323e6 Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.

The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.

The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.

The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).

Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.

In collaboration with:	pho
Reviewed by:	alc
Approved by:	re (kensmith)
2009-06-23 20:45:22 +00:00
Alan Cox
387aabc513 Long, long ago in r27464 special case code for mapping device-backed
memory with 4MB pages was added to pmap_object_init_pt().  This code
assumes that the pages of a OBJT_DEVICE object are always physically
contiguous.  Unfortunately, this is not always the case.  For example,
jhb@ informs me that the recently introduced /dev/ksyms driver creates
a OBJT_DEVICE object that violates this assumption.  Thus, this
revision modifies pmap_object_init_pt() to abort the mapping if the
OBJT_DEVICE object's pages are not physically contiguous.  This
revision also changes some inconsistent if not buggy behavior.  For
example, the i386 version aborts if the first 4MB virtual page that
would be mapped is already valid.  However, it incorrectly replaces
any subsequent 4MB virtual page mappings that it encounters,
potentially leaking a page table page.  The amd64 version has a bug of
my own creation.  It potentially busies the wrong page and always an
insufficent number of pages if it blocks allocating a page table page.

To my knowledge, there have been no reports of these bugs, hence,
their persistance.  I suspect that the existing restrictions that
pmap_object_init_pt() placed on the OBJT_DEVICE objects that it would
choose to map, for example, that the first page must be aligned on a 2
or 4MB physical boundary and that the size of the mapping must be a
multiple of the large page size, were enough to avoid triggering the
bug for drivers like ksyms.  However, one side effect of testing the
OBJT_DEVICE object's pages for physical contiguity is that a dubious
difference between pmap_object_init_pt() and the standard path for
mapping devices pages, i.e., vm_fault(), has been eliminated.
Previously, pmap_object_init_pt() would only instantiate the first
PG_FICTITOUS page being mapped because it never examined the rest.
Now, however, pmap_object_init_pt() uses the new function
vm_object_populate() to instantiate them all (in order to support
testing their physical contiguity).  These pages need to be
instantiated for the mechanism that I have prototyped for
automatically maintaining the consistency of the PAT settings across
multiple mappings, particularly, amd64's direct mapping, to work.
(Translation: This change is also being made to support jhb@'s work on
the Nvidia feature requests.)

Discussed with:	jhb@
2009-06-14 19:51:43 +00:00
Alan Cox
a28042d1e3 Change vm_object_page_remove() such that it clears the page's dirty bits
when it invalidates the page.

Suggested by:	tegge
2009-05-28 07:26:36 +00:00
Konstantin Belousov
bb2ac86f7d Do not call vm_page_lookup() from the ddb routine, namely from "show
vmopag" implementation. The vm_page_lookup() code modifies splay tree
of the object pages, and asserts that object lock is taken. First issue
could cause kernel data corruption, and second one instantly panics the
INVARIANTS-enabled kernel.

Take the advantage of the fact that object->memq is ordered by page index,
and iterate over memq to calculate the runs.

While there, make the code slightly more style-compliant by moving
variables declarations to the right place.

Discussed with:	jhb, alc
Reviewed by:	alc
MFC after:	2 weeks
2009-04-23 21:09:47 +00:00
Alan Cox
bfd9b137a0 Reduce the scope of the page queues lock in vm_object_page_remove().
MFC after:	1 week
2009-02-21 20:57:25 +00:00
Alan Cox
7b54b1a9f5 Eliminate OBJ_NEEDGIANT. After r188331, OBJ_NEEDGIANT's only use is by a
redundant assertion in vm_fault().

Reviewed by:	kib
2009-02-08 22:17:24 +00:00
Robert Noland
e9f541267d Fix printing of KASSERT message missed in r163604.
Approved by:	kib
2008-12-21 16:56:13 +00:00
Attilio Rao
0d7935fd01 Remove the struct thread unuseful argument from bufobj interface.
In particular following functions KPI results modified:
- bufobj_invalbuf()
- bufsync()

and BO_SYNC() "virtual method" of the buffer objects set.
Main consumers of bufobj functions are affected by this change too and,
in particular, functions which changed their KPI are:
- vinvalbuf()
- g_vfs_close()

Due to the KPI breakage, __FreeBSD_version will be bumped in a later
commit.

As a side note, please consider just temporary the 'curthread' argument
passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP

Reviewed by:	kib
Tested by:	Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2008-10-10 21:23:50 +00:00
Tom Rhodes
6bd9cb1c81 Fill in a few sysctl descriptions.
Reviewed by:	alc, Matt Dillon <dillon@apollo.backplane.com>
Approved by:	alc
2008-08-03 14:26:15 +00:00
John Baldwin
2c3b410b3a One more whitespace nit. 2008-07-30 21:23:32 +00:00