Commit Graph

350 Commits

Author SHA1 Message Date
tegge
a73af1b81d Don't allow pagedaemon to skip pages while scanning PQ_ACTIVE or PQ_INACTIVE
due to the vm object being locked.

When a process writes large amounts of data to a file, the vm object associated
with that file can contain most of the physical pages on the machine.  If the
process is preempted while holding the lock on the vm object, pagedaemon would
be able to move very few pages from PQ_INACTIVE to PQ_CACHE or from PQ_ACTIVE
to PQ_INACTIVE, resulting in unlimited cleaning of dirty pages belonging to
other vm objects.

Temporarily unlock the page queues lock while locking vm objects to avoid lock
order violation.  Detect and handle relevant page queue changes.

This change depends on both the lock portion of struct vm_object and normal
struct vm_page being type stable.

Reviewed by:	alc
2005-08-10 00:17:36 +00:00
jeff
95489bf6b4 - We need to inhert the OBJ_NEEDGIANT flag from the original object in
vm_object_split().

Spotted by:	alc
2005-05-04 20:54:16 +00:00
jeff
d62d255d2e - Add a new object flag "OBJ_NEEDSGIANT". We set this flag if the
underlying vnode requires Giant.
 - In vm_fault only acquire Giant if the underlying object has NEEDSGIANT
   set.
 - In vm_object_shadow inherit the NEEDSGIANT flag from the backing object.
2005-05-03 11:11:26 +00:00
alc
75d6caaaf0 Eliminate (now) unnecessary acquisition and release of the global page
queues lock in vm_object_backing_scan().  Updates to the page's PG_BUSY
flag and busy field are synchronized by the containing object's lock.

Testing the page's hold_count and wire_count in vm_object_backing_scan()'s
OBSC_COLLAPSE_NOWAIT case is unnecessary.  There is no reason why the held
or wired pages cannot be migrated to the shadow object.

Reviewed by: tegge
2005-03-30 05:40:02 +00:00
jeff
92d24b8044 - Don't lock the vnode interlock in vm_object_set_writeable_dirty() if
we've already set the object flags.

Reviewed by:	alc
2005-03-17 12:03:42 +00:00
alc
19b479b41f Update the text of an assertion to reflect changes made in revision 1.148.
Submitted by: tegge

Eliminate an unnecessary, temporary increment of the backing object's
reference count in vm_object_qcollapse().  Reviewed by: tegge
2005-01-30 21:29:47 +00:00
jeff
1dd5432139 - Remove GIANT_REQUIRED where giant is no longer required.
- Use VFS_LOCK_GIANT() rather than directly acquiring giant in places
   where giant is only held because vfs requires it.

Sponsored By:   Isilon Systems, Inc.
2005-01-24 10:48:29 +00:00
alc
3ffc6c3bf0 Consider three objects, O, BO, and BBO, where BO is O's backing object
and BBO is BO's backing object.  Now, suppose that O and BO are being
collapsed.  Furthermore, suppose that BO has been marked dead
(OBJ_DEAD) by vm_object_backing_scan() and that either
vm_object_backing_scan() has been forced to sleep due to encountering
a busy page or vm_object_collapse() has been forced to sleep due to
memory allocation in the swap pager.  If vm_object_deallocate() is
then called on BBO and BO is BBO's only shadow object,
vm_object_deallocate() will collapse BO and BBO.  In doing so, it adds
a necessary temporary reference to BO.  If this collapse also sleeps
and the prior collapse resumes first, the temporary reference will
cause vm_object_collapse to panic with the message "backing_object %p
was somehow re-referenced during collapse!"

Resolve this race by changing vm_object_deallocate() such that it
doesn't collapse BO and BBO if BO is marked dead.  Once O and BO are
collapsed, vm_object_collapse() will attempt to collapse O and BBO.
So, vm_object_deallocate() on BBO need do nothing.

Reported by: Peter Holm on 20050107
URL: http://www.holm.cc/stress/log/cons102.html

In collaboration with: tegge@
Candidate for RELENG_4 and RELENG_5
MFC after: 2 weeks
2005-01-15 21:12:47 +00:00
phk
cc0cbc6b34 Eliminate unused and unnecessary "cred" argument from vinvalbuf() 2005-01-14 07:33:51 +00:00
phk
da2718f1af Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().
I'm not sure why a credential was added to these in the first place, it is
not used anywhere and it doesn't make much sense:

	The credentials for syncing a file (ability to write to the
	file) should be checked at the system call level.

	Credentials for syncing one or more filesystems ("none")
	should be checked at the system call level as well.

	If the filesystem implementation needs a particular credential
	to carry out the syncing it would logically have to the
	cached mount credential, or a credential cached along with
	any delayed write data.

Discussed with:	rwatson
2005-01-11 07:36:22 +00:00
alc
8c07cfa5a0 Move the acquisition and release of the page queues lock outside of a loop
in vm_object_split() to avoid repeated acquisition and release.
2005-01-08 23:41:11 +00:00
imp
f0bf889d0d /* -> /*- for license, minor formatting changes 2005-01-07 02:29:27 +00:00
alc
c4f7988cf9 Eliminate another unnecessary call to vm_page_busy(). (See revision 1.333
for a detailed explanation.)
2004-12-17 18:54:51 +00:00
alc
40ad9ef99b With the removal of kern/uipc_jumbo.c and sys/jumbo.h,
vm_object_allocate_wait() is not used.  Remove it.
2004-12-08 05:01:47 +00:00
alc
6314cca720 Eliminate an unnecessary atomic operation. Articulate the rationale in
a comment.
2004-11-06 21:48:45 +00:00
alc
4030274372 Move a call to wakeup() from vm_object_terminate() to vnode_pager_dealloc()
because this call is only needed to wake threads that slept when they
discovered a dead object connected to a vnode.  To eliminate unnecessary
calls to wakeup() by vnode_pager_dealloc(), introduce a new flag,
OBJ_DISCONNECTWNT.

Reviewed by: tegge@
2004-11-06 05:33:02 +00:00
alc
96eb8f832a Eliminate another unnecessary call to vm_page_busy() that immediately
precedes a call to vm_page_rename().  (See the previous revision for a
detailed explanation.)
2004-11-05 05:40:45 +00:00
alc
25b80a64b9 The synchronization provided by vm object locking has eliminated the
need for most calls to vm_page_busy().  Specifically, most calls to
vm_page_busy() occur immediately prior to a call to vm_page_remove().
In such cases, the containing vm object is locked across both calls.
Consequently, the setting of the vm page's PG_BUSY flag is not even
visible to other threads that are following the synchronization
protocol.

This change (1) eliminates the calls to vm_page_busy() that
immediately precede a call to vm_page_remove() or functions, such as
vm_page_free() and vm_page_rename(), that call it and (2) relaxes the
requirement in vm_page_remove() that the vm page's PG_BUSY flag is
set.  Now, the vm page's PG_BUSY flag is set only when the vm object
lock is released while the vm page is still in transition.  Typically,
this is when it is undergoing I/O.
2004-11-03 20:17:31 +00:00
alc
0dc141e15d Move the acquisition and release of the lock on the object at the head of
the shadow chain outside of the loop in vm_object_madvise(), reducing the
number of times that this lock is acquired and released.
2004-08-29 20:14:10 +00:00
green
9532ab7116 * Add a "how" argument to uma_zone constructors and initialization functions
so that they know whether the allocation is supposed to be able to sleep
  or not.
* Allow uma_zone constructors and initialation functions to return either
  success or error.  Almost all of the ones in the tree currently return
  success unconditionally, but mbuf is a notable exception: the packet
  zone constructor wants to be able to fail if it cannot suballocate an
  mbuf cluster, and the mbuf allocators want to be able to fail in general
  in a MAC kernel if the MAC mbuf initializer fails.  This fixes the
  panics people are seeing when they run out of memory for mbuf clusters.
* Allow debug.nosleepwithlocks on WITNESS to be disabled, without changing
  the default.

Both bmilekic and jeff have reviewed the changes made to make failable
zone allocations work.
2004-08-02 00:18:36 +00:00
dfr
f4b42f3d57 Fix handling of msync(2) for character special files.
Submitted by: nvidia
2004-07-30 11:08:02 +00:00
alc
9507df06e3 Correct a very old error in both vm_object_madvise() (originating in
vm/vm_object.c revision 1.88) and vm_object_sync() (originating in
vm/vm_map.c revision 1.36): When descending a chain of backing objects,
both use the wrong object's backing offset.  Consequently, both may
operate on the wrong pages.

Quoting Matt, "This could be responsible for all of the sporatic madvise
oddness that has been reported over the years."

Reviewed by:	Matt Dillon
2004-07-28 18:23:08 +00:00
alc
6bb1a6edcd Remove spl calls. 2004-07-25 19:28:10 +00:00
alc
035cd2c09d Make the code and comments for vm_object_coalesce() consistent. 2004-07-25 07:48:47 +00:00
alc
c7df7afd46 - Change uma_zone_set_obj() to call kmem_alloc_nofault() instead of
kmem_alloc_pageable().  The difference between these is that an errant
   memory access to the zone will be detected sooner with
   kmem_alloc_nofault().

The following changes serve to eliminate the following lock-order
reversal reported by witness:

 1st 0xc1a3c084 vm object (vm object) @ vm/swap_pager.c:1311
 2nd 0xc07acb00 swap_pager swhash (swap_pager swhash) @ vm/swap_pager.c:1797
 3rd 0xc1804bdc vm object (vm object) @ vm/uma_core.c:931

There is no potential deadlock in this case.  However, witness is unable
to recognize this because vm objects used by UMA have the same type as
ordinary vm objects.  To remedy this, we make the following changes:

 - Add a mutex type argument to VM_OBJECT_LOCK_INIT().
 - Use the mutex type argument to assign distinct types to special
   vm objects such as the kernel object, kmem object, and UMA objects.
 - Define a static swap zone object for use by UMA.  (Only static
   objects are assigned a special mutex type.)
2004-07-22 19:44:49 +00:00
tegge
16dc5859f2 Initialize result->backing_object_offset before linking result onto the list of
vm objects shadowing source in vm_object_shadow().  This closes a race where
vm_object_collapse() could be called with a partially uninitialized object
argument causing symptoms that looked like hardware problems, e.g.  signal 6,
10, 11 or a /bin/sh busy-waiting for a nonexistant child process.
2004-06-28 20:26:35 +00:00
des
c1e01a1433 MFS: vm_map.c rev 1.187.2.27 through 1.187.2.29, fix MS_INVALIDATE
semantics but provide a sysctl knob for reverting to old ones.
2004-05-25 18:40:53 +00:00
imp
04c9462c02 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-06 20:15:37 +00:00
alc
81dacfa43e Implement a work around for the deadlock avoidance case in
vm_object_deallocate() so that it doesn't spin forever either.

Submitted by:	bde
2004-03-08 03:54:36 +00:00
alc
28e2e6f950 Correct a long-standing race condition in vm_object_page_remove() that
could result in a dirty page being unintentionally freed.

Reviewed by:	tegge
MFC after:	7 days
2004-02-22 03:36:51 +00:00
alc
12eb36a03f Don't acquire Giant in vm_object_deallocate() unless the object is vnode-
backed.
2004-01-18 03:44:14 +00:00
alc
343d2552f5 Revision 1.74 of vm_meter.c ("Avoid lock-order reversal") makes the release
and subsequent reacquisition of the same vm object lock in
vm_object_collapse() unnecessary.
2004-01-02 19:57:45 +00:00
alc
8218d18537 - Modify vm_object_split() to expect a locked vm object on entry and
return on a locked vm object on exit.  Remove GIANT_REQUIRED.
 - Eliminate some unnecessary local variables from vm_object_split().
2003-12-30 22:28:36 +00:00
alc
269cf5aa09 - Rename vm_map_clean() to vm_map_sync(). This better reflects the fact
that msync(2) is its only caller.
 - Migrate the parts of the old vm_map_clean() that examined the internals
   of a vm object to a new function vm_object_sync() that is implemented in
   vm_object.c.  At the same, introduce the necessary vm object locking so
   that vm_map_sync() and vm_object_sync() can be called without Giant.

Reviewed by:	tegge
2003-11-09 05:25:35 +00:00
alc
f2e8aed3e6 - Increase the scope of two vm object locks in vm_object_split(). 2003-11-02 22:52:42 +00:00
alc
28e8cd183d - Introduce and use vm_object_reference_locked(). Unlike
vm_object_reference(), this function must not be used to reanimate dead
   vm objects.  This restriction simplifies locking.

Reviewed by:	tegge
2003-11-02 21:30:10 +00:00
alc
75da97558f - Increase the scope of two vm object locks in vm_object_collapse().
- Remove the acquisition and release of Giant from vm_object_coalesce().
2003-11-01 23:06:41 +00:00
alc
716130a6f9 - Modify swap_pager_copy() and its callers such that the source and
destination objects are locked on entry and exit.  Add comments to
   the callers noting that the locks can be released by swap_pager_copy().
 - Remove several instances of GIANT_REQUIRED.
2003-11-01 08:57:26 +00:00
alc
8ded4dfb69 - Additional vm object locking in vm_object_split()
- New vm object locking assertions in vm_page_insert() and
   vm_object_set_writeable_dirty()
2003-11-01 04:54:23 +00:00
alc
ab63139b09 - Revert a part of revision 1.73: Make vm_object_set_flag() an inline
function.  This function is so trivial that inlining reduces the size
   of the kernel.
2003-10-31 20:17:00 +00:00
alc
be546fdee4 - Take advantage of the swap pager locking: Eliminate the use of Giant
from vm_object_madvise().
 - Remove excessive blank lines from vm_object_madvise().
2003-10-31 18:32:03 +00:00
alc
dd1d2bd790 - Simplify vm_object_collapse()'s collapse case, reducing the number
of lock acquires and releases performed.
 - Move an assertion from vm_object_collapse() to vm_object_zdtor()
   because it applies to all cases of object destruction.
2003-10-26 06:29:26 +00:00
alc
bccf1d15ab - Increase the object lock's scope in vm_contig_launder() so that access
to the object's type field and the call to vm_pageout_flush() are
   synchronized.
 - The above change allows for the eliminaton of the last parameter
   to vm_pageout_flush().
 - Synchronize access to the page's valid field in vm_pageout_flush()
   using the containing object's lock.
2003-10-18 21:09:21 +00:00
jeff
25821dd99f - Use the UMA_ZONE_VM flag on the fakepg and object zones to prevent
vm recursion and LORs.  This may be necessary for other zones created in
   the vm but this needs to be verified.
2003-10-04 14:21:53 +00:00
alc
1283b3d480 Remove GIANT_REQUIRED from vm_object_shadow(). 2003-09-17 07:00:14 +00:00
alc
38e47abc4a Eliminate the use of Giant from vm_object_reference(). 2003-09-15 05:58:27 +00:00
alc
b182e8a6eb There is no need for an atomic increment on the vm object's generation
count in _vm_object_allocate().  (Access to the generation count is
governed by the vm object's lock.)  Note: the introduction of the
atomic increment in revision 1.238 appears to be an accident.  The
purpose of that commit was to fix an Alpha-specific bug in UMA's
debugging code.
2003-09-13 20:07:26 +00:00
phk
df05426cf5 Remove an unused variable. 2003-08-06 12:09:34 +00:00
alc
56615188dc Allow vm_object_reference() on kernel_object without Giant. 2003-07-27 05:43:58 +00:00
phk
df7d325032 Don't inline very large functions.
Gcc has silently not been doing this for a long time.
2003-07-22 09:27:58 +00:00