Commit Graph

2102 Commits

Author SHA1 Message Date
Poul-Henning Kamp
dfd4be14bd Try to unbreak the vnode locking around vop_reclaim() (based mostly on
patch from kan@).

Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on
close.  This is not yet a generally safe function, but for this very
specific use it is safe.  This solves the problem with buffers not
being flushed by unmount or after failed mount attempts.
2005-02-19 11:44:57 +00:00
Bosko Milekic
8076cb5289 Well, it seems that I pre-maturely removed the "All rights reserved"
statement from some files, so re-add it for the moment, until the
related legalese is sorted out.  This change affects:

sys/kern/kern_mbuf.c
sys/vm/memguard.c
sys/vm/memguard.h
sys/vm/uma.h
sys/vm/uma_core.c
sys/vm/uma_dbg.c
sys/vm/uma_dbg.h
sys/vm/uma_int.h
2005-02-16 21:45:59 +00:00
Bosko Milekic
500f29d06e Make UMA set the overloaded page->object back to kmem_object for
UMA_ZONE_REFCNT and UMA_ZONE_MALLOC zones, as the page(s) undoubtedly
came from kmem_map for those two.  Previously it would set it back
to NULL for UMA_ZONE_REFCNT zones and although this was probably not
fatal, it added MORE code for no reason.
2005-02-16 20:06:11 +00:00
Bosko Milekic
7fae6a1116 Rather than overloading the page->object field like UMA does, use instead
an unused pageq queue reference in the page structure to stash a pointer
to the MemGuard FIFO.  Using the page->object field caused problems
because when vm_map_protect() was called the second time to set
VM_PROT_DEFAULT back onto a set of pages in memguard_map, the protection
in the VM would be changed but the PMAP code would lazily not restore
the PG_RW bit on the underlying pages right away (see pmap_protect()).
So when a page fault finally occured and the VM noticed the faulting
address corresponds to a page that _does_ have write access now, it
would then call into PMAP to set back PG_RW (i386 case being discussed
here).  However, before it got to do that, an assertion on the object
lock not being owned would get triggered, as the object of the faulting
page would need to be locked but was overloaded by MemGuard.  This is
precisely why MemGuard cannot overload page->object.

Submitted by: Alan Cox (alc@)
2005-02-15 22:17:07 +00:00
Poul-Henning Kamp
7fbdc92113 sysctl node vm.stats can not be static (for ia64 reasons). 2005-02-11 16:34:14 +00:00
Bosko Milekic
0341256576 Implement support for buffers larger than PAGE_SIZE in MemGuard. Adds
a little bit of complexity but performance requirements lacking (this is
a debugging allocator after all), it's really not too bad (still
only 317 lines).

Also add an additional check to help catch really weird 3-threads-involved
races: make memguard_free() write to the first page handed back, always,
before it does anything else.

Note that there is still a problem in VM+PMAP (specifically with
vm_map_protect) w.r.t. MemGuard uses it, but this will be fixed shortly
and this change stands on its own.
2005-02-10 22:36:05 +00:00
Poul-Henning Kamp
39a79f0c01 Make three SYSCTL_NODEs static 2005-02-10 12:18:36 +00:00
Poul-Henning Kamp
253de0a143 Make npages static and const. 2005-02-10 12:18:17 +00:00
Suleiman Souhlal
81ae703462 Set the scheduling class of the zeroidle thread to PRI_IDLE.
Reviewed by:	jhb
Approved by:	grehan (mentor)
MFC after:	1 week
2005-02-04 06:18:31 +00:00
Alan Cox
8e99783b25 Update the text of an assertion to reflect changes made in revision 1.148.
Submitted by: tegge

Eliminate an unnecessary, temporary increment of the backing object's
reference count in vm_object_qcollapse().  Reviewed by: tegge
2005-01-30 21:29:47 +00:00
Poul-Henning Kamp
7146d6cb3e Move the contents of vop_stddestroyvobject() to the new vnode_pager
function vnode_destroy_vobject().

Make the new function zero the vp->v_object pointer so we can tell
if a call is missing.
2005-01-28 08:56:48 +00:00
Poul-Henning Kamp
8516dd18e1 Don't use VOP_GETVOBJECT, use vp->v_object directly. 2005-01-25 00:40:01 +00:00
Poul-Henning Kamp
d07a6d3f61 Move the body of vop_stdcreatevobject() over to the vnode_pager under
the name Sande^H^H^H^H^Hvnode_create_vobject().

Make the new function take a size argument which removes the need for
a VOP_STAT() or a very pessimistic guess for disks.

Call that new function from vop_stdcreatevobject().

Make vnode_pager_alloc() private now that its only user came home.
2005-01-24 21:21:59 +00:00
Poul-Henning Kamp
35764be39e Kill the VV_OBJBUF and test the v_object for NULL instead. 2005-01-24 13:13:57 +00:00
Jeff Roberson
ae51ff1127 - Remove GIANT_REQUIRED where giant is no longer required.
- Use VFS_LOCK_GIANT() rather than directly acquiring giant in places
   where giant is only held because vfs requires it.

Sponsored By:   Isilon Systems, Inc.
2005-01-24 10:48:29 +00:00
Alan Cox
75337a5677 Guard against address wrap in kernacc(). Otherwise, a program accessing a
bad address range through /dev/kmem can panic the machine.

Submitted by: Mark W. Krentel
Reported by: Kris Kennaway
MFC after: 1 week
2005-01-22 19:21:29 +00:00
Bosko Milekic
eca64e79b5 s/round_page/trunc_page/g
I meant trunc_page.  It's only a coincidence this hasn't caused
problems yet.

Pointed out by: Antoine Brodin <antoine.brodin@laposte.net>
2005-01-22 00:09:34 +00:00
Bosko Milekic
e4eb384b47 Bring in MemGuard, a very simple and small replacement allocator
designed to help detect tamper-after-free scenarios, a problem more
and more common and likely with multithreaded kernels where race
conditions are more prevalent.

Currently MemGuard can only take over malloc()/realloc()/free() for
particular (a) malloc type(s) and the code brought in with this
change manually instruments it to take over M_SUBPROC allocations
as an example.  If you are planning to use it, for now you must:

	1) Put "options DEBUG_MEMGUARD" in your kernel config.
	2) Edit src/sys/kern/kern_malloc.c manually, look for
	   "XXX CHANGEME" and replace the M_SUBPROC comparison with
	   the appropriate malloc type (this might require additional
	   but small/simple code modification if, say, the malloc type
	   is declared out of scope).
	3) Build and install your kernel.  Tune vm.memguard_divisor
	   boot-time tunable which is used to scale how much of kmem_map
	   you want to allott for MemGuard's use.  The default is 10,
	   so kmem_size/10.

ToDo:
	1) Bring in a memguard(9) man page.
	2) Better instrumentation (e.g., boot-time) of MemGuard taking
	   over malloc types.
	3) Teach UMA about MemGuard to allow MemGuard to override zone
	   allocations too.
	4) Improve MemGuard if necessary.

This work is partly based on some old patches from Ian Dowse.
2005-01-21 18:09:17 +00:00
Alan Cox
986b43f845 Add checks to vm_map_findspace() to test for address wrap. The conditions
where this could occur are very rare, but possible.

Submitted by: Mark W. Krentel
MFC after: 2 weeks
2005-01-18 19:50:09 +00:00
Alan Cox
d936694f09 Consider three objects, O, BO, and BBO, where BO is O's backing object
and BBO is BO's backing object.  Now, suppose that O and BO are being
collapsed.  Furthermore, suppose that BO has been marked dead
(OBJ_DEAD) by vm_object_backing_scan() and that either
vm_object_backing_scan() has been forced to sleep due to encountering
a busy page or vm_object_collapse() has been forced to sleep due to
memory allocation in the swap pager.  If vm_object_deallocate() is
then called on BBO and BO is BBO's only shadow object,
vm_object_deallocate() will collapse BO and BBO.  In doing so, it adds
a necessary temporary reference to BO.  If this collapse also sleeps
and the prior collapse resumes first, the temporary reference will
cause vm_object_collapse to panic with the message "backing_object %p
was somehow re-referenced during collapse!"

Resolve this race by changing vm_object_deallocate() such that it
doesn't collapse BO and BBO if BO is marked dead.  Once O and BO are
collapsed, vm_object_collapse() will attempt to collapse O and BBO.
So, vm_object_deallocate() on BBO need do nothing.

Reported by: Peter Holm on 20050107
URL: http://www.holm.cc/stress/log/cons102.html

In collaboration with: tegge@
Candidate for RELENG_4 and RELENG_5
MFC after: 2 weeks
2005-01-15 21:12:47 +00:00
Poul-Henning Kamp
7c0745eeae Eliminate unused and unnecessary "cred" argument from vinvalbuf() 2005-01-14 07:33:51 +00:00
Poul-Henning Kamp
8df6bac4c7 Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().
I'm not sure why a credential was added to these in the first place, it is
not used anywhere and it doesn't make much sense:

	The credentials for syncing a file (ability to write to the
	file) should be checked at the system call level.

	Credentials for syncing one or more filesystems ("none")
	should be checked at the system call level as well.

	If the filesystem implementation needs a particular credential
	to carry out the syncing it would logically have to the
	cached mount credential, or a credential cached along with
	any delayed write data.

Discussed with:	rwatson
2005-01-11 07:36:22 +00:00
Bosko Milekic
c5c1b16ec5 While we want the recursion protection for the bucket zones so that
recursion from the VM is handled (and the calling code that allocates
buckets knows how to deal with it), we do not want to prevent allocation
from the slab header zones (slabzone and slabrefzone) if uk_recurse is
not zero for them.  The reason is that it could lead to NULL being
returned for the slab header allocations even in the M_WAITOK
case, and the caller can't handle that (this is also explained in a
comment with this commit).

The problem analysis is documented in our mailing lists:
http://docs.freebsd.org/cgi/getmsg.cgi?fetch=153445+0+archive/2004/freebsd-current/20041231.freebsd-current

(see entire thread for proper context).

Crash dump data provided by: Peter Holm <peter@holm.cc>
2005-01-11 03:33:09 +00:00
Stefan Farfeleder
1e183df21e ISO C requires at least one element in an initialiser list. 2005-01-10 20:30:04 +00:00
Alan Cox
5ba514bc89 Move the acquisition and release of the page queues lock outside of a loop
in vm_object_split() to avoid repeated acquisition and release.
2005-01-08 23:41:11 +00:00
Alan Cox
46fbc58202 Transfer responsibility for freeing the page taken from the cache
queue and (possibly) unlocking the containing object from
vm_page_alloc() to vm_page_select_cache().  Recent optimizations to
vm_map_pmap_enter() (see vm_map.c revisions 1.362 and 1.363) and
pmap_enter_quick() have resulted in panic()s because vm_page_alloc()
mistakenly unlocked objects that had not been locked by
vm_page_select_cache().

Reported by: Peter Holm and Kris Kennaway
2005-01-07 05:02:19 +00:00
Warner Losh
60727d8b86 /* -> /*- for license, minor formatting changes 2005-01-07 02:29:27 +00:00
Alan Cox
df2e33bf42 Revise the part of vm_pageout_scan() that moves pages from the cache
queue to the free queue.  With this change, if a page from the cache
queue belongs to a locked object, it is simply skipped over rather
than moved to the inactive queue.
2005-01-06 20:22:36 +00:00
Poul-Henning Kamp
4f8205e5d1 When allocating bio's in the swap_pager use M_WAITOK since the
alternative is much worse.
2005-01-03 13:28:56 +00:00
Alan Cox
0869d38ba6 Assert that page allocations during an interrupt specify
VM_ALLOC_INTERRUPT.

Assert that pages removed from the cache queue are not busy.
2004-12-31 19:50:45 +00:00
Alan Cox
7aa2190c8e Access to the page's busy field is (now) synchronized by the containing
object's lock.  Therefore, the assertion that the page queues lock is held
can be removed from vm_page_io_start().
2004-12-29 04:18:22 +00:00
Alan Cox
91f7a86064 Note that access to the page's busy count is synchronized by the containing
object's lock.
2004-12-27 05:27:59 +00:00
Alan Cox
40198b3c04 Assert that the vm object is locked on entry to vm_page_sleep_if_busy();
remove some unneeded code.
2004-12-26 21:46:44 +00:00
Bosko Milekic
7b8712053c Add my copyright and update Jeff's copyright on UMA source files,
as per his request.

Discussed with: Jeffrey Roberson
2004-12-26 00:35:12 +00:00
Poul-Henning Kamp
475e8cc394 fix comment 2004-12-25 21:30:41 +00:00
Alan Cox
a51b084059 Continue the transition from synchronizing access to the page's PG_BUSY
flag and busy field with the global page queues lock to synchronizing their
access with the containing object's lock.  Specifically, acquire the
containing object's lock before reading the page's PG_BUSY flag and busy
field in vm_fault().

Reviewed by: tegge@
2004-12-24 19:31:54 +00:00
Alan Cox
1f70d62298 Modify pmap_enter_quick() so that it expects the page queues to be locked
on entry and it assumes the responsibility for releasing the page queues
lock if it must sleep.

Remove a bogus comment from pmap_enter_quick().

Using the first change, modify vm_map_pmap_enter() so that the page queues
lock is acquired and released once, rather than each time that a page
is mapped.
2004-12-23 20:16:11 +00:00
Alan Cox
98fe9a0ddf Eliminate another unnecessary call to vm_page_busy(). (See revision 1.333
for a detailed explanation.)
2004-12-17 18:54:51 +00:00
Alan Cox
06c98c5dcc Enable debug.mpsafevm by default on alpha. 2004-12-17 17:17:36 +00:00
Alan Cox
85f5b24573 In the common case, pmap_enter_quick() completes without sleeping.
In such cases, the busying of the page and the unlocking of the
containing object by vm_map_pmap_enter() and vm_fault_prefault() is
unnecessary overhead.  To eliminate this overhead, this change
modifies pmap_enter_quick() so that it expects the object to be locked
on entry and it assumes the responsibility for busying the page and
unlocking the object if it must sleep.  Note: alpha, amd64, i386 and
ia64 are the only implementations optimized by this change; arm,
powerpc, and sparc64 still conservatively busy the page and unlock the
object within every pmap_enter_quick() call.

Additionally, this change is the first case where we synchronize
access to the page's PG_BUSY flag and busy field using the containing
object's lock rather than the global page queues lock.  (Modifications
to the page's PG_BUSY flag and busy field have asserted both locks for
several weeks, enabling an incremental transition.)
2004-12-15 19:55:05 +00:00
Alan Cox
90688d137c With the removal of kern/uipc_jumbo.c and sys/jumbo.h,
vm_object_allocate_wait() is not used.  Remove it.
2004-12-08 05:01:47 +00:00
Alan Cox
2ad036b657 Almost nine years ago, when support for 1TB files was introduced in
revision 1.55, the address parameter to vnode_pager_addr() was changed
from an unsigned 32-bit quantity to a signed 64-bit quantity.  However,
an out-of-range check on the address was not updated.  Consequently,
memory-mapped I/O on files greater than 2GB could cause a kernel panic.
Since the address is now a signed 64-bit quantity, the problem resolution
is simply to remove a cast.

Reviewed by: bde@ and tegge@
PR: 73010
MFC after: 1 week
2004-12-07 22:05:38 +00:00
Alan Cox
d8fed1d050 Correct a sanity check in vnode_pager_generic_putpages(). The cast used
to implement the sanity check should have been changed when we converted
the implementation of vm_pindex_t from 32 to 64 bits.  (Thus, RELENG_4 is
not affected.)  The consequence of this error would be a legimate write to
an extremely large file being treated as an errant attempt to write meta-
data.

Discussed with: tegge@
2004-12-05 21:48:11 +00:00
David Schultz
6004362e66 Don't include sys/user.h merely for its side-effect of recursively
including other headers.
2004-11-27 06:51:39 +00:00
Olivier Houchard
6fc96493ac Remove useless casts. 2004-11-26 15:04:26 +00:00
Xin LI
8e33bced3c Try to close a potential, but serious race in our VM subsystem.
Historically, our contigmalloc1() and contigmalloc2() assumes
that a page in PQ_CACHE can be unconditionally reused by busying
and freeing it.  Unfortunatelly, when object happens to be not
NULL, the code will set m->object to NULL and disregard the fact
that the page is actually in the VM page bucket, resulting in
page bucket hash table corruption and finally, a filesystem
corruption, or a 'page not in hash' panic.

This commit has borrowed the idea taken from DragonFlyBSD's fix
to the VM fix by Matthew Dillon[1].  This version of patch will
do the following checks:

	- When scanning pages in PQ_CACHE, check hold_count and
	  skip over pages that are held temporarily.
	- For pages in PQ_CACHE and selected as candidate of being
	  freed, check if it is busy at that time.

Note:  It seems that this is might be unrelated to kern/72539.

Obtained from:	DragonFlyBSD, sys/vm/vm_contig.c,v 1.11 and 1.12 [1]
Reminded by:	Matt Dillon
Reworked by:	alc
MFC After:	1 week
2004-11-24 18:56:13 +00:00
David Schultz
9799b417d5 Disable U area swapping and remove the routines that create, destroy,
copy, and swap U areas.

Reviewed by:	arch@
2004-11-20 02:29:00 +00:00
Poul-Henning Kamp
9c83534dd8 Make VOP_BMAP return a struct bufobj for the underlying storage device
instead of a vnode for it.

The vnode_pager does not and should not have any interest in what
the filesystem uses for backend.

(vfs_cluster doesn't use the backing store argument.)
2004-11-15 09:18:27 +00:00
Poul-Henning Kamp
5c6e573ffb Add pbgetbo()/pbrelbo() lighter weight versions of pbgetvp()/pbrelvp(). 2004-11-15 08:47:18 +00:00
Poul-Henning Kamp
287013d287 More kasserts. 2004-11-15 08:33:09 +00:00