Commit Graph

263 Commits

Author SHA1 Message Date
David Schultz
8bc61209d4 Fix the last known race in swapoff(), which could lead to a spurious panic:
swapoff: failed to locate %d swap blocks

The race occurred because putpages() can block between the time it
allocates swap space and the time it updates the swap metadata to
associate that space with a vm_object, so swapoff() would complain
about the temporary inconsistency.  I hoped to fix this by making
swp_pager_getswapspace() and swp_pager_meta_build() a single atomic
operation, but that proved to be inconvenient.  With this change,
swapoff() simply doesn't attempt to be so clever about detecting when
all the pageout activity to the target device should have drained.
2004-11-06 07:17:50 +00:00
David Schultz
b3fed13e9d Close a race in swapoff(). Here are the gory details:
In order to avoid livelock, swapoff() skips over objects with a
  nonzero pip count and makes another pass if necessary.  Since it is
  impossible to know which objects we care about, it would choose an
  arbitrary object with a nonzero pip count and wait for it before
  making another pass, the theory being that this object would finish
  paging about as quickly as the ones we care about.  Unfortunately,
  we may have slept since we acquired a reference to this object.
  Hack around this problem by tsleep()ing on the pointer anyway, but
  timeout after a fixed interval.  More elegant solutions are possible,
  but the ones I considered unnecessarily complicate this rare case.

Also, kill some nits that seem to have crept into the swapoff() code
in the last 75 revisions or so:

- Don't pass both sp and sp->sw_used to swap_pager_swapoff(), since
  the latter can be derived from the former.

- Replace swp_pager_find_dev() with something simpler.  There's no
  need to iterate over the entire list of swap devices just to determine
  if a given block is assigned to the one we're interested in.

- Expand the scope of the swhash_mtx in a couple of places so that it
  isn't released and reacquired once for every hash bucket.

- Don't drop the swhash_mtx while holding a reference to an object.
  We need to lock the object first.  Unfortunately, doing so would
  violate the established lock order, so use VM_OBJECT_TRYLOCK() and
  try again on a subsequent pass if the object is already locked.

- Refactor swp_pager_force_pagein() and swap_pager_swapoff() a bit.
2004-11-05 05:36:56 +00:00
Poul-Henning Kamp
c5d3d25e4f De-couple our I/O bio request from the embedded bio in buf by explicitly
copying the fields.
2004-11-04 08:38:07 +00:00
Poul-Henning Kamp
c569065139 Remove buf->b_dev field. 2004-11-04 07:59:57 +00:00
Poul-Henning Kamp
b792bebeea Move the buffer method vector (buf->b_op) to the bufobj.
Extend it with a strategy method.

Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY
song and dance.

Rename ibwrite to bufwrite().

Move the two NFS buf_ops to more sensible places, add bufstrategy
to them.

Add inlines for bwrite() and bstrategy() which calls through
buf->b_bufobj->b_ops->b_{write,strategy}().

Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().
2004-10-24 20:03:41 +00:00
Poul-Henning Kamp
494eb176e7 Add b_bufobj to struct buf which eventually will eliminate the need for b_vp.
Initialize b_bufobj for all buffers.

Make incore() and gbincore() take a bufobj instead of a vnode.

Make inmem() local to vfs_bio.c

Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj)
also VI_MTX() to BO_MTX(),

Make buf_vlist_add() take a bufobj instead of a vnode.

Eliminate other uses of bp->b_vp where bp->b_bufobj will do.

Various minor polishing: remove "register", turn panic into KASSERT,
use new function declarations, TAILQ_FOREACH_SAFE() etc.
2004-10-22 08:47:20 +00:00
Poul-Henning Kamp
a76d8f4ec9 Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT
Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write
count on a bufobj.  Bufobj_wdrop() replaces vwakeup().

Use these functions all relevant places except in ffs_softdep.c where
the use if interlocked_sleep() makes this impossible.

Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.
2004-10-21 15:53:54 +00:00
David Schultz
f6bcadc4fc Don't look for swap blocks in objects that aren't swap-backed.
I expect that this will fix the following panic, reported by Jun:
	swap_pager_isswapped: failed to locate all swap meta blocks

MT5 candidate
2004-09-24 16:04:20 +00:00
Poul-Henning Kamp
5721c9c76a Tag all geom classes in the tree with a version number. 2004-08-08 07:57:53 +00:00
Alan Cox
5285558ac2 - Change uma_zone_set_obj() to call kmem_alloc_nofault() instead of
kmem_alloc_pageable().  The difference between these is that an errant
   memory access to the zone will be detected sooner with
   kmem_alloc_nofault().

The following changes serve to eliminate the following lock-order
reversal reported by witness:

 1st 0xc1a3c084 vm object (vm object) @ vm/swap_pager.c:1311
 2nd 0xc07acb00 swap_pager swhash (swap_pager swhash) @ vm/swap_pager.c:1797
 3rd 0xc1804bdc vm object (vm object) @ vm/uma_core.c:931

There is no potential deadlock in this case.  However, witness is unable
to recognize this because vm objects used by UMA have the same type as
ordinary vm objects.  To remedy this, we make the following changes:

 - Add a mutex type argument to VM_OBJECT_LOCK_INIT().
 - Use the mutex type argument to assign distinct types to special
   vm objects such as the kernel object, kmem object, and UMA objects.
 - Define a static swap zone object for use by UMA.  (Only static
   objects are assigned a special mutex type.)
2004-07-22 19:44:49 +00:00
Bruce M Simpson
9bd86a9861 Properly brucify a string by outdenting it. 2004-07-06 02:27:30 +00:00
Bruce M Simpson
0e3fe6e3e6 In swap_pager_getpages(), bp->b_dev can be NULL, particularly for the
case of NFS mounted swap, so do not try to dereference it.

While we're here, brucify the printf() call which happens when we
time out on acquisition of vm_page_queue_mtx.

PR:		kern/67898
Submitted by:	bde (style)
2004-06-23 15:15:07 +00:00
Poul-Henning Kamp
f3732fd15b Second half of the dev_t cleanup.
The big lines are:
	NODEV -> NULL
	NOUDEV -> NODEV
	udev_t -> dev_t
	udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.
2004-06-17 17:16:53 +00:00
Poul-Henning Kamp
89c9c53da0 Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.
2004-06-16 09:47:26 +00:00
Alan Cox
5a32489377 Make vm_page's PG_ZERO flag immutable between the time of the page's
allocation and deallocation.  This flag's principal use is shortly after
allocation.  For such cases, clearing the flag is pointless.  The only
unusual use of PG_ZERO is in vfs_bio_clrbuf().  However, allocbuf() never
requests a prezeroed page.  So, vfs_bio_clrbuf() never sees a prezeroed
page.

Reviewed by:	tegge@
2004-05-06 05:03:23 +00:00
Alan Cox
2c840b1f65 - Substitute bdone() and bwait() from vfs_bio.c for
swap_pager_putpages()'s buffer completion code.  Note: the only
   difference between swp_pager_sync_iodone() and bdone(), aside from
   the locking in the latter, was the unnecessary clearing of B_ASYNC.
 - Remove an unnecessary pmap_page_protect() from
   swp_pager_async_iodone().

Reviewed by:	tegge
2004-02-23 03:15:13 +00:00
Poul-Henning Kamp
d2bae332d6 Remove the absolute count g_access_abs() function since experience has
shown that it is not useful.

Rename the relative count g_access_rel() function to g_access(), only
the name has changed.

Change all g_access_rel() calls in our CVS tree to call g_access() instead.

Add an #ifndef BURN_BRIDGES #define of g_access_rel() for source
code compatibility.
2004-02-12 22:42:11 +00:00
Alan Cox
c5aebf380c swp_pager_async_iodone() no longer requires Giant. Modify bufdone()
and swapgeom_done() to perform swp_pager_async_iodone() without Giant.

Reviewed by:	tegge
2004-02-07 08:54:50 +00:00
Poul-Henning Kamp
3e5b686160 Check error return from g_clone_bio(). (netchild@)
Add XXX comment about why this is still not optimal. (phk@)

Submitted by:	netchild@
2004-02-02 13:08:03 +00:00
Alan Cox
7dea2c2e3b 1. Statically initialize swap_pager_full and swap_pager_almost_full to the
full state.  (When swap is added their state will change appropriately.)
2. Set swap_pager_full and swap_pager_almost_full to the full state when
   the last swap device is removed.
Combined these changes eliminate nonsense messages from the kernel on swap-
less machines.

Item 2 submitted by:	Divacky Roman <xdivac02@stud.fit.vutbr.cz>
Prodding by:		phk
2004-01-24 21:31:06 +00:00
Alan Cox
2f7af3db57 Simplify the various pager allocation routines by computing the desired
object size once and assigning that value to a local variable.
2004-01-04 20:55:15 +00:00
Alan Cox
e793b7797d Reduce the scope of Giant in swap_pager_alloc(). 2004-01-03 20:02:17 +00:00
Alan Cox
bd228075c7 Remove swap_pager_un_object_list; it is unused. 2003-12-29 04:21:44 +00:00
Alan Cox
c7c8dd7e80 - Modify swap_pager_copy() and its callers such that the source and
destination objects are locked on entry and exit.  Add comments to
   the callers noting that the locks can be released by swap_pager_copy().
 - Remove several instances of GIANT_REQUIRED.
2003-11-01 08:57:26 +00:00
Alan Cox
2928cef7e1 - Synchronize access to the swdevt's sw_flags with sw_dev_mtx.
- Remove several instances of GIANT_REQUIRED.
2003-10-31 05:18:45 +00:00
Alan Cox
7645e88596 - Synchronize access to the swdevt's sw_blist with sw_dev_mtx.
- Remove several instances of GIANT_REQUIRED.
2003-10-30 09:12:43 +00:00
Alan Cox
d05bc12976 - Synchronize access to swdevhd using sw_dev_mtx.
- Use swp_sizecheck() rather than assignment to swap_pager_full in
   swaponsomething().
2003-10-30 07:11:06 +00:00
Alan Cox
0676a140b2 - Synchronize updates to nswapdev using sw_dev_mtx. 2003-10-29 07:51:41 +00:00
Alan Cox
2d9974c1e8 - Avoid a race in swaponsomething(): Calculate the new swdevt's first and
end swblk and insert this new swdevt into the list of swap devices
   in the same critical section.
2003-10-29 05:42:28 +00:00
Alan Cox
d536c58f53 - Complete the synchronization of accesses to the swblock hash table. 2003-10-27 05:58:15 +00:00
Alan Cox
7827d9b0fe - Introduce and use a mutex synchronizing access to the swblock hash table. 2003-10-26 19:55:35 +00:00
Alan Cox
ee3dc7d7fe - Add some of the required vm object locking, including assertions where
the vm object lock is required and already held.
2003-10-25 23:42:17 +00:00
Alan Cox
2e3b314d3a - Push down Giant from vm_pageout() to vm_pageout_scan(), freeing
vm_pageout_page_stats() from Giant.
 - Modify vm_pager_put_pages() and vm_pager_page_unswapped() to expect the
   vm object to be locked on entry.  (All of the pager routines now expect
   this.)
2003-10-24 06:43:04 +00:00
Poul-Henning Kamp
2c18019f14 DuH!
bp->b_iooffset (the spot on the disk), not bp->b_offset (the offset in
the file)
2003-10-18 14:10:28 +00:00
Poul-Henning Kamp
9fbf91c0dd Initialize bp->b_offset before calling VOP_[SPEC]STRATEGY().
Remove stale comment about B_PHYS.
2003-10-18 11:11:05 +00:00
Poul-Henning Kamp
afeb65e61d Don't open with exclusive bit, swapon(8) wants to trash our swapdev.
Add XXX comment with a rating of this concept.
2003-09-02 05:53:44 +00:00
Poul-Henning Kamp
dee34ca4fc Add a close() method to a swapdev.
Add a GEOM based backend.

Remove the device/VOP_SPECSTRATEGY() based backend.
2003-08-30 16:44:26 +00:00
Poul-Henning Kamp
20da9c2eaf Protect the swapdevice tailq with a mutex.
Store the udev_t we will report to userland in the swdevt.
2003-08-30 16:10:28 +00:00
Poul-Henning Kamp
59efee01a3 Continue the objectification of the swapdev backends:
Remove the vnode and dev_t fields and replace them with a void *.

Introduce separate strategy functions for devices and regular (NFS)
vnodes.

For devices we don't need the vnode v_numoutput stuff.

Add a generic swaponsomething() function to add a swapdevice and
split the remainder of swaponvp() into swaponvp() and swapondev()
which calls this backend.
2003-08-30 11:33:25 +00:00
Poul-Henning Kamp
4b03903a46 Make the strategy function a method of the individual swapdev. 2003-08-30 09:42:00 +00:00
Poul-Henning Kamp
2f249180f5 Consistent use modern function definitions 2003-08-30 08:32:42 +00:00
Poul-Henning Kamp
395714feb7 Eliminate unnecessary udev_t variable: we can derive it from the dev_t
when we need it.
2003-08-15 13:14:25 +00:00
Poul-Henning Kamp
89dc784fa3 Make swaponvp() static to the swap_pager. 2003-08-15 12:04:29 +00:00
Poul-Henning Kamp
ef3c5abdba Make the first two pages magic to protect the BSD labels rather than
only one.
2003-08-06 14:13:38 +00:00
Poul-Henning Kamp
751221fd32 Staticize swap_pager_putpages()
Eliminate a lot of checkes to make sure requests are not cross-device
which is unnecessary with the new layout.  We know a sequential request
cannot possibly be cross-device because there is a reserved page between
the devices.

Remove a couple of comments which no longer are relevant.
2003-08-06 12:08:27 +00:00
Poul-Henning Kamp
5e04322a6e Explicitly set B_PAGING 2003-08-06 09:22:47 +00:00
Poul-Henning Kamp
c37a77ee86 Rip out the totally bogos vnode swapdev_vp with extreeme prejudice.
Don't mark buffers with B_KEEPGIANT, we don't drop giant in strategy
at this point in time.
2003-08-06 06:53:31 +00:00
Poul-Henning Kamp
e04e4bacf6 Use sparse struct initialization for struct pagerops.
Mark our buffers B_KEEPGIANT before sending them downstream.

Remove swap_pager_strategy implementation.
2003-08-05 06:54:56 +00:00
Poul-Henning Kamp
665c0caf03 Put an uncovered page between the swap devices, that way we can be sure
to not get any cross-device I/O requests.  (The unallocated first page
protecting BSD labels already gave us this, but that hack may go away
at some point in time).

Remove the check for cross-device I/O requests in swap_pager_strategy.

Move the repeated statistics updating into flushchainbuf().
2003-08-04 08:22:49 +00:00
Poul-Henning Kamp
12692209a6 Name swap_pager_find_dev() more correctly swp_pager_finde_dev().
Use ->bio_children to count child buffers, rather than abuse the
bio_caller1 pointer.

Expand the relevant bits of waitchainbuf() inline, this clarifies
the code a little bit.
2003-08-03 21:22:42 +00:00