34824 Commits

Author SHA1 Message Date
Matthew Dillon
a489f7614b Reorganized some of the low memory testing code to make it more useful.
Removed call to vm_object_collapse(), which can block.  This was being
    called without the pageout code holding any sort of reference on the
    vm_object or vm_page_t structures being manipulated.  Since this code
    can block, it was possible for other kernel code to shred the state
    the pageout code was assuming remained intact.

    Fixed potential blocking condition in vm_pageout_page_free() ( which
    could cause a deadlock in a low-memory situation ).

    Currently there is a hack in-place to deal with clean filesystem meta-data
    polluting the inactive page queue.  John doesn't like the hack, and neither
    do I.

    Revamped and commented a portion of the pageout loop.

    Added protection against potential memory deadlocks with OBJT_VNODE
    when using VOP_ISLOCKED().  The problem is that vp->v_data can be NULL
    which causes VOP_ISLOCKED() to return a less informed answer.

    remove vm_pager_sync() -- none of the pagers use it any more ( the old
    swapper used to.  The new one does not ).
1999-01-21 10:12:54 +00:00
Matthew Dillon
41dbefba28 The TAILQ hashq has been turned into a singly-linked=list link,
reducing the size of vm_page_t.

    SWAPBLK_NONE and SWAPBLK_MASK are defined here.  These actually are
    more generalized then their names imply, but their placement is somewhat
    of a legacy issue from a prior test version of this code that put
    the swapblk in the vm_page_t structure.  That test code was eventually
    thrown away.  The legacy remains.

    Added vm_page_flash() inline.  Similar to vm_page_wakeup() except that
    it does not clear PG_BUSY ( one assumes that PG_BUSY is already clear ).
    Used by a number of routines to wakeup waiters.

    Collapsed some of the code in inline calls to make other inline calls.
    GCC will optimize this well and it reduces duplication.

    vm_page_free() and vm_page_free_zero() inlines added to convert to
    the proper vm_page_free_toq() call.

    vm_page_sleep_busy() inline added, replacing vm_page_sleep() ( which has
    been removed ).  This implements a much more optimizable page-waiting
    function.
1999-01-21 10:06:24 +00:00
Matthew Dillon
060282de8a The hash table used to be a table of doubly-link list headers ( two
pointers per entry ).  The table has been changed to a singly linked
    list of vm_page_t pointers.  The table has been doubled in size, but
    the entries only take half the space so a net-zero change in memory use.

    The hash function has been changed, hopefully for the better.  The
    combination of the larger hash table size of changed function should
    keep the chain length down to a reasonable number (0-3, average 1).

    vm_object->page_hint has been removed.  This 'optimization' was not
    only never needed, but costs as much as a hash chain link to implement.
    While having page_hint in vm_object might result in better locality
    of reference, the cost is not worth the space in vm_object or the
    extra instructions in my view.

    vm_page_alloc*() functions have been inlined and call a generalized
    non-inlined vm_page_alloc_toq() which combines the standard alloc
    and zero-page alloc functions together, reducing code size and the L1
    cache footprint.  Some reordering has been done... not much.  The
    delinking code should be faster ( because unlinking a doubly-linked list
    requires four memory ops and unlinking a singly linked list only requires
    two ), and we get a hash consistancy check for free.

    vm_page_rename() now automatically sets the page's dirty bits.

    vm_page_alloc() does not try to manually inline freeing a cache page.
    Instead, it now properly calls vm_page_free(m) ... vm_page_free() is
    really too complex to manually inline.

    vm_await(), supporting asleep(), has been added.
1999-01-21 10:01:49 +00:00
Matthew Dillon
48f1335479 The vm_object structure is now somewhat smaller due to the removal
of most of the swap-pager-specific fields, the removal of the id,
    and the removal of paging_offset.

    A new inline, vm_object_pip_wakeupn() has been added to subtract an
    arbitrary number n from the paging_in_progress count and then wakeup
    waiters as necessary.  n may be 0, resulting in a 'flash'.
1999-01-21 09:51:21 +00:00
Matthew Dillon
7bc9e80ecc object->id was badly implemented. It has simply been removed.
object->paging_offset has been removed - it was used to optimize a
    single OBJT_SWAP collapse case yet introduced massive confusion throughout
    vm_object.c.  The optimization was inconsequential except for the
    claim that it didn't have to allocate any memory.  The optimization
    has been removed.

    madvise() has been fixed.  The old madvise() could be made to operate
    on shared objects which is a big no-no.  The new one is much more careful
    in what it modifies.  MADV_FREE was totally broken and has now been fixed.

    vm_page_rename() now automatically dirties a page, so explicit dirtying
    of the page prior to calling vm_page_rename() has been removed.
1999-01-21 09:46:55 +00:00
Matthew Dillon
e4ba1db60a Objects associated with raw devices are no longer counted in the VM stats
total because they may contain absurd numbers ( like the size of all
    of physical memory if you mmap() /dev/mem ).
1999-01-21 09:41:52 +00:00
Matthew Dillon
81522c62fa General cleanup related to the new pager. We no longer have to worry
about conversions of objects to OBJT_SWAP, it is done automatically
    now.

    Replaced manually inserted code with inline calls for busy waiting on
    pages, which also incidently fixes a potential PG_BUSY race due to
    the code not running at splvm().

    vm_objects no longer have a paging_offset field ( see vm/vm_object.c )
1999-01-21 09:40:48 +00:00
Matthew Dillon
1d58b2bc7d Potential bug fix, do not just clear PG_BUSY... call vm_page_wakeup()
instead to properly handle any waiters.

    Added comments, added support for M_ASLEEP.  Generally treat M_ flags
    as flags instead of constants to compare against.
1999-01-21 09:38:20 +00:00
Matthew Dillon
6de6079300 Removed low-memory blockages at fork. This is the wrong place to put
this sort of test.  We need to fix the low-memory handling in general.
1999-01-21 09:36:23 +00:00
Matthew Dillon
4c23ae0916 Mainly cleanup. Removed some inappropriate low-memory handling code
and added lots of comments.  Add tie-in to vm_pager ( and thus the
    new swapper ) to deallocate backing swap for dirtied pages on the fly.
1999-01-21 09:35:38 +00:00
Matthew Dillon
9f6fed9017 The default_pager's interaction with the swap_pager has been reorganized,
and the swap_pager has been completely replaced.

    The new swap pager uses the new blist radix-tree based bitmap allocator
    for low level swap allocation and deallocation.   The new allocator
    is effectively O(5) while the old one was O(N), and the new allocator
    allocates all required memory at init time rather then at allocate
    memory on the fly at run time.

    Swap metadata is allocated in clusters and stored in a hash table,
    eliminating linearly allocated structures.

    Many, many features have been rewritten or added.  Swap space is now
    reallocated on the fly providing a poor-mans auto defragmentation of
    swap space.  Swap space that is no longer needed is freed on a timely
    basis so no garbage collection is necessary.

    Swap I/O is marked B_ASYNC and NFS has been fixed to do the right
    thing with it, so NFS-based paging now has around 10x the performance
    as it did before ( previously NFS enforced synchronous I/O for paging ).
1999-01-21 09:33:07 +00:00
Matthew Dillon
50c7d0d513 Added support for VOP_FREEBLKS(), reducing MFS's impact on swap and
increasing performance by deallocating at least some of the backing
    store when files are removed.

    Protect mfsp->buf_queue access at splbio().
1999-01-21 09:27:03 +00:00
Matthew Dillon
3fe2487dfc Access to mfsp->buf_queue must be protected at splbio(). Other minor
adjustments also made, such as passing mfsp to mfs_doio() directly.
1999-01-21 09:24:46 +00:00
Eivind Eklund
053a2b61f7 Move EXT2FS to be more visible, and give it a description. Also make
the text from my last commit somewhat better.
1999-01-21 09:24:28 +00:00
Matthew Dillon
fa70685a6c Renamed M_KERNEL to a more appropriate M_USE_RESERVE. 1999-01-21 09:23:21 +00:00
Matthew Dillon
8618c644e4 The main operational changes are in getblk()'s handling of the
B_DELWRI and B_CACHE flags, fixing a bug that showed up with NFS.
    Also, a number of cases where manually inserted code has been removed
    and replaced with an inline function call giving us better functional
    isolation in the source.
1999-01-21 09:19:33 +00:00
Matthew Dillon
39ebd17269 The code that reclaims descriptors from in-transit unix domain
descriptor-passing messages was calling sorflush() without checking
    to see if the descriptor was actually a socket.  This can cause a
    crash by exiting programs that use the mechanism under certain
    circumstances.
1999-01-21 09:02:18 +00:00
Matthew Dillon
0069f505eb Fixed a potential bug ( but maybe not ), where sendfile() clears PG_BUSY
on a page without testing for waiters.  Also collapsed busy wait into
    new vm_page_sleep_busy() inline ( see vm/vm_page.h )
1999-01-21 09:00:26 +00:00
Matthew Dillon
3701b35988 This module was used only by the old swapper and has been #if'd out,
and will be eventually removed if no other use is found for it.
1999-01-21 08:58:41 +00:00
Matthew Dillon
1c7c3c6a86 This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
    fixes, several VM optimizations, and some additional revamping of the
    VM code.  The specific bug fixes will be documented with additional
    forced commits.  This commit is somewhat rough in regards to code
    cleanup issues.

Reviewed by:	"John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
1999-01-21 08:29:12 +00:00
Matthew Dillon
7090df5aed Add new blist module - radix tree based bitmap allocator with
size hinting.  Will be used by the new swapper.
1999-01-21 08:11:06 +00:00
Matthew Dillon
eedc343625 Update pstat -s to handle both old and new swapper.
Add pstat -ss to dump new swapper's radix tree.
1999-01-21 08:08:55 +00:00
Jordan K. Hubbard
391a151a77 This is now 4.0-current 1999-01-21 03:07:33 +00:00
Greg Lehey
99af619cb2 Update to reflect changes in kernel module
Remove references to LKMs
Change descriptions on read command (HEADS UP: this command has changed
  in an incompatible manner)
1999-01-21 00:55:28 +00:00
Greg Lehey
52e80fad69 Update to reflect major changes in vinum kernel module 1999-01-21 00:45:11 +00:00
Greg Lehey
0834a1f2db Update to reflect changes in kernel lkm 1999-01-21 00:44:11 +00:00
Greg Lehey
6ab8fef8a8 Remove -DRAID5 from CFLAGS 1999-01-21 00:43:00 +00:00
Greg Lehey
33953e5e97 Include Peter Wemm's renaming and restructuring
Change from lkm to kld

Add field plexsdno to sd struct

Add flag VF_NEWBORN to drive, sd, plex and volume structs, indicating
that the object has just been created.

Add object types for raw (unattached) plexes and subdisks

Remove definitions of VOLNO, PLEXNO and SDNO (now functions Volno,
Plexno and Sdno)

Move revive parameters from struct plex to struct sd.

struct plex:
  maintain a count of the number of inaccessible subdisks.
  remove defective and unmapped regions.

Debug flags: make an enum (previously #define)

Set default revive block size to 64kB (was 32 kB)
1999-01-21 00:41:58 +00:00
Greg Lehey
5d6f3129b1 Retain some of the RAID5 texts even in the non-RAID5 versions.
Previously, accidentally starting the wrong version could corrupt
  the RAID5 configuration.

  Add functions Volno, Plexno and Sdno to replace the old defines
  VOLNO, PLEXNO and SDNO.
1999-01-21 00:41:31 +00:00
Greg Lehey
8a2cba6003 Remove states:
plex_reviving (now we revive subdisks)
  drive_coming_up (never happened)

Add state sd_reviving.
1999-01-21 00:40:50 +00:00
Greg Lehey
6d930639c3 Include Peter Wemm's renaming and restructuring
Change from lkm to kld

Serious rewrite.  No longer call set_<foo>_state to set the state
based only on other objects; instead, add functions
update_<foo>_state, which determine what the state should be by
themselves.  This allows the set_<foo>_state functions to shrink
enough to be almost intelligible.

Remove flags setstate_recurse and setstate_recursing.

Remove plex defective regions and unmapped regions, which were
maintained but not used.

Change code to allow daemon to perform operations formerly kludged
into an interrupt context.  Remove the DIRTYCONFIG kludge.
1999-01-21 00:40:32 +00:00
Greg Lehey
2178c45997 Change the style of revive: revive subdisks instead of plexes. This
makes it easier to keep track of which parts of a plex need reviving.

Use sdio, not vinumstrategy, to write to the subdisk.
1999-01-21 00:40:03 +00:00
Greg Lehey
7d077233f8 Include Peter Wemm's renaming and restructuring
Change from lkm to kld

Remove #ifdefs for FreeBSD 2.c

vinumstrategy:
  Support anonymous (`raw') subdisks and plexes.
  Change code to allow daemon to perform operations formerly kludged
  into an interrupt context.  Remove the DIRTYCONFIG kludge.

No longer set B_ORDERED for reviving subdisks.  I suspect this
wouldn't work correctly, and it should be done in a different manner
in vinumrevive.c

sdio: set subdisk state correctly on error
      start to remove code that doesn't make any sense any more.
1999-01-21 00:39:36 +00:00
Greg Lehey
09c265ba75 Add keywords help, makedev, quit and setdaemon 1999-01-21 00:39:01 +00:00
Greg Lehey
cb4547a333 MMalloc: remove directory part of name correctly 1999-01-21 00:37:56 +00:00
Greg Lehey
ba40e00009 Create functions lockdrive and unlockdrive to handle drive locking 1999-01-21 00:37:38 +00:00
Greg Lehey
216e9f2046 Add keywords makedev, setdaemon and help. 1999-01-21 00:37:18 +00:00
Greg Lehey
2f4e1ffa11 Remove ioctls VINUM_GETUNMAPPED and VINUM_GETDEFECTIVE (we no longer
have regions).

Add definitions for VINUM_DAEMON, VINUM_FINDDAEMON and VINUM_DAEMON
1999-01-21 00:36:21 +00:00
Greg Lehey
c3dff4f036 Include Peter Wemm's renaming and restructuring
Remove #ifdefs for FreeBSD 2.c

Change from lkm to kld

correct type of `flags' in calls to set_drive_state.

set_drive_parms: handle anonymous drives correctly (remove them)

drive VOP functions: use the PID of the original opener to fool the
lock manager.

open_drive: be quiet about failures (they're normal when scanning the
partitions).

close_drive: lock drive before closing.

remove_drive: lock drive before deallocating.

read_drive_label: set drive up when all is OK

check_drive:
  Complete rewrite.  Offload most of the code to the new
  vinum_scandisk

format_config:
  use snprintf and %qd options to make much less emetic.
  Remove old supporting functions.

vinum_scandisk:
  Moved here from vinum.c
  Almost complete rewrite, incorporating much of what was check_drive.
  We still don't have a general way to find the drives on a system, so
  get the user to supply the names via the `read' command.  For each
  device, try each possible compatibility slice name (there's a danger
  of finding both /dev/da1h and /dev/da0s1h otherwise).  Sort the
  partitions found in reverse order of last update time and read them
  in, setting the `update' parameter to parse_config and descendents.

save_config: rename to daemon_save_config, since the function is now
called by the daemon.  Create a new function save_config which queues
the request with the daemon.

daemon_save_config:  some mods to allow for the unfamiliar
environment.
1999-01-21 00:35:35 +00:00
Greg Lehey
a01fbbe4ac Include Peter Wemm's renaming and restructuring
Change from lkm to kld

Recover from I/O errors by passing the request to the daemon, except
for plex I/O, which is not fault tolerant.
1999-01-21 00:34:14 +00:00
Greg Lehey
3a85d3f98a Remove old cruft 1999-01-21 00:33:47 +00:00
Greg Lehey
76714993c6 Include Peter Wemm's renaming and restructuring
Change from lkm to kld

Remove BROKEN_GDB kludge (it's not needed with klds)

Add code for interfacing with daemon

Modify device minor number encoding, use selector functions which also
permit anonymous plexes and subdisks.

Remove code for 2.x support.

Change messages to omit obvious words like 'plex' and 'subdisk.

give_plex_to_volume: invalidate subdisks being given to a plex which
is part of a volume with other plexes.

give_sd_to_plex: keep track of plex size in all cases

lock drives before closing them, to keep the daemon from getting
confused.

config_drive: handle partition type errors more gracefully

config_subdisk: set subdisk state correctly

find_drive, find_drive_by_dev, find_subdisk, find_plex, find_volume:
  set VF_NEWBORN flag when a new object is created

config_drive:
  Handle partition_status returns more cleverly.
  Replace the device name in some cases where it got overwritten.

config_subdisk:
  add parameter `update'.  If the object already exists, exit without
  any changes.
  Set state correctly.

config_plex, config_volume:
  add parameter `update'.  If the object already exists, exit without
  any changes.

parse_config:
  move read function to vinum_scandisk.
  add parameter `update' to pass to config_<object>.

remove_<object>_entry:
  print a message when the object is removed.

update_plex_config:
  Start defusing this function, which will go away some time.
  Remove calls to update_volume_config.
  Make size 64 bits
1999-01-21 00:32:54 +00:00
Greg Lehey
2ef3e3ddbc Include Peter Wemm's renaming and restructuring
Change from lkm to kld
1999-01-21 00:31:59 +00:00
Greg Lehey
eb1cd79843 Code for vinum daemon. 1999-01-21 00:31:31 +00:00
Greg Lehey
fc4e325173 Include Peter Wemm's renaming and restructuring
Change from lkm to kld

Remove BROKEN_GDB kludge (it's not needed with klds)

Add code for interfacing with daemon

Modify manner of determining when module is idle

Modify device minor number encoding, use selector functions which also
permit anonymous plexes and subdisks.

Remove code for 2.x support.

Move vinum_scandisk to vinumio.c

Remove myproc kludge

Keep track of open volumes by flag, not by pid (the pids caused some
problems with the lock manager).

free_vinum:
  Remove unmapped and defective regions from plexes.
  Wait for daemon to stop before returning

vinumopen:
  Don't refuse an open if the volume is already open.
1999-01-21 00:30:52 +00:00
Greg Lehey
efd86c9a8c Reflect changes in vinumstate.h 1999-01-21 00:30:24 +00:00
Greg Lehey
7230cf7d2a Include Peter Wemm's renaming and restructuring
Change from lkm to kld

Add structures for daemon

Remove old cruft

Increase RQINFO_SIZE to 128
1999-01-21 00:29:44 +00:00
Greg Lehey
e743f211c1 Include the copyright notice in the script, making the file COPYRIGHT
unnecessary.
1999-01-21 00:29:20 +00:00
Julian Elischer
ea5f0893fd Minor rearranging of code to allow simple protocol domains to be
added as KLDs.
1999-01-21 00:26:41 +00:00
Greg Lehey
447a8172e9 Add source file vinumdaemon.c 1999-01-21 00:25:47 +00:00