Commit Graph

1056 Commits

Author SHA1 Message Date
mckusick
a3d0c189ea Add snapshots to the fast filesystem. Most of the changes support
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.

Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).
2000-07-11 22:07:57 +00:00
alfred
4036af0a9c #elsif -> #elif
Noticed by: green
2000-07-11 09:41:29 +00:00
jhb
b6e74b58eb Support for unsigned integer and long sysctl variables. Update the
SYSCTL_LONG macro to be consistent with other integer sysctl variables
and require an initial value instead of assuming 0.  Update several
sysctl variables to use the unsigned types.

PR:		15251
Submitted by:	Kelly Yancey <kbyanc@posi.net>
2000-07-05 07:46:41 +00:00
phk
e5de271d47 Previous commit changing SYSCTL_HANDLER_ARGS violated KNF.
Pointed out by:	bde
2000-07-04 11:25:35 +00:00
jhb
183fce400a Replace the PQ_*CACHE options with a single PQ_CACHESIZE option that you
set equal to the number of kilobytes in your cache.  The old options are
still supported for backwards compatibility.

Submitted by:	Kelly Yancey <kbyanc@posi.net>
2000-07-04 08:55:18 +00:00
mckusick
2f0e9591fa Simplify and rationalise the management of the vnode free list
(preparing the code to add snapshots).
2000-07-04 04:32:40 +00:00
phk
61ff05be25 Style police catches up with rev 1.26 of src/sys/sys/sysctl.h:
Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our
sources:

        -sysctl_vm_zone SYSCTL_HANDLER_ARGS
        +sysctl_vm_zone (SYSCTL_HANDLER_ARGS)
2000-07-03 09:35:31 +00:00
markm
53a44ce9d3 Nifty idea from Jeroen van Gelderen; don't call a routine to check if
we are using the /dev/zero device, just check a flag (supplied by
/dev/zero).
Reviewed by:	dfr
2000-06-25 09:44:32 +00:00
hsu
ce8639fd44 Add missing increment of allocation counter. 2000-06-05 06:34:41 +00:00
dillon
82627e96a0 This is a cleanup patch to Peter's new OBJT_PHYS VM object type
and sysv shared memory support for it.  It implements a new
    PG_UNMANAGED flag that has slightly different characteristics
    from PG_FICTICIOUS.

    A new sysctl, kern.ipc.shm_use_phys has been added to enable the
    use of physically-backed sysv shared memory rather then swap-backed.
    Physically backed shm segments are not tracked with PV entries,
    allowing programs which use a large shm segment as a rendezvous
    point to operate without eating an insane amount of KVM in the
    PV entry management.  Read: Oracle.

    Peter's OBJT_PHYS object will also allow us to eventually implement
    page-table sharing and/or 4MB physical page support for such segments.
    We're half way there.
2000-05-29 22:40:54 +00:00
dfr
14048face6 Brucify the pmap_enter_temporary() changes. 2000-05-29 19:21:01 +00:00
dillon
7832cdab03 Fix bug in vm_pageout_page_stats() that always resulted in a full
scan of the active queue.  This fix is not expected to have any
    noticeable impact on performance.

Noticed by: Rik van Riel <riel@conectiva.com.br>
2000-05-29 02:31:55 +00:00
dfr
3d3263476e Add a new pmap entry point, pmap_enter_temporary() to be used during
dumps to create temporary page mappings. This replaces the use of CADDR1
which is fairly x86 specific.

Reviewed by: dillon
2000-05-28 15:49:55 +00:00
jake
961b97d434 Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by:		msmith and others
2000-05-26 02:09:24 +00:00
jake
d93fbc9916 Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by:	phk
Reviewed by:	phk
Approved by:	mdodd
2000-05-23 20:41:01 +00:00
peter
807a551902 Checkpoint of a new physical memory backed object type, that does not
have pv_entries.  This is intended for very special circumstances,
eg: a certain database that has a 1GB shm segment mapped into 300
processes.  That would consume 2GB of kvm just to hold the pv_entries
alone.  This would not be used on systems unless the physical ram was
available, as it's not pageable.

This is a work-in-progress, but is a useful and functional checkpoint.
Matt has got some more fixes for it that will be committed soon.

Reviewed by:	dillon
2000-05-21 13:41:29 +00:00
peter
ee5cd6988f Implement an optimization of the VM<->pmap API. Pass vm_page_t's directly
to various pmap_*() functions instead of looking up the physical address
and passing that.  In many cases, the first thing the pmap code was doing
was going to a lot of trouble to get back the original vm_page_t, or
it's shadow pv_table entry.

Inspired by: John Dyson's 1998 patches.

Also:
Eliminate pv_table as a seperate thing and build it into a machine
dependent part of vm_page_t.  This eliminates having a seperate set of
structions that shadow each other in a 1:1 fashion that we often went to
a lot of trouble to translate from one to the other. (see above)
This happens to save 4 bytes of physical memory for each page in the
system.  (8 bytes on the Alpha).

Eliminate the use of the phys_avail[] array to determine if a page is
managed (ie: it has pv_entries etc).  Store this information in a flag.
Things like device_pager set it because they create vm_page_t's on the
fly that do not have pv_entries.  This makes it easier to "unmanage" a
page of physical memory (this will be taken advantage of in subsequent
commits).

Add a function to add a new page to the freelist.  This could be used
for reclaiming the previously wasted pages left over from preloaded
loader(8) files.

Reviewed by:	dillon
2000-05-21 12:50:18 +00:00
dillon
5303cc9523 Fixed bug in madvise() / MADV_WILLNEED. When the request is offset
from the base of the first map_entry the call to pmap_object_init_pt()
    uses the wrong start VA.  MFC to follow.

PR: i386/18095
2000-05-14 18:46:40 +00:00
phk
36c3965ff9 Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by:    peter
2000-05-05 09:59:14 +00:00
phk
c7feb17572 Convert the vm_pager_strategy() interface to take a struct bio instead of
a struct buf.  Don't try to examine B_ASYNC, it is a layering violation
to do so.  The only current user of this interface is vn(4) which, since
it emulates a disk interface, operates on struct bio already.
2000-05-03 07:47:46 +00:00
phk
54f7afc04f Move and staticize the bufchain functions so they become local to the
only piece of code using them.  This will ease a rewrite of them.
2000-05-01 19:38:51 +00:00
phk
10914aa708 Remove unneeded #include <vm/vm_zone.h>
Generated by:	src/tools/tools/kerninclude
2000-04-30 18:52:11 +00:00
wollman
16604ab260 Implement POSIX.1b shared memory objects. In this implementation,
shared memory objects are regular files; the shm_open(3) routine
uses fcntl(2) to set a flag on the descriptor which tells mmap(2)
to automatically apply MAP_NOSYNC.

Not objected to by: bde, dillon, dufault, jasone
2000-04-22 15:22:31 +00:00
alc
5be57cb29c vm_object_shadow: Remove an incorrect assertion. In obscure circumstances
vm_object_shadow can be called on an object with ref_count > 1 and
OBJ_ONEMAPPING set.  This isn't really a problem for vm_object_shadow.
2000-04-19 16:32:04 +00:00
phk
75e82c815e Remove unneeded <sys/buf.h> includes.
Due to some interesting cpp tricks in lockmgr, the LINT kernel shrinks
by 924 bytes.
2000-04-18 15:15:39 +00:00
phk
aaaef0b54e Complete the bio/buf divorce for all code below devfs::strategy
Exceptions:
        Vinum untouched.  This means that it cannot be compiled.
        Greg Lehey is on the case.

        CCD not converted yet, casts to struct buf (still safe)

        atapi-cd casts to struct buf to examine B_PHYS
2000-04-15 05:54:02 +00:00
msmith
9942d38efa Fix _zget() so that it checks the return from kmem_alloc(), to avoid
zttempting to bzero NULL when the kernel map fills up.  _zget() will
now return NULL as it seems it was originally intended to do.
2000-04-04 21:00:39 +00:00
phk
8ee11d587f Move B_ERROR flag to b_ioflags and call it BIO_ERROR.
(Much of this done by script)

Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.

Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.

Add bio_queue field for struct bio aware disksort.

Address a lot of stylistic issues brought up by bde.
2000-04-02 15:24:56 +00:00
dillon
5ccef75e02 Add necessary spl protection for swapper. The problem was located by
Alfred while testing his SPLASSERT stuff.   This is not a complete fix,
    more protections are probably needed.
2000-03-27 21:33:32 +00:00
charnier
5c16c2a8f1 Revert spelling mistake I made in the previous commit
Requested by: Alan and Bruce
2000-03-27 20:41:17 +00:00
charnier
686df89909 Spelling 2000-03-26 15:20:23 +00:00
phk
1b817afc6d Fix one place which knew that B_WRITE was zero.
Fix a stylistic mistake of mine while here.

Found by:	Stephen Hocking <shocking@prth.pgs.com>
2000-03-22 08:40:13 +00:00
phk
5df766a0f8 Rename the existing BUF_STRATEGY() to DEV_STRATEGY()
substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)

substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)

This patch is machine generated except for the ccd.c and buf.h parts.
2000-03-20 11:29:10 +00:00
phk
a246e10f55 Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new
field in struct buf: b_iocmd.  The b_iocmd is enforced to have
exactly one bit set.

B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.

Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.

Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue.  It is likely to write on your disk
where it should have been reading.

This change is a step in the direction towards a stackable BIO capability.

A lot of this patch were machine generated (Thanks to style(9) compliance!)

Vinum users:  Greg has not had time to test this yet, be careful.
2000-03-20 10:44:49 +00:00
phk
6b3385b773 Eliminate the undocumented, experimental, non-delivering and highly
dangerous MAX_PERF option.
2000-03-16 08:51:55 +00:00
phk
bb6e8e0c2e Remove unused 3rd argument from vsunlock() which abused B_WRITE. 2000-03-13 10:47:24 +00:00
ps
c3800346ab Add MAP_NOCORE to mmap(2), and MADV_NOCORE and MADV_CORE to madvise(2).
This
This feature allows you to specify if mmap'd data is included in
an application's corefile.

Change the type of eflags in struct vm_map_entry from u_char to
vm_eflags_t (an unsigned int).

Reviewed by:	dillon,jdp,alfred
Approved by:	jkh
2000-02-28 04:10:35 +00:00
dillon
7a2987cf94 Fix null-pointer dereference crash when the system is intentionally
run out of KVM through a mmap()/fork() bomb that allocates hundreds
    of thousands of vm_map_entry structures.

    Add panic to make null-pointer dereference crash a little more verbose.

    Add a new sysctl, vm.max_proc_mmap, which specifies the maximum number
    of mmap()'d spaces (discrete vm_map_entry's in the process).  The value
    defaults to around 9000 for a 128MB machine.  The test is scaled for the
    number of processes sharing a vmspace (aka linux threads).  Setting
    the value to 0 disables the feature.

PR: kern/16573
Approved by: jkh
2000-02-16 21:11:33 +00:00
dillon
fd8bf79164 The swapdev_vp changes made to rip out the swap specfs interaction
also broke diskless swapping.  Moving the swapdev_vp initialization
    to more commonly run code solves the problem.

PR: kern/16165
Additional testing by: David Gilbert <dgilbert@velocet.ca>
2000-01-25 17:49:12 +00:00
dillon
cacb1b6d5f Fix a deadlock between msync(..., MS_INVALIDATE) and vm_fault. The
invalidation code cannot wait for paging to complete while holding a
    vnode lock, so we don't wait.  Instead we simply allow the lower level
    code to simply block on any busy pages it encounters.  I think Yahoo
    may be the only entity in the entire world that actually uses this
    msync feature :-).

Bug reported by:  Paul Saab <paul@mu.org>
2000-01-21 20:17:01 +00:00
phk
ae0c1ec8f7 Give vn_isdisk() a second argument where it can return a suitable errno.
Suggested by:	bde
2000-01-10 12:04:27 +00:00
guido
ce046933de Use MAP_NOSYNC for vnodes without any links in their filesystem.
This is necessary for vmware: it does not use an anonymous mmap for
the memory of the virtual system. In stead it creates a temp file an
unlinks it. For a 50 MB file, this results in a ot of syncing
every 30 seconds.

Reviewed by:	Matthew Dillon <dillon@backplane.com>
2000-01-03 19:13:53 +00:00
peter
d53e4c1d80 Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot).  This is consistant with the other
BSD's who made this change quite some time ago.  More commits to come.
1999-12-29 05:07:58 +00:00
peter
38a9421a1e Fix the swap backed vn case - this was broken by my rev 1.128 to
swap_pager.c and related commits.

Essentially swap_pager.c is backed out to before the changes, but
swapdev_vp is converted into a real vnode with just VOP_STRATEGY().
It no longer abuses specfs vnops and no longer needs a dev_t and
/dev/drum (or /dev/swapdev) for the intermediate layer.

This essentially restores the vnode interface as the interface to the
bottom of the swap pager, and vm_swap.c provides a clean vnode interface.

This will need to be revisited when we swap to files (vnodes) - which
is the other reason for keeping the vnode interface between the swap pager
and the swap devices.

OK'ed by:	dillon
1999-12-28 07:30:55 +00:00
eivind
87724eb673 Introduce NDFREE (and remove VOP_ABORTOP) 1999-12-15 23:02:35 +00:00
dillon
b66fb2c648 Add MAP_NOSYNC feature to mmap(), and MADV_NOSYNC and MADV_AUTOSYNC to
madvise().

    This feature prevents the update daemon from gratuitously flushing
    dirty pages associated with a mapped file-backed region of memory.  The
    system pager will still page the memory as necessary and the VM system
    will still be fully coherent with the filesystem.  Modifications made
    by other means to the same area of memory, for example by write(), are
    unaffected.  The feature works on a page-granularity basis.

    MAP_NOSYNC allows one to use mmap() to share memory between processes
    without incuring any significant filesystem overhead, putting it in
    the same performance category as SysV Shared memory and anonymous memory.

Reviewed by: julian, alc, dg
1999-12-12 03:19:33 +00:00
eivind
287836faea Lock reporting and assertion changes.
* lockstatus() and VOP_ISLOCKED() gets a new process argument and a new
  return value: LK_EXCLOTHER, when the lock is held exclusively by another
  process.
* The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them
* Extend the vnode_if.src format to allow more exact specification than
  locked/unlocked.

This commit should not do any semantic changes unless you are using
DEBUG_VFS_LOCKS.

Discussed with:	grog, mch, peter, phk
Reviewed by:	peter
1999-12-11 16:13:02 +00:00
luoqi
5c9244cd12 User ldt sharing. 1999-12-06 04:53:08 +00:00
phk
9809b71a89 Report swapdevices as cdevs rather than bdevs.
Remove unused dev2budev() function.
1999-11-29 21:37:18 +00:00
alc
7f96711685 Remove nonsensical vm_map_{clear,set}_recursive() calls
from vm_map_pageable().  At the point they called, vm_map_pageable()
holds a read (or shared) lock on the map.  The purpose
of vm_map_{clear,set}_recursive() is to disable/enable repeated
write (or exclusive) lock requests by the same process.
1999-11-25 20:21:52 +00:00
alc
5d1cd5631b Correct the following error: vm_map_pageable() on a COW'ed (post-fork)
vm_map always failed because vm_map_lookup() looked at
 "vm_map_entry->wired_count" instead of "(vm_map_entry->eflags &
 MAP_ENTRY_USER_WIRED)".  The effect was that many page
 wiring operations by sysctl were (silently) failing.
1999-11-23 06:51:28 +00:00
phk
84a3f8a8d2 Isolate the swapdev_vp "not quite" vnode in the only source file which
needs it now that /dev/drum is gone.

Reviewed by: eivind, peter
1999-11-22 15:27:09 +00:00
peter
bae4ed31fd Remove the non-functional "swap device" userland front-end to the
multiplexed underlying swap devices (/dev/drum).  The only thing it did
was to allow root to open /dev/drum, but not do anything with it.
Various utilities used to grovel around in here, but Matt has written
a much nicer (and clean) front-end to this for libkvm, and nothing uses
the old system any more.

The VM system was calling VOP_STRATEGY() on the vp of the first underlying
swap device (not the /dev/drum one, the first real device), and using
the VOP system to indirectly (and only) call swstrategy() to choose
an underlying device and enqueue it on that device.  I have changed it
to avoid diverting through the VOP system and to call the only possible
target directly, saving a little bit of time and some complexity.

In all, nothing much changes, except some scaffolding to support the
roundabout way of calling swstrategy() is gone.

Matt gave me the ok to do this some time ago, and I apologize for taking
so long to get around to it.
1999-11-18 06:55:40 +00:00
alc
f0cd9c6361 Two changes: (1) Use vm_page_unqueue_nowakeup in vm_page_alloc
instead of duplicating the code.  (2) If a wired page is passed
to vm_page_free_toq, panic instead of printing a friendly warning.
(If we don't panic here, we'll just panic later in vm_page_unwire
obscuring the problem.)
1999-11-10 05:23:19 +00:00
alc
32d7a664c3 Remove unused declarations. 1999-11-08 00:53:34 +00:00
alc
2ad13ddedc Remove unused #include's.
Submitted by:	phk
1999-11-07 20:03:54 +00:00
alc
791f67cc7b The functions declared by this header file no longer exist.
Submitted by:	phk (in part)
1999-11-07 06:46:48 +00:00
alc
9ef4299380 Reverse the sense of the test in the KASSERT's from the last commit. 1999-10-30 09:09:02 +00:00
alc
a301513e80 The core of this patch is to vm/vm_page.h. The effects are two-fold: (1) to
eliminate an extra (useless) level of indirection in half of the page
queue accesses and (2) to use a single name for each queue throughout,
instead of, e.g., "vm_page_queue_active" in some places and
"vm_page_queues[PQ_ACTIVE]" in others.

Reviewed by:	dillon
1999-10-30 07:37:14 +00:00
phk
8d8f53dcdc Change useracc() and kernacc() to use VM_PROT_{READ|WRITE|EXECUTE} for the
"rw" argument, rather than hijacking B_{READ|WRITE}.

Fix two bugs (physio & cam) resulting by the confusion caused by this.

Submitted by:   Tor.Egge@fast.no
Reviewed by:    alc, ken (partly)
1999-10-30 06:32:05 +00:00
phk
8e3c3eafed useracc() the prequel:
Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>.  This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.
1999-10-29 18:09:36 +00:00
alc
baa44292b4 Remove the last vestiges of "vm_map_t phys_map". It's been unused
since i386/i386/machdep.c rev 1.45 (or 1994 :-) ).
1999-10-29 05:17:20 +00:00
alc
2c86d1cce1 Shrink "struct vm_object" by not spending a full 32 bits
on "objtype_t".
1999-10-27 17:47:24 +00:00
phk
cec2d553b6 Fix a panic(8) implementation:
hexdump -C < /dev/drum
by simply refusing to do I/O from userland.
a panic.  I'm not sure we even need /dev/drum anymore, it seems
to have been broken for a long time thi
1999-10-08 19:10:18 +00:00
phk
ff20b1e15d Introduce swopen to prevent blockdevice opens and insist on minor==0. 1999-10-04 13:09:30 +00:00
phk
1c8b44f4ce Give the swap device a D_DISK flag against my better judgement.
TODO: add an open routing which fails for bdev opens.
1999-10-04 12:27:58 +00:00
dt
b99bff68d4 Plug an accounting leak: count pages in ZONE_INTERRUPT zones as wired. 1999-09-30 07:35:50 +00:00
phk
e9e0512210 Remove five now unused fields from struct cdevsw. They should never
have been there in the first place.  A GENERIC kernel shrinks almost 1k.

Add a slightly different safetybelt under nostop for tty drivers.

Add some missing FreeBSD tags
1999-09-25 18:24:47 +00:00
dillon
37bee3bb3f cleanup madvise code, add a few more sanity checks.
Reviewed by:	Alan Cox <alc@cs.rice.edu>,  dg@root.com
1999-09-21 05:00:48 +00:00
dillon
493548f6b6 Final commit to remove vnode->v_lastr. vm_fault now handles read
clustering issues (replacing code that used to be in
    ufs/ufs/ufs_readwrite.c).  vm_fault also now uses the new VM page counter
    inlines.

    This completes the changeover from vnode->v_lastr to vm_entry_t->v_lastr
    for VM, and fp->f_nextread and fp->f_seqcount (which have been in the
    tree for a while).  Determination of the I/O strategy (sequential, random,
    and so forth) is now handled on a descriptor-by-descriptor basis for
    base I/O calls, and on a memory-region-by-memory-region and
    process-by-process basis for VM faults.

Reviewed by:	David Greenman <dg@root.com>, Alan Cox <alc@cs.rice.edu>
1999-09-21 00:36:16 +00:00
dillon
e8761b4e05 Fix bug in pipe code relating to writes of mmap'd but illegal address
spaces which cross a segment boundry in the page table.  pmap_kextract()
    is not designed for access to the user space portion of the page
    table and cannot handle the null-page-directory-entry case.

    The fix is to have vm_fault_quick() return a success or failure which
    is then used to avoid calling pmap_kextract().
1999-09-20 19:08:48 +00:00
dillon
4c1e280de7 Remove inappropriate VOP_FSYNC from vm_object_page_clean(). The fsync
syncs the entire underlying file rather then just the requested range,
    resulting in huge inefficiencies when the VM system is articulated in
    a certain way.  The VOP_FSYNC was also found to massively reduce NFS
    performance in certain cases.

    Change MADV_DONTNEED and MADV_FREE to call vm_page_dontneed() instead
    of vm_page_deactivate().  Using vm_page_deactivate() causes all
    inactive and cache pages to be recycled before the dontneed/free page
    is recycled, effectively flushing our entire VM inactive & cache
    queues continuously even if only a few pages are being actively MADV
    free'd and reused (such as occurs with a sequential scan of a
    memory-mapped file).

Reviewed by:	Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
1999-09-17 05:48:36 +00:00
dillon
d1666d7f52 Add 'lastr' field to vm_map_entry in preparation for its removal
from the vnode.  (The changeover is undergoing final testing and
    will be committed soon).

Reviewed by:	Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
1999-09-17 05:40:17 +00:00
dillon
124cdf5346 The vnode pager (used when you do file-backed mmaps) must use the
underlying physical sector size when aligning I/O transfer sizes.
    It cannot assume 512 bytes.

    We assume the underlying sector size is a power of 2.  If it isn't,
    mmap() will break badly anyway (in the same way mmap broke with NFS
    when NFS tried to cache piecemeal write ranges in buffers, before
    we enforced read-buffer-before-write-piecemeal for NFS).

Reviewed by:	Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
1999-09-17 05:17:59 +00:00
dillon
7a0052d268 Fix a number of spl bugs related to reserving and freeing swap space.
Swap space can be freed from an interrupt and so swap reservation and
    freeing must occur at splvm.

    Add swap_pager_reserve() code to support a new swap pre-reservation
    capability for the VN device.

    Generally cleanup the swap code by simplifying the swp_pager_meta_build()
    static function and consolidating the SWAPBLK_NONE test from a bit test
    to an absolute compare.  The bit test was left over from a rejected
    swap allocation scheme that was not ultimately committed.  A few other
    minor cleanups were also made.

    Reorganize the swap strategy code, again for VN support, to not
    reallocate swap when writing as this messes up pre-reservation and
    can fragment I/O unnecessarily as VN-baesd disk is messed around with.

Reviewed by:	Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
1999-09-17 05:09:24 +00:00
dillon
6c2a557edd Add required BUF_KERNPROC to flushchainbuf() to disassociate the
current process from the exclusive lock prior to initiating I/O.

    This fixes a panic related to swap-backed VN disks

Reviewed by:	Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
1999-09-17 05:03:27 +00:00
dillon
4cb1921c9b Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
Replace various VM related page count calculations strewn over the
    VM code with inlines to aid in readability and to reduce fragility
    in the code where modules depend on the same test being performed
    to properly sleep and wakeup.

    Split out a portion of the page deactivation code into an inline
    in vm_page.c to support vm_page_dontneed().

    add vm_page_dontneed(), which handles the madvise MADV_DONTNEED
    feature in a related commit coming up for vm_map.c/vm_object.c.  This
    code prevents degenerate cases where an essentially active page may
    be rotated through a subset of the paging lists, resulting in premature
    disposal.
1999-09-17 04:56:40 +00:00
peter
3b842d34e8 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
phk
591c94d4c6 Simplify the handling of VCHR and VBLK vnodes using the new dev_t:
Make the alias list a SLIST.

        Drop the "fast recycling" optimization of vnodes (including
        the returning of a prexisting but stale vnode from checkalias).
        It doesn't buy us anything now that we don't hardlimit
        vnodes anymore.

        Rename checkalias2() and checkalias() to addalias() and
        addaliasu() - which takes dev_t and udev_t arg respectively.

        Make the revoke syscalls use vcount() instead of VALIASED.

        Remove VALIASED flag, we don't need it now and it is faster
        to traverse the much shorter lists than to maintain the
        flag.

        vfs_mountedon() can check the dev_t directly, all the vnodes
        point to the same one.

Print the devicename in specfs/vprint().

Remove a couple of stale LFS vnode flags.

Remove unimplemented/unused LK_DRAINED;
1999-08-26 14:53:31 +00:00
green
a788068fed When the SYSINIT() was removed, it was replaced with a make_dev on-demand
creation of /dev/drum via calling swapon. However, the make_dev has a
bogus (insofar that it hasn't been added yet) cdevsw, so later we end
up crashing with a null pointer dereference on the swap vp's specinfo.
The specinfo points to a dev_t with a major of 254 (uninitialized), and
we get a crash on its d_strategy being called.

The simple solution to this is to call cdevsw_add before the make_dev
is ever used. This fixes the panic which occurred upon swapping.
1999-08-24 05:58:35 +00:00
bde
aff861d544 Use devtoname to print dev_t's instead of casting them to u_long for
misprinting with %lx.

Cast pointers to intptr_t instead of casting them to long.  Cosmetic.
1999-08-23 23:55:03 +00:00
phk
663cbe4fc2 Convert DEVFS hooks in (most) drivers to make_dev().
Diskslice/label code not yet handled.

Vinum, i4b, alpha, pc98 not dealt with (left to respective Maintainers)

Add the correct hook for devfs to kern_conf.c

The net result of this excercise is that a lot less files depends on DEVFS,
and devtoname() gets more sensible output in many cases.

A few drivers had minor additional cleanups performed relating to cdevsw
registration.

A few drivers don't register a cdevsw{} anymore, but only use make_dev().
1999-08-23 20:59:21 +00:00
alc
42583848d0 Correct the inconsistent formatting in struct vm_map.
Addendum to rev 1.47: submitted by dillon.
1999-08-23 18:16:05 +00:00
alc
980da3c115 struct vm_map:
The lock structure cannot be the first element of the vm_map
	because this can result in livelock between two or more system
	processes trying to kmem_alloc_wait.
1999-08-23 18:08:34 +00:00
alc
4039114e6f Remove two unused variable declarations. 1999-08-22 00:01:46 +00:00
alc
e09a564be4 vm_page_alloc and contigmalloc1:
Verify that free pages are not dirty.

Submitted by:	dillon
1999-08-20 06:32:00 +00:00
peter
9ad9e224d9 Update for run queue code. 1999-08-19 00:15:27 +00:00
mjacob
c3587e5dcb Fix breakage - an extra brace got inserted where DIAGNOSTIC was defined
but MAP_LOCK_DIAGNOSTIC wasn't.
1999-08-18 03:56:57 +00:00
green
30c109da4d Unbreak the nfs KLD_MODULE. It needs a bit more of vm_page.h than was
exported (notably vm_page_undirty()). Also, let vm_page_dirty() work
in a KLD.
1999-08-17 22:48:10 +00:00
alc
eacdecd413 vm_page_free_toq:
Update the comment to reflect the demise of PQ_ZERO and
	remove a (now) useless test.
1999-08-17 18:09:01 +00:00
alc
fb63bd8ded Correct an accidental omission of one "vm_page_undirty" replacement
from the previous commit.
1999-08-17 05:56:00 +00:00
alc
d64b069518 vm_page_free_toq:
Clear the dirty bit mask (vm_page_undirty) before adding the page
	to the free page queue.

Submitted by:	dillon
1999-08-17 05:08:39 +00:00
alc
075745f2e2 Add the (inline) function vm_page_undirty for clearing the dirty bitmask
of a vm_page.

Use it.

Submitted by:	dillon
1999-08-17 04:02:34 +00:00
alc
33f15d9b59 vm_pageout_clean:
Remove dead code.

Submitted by:	dillon
1999-08-17 00:07:35 +00:00
alc
708e372c9a vm_map_lock*:
Remove semicolons or add "do { } while (0)" as necessary
	to enable the use of these macros in arbitrary statements.
	(There are no functional changes.)

Submitted by:	dillon
1999-08-16 18:21:09 +00:00
alc
a244527220 Remove the declarations for "vm_map_t io_map". It's been unused
since i386/i386/machdep rev 1.310, i.e., the demise of BOUNCE_BUFFERS.
1999-08-15 23:55:46 +00:00
alc
d7ab53a49a Remove the declarations for "vm_map_t u_map". It's been unused
since i386/i386/pmap rev 1.190.  (The alpha never used it.)
1999-08-15 21:55:20 +00:00
alc
816a1752a0 contigmalloc1 (currently) depends on PQ_FREE and PQ_CACHE not being 0
to tell a valid "struct vm_page" from an invalid one in the vm_page_array.
This isn't a very robust method.
1999-08-15 05:36:43 +00:00
mjacob
9c963d1884 Add back in old definitions if we're compiling for alpha. 1999-08-15 01:16:53 +00:00
alc
70b651dfc3 Don't create a "struct vpgqueues" for PQ_NONE. 1999-08-14 06:25:54 +00:00
alc
7fd2f69ac5 vm_map_madvise:
A complete rewrite by dillon and myself to separate
	the implementation of behaviors that effect the vm_map_entry
	from those that effect the vm_object.

	A result of this change is that madvise(..., MADV_FREE);
	is much cheaper.
1999-08-13 17:45:34 +00:00
phk
7b7ae40370 The bdevsw() and cdevsw() are now identical, so kill the former. 1999-08-13 10:29:38 +00:00
alc
17f88167e4 Make the default page coloring parameters match a (non-Xeon) Pentium II/III.
This setting is also acceptable for Celerons and Pentium Pros
with less than 1MB L2 caches.

Note: PQ_L2_SIZE is a misnomer.  The correct number of colors is
a function of the cache's degree of associativity as well as its size.

Submitted by:	bde and alc
1999-08-12 21:16:53 +00:00
alc
0740e2b138 vm_object_madvise:
Update the comments to match the implementation.

Submitted by:	dillon
1999-08-12 08:22:57 +00:00
alc
e3c495d003 vm_object_madvise:
Support MADV_DONTNEED and MADV_WILLNEED on object types
	besides OBJT_DEFAULT and OBJT_SWAP.

Submitted by:	dillon
1999-08-12 06:33:56 +00:00
alc
157bb2131d contigmalloc1:
If a page is found in the wrong queue, panic instead
	of silently ignoring the problem.
1999-08-11 05:12:00 +00:00
peter
e657118d0b Add a contigfree() as a corollary to contigmalloc() as it's not clear
which free routine to use and people are tempted to use free() (which
doesn't work)
1999-08-10 22:21:13 +00:00
alc
0524a59178 vm_map_madvise:
Now that behaviors are stored in the vm_map_entry rather than
	the vm_object, it's no longer necessary to instantiate a vm_object
	just to hold the behavior.

Reviewed by:	dillon
1999-08-10 04:50:20 +00:00
phk
ee871b6440 Merge the cons.c and cons.h to the best of my ability. alpha may or
may not compile, I can't test it.
1999-08-09 10:35:05 +00:00
phk
e938d317d5 Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>,
a few lines into <sys/vnode.h>.

Add a few fields to struct specinfo, paving the way for the fun part.
1999-08-08 18:43:05 +00:00
alc
33da09bf48 Move the memory access behavior information provided by madvise
from the vm_object to the vm_map.

Submitted by:	dillon
1999-08-01 06:05:09 +00:00
alc
5180aa1c5c Change the type of vpgqueues::lcnt from "int *" to "int". The indirection
served no purpose.
1999-07-31 18:31:00 +00:00
alc
5c6321aca5 vm_page_queue_init:
Remove the initialization of PQ_NONE's cnt and lcnt.  They aren't
	used.

vm_page_insert:
	Remove an unnecessary dereference.

vm_page_wire:
	Remove the one and only (and thus pointless) reference
	to PQ_NONE's lcnt.
1999-07-31 04:19:49 +00:00
alc
9cda945475 Reduce the number of "magic constants" used for page coloring
by one: PQ_PRIME2 and PQ_PRIME3 are used to accomplish the same
thing at different places in the kernel.  Drop PQ_PRIME3.
1999-07-22 06:04:17 +00:00
alc
279eafd09f Fix the following problem:
When creating new processes (or performing exec), the new page
directory is initialized too early.  The kernel might grow before
p_vmspace is initialized for the new process.  Since pmap_growkernel
doesn't yet know about the new page directory, it isn't updated, and
subsequent use causes a failure.

The fix is (1) to clear p_vmspace early, to stop pmap_growkernel
from stomping on memory, and (2) to defer part of the initialization
of new page directories until p_vmspace is initialized.

PR:		kern/12378
Submitted by:	tegge
Reviewed by:	dfr
1999-07-21 18:02:27 +00:00
green
bf26dc7074 Make a dev2budev() function, and use it. This refixes pstat (working, broken,
working, broken, working) and savecore (working, working, broken, working,
working).

Sorta Reviewed by:	phk
1999-07-20 21:29:13 +00:00
alc
db0a7d40ad Convert a "page not busy" warning to an assertion.
Submitted by:	dillon@backplane.com
1999-07-20 05:46:56 +00:00
phk
09327a40f9 Add a field to struct swdevt to avoid a bogus udev2dev() call. 1999-07-17 19:59:55 +00:00
phk
6c373ff516 I have not one single time remembered the name of this function correctly
so obviously I gave it the wrong name.  s/umakedev/makeudev/g
1999-07-17 18:43:50 +00:00
alc
ab1033c835 Remove vm_object::last_read. It is used by the old swap pager, but
not by the new one, i.e., vm/swap_pager.c rev 1.108.

Reviewed by:	dillon@backplane.com
1999-07-16 05:11:37 +00:00
alc
c7285cff59 Cleanup OBJ_ONEMAPPING management.
vm_map.c:
	Don't set OBJ_ONEMAPPING on arbitrary vm objects.  Only default
	and swap type vm objects should have it set.  vm_object_deallocate
	already handles these cases.

vm_object.c:
	If OBJ_ONEMAPPING isn't already clear in vm_object_shadow,
	we are in trouble.  Instead of clearing it, make it
	an assertion that it is already clear.
1999-07-11 18:30:32 +00:00
alc
b0f696a574 Change the data type used to represent page color in the vm_object
to be the same as that used in the vm_page.  (This change also
shrinks the vm_object.)
1999-07-10 18:29:18 +00:00
alc
bcd454da95 Remove unused function prototypes. 1999-07-10 18:16:08 +00:00
ache
41c66e4188 add unused argument to udev2dev() to make kernel compiled 1999-07-07 09:12:44 +00:00
msmith
2d9dfcac47 Reinstate the previous fix for the broken export of a dev_t in sw_dev, convert
back to a dev_t when the value is actually used.
1999-07-07 04:07:03 +00:00
green
0d33984f04 Back out previous commit. It was wrong, and caused panics. 1999-07-07 03:03:59 +00:00
msmith
0ca35a4fe1 swdevt should contain a udev_t not a devt. This resulted in bogus
swap device name reporting.

Submitted by:	Bill Swingle <unfurl@freebsd.org>
1999-07-06 23:51:02 +00:00
mckay
4eda78adf2 Reformat previous fix to remove an uglier than average goto.
Looked OK to:	dg
1999-07-05 12:50:54 +00:00
mckusick
9d4f0d78fa The buffer queue mechanism has been reformulated. Instead of having
QUEUE_AGE, QUEUE_LRU, and QUEUE_EMPTY we instead have QUEUE_CLEAN,
QUEUE_DIRTY, QUEUE_EMPTY, and QUEUE_EMPTYKVA.  With this patch clean
and dirty buffers have been separated.  Empty buffers with KVM
assignments have been separated from truely empty buffers.  getnewbuf()
has been rewritten and now operates in a 100% optimal fashion.  That is,
it is able to find precisely the right kind of buffer it needs to
allocate a new buffer, defragment KVM, or to free-up an existing buffer
when the buffer cache is full (which is a steady-state situation for
the buffer cache).

Buffer flushing has been reorganized.  Previously buffers were flushed
in the context of whatever process hit the conditions forcing buffer
flushing to occur.  This resulted in processes blocking on conditions
unrelated to what they were doing.  This also resulted in inappropriate
VFS stacking chains due to multiple processes getting stuck trying to
flush dirty buffers or due to a single process getting into a situation
where it might attempt to flush buffers recursively - a situation that
was only partially fixed in prior commits.  We have added a new daemon
called the buf_daemon which is responsible for flushing dirty buffers
when the number of dirty buffers exceeds the vfs.hidirtybuffers limit.
This daemon attempts to dynamically adjust the rate at which dirty buffers
are flushed such that getnewbuf() calls (almost) never block.

The number of nbufs and amount of buffer space is now scaled past the
8MB limit that was previously imposed for systems with over 64MB of
memory, and the vfs.{lo,hi}dirtybuffers limits have been relaxed
somewhat.  The number of physical buffers has been increased with the
intention that we will manage physical I/O differently in the future.

reassignbuf previously attempted to keep the dirtyblkhd list sorted which
could result in non-deterministic operation under certain conditions,
such as when a large number of dirty buffers are being managed.  This
algorithm has been changed.  reassignbuf now keeps buffers locally sorted
if it can do so cheaply, and otherwise gives up and adds buffers to
the head of the dirtyblkhd list.  The new algorithm is deterministic but
not perfect.  The new algorithm greatly reduces problems that previously
occured when write_behind was turned off in the system.

The P_FLSINPROG proc->p_flag bit has been replaced by the more descriptive
P_BUFEXHAUST bit.  This bit allows processes working with filesystem
buffers to use available emergency reserves.  Normal processes do not set
this bit and are not allowed to dig into emergency reserves.  The purpose
of this bit is to avoid low-memory deadlocks.

A small race condition was fixed in getpbuf() in vm/vm_pager.c.

Submitted by:	Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by:	Kirk McKusick <mckusick@mckusick.com>
1999-07-04 00:25:38 +00:00
peter
18cf795f27 Fix some int/long printf problems for the Alpha 1999-07-01 19:53:43 +00:00
peter
6cb5fe6c6b Slight reorganization of kernel thread/process creation. Instead of using
SYSINIT_KT() etc (which is a static, compile-time procedure), use a
NetBSD-style kthread_create() interface.  kproc_start is still available
as a SYSINIT() hook.  This allowed simplification of chunks of the
sysinit code in the process.  This kthread_create() is our old kproc_start
internals, with the SYSINIT_KT fork hooks grafted in and tweaked to work
the same as the NetBSD one.

One thing I'd like to do shortly is get rid of nfsiod as a user initiated
process.  It makes sense for the nfs client code to create them on the
fly as needed up to a user settable limit.  This means that nfsiod
doesn't need to be in /sbin and is always "available".  This is a fair bit
easier to do outside of the SYSINIT_KT() framework.
1999-07-01 13:21:46 +00:00
peter
5d3d376190 Kirk missed a required BUF_KERNPROC(). Even though this is a non-async
transfer, the b_iodone hook causes biodone() to release it from interrupt
context.
1999-06-27 22:08:38 +00:00
peter
6d9ab211eb Minor tweaks to make sure (new) prerequisites for <sys/buf.h> (mostly
splbio()/splx()) are #included in time.
1999-06-27 11:44:22 +00:00
peter
e77d71685c There isn't much point waking up a daemon that hasn't existed since
softupdates came in.  Try calling speedup_syncer() instead..
1999-06-26 14:56:58 +00:00
mckusick
5b58f2f951 Convert buffer locking from using the B_BUSY and B_WANTED flags to using
lockmgr locks. This commit should be functionally equivalent to the old
semantics. That is, all buffer locking is done with LK_EXCLUSIVE
requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will
be done in future commits.
1999-06-26 02:47:16 +00:00
alc
f972513d8f Remove (1) "extern" declarations for variables that were previously
made "static" and (2) initialized but unused variables.
1999-06-22 07:18:20 +00:00
alc
72c18c01ff Remove vm_object::cache_count and vm_object::wired_count. They are
not used.  (Nor is there any planned use by John who introduced them.)

Reviewed by:	"John S. Dyson" <toor@dyson.iquest.net>
1999-06-20 21:47:02 +00:00
alc
aa61764b70 Set cnt.v_page_size to PAGE_SIZE rather than DEFAULT_PAGE_SIZE so that
"vmstat -s" reports the correct value on the Alpha.

Submitted by:	Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp>
1999-06-20 04:55:29 +00:00
alc
c9ce3ad902 Remove some unused function and variable declarations. 1999-06-19 18:42:53 +00:00
alc
90cdb68b0a vm_map_growstack uses vmspace::vm_ssize as though it contained
the stack size in bytes when in fact it is the stack size in pages.
1999-06-17 21:29:38 +00:00
alc
2f13d2c4a2 vm_map_insert sometimes extends an existing vm_map entry, rather than
creating a new entry.  vm_map_stack and vm_map_growstack can panic when
a new entry isn't created.  Fixed vm_map_stack and vm_map_growstack.

Also, when extending the stack, always set the protection to VM_PROT_ALL.
1999-06-17 05:49:00 +00:00
alc
9fb63b7080 Move vm_map_stack and vm_map_growstack after the definition
of the vm_map_clip_end macro.  (The next commit will modify
vm_map_stack and vm_map_growstack to use vm_map_clip_end.)
1999-06-17 00:39:26 +00:00
alc
bdb6bd3066 Remove some unused declarations and duplicate initialization. 1999-06-17 00:27:39 +00:00
alc
15096c63d6 vm_map_protect:
The wrong vm_map_entry is used to determine if writes must not be
	allowed due to COW.
1999-06-12 23:10:38 +00:00
dt
309c487efd Add a function kmem_alloc_nofault() - same as kmem_alloc_pageable(), but
create a nofault entry. It will be used to allocate kmem for upages.

(I am not too happy with all this, but it's better than nothing).
1999-06-08 17:03:28 +00:00
alc
3d024af49f vm_mmap:
Insure that device mappings get MAP_PREFAULT(_PARTIAL) set,
	so that 4M page mappings are used when possible.

Reviewed by:	Luoqi Chen <luoqi@watermarkgroup.com>
1999-06-05 18:21:53 +00:00
phk
1efee9b283 Shorten a detour around dev_t to get a udev_t created. 1999-06-01 17:11:27 +00:00
phk
6a5dc97620 Simplify cdevsw registration.
The cdevsw_add() function now finds the major number(s) in the
struct cdevsw passed to it.  cdevsw_add_generic() is no longer
needed, cdevsw_add() does the same thing.

cdevsw_add() will print an message if the d_maj field looks bogus.

Remove nblkdev and nchrdev variables.  Most places they were used
bogusly.  Instead check a dev_t for validity by seeing if devsw()
or bdevsw() returns NULL.

Move bdevsw() and devsw() functions to kern/kern_conf.c

Bump __FreeBSD_version to 400006

This commit removes:
        72 bogus makedev() calls
        26 bogus SYSINIT functions

if_xe.c bogusly accessed cdevsw[], author/maintainer please fix.

I4b and vinum not changed.  Patches emailed to authors.  LINT
probably broken until they catch up.
1999-05-31 11:29:30 +00:00
phk
7e4a9dced9 This commit should be a extensive NO-OP:
Reformat and initialize correctly all "struct cdevsw".

        Initialize the d_maj and d_bmaj fields.

        The d_reset field was not removed, although it is never used.

I used a program to do most of this, so all the files now use the
same consistent format.  Please keep it that way.

Vinum and i4b not modified, patches emailed to respective authors.
1999-05-30 16:53:49 +00:00
alc
a20319a2ae Addendum to 1.155. Verify the existence of the object before checking
its reference count.
1999-05-30 01:12:19 +00:00
alc
2bb1b6a2ae Avoid the creation of unnecessary shadow objects. 1999-05-28 03:39:44 +00:00
alc
016bcfa9ed vm_map_insert:
General cleanup.  Eliminate coalescing checks that are duplicated
	by vm_object_coalesce.
1999-05-18 05:38:48 +00:00
alc
936a553033 Add the options MAP_PREFAULT and MAP_PREFAULT_PARTIAL to vm_map_find/insert,
eliminating the need for the pmap_object_init_pt calls in imgact_* and
mmap.

Reviewed by:	David Greenman <dg@root.com>
1999-05-17 00:53:56 +00:00
alc
83a9863079 Remove prototypes for functions that don't exist anymore (vm_map.h).
Remove a useless argument from vm_map_madvise's interface (vm_map.c,
	vm_map.h, and vm_mmap.c).

Remove a redundant test in vm_uiomove (vm_map.c).

Make two changes to vm_object_coalesce:

1. Determine whether the new range of pages actually overlaps
the existing object's range of pages before calling vm_object_page_remove.
(Prior to this change almost 90% of the calls to vm_object_page_remove
were to remove pages that were beyond the end of the object.)

2. Free any swap space allocated to removed pages.
1999-05-16 05:07:34 +00:00
dt
c936e75142 Fix confusion of size of transfer with size of the pager.
PR:		11658
Broken in:	1.89 (1998/03/07)
1999-05-15 23:42:39 +00:00
alc
e79d7db00b Simplify vm_map_find/insert's interface: remove the MAP_COPY_NEEDED option.
It never makes sense to specify MAP_COPY_NEEDED without also specifying
MAP_COPY_ON_WRITE, and vice versa.  Thus, MAP_COPY_ON_WRITE suffices.

Reviewed by:	David Greenman <dg@root.com>
1999-05-14 23:09:34 +00:00
bde
bdbf14bb39 Casting handles from void * to uintptr_t on the way to dev_t became
especially bogus when dev_t became a pointer.
1999-05-13 12:55:37 +00:00
luoqi
923ece6a8c Device pager's handle is dev_t not udev_t. 1999-05-13 04:02:07 +00:00
phk
760193e96e Fix a udev_t/dev_t mismatch which prevent paging from working. 1999-05-12 11:05:23 +00:00
phk
7e26ca1d1a Divorce "dev_t" from the "major|minor" bitmap, which is now called
udev_t in the kernel but still called dev_t in userland.

Provide functions to manipulate both types:
        major()         umajor()
        minor()         uminor()
        makedev()       umakedev()
        dev2udev()      udev2dev()

For now they're functions, they will become in-line functions
after one of the next two steps in this process.

Return major/minor/makedev to macro-hood for userland.

Register a name in cdevsw[] for the "filedescriptor" driver.

In the kernel the udev_t appears in places where we have the
major/minor number combination, (ie: a potential device: we
may not have the driver nor the device), like in inodes, vattr,
cdevsw registration and so on, whereas the dev_t appears where
we carry around a reference to a actual device.

In the future the cdevsw and the aliased-from vnode will be hung
directly from the dev_t, along with up to two softc pointers for
the device driver and a few houskeeping bits.  This will essentially
replace the current "alias" check code (same buck, bigger bang).

A little stunt has been provided to try to catch places where the
wrong type is being used (dev_t vs udev_t), if you see something
not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if
it makes a difference.  If it does, please try to track it down
(many hands make light work) or at least try to reproduce it
as simply as possible, and describe how to do that.

Without DEVT_FASCIST I belive this patch is a no-op.

Stylistic/posixoid comments about the userland view of the <sys/*.h>
files welcome now, from userland they now contain the end result.

Next planned step: make all dev_t's refer to the same devsw[] which
means convert BLK's to CHR's at the perimeter of the vnodes and
other places where they enter the game (bootdev, mknod, sysctl).
1999-05-11 19:55:07 +00:00
phk
91e7a75ba8 No point in swapdev being a static global when used only locally. 1999-05-09 17:28:00 +00:00
phk
500e41bd71 I got tired of seeing all the cdevsw[major(foo)] all over the place.
Made a new (inline) function devsw(dev_t dev) and substituted it.

Changed to the BDEV variant to this format as well: bdevsw(dev_t dev)

DEVFS will eventually benefit from this change too.
1999-05-08 06:40:31 +00:00
phk
693dd58bb3 Continue where Julian left off in July 1998:
Virtualize bdevsw[] from cdevsw.  bdevsw() is now an (inline)
        function.

        Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention
        to the order of the cmaj/bmaj arguments!)

        Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE
        (ditto!)

(Next step will be to convert all bdev dev_t's to cdev dev_t's
before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)
1999-05-07 10:11:40 +00:00
phk
7f79e0b14a Introduce two functions: physread() and physwrite() and use these directly
in *devsw[] rather than the 46 local copies of the same functions.

(grog will do the same for vinum when he has time)
1999-05-07 07:03:47 +00:00
peter
d247469429 Add brackets to silence egcs and help clarity. 1999-05-06 22:06:45 +00:00
phk
f57a01ebfc remove b_proc from struct buf, it's (now) unused.
Reviewed by:	dillon, bde
1999-05-06 20:00:34 +00:00
luoqi
148df5d8ab Don't ignore mmap() address hint below the text section. 1999-05-06 00:46:19 +00:00
billf
dd35516544 Add sysctl descriptions to many SYSCTL_XXXs
PR:		kern/11197
Submitted by:	Adrian Chadd <adrian@FreeBSD.org>
Reviewed by:	billf(spelling/style/minor nits)
Looked at by:	bde(style)
1999-05-03 23:57:32 +00:00
alc
5cb08a2652 The VFS/BIO subsystem contained a number of hacks in order to optimize
piecemeal, middle-of-file writes for NFS.  These hacks have caused no
end of trouble, especially when combined with mmap().  I've removed
them.  Instead, NFS will issue a read-before-write to fully
instantiate the struct buf containing the write.  NFS does, however,
optimize piecemeal appends to files.  For most common file operations,
you will not notice the difference.  The sole remaining fragment in
the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache
coherency issues with read-merge-write style operations.  NFS also
optimizes the write-covers-entire-buffer case by avoiding the
read-before-write.  There is quite a bit of room for further
optimization in these areas.

The VM system marks pages fully-valid (AKA vm_page_t->valid =
VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault.  This
is not correct operation.  The vm_pager_get_pages() code is now
responsible for marking VM pages all-valid.  A number of VM helper
routines have been added to aid in zeroing-out the invalid portions of
a VM page prior to the page being marked all-valid.  This operation is
necessary to properly support mmap().  The zeroing occurs most often
when dealing with file-EOF situations.  Several bugs have been fixed
in the NFS subsystem, including bits handling file and directory EOF
situations and buf->b_flags consistancy issues relating to clearing
B_ERROR & B_INVAL, and handling B_DONE.

getblk() and allocbuf() have been rewritten.  B_CACHE operation is now
formally defined in comments and more straightforward in
implementation.  B_CACHE for VMIO buffers is based on the validity of
the backing store.  B_CACHE for non-VMIO buffers is based simply on
whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear,
and vise-versa).  biodone() is now responsible for setting B_CACHE
when a successful read completes.  B_CACHE is also set when a bdwrite()
is initiated and when a bwrite() is initiated.  VFS VOP_BWRITE
routines (there are only two - nfs_bwrite() and bwrite()) are now
expected to set B_CACHE.  This means that bowrite() and bawrite() also
set B_CACHE indirectly.

There are a number of places in the code which were previously using
buf->b_bufsize (which is DEV_BSIZE aligned) when they should have
been using buf->b_bcount.  These have been fixed.  getblk() now clears
B_DONE on return because the rest of the system is so bad about
dealing with B_DONE.

Major fixes to NFS/TCP have been made.  A server-side bug could cause
requests to be lost by the server due to nfs_realign() overwriting
other rpc's in the same TCP mbuf chain.  The server's kernel must be
recompiled to get the benefit of the fixes.

Submitted by:	Matthew Dillon <dillon@apollo.backplane.com>
1999-05-02 23:57:16 +00:00
dt
ba8c622703 s/static foo_devsw_installed = 0;/static int foo_devsw_installed;/.
(Edited automatically)
1999-04-28 10:54:24 +00:00
phk
16e3fbd2c1 Suser() simplification:
1:
  s/suser/suser_xxx/

2:
  Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
  s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.
1999-04-27 11:18:52 +00:00
dt
9efe75f7a6 Make pmap_collect() an official pmap interface. 1999-04-23 20:29:58 +00:00
peter
a74bdeb7d1 unifdef -DVM_STACK - it's been on for a while for x86 and was checked
and appeared to be working for the Alpha some time ago.
1999-04-19 14:14:14 +00:00
peter
dfdcc62332 Move the declaration of faultin() from the vm headers to proc.h, since
it is now referenced from a macro there (PHOLD()).
1999-04-13 19:17:15 +00:00
eivind
29478a96d3 Staticize 1999-04-11 02:16:27 +00:00
dt
80578d3e92 Convert usage of vm_page_bits() to the new convention ("Inputs are required
to range within a page").
1999-04-10 20:52:11 +00:00
eivind
b790d856f9 Lock vnode correctly for VOP_OPEN.
Discussed with:	alc, dillon
1999-04-10 17:54:43 +00:00
peter
100f4abd46 Don't forcibly kill processes that are locked in-core via PHOLD - it was
just checking P_NOSWAP before.
1999-04-06 03:14:56 +00:00
peter
bc820937dc Only use p->p_lock (manage by PHOLD()/PRELE()) - P_NOSWAP/P_PHYSIO is no
longer set.
1999-04-06 03:11:34 +00:00
julian
0ed09d2ad5 Catch a case spotted by Tor where files mmapped could leave garbage in the
unallocated parts of the last page when the file ended on a frag
but not a page boundary.
Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF,
in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h
    vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c
    ufs/ufs/ufs_readwrite.c kern/vfs_bio.c

Submitted by: Matt Dillon <dillon@freebsd.org>
Reviewed by: Alan Cox <alc@freebsd.org>
1999-04-05 19:38:30 +00:00
alc
ad1fbba2a9 Two changes to vm_map_delete:
1. Don't bother checking object->ref_count == 1 in order to set
OBJ_ONEMAPPING.  It's a waste of time.  If object->ref_count == 1,
vm_map_entry_delete will "run-down" the object and its pages.

2. If object->ref_count == 1, ignore OBJ_ONEMAPPING.  Wait for
vm_map_entry_delete to "run-down" the object and its pages.
Otherwise, we're calling two different procedures to delete
the object's pages.

Note: "vmstat -s" will once again show a non-zero value
for "pages freed by exiting processes".
1999-04-04 07:11:02 +00:00
alc
a976359db5 Mainly, eliminate the comments about share maps. (We don't have share maps
any more.)  Also, eliminate an incorrect comment that says that we don't
coalesce vm_map_entry's.  (We do.)
1999-03-27 23:46:04 +00:00
eivind
fdc0436c85 Correct a comment. 1999-03-27 02:39:01 +00:00
alc
9b15de3986 Two changes:
Remove more (redundant) map timestamp increments from properly
synchronized routines.  (Changed: vm_map_entry_link, vm_map_entry_unlink,
and vm_map_pageable.)

Micro-optimize vm_map_entry_link and vm_map_entry_unlink, eliminating
unnecessary dereferences.  At the same time, converted them from macros
to inline functions.
1999-03-21 23:37:00 +00:00
alc
4bdf1d66da Construct the free queue(s) in descending order (by physical
address) so that the first 16MB of physical memory is allocated
last rather than first.  On large-memory machines, this avoids
the exhaustion of low physical memory before isa_dmainit has run.
1999-03-19 05:21:03 +00:00
alc
57d921a394 Correct a problem in kmem_malloc: A kmem_malloc allowing "wait" may
block (VM_WAIT) holding the map lock.  This is bad.  For example, a subsequent
kmem_malloc by an interrupt handler on the same map may find the lock held
and panic in the lockmgr.
1999-03-16 07:39:07 +00:00
alc
8baf85480b Two changes:
In general, vm_map_simplify_entry should be performed INSIDE
the loop that traverses the map, not outside.  (Changed:
vm_map_inherit, vm_map_pageable.)

vm_fault_unwire doesn't acquire the map lock (or block holding
it).  Thus, vm_map_set/clear_recursive shouldn't be called.
(Changed: vm_map_user_pageable, vm_map_pageable.)
1999-03-15 06:24:52 +00:00
julian
ec27b516c8 Fix breakage in last commit
Submitted by: Brian Feldman <green@unixhelp.org>
1999-03-15 05:09:48 +00:00
julian
8ad9ed65a4 A bit of a hack, but allows the vn device to be a module again.
Submitted by: Matt Dillon <dillon@freebsd.org>
1999-03-14 20:40:15 +00:00
julian
0c3f3973d2 Submitted by: Matt Dillon <dillon@freebsd.org>
The old VN device broke in -4.x when the definition of B_PAGING
changed. This patch fixes this plus implements additional capabilities.
The new VN device can be backed by a file ( as per normal ), or it can
be directly backed by swap.

Due to dependencies in VM include files  (on opt_xxx options) the new
vn device cannot be a module yet. This will be fixed in a later commit.
This commit delimitted by tags {PRE,POST}_MATT_VNDEV
1999-03-14 09:20:01 +00:00
alc
aa8bb4e29a Correct two optimization errors in vm_object_page_remove:
1. The size of vm_object::memq is vm_object::resident_page_count,
not vm_object::size.

2. The "size > 4" test sometimes results in the traversal of a ~1000 page
memq in order to locate ~10 pages.
1999-03-14 06:36:00 +00:00
alc
2d75a5cc4c Remove vm_page_frees from kmem_malloc that are performed
by vm_map_delete/vm_object_page_remove anyway.
1999-03-12 08:05:49 +00:00
julian
4726cfcda9 Stop the mfs from trying to swap out crucial bits of the mfs
as this can lead to deadlock.
Submitted by: Mat dillon <dillon@freebsd.org>
1999-03-12 00:44:03 +00:00
alc
143686d0c8 Remove (redundant) map timestamp increments from some properly
synchronized routines.
1999-03-09 08:00:17 +00:00
alc
118d31f1dd Remove an unused variable from vmspace_fork. 1999-03-08 03:53:07 +00:00
alc
65b8ae0944 Change vm_map_growstack to acquire and hold a read lock (instead of a write
lock) until it actually needs to modify the vm_map.

Note: it is legal to modify vm_map::hint without holding a write lock.

Submitted by:	"Richard Seaman, Jr." <dick@tar.com> with minor changes
		by myself.
1999-03-07 21:25:42 +00:00
alc
4e7ebf3dd0 Upgrading a map's lock to exclusive status should increment
the map's timestamp.  In general, whenever an exclusive lock is
acquired the timestamp should be incremented.
1999-03-06 07:11:33 +00:00
alc
9e3d479b9d To avoid a conflict for the vm_map's lock with vm_fault, release
the read lock around the subyte operations in mincore.  After the lock is
reacquired, use the map's timestamp to determine if we need to restart
the scan.
1999-03-02 22:55:02 +00:00
alc
5149bf6666 Remove the last of the share map code: struct vm_map::is_main_map.
Reviewed by:	Matthew Dillon <dillon@apollo.backplane.com>
1999-03-02 05:43:18 +00:00
alc
4d728cf4b3 mincore doesn't modify the vm_map. Therefore, it doesn't require
an exclusive lock.  A read lock will suffice.
1999-03-01 20:42:16 +00:00