Commit Graph

358 Commits

Author SHA1 Message Date
kris
d5babdb93c Correct a couple of spelling errors in comments. 1999-07-12 15:02:51 +00:00
mckusick
52ea4270f3 These changes appear to give us benefits with both small (32MB) and
large (1G) memory machine configurations.  I was able to run 'dbench 32'
on a 32MB system without bring the machine to a grinding halt.

    * buffer cache hash table now dynamically allocated.  This will
      have no effect on memory consumption for smaller systems and
      will help scale the buffer cache for larger systems.

    * minor enhancement to pmap_clearbit().  I noticed that
      all the calls to it used constant arguments.  Making
      it an inline allows the constants to propogate to
      deeper inlines and should produce better code.

    * removal of inherent vfs_ioopt support through the emplacement
      of appropriate #ifdef's, with John's permission.  If we do not
      find a use for it by the end of the year we will remove it entirely.

    * removal of getnewbufloops* counters & sysctl's - no longer
      necessary for debugging, getnewbuf() is now optimal.

    * buffer hash table functions removed from sys/buf.h and localized
      to vfs_bio.c

    * VFS_BIO_NEED_DIRTYFLUSH flag and support code added
      ( bwillwrite() ), allowing processes to block when too many dirty
      buffers are present in the system.

    * removal of a softdep test in bdwrite() that is no longer necessary
      now that bdwrite() no longer attempts to flush dirty buffers.

    * slight optimization added to bqrelse() - there is no reason
      to test for available buffer space on B_DELWRI buffers.

    * addition of reverse-scanning code to vfs_bio_awrite().
      vfs_bio_awrite() will attempt to locate clusterable areas
      in both the forward and reverse direction relative to the
      offset of the buffer passed to it.  This will probably not
      make much of a difference now, but I believe we will start
      to rely on it heavily in the future if we decide to shift
      some of the burden of the clustering closer to the actual
      I/O initiation.

    * Removal of the newbufcnt and lastnewbuf counters that Kirk
      added.  They do not fix any race conditions that haven't already
      been fixed by the gbincore() test done after the only call
      to getnewbuf().  getnewbuf() is a static, so there is no chance
      of it being misused by other modules.  ( Unless Kirk can think
      of a specific thing that this code fixes.  I went through it
      very carefully and didn't see anything ).

    * removal of VOP_ISLOCKED() check in flushbufqueues().  I do not
      think this check is necessary, the buffer should flush properly
      whether the vnode is locked or not. ( yes? ).

    * removal of extra arguments passed to getnewbuf() that are not
      necessary.

    * missed cluster_wbuild() that had to be a cluster_wbuild_wb() in
      vfs_cluster.c

    * vn_write() now calls bwillwrite() *PRIOR* to locking the vnode,
      which should greatly aid flushing operations in heavy load
      situations - both the pageout and update daemons will be able
      to operate more efficiently.

    * removal of b_usecount.  We may add it back in later but for now
      it is useless.  Prior implementations of the buffer cache never
      had enough buffers for it to be useful, and current implementations
      which make more buffers available might not benefit relative to
      the amount of sophistication required to implement a b_usecount.
      Straight LRU should work just as well, especially when most things
      are VMIO backed.  I expect that (even though John will not like
      this assumption) directories will become VMIO backed some point soon.

Submitted by:	Matthew Dillon <dillon@backplane.com>
Reviewed by:	Kirk McKusick <mckusick@mckusick.com>
1999-07-08 06:06:00 +00:00
mckusick
9d4f0d78fa The buffer queue mechanism has been reformulated. Instead of having
QUEUE_AGE, QUEUE_LRU, and QUEUE_EMPTY we instead have QUEUE_CLEAN,
QUEUE_DIRTY, QUEUE_EMPTY, and QUEUE_EMPTYKVA.  With this patch clean
and dirty buffers have been separated.  Empty buffers with KVM
assignments have been separated from truely empty buffers.  getnewbuf()
has been rewritten and now operates in a 100% optimal fashion.  That is,
it is able to find precisely the right kind of buffer it needs to
allocate a new buffer, defragment KVM, or to free-up an existing buffer
when the buffer cache is full (which is a steady-state situation for
the buffer cache).

Buffer flushing has been reorganized.  Previously buffers were flushed
in the context of whatever process hit the conditions forcing buffer
flushing to occur.  This resulted in processes blocking on conditions
unrelated to what they were doing.  This also resulted in inappropriate
VFS stacking chains due to multiple processes getting stuck trying to
flush dirty buffers or due to a single process getting into a situation
where it might attempt to flush buffers recursively - a situation that
was only partially fixed in prior commits.  We have added a new daemon
called the buf_daemon which is responsible for flushing dirty buffers
when the number of dirty buffers exceeds the vfs.hidirtybuffers limit.
This daemon attempts to dynamically adjust the rate at which dirty buffers
are flushed such that getnewbuf() calls (almost) never block.

The number of nbufs and amount of buffer space is now scaled past the
8MB limit that was previously imposed for systems with over 64MB of
memory, and the vfs.{lo,hi}dirtybuffers limits have been relaxed
somewhat.  The number of physical buffers has been increased with the
intention that we will manage physical I/O differently in the future.

reassignbuf previously attempted to keep the dirtyblkhd list sorted which
could result in non-deterministic operation under certain conditions,
such as when a large number of dirty buffers are being managed.  This
algorithm has been changed.  reassignbuf now keeps buffers locally sorted
if it can do so cheaply, and otherwise gives up and adds buffers to
the head of the dirtyblkhd list.  The new algorithm is deterministic but
not perfect.  The new algorithm greatly reduces problems that previously
occured when write_behind was turned off in the system.

The P_FLSINPROG proc->p_flag bit has been replaced by the more descriptive
P_BUFEXHAUST bit.  This bit allows processes working with filesystem
buffers to use available emergency reserves.  Normal processes do not set
this bit and are not allowed to dig into emergency reserves.  The purpose
of this bit is to avoid low-memory deadlocks.

A small race condition was fixed in getpbuf() in vm/vm_pager.c.

Submitted by:	Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by:	Kirk McKusick <mckusick@mckusick.com>
1999-07-04 00:25:38 +00:00
phk
b6d067fe70 Make sure that stat(2) and friends always return a valid st_dev field.
Pseudo-FS need not fill in the va_fsid anymore, the syscall code
will use the first half of the fsid, which now looks like a udev_t
with major 255.
1999-07-02 16:29:47 +00:00
peter
6cb5fe6c6b Slight reorganization of kernel thread/process creation. Instead of using
SYSINIT_KT() etc (which is a static, compile-time procedure), use a
NetBSD-style kthread_create() interface.  kproc_start is still available
as a SYSINIT() hook.  This allowed simplification of chunks of the
sysinit code in the process.  This kthread_create() is our old kproc_start
internals, with the SYSINIT_KT fork hooks grafted in and tweaked to work
the same as the NetBSD one.

One thing I'd like to do shortly is get rid of nfsiod as a user initiated
process.  It makes sense for the nfs client code to create them on the
fly as needed up to a user settable limit.  This means that nfsiod
doesn't need to be in /sbin and is always "available".  This is a fair bit
easier to do outside of the SYSINIT_KT() framework.
1999-07-01 13:21:46 +00:00
mckusick
5b58f2f951 Convert buffer locking from using the B_BUSY and B_WANTED flags to using
lockmgr locks. This commit should be functionally equivalent to the old
semantics. That is, all buffer locking is done with LK_EXCLUSIVE
requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will
be done in future commits.
1999-06-26 02:47:16 +00:00
mckusick
88e39a63db Add a vnode argument to VOP_BWRITE to get rid of the last vnode
operator special case. Delete special case code from vnode_if.sh,
vnode_if.src, umap_vnops.c, and null_vnops.c.
1999-06-16 23:27:55 +00:00
mckusick
02e5fe8035 Get rid of the global variable rushjob and replace it with a function in
kern/vfs_subr.c named speedup_syncer() which handles the speedup request.
Change the various clients of rushjob to use the new function.
1999-06-15 23:37:29 +00:00
phk
6a5dc97620 Simplify cdevsw registration.
The cdevsw_add() function now finds the major number(s) in the
struct cdevsw passed to it.  cdevsw_add_generic() is no longer
needed, cdevsw_add() does the same thing.

cdevsw_add() will print an message if the d_maj field looks bogus.

Remove nblkdev and nchrdev variables.  Most places they were used
bogusly.  Instead check a dev_t for validity by seeing if devsw()
or bdevsw() returns NULL.

Move bdevsw() and devsw() functions to kern/kern_conf.c

Bump __FreeBSD_version to 400006

This commit removes:
        72 bogus makedev() calls
        26 bogus SYSINIT functions

if_xe.c bogusly accessed cdevsw[], author/maintainer please fix.

I4b and vinum not changed.  Patches emailed to authors.  LINT
probably broken until they catch up.
1999-05-31 11:29:30 +00:00
jb
5dc5dd6c49 Remove the test for bdevsw(dev) == NULL from bdevvp() because it fails
if there is no character device associated with the block device. In this
case that doesn't matter because bdevvp() doesn't use the character
device structure.

I can use the pointy bit of the axe too.
1999-05-24 00:34:10 +00:00
luoqi
6f6fbfa99e Legally acquire a major number for mfs. 1999-05-14 20:40:23 +00:00
mckusick
48fc2a0eac Previously directories were sync'ed every 10 seconds while bitmaps &
inodes were synced every 15 seconds. This is now reversed as during
directory create, we cannot commit the directory entry until its
inode has been written. With this switch, the inodes will be more
likely to be written by the time that the directory is written thus
reducing the number of directory rollbacks that are needed.
1999-05-14 01:29:21 +00:00
peter
64dd9c44eb Fix (?) SPECHASH dev_t/major/minor/etc args 1999-05-12 19:06:40 +00:00
phk
23c70ba4d7 Don't peek into dev_t 1999-05-12 11:06:07 +00:00
phk
7e26ca1d1a Divorce "dev_t" from the "major|minor" bitmap, which is now called
udev_t in the kernel but still called dev_t in userland.

Provide functions to manipulate both types:
        major()         umajor()
        minor()         uminor()
        makedev()       umakedev()
        dev2udev()      udev2dev()

For now they're functions, they will become in-line functions
after one of the next two steps in this process.

Return major/minor/makedev to macro-hood for userland.

Register a name in cdevsw[] for the "filedescriptor" driver.

In the kernel the udev_t appears in places where we have the
major/minor number combination, (ie: a potential device: we
may not have the driver nor the device), like in inodes, vattr,
cdevsw registration and so on, whereas the dev_t appears where
we carry around a reference to a actual device.

In the future the cdevsw and the aliased-from vnode will be hung
directly from the dev_t, along with up to two softc pointers for
the device driver and a few houskeeping bits.  This will essentially
replace the current "alias" check code (same buck, bigger bang).

A little stunt has been provided to try to catch places where the
wrong type is being used (dev_t vs udev_t), if you see something
not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if
it makes a difference.  If it does, please try to track it down
(many hands make light work) or at least try to reproduce it
as simply as possible, and describe how to do that.

Without DEVT_FASCIST I belive this patch is a no-op.

Stylistic/posixoid comments about the userland view of the <sys/*.h>
files welcome now, from userland they now contain the end result.

Next planned step: make all dev_t's refer to the same devsw[] which
means convert BLK's to CHR's at the perimeter of the vnodes and
other places where they enter the game (bootdev, mknod, sysctl).
1999-05-11 19:55:07 +00:00
phk
bac74fbd54 Fix some of the places where too much inside knowledge about major/minor
layout and dev_t structure is being (ab)used.
1999-05-08 07:02:41 +00:00
phk
500e41bd71 I got tired of seeing all the cdevsw[major(foo)] all over the place.
Made a new (inline) function devsw(dev_t dev) and substituted it.

Changed to the BDEV variant to this format as well: bdevsw(dev_t dev)

DEVFS will eventually benefit from this change too.
1999-05-08 06:40:31 +00:00
phk
693dd58bb3 Continue where Julian left off in July 1998:
Virtualize bdevsw[] from cdevsw.  bdevsw() is now an (inline)
        function.

        Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention
        to the order of the cmaj/bmaj arguments!)

        Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE
        (ditto!)

(Next step will be to convert all bdev dev_t's to cdev dev_t's
before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)
1999-05-07 10:11:40 +00:00
billf
dd35516544 Add sysctl descriptions to many SYSCTL_XXXs
PR:		kern/11197
Submitted by:	Adrian Chadd <adrian@FreeBSD.org>
Reviewed by:	billf(spelling/style/minor nits)
Looked at by:	bde(style)
1999-05-03 23:57:32 +00:00
julian
f27c95753f Reviewed by: Many at differnt times in differnt parts,
including alan, john, me, luoqi, and kirk
Submitted by:	Matt Dillon <dillon@frebsd.org>

This change implements a relatively sophisticated fix to getnewbuf().
There were two problems with getnewbuf(). First, the writerecursion
can lead to a system stack overflow when you have NFS and/or VN
devices in the system. Second, the free/dirty buffer accounting was
completely broken. Not only did the nfs routines blow it trying to
manually account for the buffer state, but the accounting that was
done did not work well with the purpose of their existance: figuring
out when getnewbuf() needs to sleep.

The meat of the change is to kern/vfs_bio.c. The remaining diffs are
all minor except for NFS, which includes both the fixes for bp
interaction AND fixes for a 'biodone(): buffer already done' lockup.
Sys/buf.h also contains a chaining structure which is not used by
this patchset but is used by other patches that are coming soon.
This patch deliniated by tags PRE_MAT_GETBUF and POST_MAT_GETBUF.
(sorry for the missing T matt)
1999-03-12 02:24:58 +00:00
dillon
9b8aaaa737 Reviewed by: Julian Elischer <julian@whistle.com>
Add d_parms() to {c,b}devsw[].  If non-NULL this function points to
    a device routine that will properly fill in the specinfo structure.
    vfs_subr.c's checkalias() supplies appropriate defaults.  This change
    should be fully backwards compatible with existing devices.
1999-02-25 05:22:30 +00:00
dillon
17b6679e7e Protect vn worklist and vn->v_{clean,dirty}blkhd at splbio().
Get rid of extra LIST_REMOVE()

Reviewed by:	hsu@FreeBSD.ORG (Jeffrey Hsu), mckusick@McKusick.COM
Submitted by:	hsu@FreeBSD.ORG (Jeffrey Hsu), dillon@backplane.com ( Matthew Dillon )
1999-02-19 17:36:58 +00:00
dillon
ecfbb44ba1 vp->v_object must be valid after normal flow of vfs_object_create()
completes, change if() to KASSERT().  This is not a bug, we are
    simplify clarifying and optimizing the code.

    In if/else in vfs_object_create(), the failure of both conditionals
    will lead to a NULL object.  Exit gracefully if this case occurs.
    ( this case does not normally occur, but needed to be handled ).

Obtained from: Eivind Eklund <eivind@FreeBSD.org>
1999-02-04 18:25:39 +00:00
dillon
a784c6e33d More const fixes for -Wall, -Wcast-qual 1999-01-29 23:18:50 +00:00
dillon
975fba8a24 Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile
1999-01-28 00:57:57 +00:00
dillon
df24433bbe This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
    fixes, several VM optimizations, and some additional revamping of the
    VM code.  The specific bug fixes will be documented with additional
    forced commits.  This commit is somewhat rough in regards to code
    cleanup issues.

Reviewed by:	"John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
1999-01-21 08:29:12 +00:00
eivind
89e1199534 KNFize, by bde. 1999-01-10 01:58:29 +00:00
eivind
a8dc66f457 Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as
discussed on -hackers.

Introduce 'KASSERT(assertion, ("panic message", args))' for simple
check + panic.

Reviewed by:	msmith
1999-01-08 17:31:30 +00:00
eivind
ffaaca5874 Remove the 'waslocked' parameter to vfs_object_create(). 1999-01-05 18:50:03 +00:00
eivind
2b3e3223c1 Finish staticization. 1999-01-05 18:12:29 +00:00
bde
734d13314e Ifdefed conditionally used simplock variables. 1999-01-02 11:34:57 +00:00
bde
85c558753b Restored rev.1.31 which was clobbered by rev.1.69 (the big Lite2
merge).  This fixes at least hanging in revoke(2) when a somewhat
active slave pty is revoked.  The hang made the window for the
null pointer bug in ufsspec_{read,write} much larger.

There are many other bugs in this area (revoke of an active fifo
at best leaks memory...).
1998-12-24 12:07:16 +00:00
eivind
a4213663c9 Check return value of tsleep(). I've checked of all call points -
there does not seem to be a problem with this.

PR:		kern/8732
Analysis by:	David G Andersen <danderse@cs.utah.edu>
Tested by:	Alfred Perlstein <bright@hotjobs.com>
1998-12-22 00:44:11 +00:00
eivind
a0317115f8 Staticize. 1998-12-21 23:38:33 +00:00
archie
982e80577d Examine all occurrences of sprintf(), strcat(), and str[n]cpy()
for possible buffer overflow problems. Replaced most sprintf()'s
with snprintf(); for others cases, added terminating NUL bytes where
appropriate, replaced constants like "16" with sizeof(), etc.

These changes include several bug fixes, but most changes are for
maintainability's sake. Any instance where it wasn't "immediately
obvious" that a buffer overflow could not occur was made safer.

Reviewed by:	Bruce Evans <bde@zeta.org.au>
Reviewed by:	Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by:	Mike Spengler <mks@networkcs.com>
1998-12-04 22:54:57 +00:00
peter
ca67637f92 Convert lists for bufs attached to vnodes from a LIST to a TAILQ.
- Use TAILQ_* macros extensively instead of internal names
- use b_xflags instead of the NOLIST magic number hack in the next pointer
- clean bufs are inserted at the tail rather than the head.
- redo dirty buffer insert so that metadata (negative lbn) goes to the
  tail directly rather than at the HEAD.  This makes a difference when
  inserting dirty data blocks in lbn sorted order since data block
  insertion will not have to bypass all the metadata cruft.  data is
  lbn sorted since it makes sense for clustering and writeback ordering,
  while metadata sorting doesn't help much since the lbn's are
  meaningless when walking the list for writebacks.

Small systems will not notice much (if any) benefit from this, but really
busy systems with large dirty block lists should get a lot more.

I've tested this with softdep, and it doesn't seem to mind the change of
queueing of metadata.

Reviewed (in princible) by: dg
Obtained from: partly from John Dyson's work-in-progress patches in June.
1998-10-31 14:20:39 +00:00
peter
c9ebff45d0 The last argument to vm_object_page_clean() are now bit flags, rather than
the old true/false.

While here, have vfs_msync() only call vm_object_page_clean() with
OBJPC_SYNC if called with MNT_WAIT flags.  vfs_msync() is called at unmount
time (with MNT_WAIT) and from the syncer process (formerly update).
This should make dirty mmap writebacks a little less nasty.

I have tested this a little with SOFTUPDATES enabled, but I don't normally
use it since I've been badly burned too many times.
1998-10-31 07:42:04 +00:00
bde
9a84781068 Oops, rev.1.167 made the device number checking in bdevvp() too strict
for mfs root mounts.  Don't require major 255 to be in bdevsw[].
1998-10-29 11:50:32 +00:00
peter
fe9b0651f9 Remove the V_SAVEMETA flag, nothing uses it any more now that msdosfs and
ext2fs call vtruncbuf() directly.  This simplifies and cleans up
vinvalbuf() a little.
1998-10-29 09:51:28 +00:00
bde
285a46e7e4 Updated the major number check in vfs_object_create(). It's not
clear if the check is necessary, but vfs_object_create() is called
for all vnodes and it was silly to create objects for VBLK vnodes
that don't even have a driver.
1998-10-26 08:07:00 +00:00
phk
13c66194f4 Nitpicking and dusting performed on a train. Removes trivial warnings
about unused variables, labels and other lint.
1998-10-25 17:44:59 +00:00
bde
b17bde009a Fixed device number checking in bdevvp():
- dev != NODEV was checked for, but 0 was returned on failure.  This was
  fixed in Lite2 (except the return code was still slightly wrong (ENODEV
  instead of ENXIO)) but the changes were not merged.  This case probably
  doesn't actually occur under FreeBSD.
- major(dev) was not checked to have a valid non-NULL bdevsw entry.  This
  caused panics when the driver for the root device didn't exist.

Fixed minor misformattings in bdevvp().  Rev.1.14 consisted mainly of
gratuitous reformattings that seem to have caused many Lite2 merge
errors.

PR:			8417
1998-10-25 16:11:49 +00:00
dt
1f69c742ab Backed out rev. 1.164. It caused problems on SMP.
PR:		8309
1998-10-14 15:05:52 +00:00
dg
3defb6d13f Fixed two potentially serious classes of bugs:
1) The vnode pager wasn't properly tracking the file size due to
   "size" being page rounded in some cases and not in others.
   This sometimes resulted in corrupted files. First noticed by
   Terry Lambert.
   Fixed by changing the "size" pager_alloc parameter to be a 64bit
   byte value (as opposed to a 32bit page index) and changing the
   pagers and their callers to deal with this properly.
2) Fixed a bogus type cast in round_page() and trunc_page() that
   caused some 64bit offsets and sizes to be scrambled. Removing
   the cast required adding casts at a few dozen callers.
   There may be problems with other bogus casts in close-by
   macros. A quick check seemed to indicate that those were okay,
   however.
1998-10-13 08:24:45 +00:00
dt
016509e247 UnVMIO vnodes of block devices when they are no longer in use. (Some
things, like msdosfs, do not work (panic) on devices with VMIO enabled.
FFS enable VMIO on mounted devices, and nothing previously disabled it, so,
after you mounted FFS floppy, you could not mount msdosfs floppy anymore...)

This is mostly a quick before-release fix.

Reviewed by:	bde
1998-10-12 20:14:09 +00:00
sos
8397655514 Remove the SLICE code.
This clearly needs alot more thought, and we dont need this to hunt
us down in 3.0-RELEASE.
1998-09-14 19:56:42 +00:00
bde
a84a2dedfc Instantiate `nfs_mount_type' in a standard file so that it is present
when nfs is an LKM.  Declare it in a header file.  Don't forget to use
it in non-Lite2 code.  Initialize it to -1 instead of to 0, since 0
will soon be the mount type number for the first vfs loaded.

NetBSD uses strcmp() to avoid this ugly global.
1998-09-05 15:17:34 +00:00
bde
d3f8ad1875 Oops, the previous revision unconfigured too much pre-Lite2 compatibilty
cruft.  At least lsvfs(1) was broken.
1998-08-29 13:13:10 +00:00
bde
92b68e1a4b Don't configure compatibility code for pre-Lite2 mount() calls by
default.  This code should go away soon.
1998-08-12 20:17:42 +00:00
dfr
bdb4d90548 Initialise all the fields separately in vattr_null since on the alpha
they are not all the same width.
1998-07-12 16:45:39 +00:00
bde
f0b863f4b5 Fixed printf format errors. 1998-07-11 07:46:16 +00:00
bde
9e868cbb1a Removed unused includes. 1998-06-21 18:02:50 +00:00
julian
da325152ee Replace 'sleep()' with 'tsleep()'
Accidentally imported from Kirk's codebase.

Pointed out by: various.
1998-06-10 22:02:14 +00:00
julian
acd840b15c Submitted by: Kirk McKusick <mckusick@McKusick.COM>
Fix for potential hang when trying to reboot the system or
to forcibly unmount a soft update enabled filesystem.
FreeBSD already handled the reboot case differently, this is however a better
fix.
1998-06-10 18:13:19 +00:00
dfr
1d5f38ac22 This commit fixes various 64bit portability problems required for
FreeBSD/alpha.  The most significant item is to change the command
argument to ioctl functions from int to u_long.  This change brings us
inline with various other BSD versions.  Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.
1998-06-07 17:13:14 +00:00
tegge
93d053207e Supply the correct process argument to dounmount when possible. 1998-05-17 19:38:55 +00:00
julian
0796a5c56e Add changes and code to implement a functional DEVFS.
This code will be turned on with the TWO options
DEVFS and SLICE. (see LINT)
Two labels PRE_DEVFS_SLICE and POST_DEVFS_SLICE will deliniate these changes.

/dev will be automatically mounted by init (thanks phk)
on bootup. See /sys/dev/slice/slice.4 for more info.
All code should act the same without these options enabled.

Mike Smith, Poul Henning Kamp, Soeren, and a few dozen others

This code does not support the following:
bad144 handling.
Persistance. (My head is still hurting from the last time we discussed this)
ATAPI flopies are not handled by the SLICE code yet.

When this code is running, all major numbers are arbitrary and COULD
be dynamically assigned. (this is not done, for POLA only)
Minor numbers for disk slices ARE arbitray and dynamically assigned.
1998-04-19 23:32:49 +00:00
peter
bbf574dac9 In vfs_msync(), test to see if the vnode being examined is "interesting"
(ie: it has a vm_object attached and is marked as OBJ_MIGHTBEDIRTY) before
attempting to lock it.  This should reduce the cpu hit that is incurred
when doing a sync(2) and when the syncer process is doing the 30-second
writeback of dirty mmap() data to disk.  Skip this speedup if we are
doing an unmount() to be sure to get everything - we can afford to
occasionally miss a msync while the system is running, but not at unmount.

I'm not sure about the VXLOCK and MNT_WAIT case, it seems a bit odd to skip
doing a page_clean at unmount time just because a vnode is VXLOCKed, but
that's what was being done before...
1998-04-18 06:26:16 +00:00
peter
a1bb161dc4 When the softdep conversion took place, the periodic vfs_msync() from
update got lost.  This is responsible for ensuring that dirty mmap() pages
get periodically written to disk.  Without it, long time mmap's might not
have their dirty pages written out at all of the system crashes or isn't
cleanly shut down.  This could be nasty if you've got a long-running
writing via mmap(), dirty pages used to get written to disk within 30
seconds or so.
1998-04-16 03:31:26 +00:00
tegge
eb926b31ca Unlock mountlist_slock if the mount point was busy (unmount in progress)
during the attempt at lazy fsync.
1998-04-15 18:37:49 +00:00
phk
9b703b1455 Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time.  Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by:	bde
1998-03-30 09:56:58 +00:00
bde
b5ae2c779b Removed unused #includes. 1998-03-28 13:25:01 +00:00
bde
a1015f7749 Don't depend on <sys/mount.h> including <sys/socket.h>. 1998-03-28 12:04:40 +00:00
dyson
aad85d5a04 In kern_physio.c fix tsleep priority messup.
In vfs_bio.c, remove b_generation count usage,
	remove redundant reassignbuf,
	remove redundant spl(s),
	manage page PG_ZERO flags more correctly,
	utilize in invalid value for b_offset until it
		is properly initialized.  Add asserts
		for #ifdef DIAGNOSTIC, when b_offset is
		improperly used.
	when a process is not performing I/O, and just waiting
		on a buffer generally, make the sleep priority
		low.
	only check page validity in getblk for B_VMIO buffers.

In vfs_cluster, add b_offset asserts, correct pointer calculation
	for clustered reads.  Improve readability of certain parts of
	the code.  Remove redundant spl(s).

In vfs_subr, correct usage of vfs_bio_awrite (From Andrew Gallatin
	<gallatin@cs.duke.edu>).  More vtruncbuf problems fixed.
1998-03-19 22:48:16 +00:00
dyson
dc6d7f19f8 Fix an embarassing problem in vtruncbuf. 1998-03-19 18:46:58 +00:00
dyson
ca4225334c Correct a severely evil bug in the vtruncbuf code. It didn't cause
me any problems until after the previous commit.  This problem then
caused a severe case of creeping crud on my diskdrive, and hosed
my system so bad, that I needed to do a complete reinstall.  Sorry!!!

I assume that others have manifest this bug.
1998-03-17 06:30:52 +00:00
dyson
9bd499d7fa Allow vfs_ioopt to be enabled with a (temporary) config option. 1998-03-16 02:13:03 +00:00
dyson
6e92f5716b Some VM improvements, including elimination of alot of Sig-11
problems.  Tor Egge and others have helped with various VM bugs
lately, but don't blame him -- blame me!!!

pmap.c:
1)	Create an object for kernel page table allocations.  This
	fixes a bogus allocation method previously used for such, by
	grabbing pages from the kernel object, using bogus pindexes.
	(This was a code cleanup, and perhaps a minor system stability
	 issue.)

pmap.c:
2)	Pre-set the modify and accessed bits when prudent.  This will
	decrease bus traffic under certain circumstances.

vfs_bio.c, vfs_cluster.c:
3)	Rather than calculating the beginning virtual byte offset
	multiple times, stick the offset into the buffer header, so
	that the calculated offset can be reused.  (Long long multiplies
	are often expensive, and this is a probably unmeasurable performance
	improvement, and code cleanup.)

vfs_bio.c:
4)	Handle write recursion more intelligently (but not perfectly) so
	that it is less likely to cause a system panic, and is also
	much more robust.

vfs_bio.c:
5)	getblk incorrectly wrote out blocks that are incorrectly sized.
	The problem is fixed, and writes blocks out ONLY when B_DELWRI
	is true.

vfs_bio.c:
6)	Check that already constituted buffers have fully valid pages.  If
	not, then make sure that the B_CACHE bit is not set. (This was
	a major source of Sig-11 type problems.)

vfs_bio.c:
7)	Fix a potential system deadlock due to an incorrectly specified
	sleep priority while waiting for a buffer write operation.  The
	change that I made opens the system up to serious problems, and
	we need to examine the issue of process sleep priorities.

vfs_cluster.c, vfs_bio.c:
8)	Make clustered reads work more correctly (and more completely)
	when buffers are already constituted, but not fully valid.
	(This was another system reliability issue.)

vfs_subr.c, ffs_inode.c:
9)	Create a vtruncbuf function, which is used by filesystems that
	can truncate files.  The vinvalbuf forced a file sync type operation,
	while vtruncbuf only invalidates the buffers past the new end of file,
	and also invalidates the appropriate pages.  (This was a system reliabiliy
	and performance issue.)

10)	Modify FFS to use vtruncbuf.

vm_object.c:
11)	Make the object rundown mechanism for OBJT_VNODE type objects work
	more correctly.  Included in that fix, create pager entries for
	the OBJT_DEAD pager type, so that paging requests that might slip
	in during race conditions are properly handled.  (This was a system
	reliability issue.)

vm_page.c:
12)	Make some of the page validation routines be a little less picky
	about arguments passed to them.  Also, support page invalidation
	change the object generation count so that we handle generation
	counts a little more robustly.

vm_pageout.c:
13)	Further reduce pageout daemon activity when the system doesn't
	need help from it.  There should be no additional performance
	decrease even when the pageout daemon is running.  (This was
	a significant performance issue.)

vnode_pager.c:
14)	Teach the vnode pager to handle race conditions during vnode
	deallocations.
1998-03-16 01:56:03 +00:00
dyson
7c9ff06841 Disable the vfs.ioopt option for now, so that we don't get gratuitious
bugreports.  I might not be able to fix the problems before 3.0, due
to other, more important things.
1998-03-14 19:50:36 +00:00
tegge
fca0f92630 Don't misuse vnode interlocks in routines that can be called from interrupts.
PR:		5893
1998-03-14 02:55:01 +00:00
julian
10c5ccc30a Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman)
Submitted by:	Kirk McKusick (mcKusick@mckusick.com)
Obtained from:  WHistle development tree
1998-03-08 09:59:44 +00:00
dyson
8ceb6160f4 This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code.  These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances.  Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code.  This code might have been committed seperately, but
almost everything is interrelated.

1)	Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
	are fully valid.
2)	Rather than deactivating erroneously read initial (header) pages in
	kern_exec, we now free them.
3)	Fix the rundown of non-VMIO buffers that are in an inconsistent
	(missing vp) state.
4)	Fix the disassociation of pages from buffers in brelse.  The previous
	code had rotted and was faulty in a couple of important circumstances.
5)	Remove a gratuitious buffer wakeup in vfs_vmio_release.
6)	Remove a crufty and currently unused cluster mechanism for VBLK
	files in vfs_bio_awrite.  When the code is functional, I'll add back
	a cleaner version.
7)	The page busy count wakeups assocated with the buffer cache usage were
	incorrectly cleaned up in a previous commit by me.  Revert to the
	original, correct version, but with a cleaner implementation.
8)	The cluster read code now tries to keep data associated with buffers
	more aggressively (without breaking the heuristics) when it is presumed
	that the read data (buffers) will be soon needed.
9)	Change to filesystem lockmgr locks so that they use LK_NOPAUSE.  The
	delay loop waiting is not useful for filesystem locks, due to the
	length of the time intervals.
10)	Correct and clean-up spec_getpages.
11)	Implement a fully functional nfs_getpages, nfs_putpages.
12)	Fix nfs_write so that modifications are coherent with the NFS data on
	the server disk (at least as well as NFS seems to allow.)
13)	Properly support MS_INVALIDATE on NFS.
14)	Properly pass down MS_INVALIDATE to lower levels of the VM code from
	vm_map_clean.
15)	Better support the notion of pages being busy but valid, so that
	fewer in-transit waits occur.  (use p->busy more for pageouts instead
	of PG_BUSY.)  Since the page is fully valid, it is still usable for
	reads.
16)	It is possible (in error) for cached pages to be busy.  Make the
	page allocation code handle that case correctly.  (It should probably
	be a printf or panic, but I want the system to handle coding errors
	robustly.  I'll probably add a printf.)
17)	Correct the design and usage of vm_page_sleep.  It didn't handle
	consistancy problems very well, so make the design a little less
	lofty.  After vm_page_sleep, if it ever blocked, it is still important
	to relookup the page (if the object generation count changed), and
	verify it's status (always.)
18)	In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19)	Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20)	Fix vm_pager_put_pages and it's descendents to support an int flag
	instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
dyson
1928b68b1a Change vfs.ioopt default back to '0'. 1998-03-01 23:07:45 +00:00
dyson
69e5a1e9f5 1) Use a more consistent page wait methodology.
2)	Do not unnecessarily force page blocking when paging
	pages out.
3)	Further improve swap pager performance and correctness,
	including fixing the paging in progress deadlock (except
	in severe I/O error conditions.)
4)	Enable vfs_ioopt=1 as a default.
5)	Fix and enable the page prezeroing in SMP mode.

All in all, SMP systems especially should show a significant
improvement in "snappyness."
1998-03-01 04:18:54 +00:00
dyson
32e0a3673a Clean-up the vget mechanism by permanently attaching VM objects to
vnodes, therefore vget doesn't need to do so anymore.  Other minor
improvements include the temp free vnode queue obeying the VAGE
flag and a printf that warns of to-be-removed code being executed.
1998-02-23 06:59:52 +00:00
kato
8b0f1ac87b Fixed vnode interlock handling.
Reviewed by:	Bruce Evans <bde@zeta.org.au>
            	Tor Egge <Tor.Egge@idi.ntnu.no>
1998-02-10 02:54:24 +00:00
eivind
d7a6ab2803 Staticize. 1998-02-09 06:11:36 +00:00
kato
95aa7532e2 When the vp is lcoked, vget() calls vfs_object_create() with
waslocked = TRUE.  This change may fix lockmgr panic in umapfs/nullfs.

PR:		5634
Reviewed by:	"John S. Dyson" <toor@dyson.iquest.net>
Suggested by:	Bruce Evans <bde@zeta.org.au>
1998-02-07 08:44:31 +00:00
eivind
4547a09753 Back out DIAGNOSTIC changes. 1998-02-06 12:14:30 +00:00
dyson
ebccbfc1ff 1) Start using a cleaner and more consistant page allocator instead
of the various ad-hoc schemes.
2)	When bringing in UPAGES, the pmap code needs to do another vm_page_lookup.
3)	When appropriate, set the PG_A or PG_M bits a-priori to both avoid some
	processor errata, and to minimize redundant processor updating of page
	tables.
4)	Modify pmap_protect so that it can only remove permissions (as it
	originally supported.)  The additional capability is not needed.
5)	Streamline read-only to read-write page mappings.
6)	For pmap_copy_page, don't enable write mapping for source page.
7)	Correct and clean-up pmap_incore.
8)	Cluster initial kern_exec pagin.
9)	Removal of some minor lint from kern_malloc.
10)	Correct some ioopt code.
11)	Remove some dead code from the MI swapout routine.
12)	Correct vm_object_deallocate (to remove backing_object ref.)
13)	Fix dead object handling, that had problems under heavy memory load.
14)	Add minor vm_page_lookup improvements.
15)	Some pages are not in objects, and make sure that the vm_page.c can
	properly support such pages.
16)	Add some more page deficit handling.
17)	Some minor code readability improvements.
1998-02-05 03:32:49 +00:00
eivind
c552a9a1c3 Turn DIAGNOSTIC into a new-style option. 1998-02-04 22:34:03 +00:00
tegge
fbf474f2d8 Update freevnodes when adding a vnode to the head of the free list. 1998-01-31 01:17:58 +00:00
dyson
8726294764 Add better support for larger I/O clusters, including larger physical
I/O.  The support is not mature yet, and some of the underlying implementation
needs help.  However, support does exist for IDE devices now.
1998-01-24 02:01:46 +00:00
dyson
197bd655c4 VM level code cleanups.
1)	Start using TSM.
	Struct procs continue to point to upages structure, after being freed.
	Struct vmspace continues to point to pte object and kva space for kstack.
	u_map is now superfluous.
2)	vm_map's don't need to be reference counted.  They always exist either
	in the kernel or in a vmspace.  The vmspaces are managed by reference
	counts.
3)	Remove the "wired" vm_map nonsense.
4)	No need to keep a cache of kernel stack kva's.
5)	Get rid of strange looking ++var, and change to var++.
6)	Change more data structures to use our "zone" allocator.  Added
	struct proc, struct vmspace and struct vnode.  This saves a significant
	amount of kva space and physical memory.  Additionally, this enables
	TSM for the zone managed memory.
7)	Keep ioopt disabled for now.
8)	Remove the now bogus "single use" map concept.
9)	Use generation counts or id's for data structures residing in TSM, where
	it allows us to avoid unneeded restart overhead during traversals, where
	blocking might occur.
10)	Account better for memory deficits, so the pageout daemon will be able
	to make enough memory available (experimental.)
11)	Fix some vnode locking problems. (From Tor, I think.)
12)	Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
	(experimental.)
13)	Significantly shrink, cleanup, and make slightly faster the vm_fault.c
	code.  Use generation counts, get rid of unneded collpase operations,
	and clean up the cluster code.
14)	Make vm_zone more suitable for TSM.

This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)

This is not the infamous, final cleanup of the vnode stuff, but a necessary
step.  Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)
1998-01-22 17:30:44 +00:00
dyson
b130b30c96 Tie up some loose ends in vnode/object management. Remove an unneeded
config option in pmap.  Fix a problem with faulting in pages.  Clean-up
some loose ends in swap pager memory management.

The system should be much more stable, but all subtile bugs aren't fixed yet.
1998-01-17 09:17:02 +00:00
dyson
9a35ec7fec Fix another vnode leak. 1998-01-12 03:15:01 +00:00
dyson
d9d8bf6d30 Fix some vnode management problems, and better mgmt of vnode free list.
Fix the UIO optimization code.
Fix an assumption in vm_map_insert regarding allocation of swap pagers.
Fix an spl problem in the collapse handling in vm_object_deallocate.
When pages are freed from vnode objects, and the criteria for putting
the associated vnode onto the free list is reached, either put the
vnode onto the list, or put it onto an interrupt safe version of the
list, for further transfer onto the actual free list.
Some minor syntax changes changing pre-decs, pre-incs to post versions.
Remove a bogus timeout (that I added for debugging) from vn_lock.

PHK will likely still have problems with the vnode list management, and
so do I, but it is better than it was.
1998-01-12 01:46:33 +00:00
dyson
66da5c4f34 Disable io optimizations again, minor bug found, and will be fixed in
a few days.
1998-01-07 09:26:29 +00:00
dyson
cb2800cd94 Make our v_usecount vnode reference count work identically to the
original BSD code.  The association between the vnode and the vm_object
no longer includes reference counts.  The major difference is that
vm_object's are no longer freed gratuitiously from the vnode, and so
once an object is created for the vnode, it will last as long as the
vnode does.

When a vnode object reference count is incremented, then the underlying
vnode reference count is incremented also.  The two "objects" are now
more intimately related, and so the interactions are now much less
complex.

When vnodes are now normally placed onto the free queue with an object still
attached.  The rundown of the object happens at vnode rundown time, and
happens with exactly the same filesystem semantics of the original VFS
code.  There is absolutely no need for vnode_pager_uncache and other
travesties like that anymore.

A side-effect of these changes is that SMP locking should be much simpler,
the I/O copyin/copyout optimizations work, NFS should be more ponderable,
and further work on layered filesystems should be less frustrating, because
of the totally coherent management of the vnode objects and vnodes.

Please be careful with your system while running this code, but I would
greatly appreciate feedback as soon a reasonably possible.
1998-01-06 05:26:17 +00:00
dyson
7bf56bd14a Add the vnode interlock back around vref. 1997-12-29 16:54:03 +00:00
dyson
8ab3ac77d2 Fix the decl of vfs_ioopt, allow LFS to compile again, fix a minor problem
with the object cache removal.
1997-12-29 01:03:55 +00:00
dyson
cd67bb82fe Lots of improvements, including restructring the caching and management
of vnodes and objects.  There are some metadata performance improvements
that come along with this.  There are also a few prototypes added when
the need is noticed.  Changes include:

1) Cleaning up vref, vget.
2) Removal of the object cache.
3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore.
4) Correct some missing LK_RETRY's in vn_lock.
5) Correct the page range in the code for msync.

Be gentle, and please give me feedback asap.
1997-12-29 00:25:11 +00:00
dyson
6bd1f74dcf Some performance improvements, and code cleanups (including changing our
expensive OFF_TO_IDX to btoc whenever possible.)
1997-12-19 09:03:37 +00:00
wollman
1a06d88098 Add support for poll(2) on files. vop_nopoll() now returns POLLNVAL
if one of the new poll types is requested; hopefully this will not break
any existing code.  (This is done so that programs have a dependable
way of determining whether a filesystem supports the extended poll types
or not.)

The new poll types added are:

	POLLWRITE - file contents may have been modified
	POLLNLINK - file was linked, unlinked, or renamed
	POLLATTRIB - file's attributes may have been changed
	POLLEXTEND - file was extended

Note that the internal operation of poll() means that it is impossible
for two processes to reliably poll for the same event (this could
be fixed but may not be worth it), so it is not possible to rewrite
`tail -f' to use poll at this time.
1997-12-15 03:09:59 +00:00
bde
975c3797b1 Staticized. 1997-11-22 08:35:46 +00:00
julian
ae22df605c Reviewed by: various.
Ever since I first say the way the mount flags were used I've hated the
fact that modes, and events, internal and exported, and short-term
and long term flags are all thrown together. Finally it's annoyed me enough..
This patch to the entire FreeBSD tree adds a second mount flag word
to the mount struct. it is not exported to userspace. I have moved
some of the non exported flags over to this word. this means that we now
have 8 free bits in the mount flags. There are another two that might
well move over, but which I'm not sure about.
The only user visible change would have been in pstat -v, except
that davidg has disabled it anyhow.
I'd still like to move the state flags and the 'command' flags
apart from each other.. e.g. MNT_FORCE really doesn't have the
same semantics as MNT_RDONLY, but that's left  for another day.
1997-11-12 05:42:33 +00:00
phk
4d26888936 Remove a bunch of variables which were unused both in GENERIC and LINT.
Found by:	-Wunused
1997-11-07 08:53:44 +00:00
phk
e3cdaf12b2 VFS interior redecoration.
Rename vn_default_error to vop_defaultop all over the place.
Move vn_bwrite from vfs_bio.c to vfs_default.c and call it vop_stdbwrite.
Use vop_null instead of nullop.
Move vop_nopoll from vfs_subr.c to vfs_default.c
Move vop_sharedlock from vfs_subr.c to vfs_default.c
Move vop_nolock from vfs_subr.c to vfs_default.c
Move vop_nounlock from vfs_subr.c to vfs_default.c
Move vop_noislocked from vfs_subr.c to vfs_default.c
Use vop_ebadf instead of *_ebadf.
Add vop_defaultop for getpages on master vnode in MFS.
1997-10-26 20:55:39 +00:00
phk
36e7a51ea1 Last major round (Unless Bruce thinks of somthing :-) of malloc changes.
Distribute all but the most fundamental malloc types.  This time I also
remembered the trick to making things static:  Put "static" in front of
them.

A couple of finer points by:	bde
1997-10-12 20:26:33 +00:00
phk
645e7b2ab6 Distribute and statizice a lot of the malloc M_* types.
Substantial input from:	bde
1997-10-11 18:31:40 +00:00
phk
dde8490d33 Dike out a weird warning. 1997-10-11 07:34:27 +00:00
phk
3783a8767e I lost a bit of my change in the last commit, this is more like it.
Noticed by:	bde
1997-09-26 08:08:58 +00:00
phk
099d96fad2 Reduce the target number of vnodes on the freelist from desiredvnodes
(usually a couple of thousand) to 25.  The measured impact on cache-hits
doesn't justify spending memory this way:

Target number of free vnodes versus namecache hit rate in % during a
make world:
          10    98.5316
         200    98.5479
         500    98.5546
        1000    98.5709
        3000    98.6006
        4000    98.6126
1997-09-25 16:17:57 +00:00
phk
4d1dd5bc33 A couple of handles to tweak, more statistics. 1997-09-24 07:46:54 +00:00
bde
1062c10a86 Fixed gratuitous ANSIisms. 1997-09-16 11:44:05 +00:00
peter
1ffbda9a9e Provide a 'return true' poll vnode op rather than duplicating the
'do nothing' case all over the various filesystems.
1997-09-14 02:49:06 +00:00
peter
723553368e print correct function name in a panic (vop_nolock -> vop_sharedlock) 1997-09-13 15:02:28 +00:00
bde
bcade9a903 Removed yet more vestiges of config-time swap configuration and/or
cleaned up nearby cruft.
1997-09-07 16:21:11 +00:00
bde
fc775e3711 Removed vestiges of config-time "argument processing" configuration. 1997-09-07 13:49:56 +00:00
phk
c202b61548 Hmm, this is hopefully better. 1997-09-03 13:29:41 +00:00
phk
cbe64dd591 Revert the v_usecount handling in relation to VOP_INACTIVE. 1997-09-03 09:18:48 +00:00
bde
6ffb8bf9af Removed unused #includes. 1997-09-02 20:06:59 +00:00
phk
0b3a12b83e Change the 0xdeadb hack to a flag called VDOOMED.
Introduce VFREE which indicates that vnode is on freelist.
Rename vholdrele() to vdrop().
Create vfree() and vbusy() to add/delete vnode from freelist.
Add vfree()/vbusy() to keep (v_holdcnt != 0 || v_usecount != 0)
  vnodes off the freelist.
Generalize vhold()/v_holdcnt to mean "do not recycle".
Fix reassignbuf()s lack of use of vhold().
Use vhold() instead of checking v_cache_src list.
Remove vtouch(), the vnodes are always vget'ed soon enough
  after for it to have any measuable effect.
Add sysctl debug.freevnodes to keep track of things.
Move cache_purge() up in getnewvnodes to avoid race.
Decrement v_usecount after VOP_INACTIVE(), put a vhold() on
  it during VOP_INACTIVE()
Unmacroize vhold()/vdrop()
Print out VDOOMED and VFREE flags (XXX: should use %b)

Reviewed by:		dyson
1997-08-31 07:32:39 +00:00
bde
c312d330af Restored rev.1.92 which was clobbered by the previous commit. 1997-08-26 11:59:20 +00:00
dyson
b90433b1a9 Back out some incorrect changes that was worse than the original bug. 1997-08-26 04:36:27 +00:00
dyson
042ae4067b This is a trial improvement for the vnode reference count while on the vnode
free list problem.  Also, the vnode age flag is no longer used by the
vnode pager.  (It is actually incorrect to use then.)  Constructive
feedback welcome -- just be kind.
1997-08-22 03:56:37 +00:00
bde
6be005551f #include <machine/limits.h> explicitly in the few places that it is required. 1997-08-21 20:33:42 +00:00
wollman
4542c1cf5d Fix all areas of the system (or at least all those in LINT) to avoid storing
socket addresses in mbufs.  (Socket buffers are the one exception.)  A number
of kernel APIs needed to get fixed in order to make this happen.  Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead.  Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.
1997-08-16 19:16:27 +00:00
dyson
4811e46aa5 Fix a problem with the vfs vnode caching that it doesn't grow quickly
enough and can cause some strange performance problems.  Specifically, at
or near startup time is when the problem is worst.  To reproduce
the problem, run "lat_syscall stat" from the alpha lmbench code right
after bootup.  A positive side effect of this mod is that the name
cache can be set to grow again by sysctl.  A noticable positive
performance impact is realized due to a larger namecache being available
as needed (or tuned.)
1997-08-04 07:43:28 +00:00
dfr
df8d8e5713 Merge WebNFS support from NetBSD
Obtained from:	NetBSD
1997-07-17 07:17:33 +00:00
dyson
8786565a86 Remove a window during running down a file vnode. Also, the OBJ_DEAD
flag wasn't being respected during vref(), et. al.  Note that this
isn't the eventual fix for the locking problem.  Fine grained SMP
in the VM and VFS code will require (lots) more work.
1997-06-22 03:00:24 +00:00
dg
a1414ec53a Disabled the kern.vnode sysctl variable. It's causing system crashes on
large systems and needs to be re-thinked or removed wholesale.
1997-06-10 02:48:08 +00:00
phk
d8e3734a09 Fix a race condition that did, after all, exist.
Reviewed by:	phk
Submitted by:	dfr
1997-05-06 15:19:38 +00:00
phk
aa8738a5f3 1. Add a {pointer, v_id} pair to the vnode to store the reference to the
".." vnode.  This is cheaper storagewise than keeping it in the
    namecache, and it makes more sense since it's a 1:1 mapping.

2.  Also handle the case of "." more intelligently rather than stuff
    the namecache with pointless entries.

3.  Add two lists to the vnode and hang namecache entries which go from
    or to this vnode.  When cleaning a vnode, delete all namecache
    entries it invalidates.

4.  Never reuse namecache enties, malloc new ones when we need it, free
    old ones when they die.  No longer a hard limit on how many we can
    have.

5.  Remove the upper limit on namelength of namecache entries.

6.  Make a global list for negative namecache entries, limit their number
    to a sysctl'able (debug.ncnegfactor) fraction of the total namecache.
    Currently the default fraction is 1/16th.  (Suggestions for better
    default wanted!)

7.  Assign v_id correctly in the face of 32bit rollover.

8.  Remove the LRU list for namecache entries, not needed.  Remove the
    #ifdef NCH_STATISTICS stuff, it's not needed either.

9.  Use the vnode freelist as a true LRU list, also for namecache accesses.

10. Reuse vnodes more aggresively but also more selectively, if we can't
    reuse, malloc a new one.  There is no longer a hard limit on their
    number, they grow to the point where we don't reuse potentially
    usable vnodes.  A vnode will not get recycled if still has pages in
    core or if it is the source of namecache entries (Yes, this does
    indeed work :-)  "." and ".." are not namecache entries any longer...)

11. Do not overload the v_id field in namecache entries with whiteout
    information, use a char sized flags field instead, so we can get
    rid of the vpid and v_id fields from the namecache struct.  Since
    we're linked to the vnodes and purged when they're cleaned, we don't
    have to check the v_id any more.

12. NFS knew about the limitation on name length in the namecache, it
    shouldn't and doesn't now.

Bugs:
        The namecache statistics no longer includes the hits for ".."
        and "." hits.

Performance impact:
        Generally in the +/- 0.5% for "normal" workstations, but
        I hope this will allow the system to be selftuning over a
        bigger range of "special" applications.  The case where
        RAM is available but unused for cache because we don't have
        any vnodes should be gone.

Future work:
        Straighten out the namecache statistics.

        "desiredvnodes" is still used to (bogusly ?) size hash
        tables in the filesystems.

        I have still to find a way to safely free unused vnodes
        back so their number can shrink when not needed.

        There is a few uses of the v_id field left in the filesystems,
        scheduled for demolition at a later time.

        Maybe a one slot cache for unused namecache entries should
        be implemented to decrease the malloc/free frequency.
1997-05-04 09:17:38 +00:00
dyson
43ef323eb0 Staticize an unnecessarily global function: vputrele.
Submitted by:	 Michael Hancock <michaelh@cet.co.jp>
1997-04-30 03:09:15 +00:00
peter
74f136022f copyin the export network mask to the correct variable.
Submitted by: Mike Hibler <mike@marker.cs.utah.edu>, PR#3380
1997-04-25 06:47:12 +00:00
dfr
60008c7902 Add a function vop_sharedlock which a copy of vop_nolock without the
implementation #ifdef out.  This can be used for now by NFS.  As soon
as all the other filesystems' locking is fixed, this can go away.

Print the vnode address in vprint for easier debugging.
1997-04-04 17:46:21 +00:00
bde
da270d0a3c Use OID_AUTO instead of magic number for the Lite2 sysctl debug.busyprt.
Removed declaration of vfs_unmountroot() again.

Staticized vgonel().
1997-04-01 13:05:34 +00:00
dg
1a326a5d28 Fixed splbio problems in vinvalbuf. Closes PR#2875, although fixed
differently by me.
1997-03-05 04:54:54 +00:00
bde
61a92b4f52 Attach vfs_sysctl() one level lower so that only the levels below
VFS_GENERIC aren't done in the FreeBSD way.  The previous commit
broke the nfs sysctls.
1997-03-04 18:31:56 +00:00
bde
5fc94677bd Merged Lite2's vfs_sysctl(). It doesn't fit very well into FreeBSD's
(phk's) sysctl framework, and I needed special code to disambiguate
the VFS_GENERIC node from the VFS_VFSCONF leaf, so I only converted
the leaves to the FreeBSD framework.  The error handling isn't quite
right.  CSRGS's sysctls seem to return ENOTDIR too much and FreeBSD's
sysctls don't agree with the man page.
1997-03-03 12:58:20 +00:00
bde
a42104c77d Restored some pre-Lite2-merge source-level compatibility to the mount()
and getvfsbyname() interfaces.  The new interfaces are now hidden from
applications unless _NEW_VFSCONF is defined.  The new vfsconf interfaces
don't work yet.
1997-03-02 17:53:37 +00:00
bde
6e31892943 Moved vfs sysctls to where Lite2 put them. No code changes yet. 1997-03-02 11:06:22 +00:00
bde
b19a6b276f Fixed Lite2 merge of spechash simplelocking. It was misplaced in
checkalias() and missing in vfinddev() and vcount().
1997-02-27 16:08:43 +00:00
dyson
c33087f4e0 Fix the previous simple_lock fix breakage in the combined
vput/vrele routine.  Fix a panic message.  Fix the vop_nounlock
routine so that "special" filesystems that use it work correctly.
1997-02-27 05:28:58 +00:00
dyson
b3686e776a Fix the simple_lock problem with the physical I/O buffer code, and
also fix the missing simple_unlock in vrele, and improve vrele/vput
by merging them into one routine.  BDE pointed these problems out.
1997-02-27 02:57:03 +00:00
bde
c19ef73557 Fixed unmounting of the root fs. vfs_unmountroot() wasn't fully updated
to do Lite2 locking and vfs_unmountall() wasn't as simple as the Lite2
version.
1997-02-26 15:35:42 +00:00
bde
4f04146c67 Merged some missing locking from Lite2:
- getnewvnode() and vref() were missing one simple_unlock() each.
- the Lite2 locking changes weren't merged at all in
  printlockedvnodes() or sysctl_vnode().  Merging these undid
  some KNF style regressions.
1997-02-25 19:33:23 +00:00
peter
94b6d72794 Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.
1997-02-22 09:48:43 +00:00
dyson
10f666af84 This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
		Mount_std mounts will not work until the getfsent
		library routine is changed.

Reviewed by:	various people
Submitted by:	Jeffery Hsu <hsu@freebsd.org>
1997-02-10 02:22:35 +00:00
bde
89fb3ac98e Removed option EXTRAVNODES. All versions of FreeBSD-2.x have a sysctl
variable `kern.maxvnodes' which gives much better control over vnode
allocation than EXTRAVNODES (except in -current between 1995/10/28 and
1996/11/12, kern.maxvnodes was read-only and thus useless).
1997-01-16 13:16:10 +00:00
jkh
808a36ef65 Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore.  This update would have been
insane otherwise.
1997-01-14 07:20:47 +00:00
dyson
04a0ee9a2a This commit is the embodiment of some VFS read clustering improvements.
Firstly, now our read-ahead clustering is on a file descriptor basis and not
on a per-vnode basis.  This will allow multiple processes reading the
same file to take advantage of read-ahead clustering.  Secondly, there
previously was a problem with large reads still using the ramp-up
algorithm.  Of course, that was bogus, and now we read the entire
"chunk" off of the disk in one operation.   The read-ahead clustering
algorithm should use less CPU than the previous also (I hope :-)).

NOTE:  THAT LKMS MUST BE REBUILT!!!
1996-12-29 02:45:28 +00:00
bde
300de72f7a Restored writability of kern.maxvnodes. It was broken a year ago in
rev.1.29 of kern_sysctl.c.

Should be in 2.2.
1996-11-12 09:24:31 +00:00
phk
ef2dec2170 init_main.c: pass -d to init if DEVFS_ROOT
kern_conf.c:	gd driver is a disk.
vfs_subr.c:	include opt_devfs.h
1996-10-28 11:34:57 +00:00
jkh
1647f577a7 I'm not sure why, but Netcon's TFS filesystem code doesn't want to
add free vnodes back to the freelist.  They must do their own vnode
management.  Anyway, this change is *only* activated with their filesystem
and doesn't affect anyone else.  Whoops, forgot the submitted-by lines
in my previous commits too.. :-(
Submitted-By: Tony Ardolino <tony@netcon.com>
1996-10-17 17:56:07 +00:00
dyson
576dd5e9f6 Clean up the rundown of the object backing a vnode. This should fix
NFS problems associated with forcible dismounts.
1996-10-17 02:49:35 +00:00
dyson
3d9a637078 Correct vget by removing a window where a vnode can potentially go away. 1996-09-28 03:36:07 +00:00
nate
45c85d421d In sys/time.h, struct timespec is defined as:
/*
         * Structure defined by POSIX.4 to be like a timeval.
         */
        struct timespec {
                time_t  ts_sec;         /* seconds */
                long    ts_nsec;        /* and nanoseconds */
        };

        The correct names of the fields are tv_sec and tv_nsec.

Reminded by:	James Drobina <jdrobina@infinet.com>
1996-09-19 18:21:32 +00:00
dyson
966cbc5d29 Even though this looks like it, this is not a complex code change.
The interface into the "VMIO" system has changed to be more consistant
and robust.  Essentially, it is now no longer necessary to call vn_open
to get merged VM/Buffer cache operation, and exceptional conditions
such as merged operation of VBLK devices is simpler and more correct.

This code corrects a potentially large set of problems including the
problems with ktrace output and loaded systems, file create/deletes,
etc.

Most of the changes to NFS are cosmetic and name changes, eliminating
a layer of subroutine calls.  The direct calls to vput/vrele have
been re-instituted for better cross platform compatibility.

Reviewed by: davidg
1996-08-21 21:56:23 +00:00
dyson
be503a23b5 Certain vnode buffer list operations were not being spl protected,
and they needed to be.  Brelse for example can be called at interrupt
level, and the buffer list operations were not being protected from it.
1996-08-15 06:45:01 +00:00
bde
63ae415aca Only use the special bdevvp() for DEVFS if DEVFS_ROOT is defined. This
makes option DEVFS safe to use again (although mounting devfs is unsafe).
1996-07-30 18:00:32 +00:00
phk
3a73783c23 DEVFS needs a special bdevvp(). 1996-07-24 21:21:43 +00:00
bde
d449c2d3d4 Staticized a few variables.
Fixed warnings about unused variables.
1996-07-12 07:41:34 +00:00
peter
a6023afadf Add an option "EXTRA_VNODES" to cause an extra number of vnode structures
to be allocated at boot time.  This is an expensive option, as they
consume physical ram and are not pageable etc.  In certain situations,
this kind of option is quite useful, especially for news servers that
access a large number of directories at random and torture the name cache.
Defining 5000 or 10000 extra vnodes should cut down the amount of vnode
recycling somewhat, which should allow better name and directory caching
etc.

This is a "your mileage may vary" option, with no real indication of
what works best for your machine except trial and error.  Too many will
cost you ram that you could otherwise use for disk buffers etc.

This is based on something John Dyson mentioned to me a while ago.
1996-05-31 00:20:34 +00:00
dyson
67123e935d Put the "free vnode isn't" check back in the right place. 1996-03-09 06:43:19 +00:00
dyson
8fc8a772af Eliminated many redundant vm_map_lookup operations for vm_mmap.
Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish
	overhead for merged cache.
Efficiency improvement for vfs_cluster.  It used to do alot of redundant
	calls to cluster_rbuild.
Correct the ordering for vrele of .text and release of credentials.
Use the selective tlb update for 486/586/P6.
Numerous fixes to the size of objects allocated for files.  Additionally,
	fixes in the various pagers.
Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs.
Fixes in the swap pager for exhausted resources.  The pageout code
	will not as readily thrash.
Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into
	page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE),
	thereby improving efficiency of several routines.
Eliminate even more unnecessary vm_page_protect operations.
Significantly speed up process forks.
Make vm_object_page_clean more efficient, thereby eliminating the pause
	that happens every 30seconds.
Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the
	case of filesystems mounted async.
Fix a panic with busy pages when write clustering is done for non-VMIO
	buffers.
1996-01-19 04:00:31 +00:00
wollman
39d3a9a3d3 Convert DDB to new-style option. 1996-01-04 21:13:23 +00:00
dg
d409ba2b4f Moved the #ifdef DIAGNOSTIC in vrele() so that the check for negative
v_usecount is always performed and only the call to vprint is conditional.
1996-01-02 18:13:20 +00:00
phk
02cbd66c6d Staticize.
Unstaticize a function in scsi/scsi_base that was used, with an undocumented
option.
My last count on the LINT kernel shows:
Total symbols:  3647
unref symbols:   463
undef symbols:     4
1 ref symbols:  1751
2 ref symbols:   485
Approaching the pain threshold now.
1995-12-17 21:23:44 +00:00
dyson
601ed1a4c0 Changes to support 1Tb filesizes. Pages are now named by an
(object,index) pair instead of (object,offset) pair.
1995-12-11 04:58:34 +00:00
dg
c30f46c534 Untangled the vm.h include file spaghetti. 1995-12-07 12:48:31 +00:00
phk
ae642367e2 A couple of minor tweaks to the sysctl stuff. 1995-12-06 13:27:39 +00:00
bde
3089ef0816 Completed function declarations and/or added prototypes. 1995-12-02 18:58:56 +00:00
phk
e9b6a008cf A test was backwards.
Noticed by:	Cheng, Hsiao-Yang <sycheng@cis.ufl.edu>
1995-11-29 11:28:00 +00:00
phk
d0c66446cc Mega commit for sysctl.
Convert the remaining sysctl stuff to the new way of doing things.
the devconf stuff is the reason for the large number of files.
Cleaned up some compiler warnings while I were there.
1995-11-20 12:42:39 +00:00
bde
a7a396155f Fixed support for DIAGNOSTIC option. SYSCTL_INT() depends on kernel.h. 1995-11-16 09:45:23 +00:00
phk
dd42080c0c Change some of the debug sysctl vars. The semantics of these will change. 1995-11-14 09:19:16 +00:00
bde
a1927f222e Fixed type of vfs_free_netcred().
Removed redundant declaration of insmntque().
1995-11-11 00:27:00 +00:00
bde
449a11eb88 Introduced a type `vop_t' for vnode operation functions and used
it 1138 times (:-() in casts and a few more times in declarations.
This change is null for the i386.

The type has to be `typedef int vop_t(void *)' and not `typedef
int vop_t()' because `gcc -Wstrict-prototypes' warns about the
latter.  Since vnode op functions are called with args of different
(struct pointer) types, neither of these function types is any use
for type checking of the arg, so it would be preferable not to use
the complete function type, especially since using the complete
type requires adding 1138 casts to avoid compiler warnings and
another 40+ casts to reverse the function pointer conversions before
calling the functions.
1995-11-09 08:17:23 +00:00
dyson
1ffbdf1a44 This is a modification missed by me in the msync fixes a few days ago. 1995-11-07 05:09:43 +00:00
bde
8e1f5c1aad Call vfs_unbusy() before error returns from sysctl_vnode(). This fixes
PR 795.

Set the size before one error return from sysctl_vnode() the same as before
the other.  The caller might want to know about the amount successfully
read although the current caller doesn't.
1995-10-28 08:50:08 +00:00
bde
c8287897a6 Don't compile the diagnostic functions vhold() and holdrele() unless
DIAGNOSTIC is defined.
1995-08-25 20:49:44 +00:00
dg
5b4b270015 Converted mountlist to a CIRCLEQ.
Partially obtained from: 4.4BSD-Lite2
1995-08-11 11:31:18 +00:00
dg
c8b0a7332c NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!

Much needed overhaul of the VM system. Included in this first round of
changes:

1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
   haspage, and sync operations are supported. The haspage interface now
   provides information about clusterability. All pager routines now take
   struct vm_object's instead of "pagers".

2) Improved data structures. In the previous paradigm, there is constant
   confusion caused by pagers being both a data structure ("allocate a
   pager") and a collection of routines. The idea of a pager structure has
   escentially been eliminated. Objects now have types, and this type is
   used to index the appropriate pager. In most cases, items in the pager
   structure were duplicated in the object data structure and thus were
   unnecessary. In the few cases that remained, a un_pager structure union
   was created in the object to contain these items.

3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
   be removed. For instance, vm_object_enter(), vm_object_lookup(),
   vm_object_remove(), and the associated object hash list were some of the
   things that were removed.

4) simple_lock's removed. Discussion with several people reveals that the
   SMP locking primitives used in the VM system aren't likely the mechanism
   that we'll be adopting. Even if it were, the locking that was in the code
   was very inadequate and would have to be mostly re-done anyway. The
   locking in a uni-processor kernel was a no-op but went a long way toward
   making the code difficult to read and debug.

5) Places that attempted to kludge-up the fact that we don't have kernel
   thread support have been fixed to reflect the reality that we are really
   dealing with processes, not threads. The VM system didn't have complete
   thread support, so the comments and mis-named routines were just wrong.
   We now use tsleep and wakeup directly in the lock routines, for instance.

6) Where appropriate, the pagers have been improved, especially in the
   pager_alloc routines. Most of the pager_allocs have been rewritten and
   are now faster and easier to maintain.

7) The pagedaemon pageout clustering algorithm has been rewritten and
   now tries harder to output an even number of pages before and after
   the requested page. This is sort of the reverse of the ideal pagein
   algorithm and should provide better overall performance.

8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
   have been removed. Some other unnecessary casts have also been removed.

9) Some almost useless debugging code removed.

10) Terminology of shadow objects vs. backing objects straightened out.
    The fact that the vm_object data structure escentially had this
    backwards really confused things. The use of "shadow" and "backing
    object" throughout the code is now internally consistent and correct
    in the Mach terminology.

11) Several minor bug fixes, including one in the vm daemon that caused
    0 RSS objects to not get purged as intended.

12) A "default pager" has now been created which cleans up the transition
    of objects to the "swap" type. The previous checks throughout the code
    for swp->pg_data != NULL were really ugly. This change also provides
    the rudiments for future backing of "anonymous" memory by something
    other than the swap pager (via the vnode pager, for example), and it
    allows the decision about which of these pagers to use to be made
    dynamically (although will need some additional decision code to do
    this, of course).

13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
    object" code has been removed. MAP_COPY was undocumented and non-
    standard. It was furthermore broken in several ways which caused its
    behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
    continue to work correctly, but via the slightly different semantics
    of MAP_PRIVATE.

14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
    threads design can be worked around in other ways. Both #12 and #13
    were done to simplify the code and improve readability and maintain-
    ability. (As were most all of these changes)

TODO:

1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
   this will reduce the vnode pager to a mere fraction of its current size.

2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
   information provided by the new haspage pager interface. This will
   substantially reduce the overhead by eliminating a large number of
   VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
   improved to provide both a "behind" and "ahead" indication of
   contiguousness.

3) Implement the extended features of pager_haspage in swap_pager_haspage().
   It currently just says 0 pages ahead/behind.

4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
   via a much more general mechanism that could also be used for disk
   striping of regular filesystems.

5) Do something to improve the architecture of vm_object_collapse(). The
   fact that it makes calls into the swap pager and knows too much about
   how the swap pager operates really bothers me. It also doesn't allow
   for collapsing of non-swap pager objects ("unnamed" objects backed by
   other pagers).
1995-07-13 08:48:48 +00:00
dg
a5338d3518 Improve negative usecount diagnostic a little. 1995-07-08 04:10:32 +00:00
dg
3c7c1dd62f 1) Converted v_vmdata to v_object.
2) Removed unnecessary vm_object_lookup()/pager_cache(object, TRUE) pairs
   after vnode_pager_alloc() calls - the object is already guaranteed to be
   persistent.
3) Removed some gratuitous casts.
1995-06-28 12:01:13 +00:00
bde
f29b237263 Pass the correct nonblocking flag to VOP_CLOSE() in vclean().
VOP_CLOSE() takes `F' (file) flags, not `IO' flags.  At least that's
what close() passes.  I previously fixed ttylclose() to check
FNONBLOCK instead of IO_NDELAY.  This broke the call from vclean()
and cleaning of ptys sometimes deadlocked.
1995-06-27 21:29:08 +00:00
dg
2045200a00 Changes to fix the following bugs:
1) Files weren't properly synced on filesystems other than UFS. In some
   cases, this lead to lost data. Most likely would be noticed on NFS.
   The fix is to make the VM page sync/object_clean general rather than
   in each filesystem.
2) Mixing regular and mmaped file I/O on NFS was very broken. It caused
   chunks of files to end up as zeroes rather than the intended contents.
   The fix was to fix several race conditions and to kludge up the
   "b_dirtyoff" and "b_dirtyend" that NFS relies upon - paying attention
   to page modifications that occurred via the mmapping.

Reviewed by:	David Greenman
Submitted by:	John Dyson
1995-05-21 21:39:31 +00:00
dg
d1d32f6e26 Increased ratio of allowed vnodes on freelist to 1/4th of the total. This
is more representative of worst case situations of 4 files/directory. (If
that last sentence doesn't make any sense, I'm not surprised. It's rather
compilcated how this all fits together....).
This should fix a problem that Ed Hudson has been complaining about where
directories with lots of symlinks could cause excessive disk I/O.
1995-05-12 04:24:53 +00:00
dg
7010752498 Changed #ifdef around printlockedvnodes() from DEBUG to DDB. 1995-04-16 11:33:33 +00:00
dg
78a655a046 Changes from John Dyson and myself:
Fixed remaining known bugs in the buffer IO and VM system.

vfs_bio.c:
Fixed some race conditions and locking bugs. Improved performance
by removing some (now) unnecessary code and fixing some broken
logic.
Fixed process accounting of # of FS outputs.
Properly handle NFS interrupts (B_EINTR).

(various)
Replaced calls to clrbuf() with calls to an optimized routine
call vfs_bio_clrbuf().

(various FS sync)
Sync out modified vnode_pager backed pages.

ffs_vnops.c:
Do two passes: Sync out file data first, then indirect blocks.

vm_fault.c:
Fixed deadly embrace caused by acquiring locks in the wrong order.

vnode_pager.c:
Changed to use buffer I/O system for writing out modified pages. This
should fix the problem with the modification date previous not getting
updated. Also dramatically simplifies the code. Note that this is
going to change in the future and be implemented via VOP_PUTPAGES().

vm_object.c:
Fixed a pile of bugs related to cleaning (vnode) objects. The performance
of vm_object_page_clean() is terrible when dealing with huge objects,
but this will change when we implement a binary tree to keep the object
pages sorted.

vm_pageout.c:
Fixed broken clustering of pageouts. Fixed race conditions and other
lockup style bugs in the scanning of pages. Improved performance.
1995-04-09 06:02:46 +00:00
dg
4c94a22884 Fixed vinvalbuf() to work like NFS wants it to. The previous code wouldn't
flush pages in the vm object if V_SAVE was true.
1995-03-21 01:13:16 +00:00
dg
b9e7140c7c Don't gain/lose a reference to the object when yanking its pages in
vinvalbuf()...it will cause vnode locking problems in vm_object_terminate,
and isn't necessary anyway.
1995-03-20 10:19:09 +00:00
dg
cfecdfd362 Don't attempt to sync pages in the V_SAVE case of vinvalbuf; doing so can
lead to a deadlock. Just let the VM system deal with it.
1995-03-20 02:08:24 +00:00
bde
289f11acb4 Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'.  Fix all the bugs found.  There were no serious
ones.
1995-03-16 18:17:34 +00:00
dg
12b812885a Added a comment. 1995-03-11 22:29:07 +00:00
dg
4e060d1941 Reorganized an if() expression for efficiency. 1995-03-10 21:18:24 +00:00
phk
5ac4500e79 Clean up and improve the namecache.
1. We always keep one 16th of the vnodes on the freelist, so that the
namecache doesn't get trashed.  It used to be that it wasn't a problem, but
the only vnodes getting released these days are directories and things which
Clean up and improve the namecache.

1. We always keep one 16th of the vnodes on the freelist, so that the
namecache doesn't get trashed.  It used to be that it wasn't a problem, but
the only vnodes getting released these days are directories and things which
gets forced out of the VM/cache.  The latter is not numerous enough to keep
the pool of vnodes needed for the namecache sufficiently big.

2. Purge invalid entries in the namecache as soon as we notice them.  This
avoids a stale entry pushing out a valid entry on the LRU list.

3. Speed up the lookup in the namecache by avoid a special case branch.

4. Make the cache purge routines do the thing they're supposed to, and in
a decently efficient manner.

5. Make the size of the namecache follow the number of vnodes, so that we
can always point to all the vnodes we have in core.

6. Readability has gone way up.

7. Added a "options NCH_STATISTICS" feature that will gather more
detailed statistics on the performance of the namecache.

Reviewed by:    davidg

(cvs is dumping core on me :-(  )
1995-03-09 20:27:04 +00:00
dg
652f33dd01 Put VAGE vnodes at the head of the free list. 1995-03-07 18:59:45 +00:00
dg
fab8fbea37 Backed out previous change. I forgot (for about the fourth time) that
v_rdev is a #define which is dereferenced through v_specinfo->si_rdev,
and that isn't initialized until later in checkalias().
1995-02-27 10:15:38 +00:00
dg
2061c5c918 Initialize v_rdev in getnewvnode() - it appears that some filesystems
may not properly initialize this field in all cases, and this would
result in very anti-social behavior (overwriting on some other random
device/location).

Submitted by:	John Dyson
1995-02-27 06:50:08 +00:00
dg
a3e510cad3 vfs_cluster.c:
Various more tweaks from John Dyson to improve read ahead calculations.

vfs_subr.c:
Only wakeup if numoutput is 0 in vwakeup().

Submitted by:	John Dyson
1995-02-22 09:39:22 +00:00
dg
6491ec70c9 Fixed some formatting weirdness that I overlooked in the previous commit. 1995-01-10 07:32:52 +00:00
dg
1707d41102 These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.

The majority of the merged VM/cache work is by John Dyson.

The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.

vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme.  The scheme is almost fully compatible with the old filesystem
interface.  Significant improvement in the number of opportunities for write
clustering.

vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache.  Fixup of vfs_cluster to eliminate the bogus pagemove stuff.

vm_object.c:
Yet more improvements in the collapse code.  Elimination of some windows that
can cause list corruption.

vm_pageout.c:
Fixed it, it really works better now.  Somehow in 2.0, some "enhancements"
broke the code.  This code has been reworked from the ground-up.

vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.

pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.

vm_glue.c
Much simpler and more effective swapping code.  No more gratuitous swapping.

proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.

swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency.  Now the
code doesn't need it anymore.

machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.

machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.

ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache.  Add "bypass" support for sneaking in on
busy buffers.

Submitted by:	John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
dg
cf154a2134 Protect vnode buffer chain manipulation with splbio to prevent list
corruption..
1994-12-23 04:52:55 +00:00
dg
2add6128e2 Use tsleep() rather than sleep so that 'ps' is more informative about
the wait.
1994-10-06 21:07:04 +00:00
dg
52b4dc9c30 Stuff object into v_vmdata rather than pager. Not important which at
the moment, but will be in the future. Other changes mostly cosmetic,
but are made for future VMIO considerations.

Submitted by:	John Dyson
1994-10-05 09:48:45 +00:00
phk
c3e4945541 All of this is cosmetic. prototypes, #includes, printfs and so on. Makes
GCC a lot more silent.
1994-10-02 17:35:40 +00:00
phk
f73f358983 While in the real world, I had a bad case of being swapped out for a lot of
cycles.  While waiting there I added a lot of the extra ()'s I have, (I have
never used LISP to any extent).  So I compiled the kernel with -Wall and
shut up a lot of "suggest you add ()'s", removed a bunch of unused var's
and added a couple of declarations here and there.  Having a lap-top is
highly recommended.  My kernel still runs, yell at me if you kernel breaks.
1994-09-25 19:34:02 +00:00