Commit Graph

54 Commits

Author SHA1 Message Date
julian
ebb726ec45 Reviewed by: julian with quick glances by bruce and others
Submitted by:	terry (terry lambert)
This is  a composite of 3 patch sets submitted by terry.
they are:
New low-level init code that supports loadbal modules better
some cleanups in the namei code to help terry in 16-bit character support
some changes to the mount-root code to make it a little more
modular..

NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able
to test those cases..

certainly mounting root of disk still works just fine..
mfs should work but is untested. (tomorrows task)

The low level init stuff includes a total rewrite of init_main.c
to make it possible for new modules to have an init phase by simply
adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can
be added to the kernel without editing any other files other than the
'files' file.
1995-08-28 09:19:25 +00:00
bde
0bdfde4001 Don't call VOP_UPDATE() with volatile timestamps. 1995-08-25 19:40:32 +00:00
dg
b6d06a9f5d Honor -async mount option when doing the inode update.
Obtained from:	4.4BSD-Lite2
1995-08-16 13:16:58 +00:00
dg
5b4b270015 Converted mountlist to a CIRCLEQ.
Partially obtained from: 4.4BSD-Lite2
1995-08-11 11:31:18 +00:00
dg
20100f1812 Use bdwrite() rather than brelse(). The cylinder group bitmap modification
is not preserved otherwise.
Note that this is a no-op in FreeBSD, however, as we have doreallocblks
disabled.

Submitted by:	Kirk McKusick
1995-08-07 08:16:32 +00:00
dg
cf5a823616 Removed redundant call to vm_object_page_clean - this is already done
in vfs_msync().
1995-08-06 11:56:42 +00:00
dg
060942c07e Use the correct flags (IO_SYNC -> B_SYNC) when deciding to do a sync or
async write in the section that changes the filesize. The bug resulted
in the updates always being async.

Obtained from:	4.4BSD-Lite2
1995-08-04 05:49:17 +00:00
dg
5d7eb9210c Since ufs_ihashget can block, the lock must be checked for each time
the function returns. Also, moved lock into .bss and made minor cosmetic
changes.

Submitted by:	Bruce Evans
1995-07-21 16:20:20 +00:00
dg
ab7b1f1cbf Implement a lock in ffs_vget to prevent a race condition where two processes
try allocate the same inode/vnode, causing a duplicate.

Submitted by:	Matt Dillon, slightly reworked by me.
1995-07-21 03:52:40 +00:00
dg
c8b0a7332c NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!

Much needed overhaul of the VM system. Included in this first round of
changes:

1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
   haspage, and sync operations are supported. The haspage interface now
   provides information about clusterability. All pager routines now take
   struct vm_object's instead of "pagers".

2) Improved data structures. In the previous paradigm, there is constant
   confusion caused by pagers being both a data structure ("allocate a
   pager") and a collection of routines. The idea of a pager structure has
   escentially been eliminated. Objects now have types, and this type is
   used to index the appropriate pager. In most cases, items in the pager
   structure were duplicated in the object data structure and thus were
   unnecessary. In the few cases that remained, a un_pager structure union
   was created in the object to contain these items.

3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
   be removed. For instance, vm_object_enter(), vm_object_lookup(),
   vm_object_remove(), and the associated object hash list were some of the
   things that were removed.

4) simple_lock's removed. Discussion with several people reveals that the
   SMP locking primitives used in the VM system aren't likely the mechanism
   that we'll be adopting. Even if it were, the locking that was in the code
   was very inadequate and would have to be mostly re-done anyway. The
   locking in a uni-processor kernel was a no-op but went a long way toward
   making the code difficult to read and debug.

5) Places that attempted to kludge-up the fact that we don't have kernel
   thread support have been fixed to reflect the reality that we are really
   dealing with processes, not threads. The VM system didn't have complete
   thread support, so the comments and mis-named routines were just wrong.
   We now use tsleep and wakeup directly in the lock routines, for instance.

6) Where appropriate, the pagers have been improved, especially in the
   pager_alloc routines. Most of the pager_allocs have been rewritten and
   are now faster and easier to maintain.

7) The pagedaemon pageout clustering algorithm has been rewritten and
   now tries harder to output an even number of pages before and after
   the requested page. This is sort of the reverse of the ideal pagein
   algorithm and should provide better overall performance.

8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
   have been removed. Some other unnecessary casts have also been removed.

9) Some almost useless debugging code removed.

10) Terminology of shadow objects vs. backing objects straightened out.
    The fact that the vm_object data structure escentially had this
    backwards really confused things. The use of "shadow" and "backing
    object" throughout the code is now internally consistent and correct
    in the Mach terminology.

11) Several minor bug fixes, including one in the vm daemon that caused
    0 RSS objects to not get purged as intended.

12) A "default pager" has now been created which cleans up the transition
    of objects to the "swap" type. The previous checks throughout the code
    for swp->pg_data != NULL were really ugly. This change also provides
    the rudiments for future backing of "anonymous" memory by something
    other than the swap pager (via the vnode pager, for example), and it
    allows the decision about which of these pagers to use to be made
    dynamically (although will need some additional decision code to do
    this, of course).

13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
    object" code has been removed. MAP_COPY was undocumented and non-
    standard. It was furthermore broken in several ways which caused its
    behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
    continue to work correctly, but via the slightly different semantics
    of MAP_PRIVATE.

14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
    threads design can be worked around in other ways. Both #12 and #13
    were done to simplify the code and improve readability and maintain-
    ability. (As were most all of these changes)

TODO:

1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
   this will reduce the vnode pager to a mere fraction of its current size.

2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
   information provided by the new haspage pager interface. This will
   substantially reduce the overhead by eliminating a large number of
   VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
   improved to provide both a "behind" and "ahead" indication of
   contiguousness.

3) Implement the extended features of pager_haspage in swap_pager_haspage().
   It currently just says 0 pages ahead/behind.

4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
   via a much more general mechanism that could also be used for disk
   striping of regular filesystems.

5) Do something to improve the architecture of vm_object_collapse(). The
   fact that it makes calls into the swap pager and knows too much about
   how the swap pager operates really bothers me. It also doesn't allow
   for collapsing of non-swap pager objects ("unnamed" objects backed by
   other pagers).
1995-07-13 08:48:48 +00:00
dg
3c7c1dd62f 1) Converted v_vmdata to v_object.
2) Removed unnecessary vm_object_lookup()/pager_cache(object, TRUE) pairs
   after vnode_pager_alloc() calls - the object is already guaranteed to be
   persistent.
3) Removed some gratuitous casts.
1995-06-28 12:01:13 +00:00
rgrimes
c86f0c7a71 Remove trailing whitespace. 1995-05-30 08:16:23 +00:00
dg
9e52c97c63 Kill bogus vnode_pager_setsize(). It was being called at the wrong time
and resulted in the object size being too small. This caused bad things
to happen later when the file was mapped.

Reviewed by:	John Dyson
1995-05-28 04:32:23 +00:00
dg
2045200a00 Changes to fix the following bugs:
1) Files weren't properly synced on filesystems other than UFS. In some
   cases, this lead to lost data. Most likely would be noticed on NFS.
   The fix is to make the VM page sync/object_clean general rather than
   in each filesystem.
2) Mixing regular and mmaped file I/O on NFS was very broken. It caused
   chunks of files to end up as zeroes rather than the intended contents.
   The fix was to fix several race conditions and to kludge up the
   "b_dirtyoff" and "b_dirtyend" that NFS relies upon - paying attention
   to page modifications that occurred via the mmapping.

Reviewed by:	David Greenman
Submitted by:	John Dyson
1995-05-21 21:39:31 +00:00
dg
240701b33f NFS diskless operation was broken because swapdev_vp wasn't initialized.
These changes solve the problem in a general way by moving the
initialization out of the individual fs_mountroot's and into swaponvp().

Submitted by:	Poul-Henning Kamp
1995-05-19 03:27:08 +00:00
dg
138edd5273 Fixed incompleteness that would allow dirty filesystems to get mounted
when the single user shell was terminated. These changes disallow mounting
or R/W upgrading filesystems that are dirty unless "-f" (force) option
is used with mount. /etc/rc has been modified to abort the startup if
one or more non-nfs partitions fail to mount.

Reviewed by:	Poul-Henning Kamp, Rod Grimes
1995-05-15 08:39:37 +00:00
rgrimes
0e1db07cf9 Fix -Wformat warnings from LINT kernel. 1995-05-11 19:26:53 +00:00
dyson
7972592fe7 Limit filesize to the amount that the VM system can currently handle
(2GB).  If this limit is not imposed, then filesystem corruption will
ensue when files larger than 2GB are created.  This is temporary,
and the underlying limitation will be removed later.
1995-05-01 23:20:24 +00:00
dg
c1c54df7da Handle the "syncing VCHR vnode hang" problem a little differently; just
don't lock the vnode - it doesn't appear to ever be necessary for VCHR
vnode/inodes. This fixes a bug introduced in the previous commit that
caused tty timestamps to act strange (causing 'w' and 'finger' to show
the tty wasn't idle when it may have been for hours).
1995-04-11 04:23:47 +00:00
dg
b804a53282 Changes from John Dyson and myself:
Fixed remaining known bugs in the buffer IO and VM system.

vfs_bio.c:
Fixed some race conditions and locking bugs. Improved performance
by removing some (now) unnecessary code and fixing some broken
logic.
Fixed process accounting of # of FS outputs.
Properly handle NFS interrupts (B_EINTR).

(various)
Replaced calls to clrbuf() with calls to an optimized routine
called vfs_bio_clrbuf().

(various FS sync)
Sync out modified vnode_pager backed pages.

ffs_vnops.c:
Do two passes: Sync out file data first, then indirect blocks.

vm_fault.c:
Fixed deadly embrace caused by acquiring locks in the wrong order.

vnode_pager.c:
Changed to use buffer I/O system for writing out modified pages. This
should fix the problem with the modification date previous not getting
updated. Also dramatically simplifies the code. Note that this is
going to change in the future and be implemented via VOP_PUTPAGES().

vm_object.c:
Fixed a pile of bugs related to cleaning (vnode) objects. The performance
of vm_object_page_clean() is terrible when dealing with huge objects,
but this will change when we implement a binary tree to keep the object
pages sorted.

vm_pageout.c:
Fixed broken clustering of pageouts. Fixed race conditions and other
lockup style bugs in the scanning of pages. Improved performance.
1995-04-09 06:03:56 +00:00
bde
4f64fe43e7 Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) that I didn't notice when I fixed
"all" such warnings before.
1995-03-28 07:58:53 +00:00
dg
1320869559 Removed third arg (vmio) to allocbuf() that was added with the original
merged cache changes, and figure it out based on the B_VMIO buffer flag.
Fixes a problem where delayed write VMIO buffers would sometimes get
recopied into kernel-alloced memory.

Submitted by:	John Dyson
1995-03-26 23:29:13 +00:00
dg
9cd78521d8 Removed redundant newlines that were in some panic strings. 1995-03-19 14:29:26 +00:00
dg
e38e0cc286 Don't sync the inode date changes of character special devices
during the FS sync. The system would appear to hang momentarily
if there was a large backlog of I/O. This is because the vnode
remains locked during the output - preventing normal character
I/O. The problem was exacerbated by the FFS contiguous block
allocation fixes and a semi-broken disksort(). The inode/date
will still be synced during a normal FS dismount and whenever
the inode is changed for other reasons.
1995-03-18 18:03:29 +00:00
bde
289f11acb4 Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'.  Fix all the bugs found.  There were no serious
ones.
1995-03-16 18:17:34 +00:00
dg
c4a2f8db8d Increased default minfree to 8%. 1995-03-10 22:18:16 +00:00
dg
251a107ca6 The threshold for switching from time-space and space-time is too small
when minfree is 5%...so make it stay at space in this case.

Submitted by:	Kirk McKusick
1995-03-10 22:11:50 +00:00
dg
e4e3f30221 Removed obsolete vtrace() remnants. 1995-03-04 03:24:45 +00:00
dg
9dc7842c25 Fixes from John Dyson to work around vnode lock hang. Basically, remove
the VOP_BMAP calls, and add one to bdwrite.

Submitted by:	John Dyson
1995-03-03 22:13:16 +00:00
se
912b81cc5e Don't try to make use of useless rotational position optimisation,
if all free blocks are in the same bucket (i.e. NRPOS == 1).
Else a free block is choosen, possibly from a different cylinder,
even if the block succeeding bpref was free ...

Submitted by:	se
1995-02-27 17:43:57 +00:00
phk
49f19f3584 YF fix. 1995-02-14 06:14:28 +00:00
dg
1707d41102 These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.

The majority of the merged VM/cache work is by John Dyson.

The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.

vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme.  The scheme is almost fully compatible with the old filesystem
interface.  Significant improvement in the number of opportunities for write
clustering.

vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache.  Fixup of vfs_cluster to eliminate the bogus pagemove stuff.

vm_object.c:
Yet more improvements in the collapse code.  Elimination of some windows that
can cause list corruption.

vm_pageout.c:
Fixed it, it really works better now.  Somehow in 2.0, some "enhancements"
broke the code.  This code has been reworked from the ground-up.

vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.

pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.

vm_glue.c
Much simpler and more effective swapping code.  No more gratuitous swapping.

proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.

swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency.  Now the
code doesn't need it anymore.

machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.

machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.

ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache.  Add "bypass" support for sneaking in on
busy buffers.

Submitted by:	John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
bde
6eba657693 Use the same current time throughout ffs_update().
Update some macro names in comments.

Don't use MNT_WAIT for something not related to mounting.
1994-12-27 14:44:42 +00:00
bde
f2dfc1d8f0 Undo a previous change. <sys/disklabel.h> was broken, not these files. 1994-11-14 13:22:52 +00:00
jkh
076633a669 From: fredriks@mcs.com (Lars Fredriksen)
...
It turns out that these files do not include <sys/dkbad.h> before
<sys/disklabel.h>.
Submitted by:	fredriks
1994-10-28 12:42:05 +00:00
dg
dd9d06bd3f Restrict fs_maxfilesize to 2^40, and check against this in ffs_truncate().
This is part of a bug fix from Kirk McKusick to work around problems in FFS
related to the blkno of a 64bit offset not fitting into an int. Note the
proper solution would be to deal with 64bit block numbers, but doing this
would require sweeping changes; some other day perhaps.

Submitted by:	Marshall Kirk McKusick
1994-10-22 02:27:35 +00:00
phk
ffd776502f Cosmetics. make gcc less noisy. Still some way to go here. 1994-10-10 01:04:55 +00:00
phk
ee22b0c649 Cosmetics for gcc -Wall. A couple of unused "int i"'s removed and a couple of
prototypes added.  And the usual () work.
1994-10-08 06:20:06 +00:00
dg
2add6128e2 Use tsleep() rather than sleep so that 'ps' is more informative about
the wait.
1994-10-06 21:07:04 +00:00
wollman
900d29807d More loadable VFS changes:
- Make a number of filesystems work again when they are statically compiled
  (blush)

- FIFOs are no longer optional; ``options FIFO'' removed from distributed
  config files.
1994-09-22 19:38:41 +00:00
wollman
d27339a8c6 Call ffs ``ufs'' for the benefit of poor, confused user-land programs. 1994-09-22 01:57:27 +00:00
wollman
c289ac89a1 Implemented loadable VFS modules, and made most existing filesystems
loadable.  (NFS is a notable exception.)
1994-09-21 03:47:43 +00:00
bde
55c056a942 Use `1' for a boolean value instead of something irrelevant (MNT_WAIT)
that happens to be nonzero.
1994-09-20 05:53:24 +00:00
dg
08512a3127 panic if length is < 0 in ffs_truncate(). 1994-09-02 10:24:55 +00:00
dg
f15c464e97 "bogus" fixes from 1.1.5 to work around some cache coherency problems. 1994-08-29 06:09:15 +00:00
paul
bcc18c44d8 Made idempotent
Reviewed by:
Submitted by:
1994-08-21 07:03:56 +00:00
dg
f817326b2e Implemented filesystem clean bit via:
machdep.c:
	Changed printf's a little and call vfs_unmountall() if the sync was
	successful.

cd9660_vfsops.c, ffs_vfsops.c, nfs_vfsops.c, lfs_vfsops.c:
	Allow dismount of root FS. It is now disallowed at a higher level.

vfs_conf.c:
	Removed unused rootfs global.

vfs_subr.c:
	Added new routines vfs_unmountall and vfs_unmountroot. Filesystems
	are now dismounted if the machine is properly rebooted.

ffs_vfsops.c:
	Toggle clean bit at the appropriate places. Print warning if an
	unclean FS is mounted.

ffs_vfsops.c, lfs_vfsops.c:
	Fix bug in selecting proper flags for VOP_CLOSE().

vfs_syscalls.c:
	Disallow dismounting root FS via umount syscall.
1994-08-20 16:03:26 +00:00
wollman
f9fc827448 Fix up some sloppy coding practices:
- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
  header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.

NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.
1994-08-18 22:36:09 +00:00
dg
8c8cfc5c11 Made lockf advisory locking code generic (rather than ufs specific), and
use it in NFS. This is required both for diskless support and for POSIX
compliance. Note: the support in NFS is only for the local node.

Submitted by:	based on work originally done by Yuval Yurom
1994-08-08 17:31:01 +00:00
dg
9de51a06f5 Changed occurrances of "itrunc" to "ffs_truncate" to make Bruce happy. 1994-08-03 08:19:35 +00:00