Commit Graph

374 Commits

Author SHA1 Message Date
Bruce Evans
9399d2c5ad Null change. Forgot to mention in previous log message that MNT_NOATIME
is now ignored for special files, so that mounting root with option
noatime doesn't break reporting of idle times in programs like `w'.
The problem of execessive disk updates just to stamp atimes will be
handled for special files by only writing atimes to disk when inodes
become active.  This works well because special files are relatively
uncommon and their atimes are even more disposable at panic time than
regular files' atimes.
1998-06-07 11:04:26 +00:00
Bruce Evans
12f66dd32f Fixed some longstanding timestamp bugs:
1. mark atimes and mtimes of special files and fifos for update upon
   successful completion of non-null i/o, not at the beginning of the
   syscall.
2. never update file times for readonly filesystems.  They were updated
   for stats and closes but not for syncs.  The updates were of course
   only in-core and were thrown away when the inode was uncached, so
   the times sometimes appeared to go backwards.

Improved comments in code related to (1) (mostly by removing them).

Unmacroized ITIMES().  The test in (2) bloated it even more.  Don't
call getmicrotime() in the function version of it when we only need
the time in seconds.
1998-06-07 10:49:18 +00:00
Doug Rabson
8435e0aef5 Use size_t instead of u_int for sizes. 1998-06-04 17:21:39 +00:00
Doug Rabson
6589ab80a9 If the filesystem blocksize is less than the VM page size, use the generic
getpages code.  This happens for filesystems with 4k pages on the alpha since
the normal alpha pagesize is 8k.
1998-06-04 17:04:44 +00:00
Doug Rabson
58b395a905 Don't cast a pointer to an int in DQHASH. 1998-06-04 17:03:16 +00:00
Julian Elischer
00076e7cf9 Add a reference to the original softupdates paper 1998-06-02 01:30:51 +00:00
Julian Elischer
3942b533f8 Add a reference to the Ganger/Patt paper 1998-06-02 01:27:27 +00:00
Julian Elischer
b8cf4de4c8 A fix to a debug test from Kirk. 1998-05-27 03:32:23 +00:00
Julian Elischer
928c9ddf81 Ensure that there is enough information here, so that people can use
soft updates should they desire.
1998-05-19 23:18:37 +00:00
Julian Elischer
25db4e8a66 Bring up-to-date with Whistle's current version
Includes some debugging code.
1998-05-19 23:07:25 +00:00
Julian Elischer
46e752be05 Merge with Kirk's version as of Feb 20
His version 9.23 == our version 1.5 of ffs_softdep.c
His version 9.5 ==  our version 1.4 of softdep.c
1998-05-19 22:54:53 +00:00
Julian Elischer
62e12c760c Merge in Kirk's changes to stop softupdates from hogging all of memory. 1998-05-19 21:45:53 +00:00
Julian Elischer
b6dad36385 Change to stop a silly panic. This should be understood better.
Change a buffer swizzle trick to a bcopy. It would be nice if the efficient
trick could be used in the future.
1998-05-19 20:50:41 +00:00
Julian Elischer
987614a910 First published FreeBSD version of soft updates Feb 5. 1998-05-19 20:18:42 +00:00
Julian Elischer
a697eb98d4 This commit was generated by cvs2svn to compensate for changes in r36206,
which included commits to RCS files with non-trunk default branches.
1998-05-19 20:03:29 +00:00
Julian Elischer
8e95b94dec Import the next version received from kirk after some
FreeBSD feedback.
1998-05-19 20:03:29 +00:00
Julian Elischer
8d1c524575 This commit was generated by cvs2svn to compensate for changes in r36201,
which included commits to RCS files with non-trunk default branches.
1998-05-19 19:47:22 +00:00
Julian Elischer
467e1a6e7a Import the earliest version of the soft update code that I have. 1998-05-19 19:47:22 +00:00
Julian Elischer
c11d29814e try stop the user from using mount -u to set the async flag on
a filesystem currently using soft updates.
Also needs a new copy of ffs_softdep.c to complete the fix.
1998-05-18 06:38:18 +00:00
Poul-Henning Kamp
c21410e119 s/nanoruntime/nanouptime/g
s/microruntime/microuptime/g

Reviewed by:	bde
1998-05-17 11:53:46 +00:00
Julian Elischer
5d0957193a Add missing splx()
Submitted by: Luoqi Chen <luoqi@chen.ml.org>
1998-05-11 21:41:13 +00:00
Julian Elischer
336c78bb90 Submitted by: abial@nask.pl
Minor fix to support SLICE in MFS...
1998-05-11 19:27:18 +00:00
Mike Smith
7be2d30077 In the words of the submitter:
---------
Make callers of namei() responsible for releasing references or locks
instead of having the underlying filesystems do it.  This eliminates
redundancy in all terminal filesystems and makes it possible for stacked
transport layers such as umapfs or nullfs to operate correctly.

Quality testing was done with testvn, and lat_fs from the lmbench suite.

Some NFS client testing courtesy of Patrik Kudo.

vop_mknod and vop_symlink still release the returned vpp.  vop_rename
still releases 4 vnode arguments before it returns.  These remaining cases
will be corrected in the next set of patches.
---------

Submitted by:	Michael Hancock <michaelh@cet.co.jp>
1998-05-07 04:58:58 +00:00
Mike Smith
79cc756d8b As described by the submitter:
Reverse the VFS_VRELE patch.  Reference counting of vnodes does not need
to be done per-fs.  I noticed this while fixing vfs layering violations.
Doing reference counting in generic code is also the preference cited by
John Heidemann in recent discussions with him.

The implementation of alternative vnode management per-fs is still a valid
requirement for some filesystems but will be revisited sometime later,
most likely using a different framework.

Submitted by:	Michael Hancock <michaelh@cet.co.jp>
1998-05-06 05:29:41 +00:00
John Dyson
e60606c0af Correct an error that I made where the vtruncbuf was changed back
to vinvalbuf, but I incorrectly added the "V_SAVE|V_SAVEMETA" flags.
Submitted by:	Luoqi Chen <luoqi@watermarkgroup.com>
1998-05-04 17:43:48 +00:00
John Dyson
83ad4e3dbc Fix an error that I made with an optimization. In the case
of softupdates, we need to do vtruncbuf the old way.  Luoqi
caught, found the bug and submitted this fix.
Submitted by:	Luoqi Chen <luoqi@chen.ml.org>
1998-04-30 05:28:53 +00:00
Julian Elischer
c0bab11dfe Make the devfs SLICE option a standard type option.
(hopefully it will go away eventually anyhow)
1998-04-20 03:57:41 +00:00
Julian Elischer
3e425b968d Add changes and code to implement a functional DEVFS.
This code will be turned on with the TWO options
DEVFS and SLICE. (see LINT)
Two labels PRE_DEVFS_SLICE and POST_DEVFS_SLICE will deliniate these changes.

/dev will be automatically mounted by init (thanks phk)
on bootup. See /sys/dev/slice/slice.4 for more info.
All code should act the same without these options enabled.

Mike Smith, Poul Henning Kamp, Soeren, and a few dozen others

This code does not support the following:
bad144 handling.
Persistance. (My head is still hurting from the last time we discussed this)
ATAPI flopies are not handled by the SLICE code yet.

When this code is running, all major numbers are arbitrary and COULD
be dynamically assigned. (this is not done, for POLA only)
Minor numbers for disk slices ARE arbitray and dynamically assigned.
1998-04-19 23:32:49 +00:00
Dag-Erling Smørgrav
dc73342347 Seventy-odd "its" / "it's" typos in comments fixed as per kern/6108. 1998-04-17 22:37:19 +00:00
Bruce Evans
37223939f0 Fixed bitrot in the non-softdep case of ufs_dirremove():
- restored async mount support.  The first entry in a block is still
  always written synchronously, although it probably shouldn't be in
  the async case.
- restored use of BWRITE() instead of bowrite() for the DOWHITEOUT
  case, although bowrite() is probably better.

Broken by:	merge of softdep changes (rev.1.22).
Found by:	lmbench2 delete-file benchmarks.
1998-04-15 12:27:31 +00:00
Peter Wemm
a66ae6f438 Back this out, allowing users to get a fd connected to a symlink is
just too dangerous.
1998-04-06 18:18:50 +00:00
Peter Wemm
b587fd008d Don't panic if a VOP_READ() gets through on a short link, Just Do It
(because we can :-).  This means you can open a link file (or pseudo-file
in the case of short links where the data is stored in the inode rather
than disk blocks) and read the contents.
However, trap any writes from the user as it's difficult to do the right
thing in all cases.  A link may be short and the user may be trying to
extend it beyond the limit and so on.  Although.. being able to re-target
a symlink without deleting it first might have been nice.
This stuff is a bit perverse since symlink() and readlink() calls can
end up actually being implemented as read/write vnode ops.

Reviewed by: phk
1998-04-06 17:44:40 +00:00
Poul-Henning Kamp
00af9731c9 Time changes mark 2:
* Figure out UTC relative to boottime.  Four new functions provide
      time relative to boottime.

    * move "runtime" into struct proc.  This helps fix the calcru()
      problem in SMP.

    * kill mono_time.

    * add timespec{add|sub|cmp} macros to time.h.  (XXX: These may change!)

    * nanosleep, select & poll takes long sleeps one day at a time

Reviewed by:    bde
Tested by:      ache and others
1998-04-04 13:26:20 +00:00
Poul-Henning Kamp
227ee8a188 Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time.  Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by:	bde
1998-03-30 09:56:58 +00:00
Bruce Evans
08637435f2 Moved some #includes from <sys/param.h> nearer to where they are actually
used.
1998-03-28 10:33:27 +00:00
Peter Wemm
26cf9c3b75 Enable the use of soft updates on the root filesystem. Previously, the
softdep mode could only be activated on the initial mount of a filesystem
and then only if it was a read-write mount.  A 'mount -r' (as done in the
rootfs mount) followed by a 'mount -u' to convert to read-write didn't
start softdep mode.
1998-03-27 14:20:57 +00:00
Poul-Henning Kamp
a0502b19d4 Add two new functions, get{micro|nano}time.
They are atomic, but return in essence what is in the "time" variable.
gettime() is now a macro front for getmicrotime().

Various patches to use the two new functions instead of the various
hacks used in their absence.

Some puntuation and grammer patches from Bruce.

A couple of XXX comments.
1998-03-26 20:54:05 +00:00
Bruce Evans
3d2d6cc3d8 Forward declare even more structs to restore some self-sufficiency.
Didn't fix new dependence on <ufs/ufs/inode.h> and its prerequisites.
1998-03-23 14:12:37 +00:00
John Dyson
9eebcfcf8c Softdep_sync_metadata appears to expect that it is called at splbio,
so make it so...
1998-03-21 05:16:09 +00:00
John Dyson
34f72be5af Fix vfs_bio_awrite usage, and correct vtruncbuf usage. 1998-03-19 22:49:44 +00:00
John Dyson
bef608bd7e Some VM improvements, including elimination of alot of Sig-11
problems.  Tor Egge and others have helped with various VM bugs
lately, but don't blame him -- blame me!!!

pmap.c:
1)	Create an object for kernel page table allocations.  This
	fixes a bogus allocation method previously used for such, by
	grabbing pages from the kernel object, using bogus pindexes.
	(This was a code cleanup, and perhaps a minor system stability
	 issue.)

pmap.c:
2)	Pre-set the modify and accessed bits when prudent.  This will
	decrease bus traffic under certain circumstances.

vfs_bio.c, vfs_cluster.c:
3)	Rather than calculating the beginning virtual byte offset
	multiple times, stick the offset into the buffer header, so
	that the calculated offset can be reused.  (Long long multiplies
	are often expensive, and this is a probably unmeasurable performance
	improvement, and code cleanup.)

vfs_bio.c:
4)	Handle write recursion more intelligently (but not perfectly) so
	that it is less likely to cause a system panic, and is also
	much more robust.

vfs_bio.c:
5)	getblk incorrectly wrote out blocks that are incorrectly sized.
	The problem is fixed, and writes blocks out ONLY when B_DELWRI
	is true.

vfs_bio.c:
6)	Check that already constituted buffers have fully valid pages.  If
	not, then make sure that the B_CACHE bit is not set. (This was
	a major source of Sig-11 type problems.)

vfs_bio.c:
7)	Fix a potential system deadlock due to an incorrectly specified
	sleep priority while waiting for a buffer write operation.  The
	change that I made opens the system up to serious problems, and
	we need to examine the issue of process sleep priorities.

vfs_cluster.c, vfs_bio.c:
8)	Make clustered reads work more correctly (and more completely)
	when buffers are already constituted, but not fully valid.
	(This was another system reliability issue.)

vfs_subr.c, ffs_inode.c:
9)	Create a vtruncbuf function, which is used by filesystems that
	can truncate files.  The vinvalbuf forced a file sync type operation,
	while vtruncbuf only invalidates the buffers past the new end of file,
	and also invalidates the appropriate pages.  (This was a system reliabiliy
	and performance issue.)

10)	Modify FFS to use vtruncbuf.

vm_object.c:
11)	Make the object rundown mechanism for OBJT_VNODE type objects work
	more correctly.  Included in that fix, create pager entries for
	the OBJT_DEAD pager type, so that paging requests that might slip
	in during race conditions are properly handled.  (This was a system
	reliability issue.)

vm_page.c:
12)	Make some of the page validation routines be a little less picky
	about arguments passed to them.  Also, support page invalidation
	change the object generation count so that we handle generation
	counts a little more robustly.

vm_pageout.c:
13)	Further reduce pageout daemon activity when the system doesn't
	need help from it.  There should be no additional performance
	decrease even when the pageout daemon is running.  (This was
	a significant performance issue.)

vnode_pager.c:
14)	Teach the vnode pager to handle race conditions during vnode
	deallocations.
1998-03-16 01:56:03 +00:00
John Dyson
31bba5f966 Correct a problem with the ffs_getpages routine that manifest's itself
during the tail command.  The amount to read is incorrectly calculated.
Submitted by:	Tor Egge
1998-03-09 22:12:52 +00:00
Julian Elischer
b1897c197c Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman)
Submitted by:	Kirk McKusick (mcKusick@mckusick.com)
Obtained from:  WHistle development tree
1998-03-08 09:59:44 +00:00
Julian Elischer
2cbcee772b Submitted by: kirk McKusick
Stub file for soft updates.
1998-03-08 08:38:41 +00:00
John Dyson
8f9110f6a1 This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code.  These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances.  Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code.  This code might have been committed seperately, but
almost everything is interrelated.

1)	Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
	are fully valid.
2)	Rather than deactivating erroneously read initial (header) pages in
	kern_exec, we now free them.
3)	Fix the rundown of non-VMIO buffers that are in an inconsistent
	(missing vp) state.
4)	Fix the disassociation of pages from buffers in brelse.  The previous
	code had rotted and was faulty in a couple of important circumstances.
5)	Remove a gratuitious buffer wakeup in vfs_vmio_release.
6)	Remove a crufty and currently unused cluster mechanism for VBLK
	files in vfs_bio_awrite.  When the code is functional, I'll add back
	a cleaner version.
7)	The page busy count wakeups assocated with the buffer cache usage were
	incorrectly cleaned up in a previous commit by me.  Revert to the
	original, correct version, but with a cleaner implementation.
8)	The cluster read code now tries to keep data associated with buffers
	more aggressively (without breaking the heuristics) when it is presumed
	that the read data (buffers) will be soon needed.
9)	Change to filesystem lockmgr locks so that they use LK_NOPAUSE.  The
	delay loop waiting is not useful for filesystem locks, due to the
	length of the time intervals.
10)	Correct and clean-up spec_getpages.
11)	Implement a fully functional nfs_getpages, nfs_putpages.
12)	Fix nfs_write so that modifications are coherent with the NFS data on
	the server disk (at least as well as NFS seems to allow.)
13)	Properly support MS_INVALIDATE on NFS.
14)	Properly pass down MS_INVALIDATE to lower levels of the VM code from
	vm_map_clean.
15)	Better support the notion of pages being busy but valid, so that
	fewer in-transit waits occur.  (use p->busy more for pageouts instead
	of PG_BUSY.)  Since the page is fully valid, it is still usable for
	reads.
16)	It is possible (in error) for cached pages to be busy.  Make the
	page allocation code handle that case correctly.  (It should probably
	be a printf or panic, but I want the system to handle coding errors
	robustly.  I'll probably add a printf.)
17)	Correct the design and usage of vm_page_sleep.  It didn't handle
	consistancy problems very well, so make the design a little less
	lofty.  After vm_page_sleep, if it ever blocked, it is still important
	to relookup the page (if the object generation count changed), and
	verify it's status (always.)
18)	In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19)	Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20)	Fix vm_pager_put_pages and it's descendents to support an int flag
	instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
Bruce Evans
16337c2efb Fixed missing simple_lock() in ffs_mountfs(). 1998-03-07 14:59:44 +00:00
Mike Smith
34bdbbd0de The intent is to get rid of WILLRELE in vnode_if.src by making
a complement to all ops that return a vpp, VFS_VRELE.  This is
initially only for file systems that implement the following ops
that do a WILLRELE:

	vop_create, vop_whiteout, vop_mknod, vop_remove, vop_link,
	vop_rename, vop_mkdir, vop_rmdir, vop_symlink

This is initial DNA that doesn't do anything yet.  VFS_VRELE is
implemented but not called.

A default vfs_vrele was created for fs implementations that use the
standard vnode management routines.

VFS_VRELE implementations were made for the following file systems:

Standard (vfs_vrele)
	ffs mfs nfs msdosfs devfs ext2fs

Custom
	union umapfs

Just EOPNOTSUPP
	fdesc procfs kernfs portal cd9660

These implementations may change as VOP changes are implemented.

In the next phase, in the vop implementations calls to vrele and the vrele
part of vput will be moved to the top layer vfs_vnops and made visible
to all layers.  vput will be replaced by unlock in these cases.  Unlocking
will still be done in the per fs layer but the refcount decrement will be
triggered at the top because it doesn't hurt to hold a vnode reference a
little longer.  This will have minimal impact on the structure of the
existing code.

This will only be done for vnode arguments that are released by the various
fs vop implementations.

Wider use of VFS_VRELE will likely require restructuring of the code.

Reviewed by:	phk, dyson, terry et. al.
Submitted by:	Michael Hancock <michaelh@cet.co.jp>
1998-03-01 22:46:53 +00:00
Mike Smith
ce75f2c365 In the author's words:
These diffs implement the first stage of a VOP_{GET|PUT}PAGES pushdown
for local media FS's.

See ffs_putpages in /sys/ufs/ufs/ufs_readwrite.c for implementation
details for generic *_{get|put}pages for local media FS's.  Support
is trivial to add for any FS that formerly relied on the default
behaviour of the vnode_pager in in EOPNOTSUPP cases (just copy the
ffs_getpages() code for the FS in question's *_{get|put}pages).

Obviously, it would be better if each local media FS implemented a
more optimal method, instead of calling an exported interface from
the /sys/vm/vnode_pager.c, but this is a necessary first step in
getting the FS's to a point where they can be supplied with better
implementations on a case-by-case basis.

Obviously, the cd9660_putpages() can be rather trivial (since it
is a read-only FS type 8-)).

A slight (temporary) modification is made to print a diagnostic message
in the case where the underlying filesystem attempts to engage in the
previous behaviour.  Failure is likely to be ungraceful.

Submitted by:	terry@freebsd.org (Terry Lambert)
1998-02-26 06:39:59 +00:00
Bruce Evans
c9b9921363 Fixed missing permissions checking for mounting by non-root.
There is now less need for the vfs.usermount sysctl.  msdosfs already
has this change, modulo a missing LK_RETRY, via NetBSD.  At least
ext2fs is missing this and many other changes from Lite2.

Obtained from:	Lite2
1998-02-25 04:47:04 +00:00
Bruce Evans
d68fa50ccb Don't depend on "implicit int". 1998-02-20 13:37:40 +00:00