Commit Graph

377 Commits

Author SHA1 Message Date
julian
c16287759e Remove buggy debugging code. 1998-06-10 20:03:16 +00:00
julian
10bd7912f6 Back out John's changes 1.45 -> 1.46
Kirk confirms that the original semantic was what he wanted...
(well, a very slight difference)
May fix "dangling deps" panic with soft updates.
1998-06-10 19:27:56 +00:00
julian
d2a5a2e4b3 The version of the softdep changes in FreeBSD broke the
(doingdirectory && !newparent) case of ufs_rename().
rename("D1/X/", "D2/Y/") gives a wrong link count for D2.

Submitted by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Kirk McKusick <mckusick@McKusick.COM>
1998-06-08 23:55:33 +00:00
bde
60609ce6f9 Null change. Forgot to mention in previous log message that MNT_NOATIME
is now ignored for special files, so that mounting root with option
noatime doesn't break reporting of idle times in programs like `w'.
The problem of execessive disk updates just to stamp atimes will be
handled for special files by only writing atimes to disk when inodes
become active.  This works well because special files are relatively
uncommon and their atimes are even more disposable at panic time than
regular files' atimes.
1998-06-07 11:04:26 +00:00
bde
e7c876e047 Fixed some longstanding timestamp bugs:
1. mark atimes and mtimes of special files and fifos for update upon
   successful completion of non-null i/o, not at the beginning of the
   syscall.
2. never update file times for readonly filesystems.  They were updated
   for stats and closes but not for syncs.  The updates were of course
   only in-core and were thrown away when the inode was uncached, so
   the times sometimes appeared to go backwards.

Improved comments in code related to (1) (mostly by removing them).

Unmacroized ITIMES().  The test in (2) bloated it even more.  Don't
call getmicrotime() in the function version of it when we only need
the time in seconds.
1998-06-07 10:49:18 +00:00
dfr
2c20b9f8a3 Use size_t instead of u_int for sizes. 1998-06-04 17:21:39 +00:00
dfr
0d64b93547 If the filesystem blocksize is less than the VM page size, use the generic
getpages code.  This happens for filesystems with 4k pages on the alpha since
the normal alpha pagesize is 8k.
1998-06-04 17:04:44 +00:00
dfr
0f6db158a2 Don't cast a pointer to an int in DQHASH. 1998-06-04 17:03:16 +00:00
julian
1daff3f91f Add a reference to the original softupdates paper 1998-06-02 01:30:51 +00:00
julian
231adce91d Add a reference to the Ganger/Patt paper 1998-06-02 01:27:27 +00:00
julian
6cb609db70 A fix to a debug test from Kirk. 1998-05-27 03:32:23 +00:00
julian
7ee5b0760a Ensure that there is enough information here, so that people can use
soft updates should they desire.
1998-05-19 23:18:37 +00:00
julian
1d370e5321 Bring up-to-date with Whistle's current version
Includes some debugging code.
1998-05-19 23:07:25 +00:00
julian
1aca5b515a Merge with Kirk's version as of Feb 20
His version 9.23 == our version 1.5 of ffs_softdep.c
His version 9.5 ==  our version 1.4 of softdep.c
1998-05-19 22:54:53 +00:00
julian
775d9bc55b Merge in Kirk's changes to stop softupdates from hogging all of memory. 1998-05-19 21:45:53 +00:00
julian
03438335f7 Change to stop a silly panic. This should be understood better.
Change a buffer swizzle trick to a bcopy. It would be nice if the efficient
trick could be used in the future.
1998-05-19 20:50:41 +00:00
julian
294f018f94 First published FreeBSD version of soft updates Feb 5. 1998-05-19 20:18:42 +00:00
julian
59d05b8498 This commit was generated by cvs2svn to compensate for changes in r36206,
which included commits to RCS files with non-trunk default branches.
1998-05-19 20:03:29 +00:00
julian
6df1279ada Import the next version received from kirk after some
FreeBSD feedback.
1998-05-19 20:03:29 +00:00
julian
e12c18397d This commit was generated by cvs2svn to compensate for changes in r36201,
which included commits to RCS files with non-trunk default branches.
1998-05-19 19:47:22 +00:00
julian
f9429f2998 Import the earliest version of the soft update code that I have. 1998-05-19 19:47:22 +00:00
julian
d21c93a4ec try stop the user from using mount -u to set the async flag on
a filesystem currently using soft updates.
Also needs a new copy of ffs_softdep.c to complete the fix.
1998-05-18 06:38:18 +00:00
phk
2bea7e6614 s/nanoruntime/nanouptime/g
s/microruntime/microuptime/g

Reviewed by:	bde
1998-05-17 11:53:46 +00:00
julian
b27e38e1d8 Add missing splx()
Submitted by: Luoqi Chen <luoqi@chen.ml.org>
1998-05-11 21:41:13 +00:00
julian
dfabb949fd Submitted by: abial@nask.pl
Minor fix to support SLICE in MFS...
1998-05-11 19:27:18 +00:00
msmith
b4bbf78c74 In the words of the submitter:
---------
Make callers of namei() responsible for releasing references or locks
instead of having the underlying filesystems do it.  This eliminates
redundancy in all terminal filesystems and makes it possible for stacked
transport layers such as umapfs or nullfs to operate correctly.

Quality testing was done with testvn, and lat_fs from the lmbench suite.

Some NFS client testing courtesy of Patrik Kudo.

vop_mknod and vop_symlink still release the returned vpp.  vop_rename
still releases 4 vnode arguments before it returns.  These remaining cases
will be corrected in the next set of patches.
---------

Submitted by:	Michael Hancock <michaelh@cet.co.jp>
1998-05-07 04:58:58 +00:00
msmith
54f95d7f2a As described by the submitter:
Reverse the VFS_VRELE patch.  Reference counting of vnodes does not need
to be done per-fs.  I noticed this while fixing vfs layering violations.
Doing reference counting in generic code is also the preference cited by
John Heidemann in recent discussions with him.

The implementation of alternative vnode management per-fs is still a valid
requirement for some filesystems but will be revisited sometime later,
most likely using a different framework.

Submitted by:	Michael Hancock <michaelh@cet.co.jp>
1998-05-06 05:29:41 +00:00
dyson
786ece306d Correct an error that I made where the vtruncbuf was changed back
to vinvalbuf, but I incorrectly added the "V_SAVE|V_SAVEMETA" flags.
Submitted by:	Luoqi Chen <luoqi@watermarkgroup.com>
1998-05-04 17:43:48 +00:00
dyson
02900c7628 Fix an error that I made with an optimization. In the case
of softupdates, we need to do vtruncbuf the old way.  Luoqi
caught, found the bug and submitted this fix.
Submitted by:	Luoqi Chen <luoqi@chen.ml.org>
1998-04-30 05:28:53 +00:00
julian
de3ede4966 Make the devfs SLICE option a standard type option.
(hopefully it will go away eventually anyhow)
1998-04-20 03:57:41 +00:00
julian
6e0039d406 Add changes and code to implement a functional DEVFS.
This code will be turned on with the TWO options
DEVFS and SLICE. (see LINT)
Two labels PRE_DEVFS_SLICE and POST_DEVFS_SLICE will deliniate these changes.

/dev will be automatically mounted by init (thanks phk)
on bootup. See /sys/dev/slice/slice.4 for more info.
All code should act the same without these options enabled.

Mike Smith, Poul Henning Kamp, Soeren, and a few dozen others

This code does not support the following:
bad144 handling.
Persistance. (My head is still hurting from the last time we discussed this)
ATAPI flopies are not handled by the SLICE code yet.

When this code is running, all major numbers are arbitrary and COULD
be dynamically assigned. (this is not done, for POLA only)
Minor numbers for disk slices ARE arbitray and dynamically assigned.
1998-04-19 23:32:49 +00:00
des
31722e83fd Seventy-odd "its" / "it's" typos in comments fixed as per kern/6108. 1998-04-17 22:37:19 +00:00
bde
ed4817c3a8 Fixed bitrot in the non-softdep case of ufs_dirremove():
- restored async mount support.  The first entry in a block is still
  always written synchronously, although it probably shouldn't be in
  the async case.
- restored use of BWRITE() instead of bowrite() for the DOWHITEOUT
  case, although bowrite() is probably better.

Broken by:	merge of softdep changes (rev.1.22).
Found by:	lmbench2 delete-file benchmarks.
1998-04-15 12:27:31 +00:00
peter
c75fe12873 Back this out, allowing users to get a fd connected to a symlink is
just too dangerous.
1998-04-06 18:18:50 +00:00
peter
b70aee89fb Don't panic if a VOP_READ() gets through on a short link, Just Do It
(because we can :-).  This means you can open a link file (or pseudo-file
in the case of short links where the data is stored in the inode rather
than disk blocks) and read the contents.
However, trap any writes from the user as it's difficult to do the right
thing in all cases.  A link may be short and the user may be trying to
extend it beyond the limit and so on.  Although.. being able to re-target
a symlink without deleting it first might have been nice.
This stuff is a bit perverse since symlink() and readlink() calls can
end up actually being implemented as read/write vnode ops.

Reviewed by: phk
1998-04-06 17:44:40 +00:00
phk
2e82976dec Time changes mark 2:
* Figure out UTC relative to boottime.  Four new functions provide
      time relative to boottime.

    * move "runtime" into struct proc.  This helps fix the calcru()
      problem in SMP.

    * kill mono_time.

    * add timespec{add|sub|cmp} macros to time.h.  (XXX: These may change!)

    * nanosleep, select & poll takes long sleeps one day at a time

Reviewed by:    bde
Tested by:      ache and others
1998-04-04 13:26:20 +00:00
phk
8396c1d18d Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time.  Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by:	bde
1998-03-30 09:56:58 +00:00
bde
878d31d269 Moved some #includes from <sys/param.h> nearer to where they are actually
used.
1998-03-28 10:33:27 +00:00
peter
4db6350460 Enable the use of soft updates on the root filesystem. Previously, the
softdep mode could only be activated on the initial mount of a filesystem
and then only if it was a read-write mount.  A 'mount -r' (as done in the
rootfs mount) followed by a 'mount -u' to convert to read-write didn't
start softdep mode.
1998-03-27 14:20:57 +00:00
phk
925612c9b4 Add two new functions, get{micro|nano}time.
They are atomic, but return in essence what is in the "time" variable.
gettime() is now a macro front for getmicrotime().

Various patches to use the two new functions instead of the various
hacks used in their absence.

Some puntuation and grammer patches from Bruce.

A couple of XXX comments.
1998-03-26 20:54:05 +00:00
bde
f0511c277c Forward declare even more structs to restore some self-sufficiency.
Didn't fix new dependence on <ufs/ufs/inode.h> and its prerequisites.
1998-03-23 14:12:37 +00:00
dyson
bcac3d4a40 Softdep_sync_metadata appears to expect that it is called at splbio,
so make it so...
1998-03-21 05:16:09 +00:00
dyson
089cf3e906 Fix vfs_bio_awrite usage, and correct vtruncbuf usage. 1998-03-19 22:49:44 +00:00
dyson
5abb86e605 Some VM improvements, including elimination of alot of Sig-11
problems.  Tor Egge and others have helped with various VM bugs
lately, but don't blame him -- blame me!!!

pmap.c:
1)	Create an object for kernel page table allocations.  This
	fixes a bogus allocation method previously used for such, by
	grabbing pages from the kernel object, using bogus pindexes.
	(This was a code cleanup, and perhaps a minor system stability
	 issue.)

pmap.c:
2)	Pre-set the modify and accessed bits when prudent.  This will
	decrease bus traffic under certain circumstances.

vfs_bio.c, vfs_cluster.c:
3)	Rather than calculating the beginning virtual byte offset
	multiple times, stick the offset into the buffer header, so
	that the calculated offset can be reused.  (Long long multiplies
	are often expensive, and this is a probably unmeasurable performance
	improvement, and code cleanup.)

vfs_bio.c:
4)	Handle write recursion more intelligently (but not perfectly) so
	that it is less likely to cause a system panic, and is also
	much more robust.

vfs_bio.c:
5)	getblk incorrectly wrote out blocks that are incorrectly sized.
	The problem is fixed, and writes blocks out ONLY when B_DELWRI
	is true.

vfs_bio.c:
6)	Check that already constituted buffers have fully valid pages.  If
	not, then make sure that the B_CACHE bit is not set. (This was
	a major source of Sig-11 type problems.)

vfs_bio.c:
7)	Fix a potential system deadlock due to an incorrectly specified
	sleep priority while waiting for a buffer write operation.  The
	change that I made opens the system up to serious problems, and
	we need to examine the issue of process sleep priorities.

vfs_cluster.c, vfs_bio.c:
8)	Make clustered reads work more correctly (and more completely)
	when buffers are already constituted, but not fully valid.
	(This was another system reliability issue.)

vfs_subr.c, ffs_inode.c:
9)	Create a vtruncbuf function, which is used by filesystems that
	can truncate files.  The vinvalbuf forced a file sync type operation,
	while vtruncbuf only invalidates the buffers past the new end of file,
	and also invalidates the appropriate pages.  (This was a system reliabiliy
	and performance issue.)

10)	Modify FFS to use vtruncbuf.

vm_object.c:
11)	Make the object rundown mechanism for OBJT_VNODE type objects work
	more correctly.  Included in that fix, create pager entries for
	the OBJT_DEAD pager type, so that paging requests that might slip
	in during race conditions are properly handled.  (This was a system
	reliability issue.)

vm_page.c:
12)	Make some of the page validation routines be a little less picky
	about arguments passed to them.  Also, support page invalidation
	change the object generation count so that we handle generation
	counts a little more robustly.

vm_pageout.c:
13)	Further reduce pageout daemon activity when the system doesn't
	need help from it.  There should be no additional performance
	decrease even when the pageout daemon is running.  (This was
	a significant performance issue.)

vnode_pager.c:
14)	Teach the vnode pager to handle race conditions during vnode
	deallocations.
1998-03-16 01:56:03 +00:00
dyson
337001778a Correct a problem with the ffs_getpages routine that manifest's itself
during the tail command.  The amount to read is incorrectly calculated.
Submitted by:	Tor Egge
1998-03-09 22:12:52 +00:00
julian
3da153eb72 Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman)
Submitted by:	Kirk McKusick (mcKusick@mckusick.com)
Obtained from:  WHistle development tree
1998-03-08 09:59:44 +00:00
julian
02838128a5 Submitted by: kirk McKusick
Stub file for soft updates.
1998-03-08 08:38:41 +00:00
dyson
067e84884d This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code.  These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances.  Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code.  This code might have been committed seperately, but
almost everything is interrelated.

1)	Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
	are fully valid.
2)	Rather than deactivating erroneously read initial (header) pages in
	kern_exec, we now free them.
3)	Fix the rundown of non-VMIO buffers that are in an inconsistent
	(missing vp) state.
4)	Fix the disassociation of pages from buffers in brelse.  The previous
	code had rotted and was faulty in a couple of important circumstances.
5)	Remove a gratuitious buffer wakeup in vfs_vmio_release.
6)	Remove a crufty and currently unused cluster mechanism for VBLK
	files in vfs_bio_awrite.  When the code is functional, I'll add back
	a cleaner version.
7)	The page busy count wakeups assocated with the buffer cache usage were
	incorrectly cleaned up in a previous commit by me.  Revert to the
	original, correct version, but with a cleaner implementation.
8)	The cluster read code now tries to keep data associated with buffers
	more aggressively (without breaking the heuristics) when it is presumed
	that the read data (buffers) will be soon needed.
9)	Change to filesystem lockmgr locks so that they use LK_NOPAUSE.  The
	delay loop waiting is not useful for filesystem locks, due to the
	length of the time intervals.
10)	Correct and clean-up spec_getpages.
11)	Implement a fully functional nfs_getpages, nfs_putpages.
12)	Fix nfs_write so that modifications are coherent with the NFS data on
	the server disk (at least as well as NFS seems to allow.)
13)	Properly support MS_INVALIDATE on NFS.
14)	Properly pass down MS_INVALIDATE to lower levels of the VM code from
	vm_map_clean.
15)	Better support the notion of pages being busy but valid, so that
	fewer in-transit waits occur.  (use p->busy more for pageouts instead
	of PG_BUSY.)  Since the page is fully valid, it is still usable for
	reads.
16)	It is possible (in error) for cached pages to be busy.  Make the
	page allocation code handle that case correctly.  (It should probably
	be a printf or panic, but I want the system to handle coding errors
	robustly.  I'll probably add a printf.)
17)	Correct the design and usage of vm_page_sleep.  It didn't handle
	consistancy problems very well, so make the design a little less
	lofty.  After vm_page_sleep, if it ever blocked, it is still important
	to relookup the page (if the object generation count changed), and
	verify it's status (always.)
18)	In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19)	Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20)	Fix vm_pager_put_pages and it's descendents to support an int flag
	instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
bde
caeeae82b1 Fixed missing simple_lock() in ffs_mountfs(). 1998-03-07 14:59:44 +00:00
msmith
0656734d76 The intent is to get rid of WILLRELE in vnode_if.src by making
a complement to all ops that return a vpp, VFS_VRELE.  This is
initially only for file systems that implement the following ops
that do a WILLRELE:

	vop_create, vop_whiteout, vop_mknod, vop_remove, vop_link,
	vop_rename, vop_mkdir, vop_rmdir, vop_symlink

This is initial DNA that doesn't do anything yet.  VFS_VRELE is
implemented but not called.

A default vfs_vrele was created for fs implementations that use the
standard vnode management routines.

VFS_VRELE implementations were made for the following file systems:

Standard (vfs_vrele)
	ffs mfs nfs msdosfs devfs ext2fs

Custom
	union umapfs

Just EOPNOTSUPP
	fdesc procfs kernfs portal cd9660

These implementations may change as VOP changes are implemented.

In the next phase, in the vop implementations calls to vrele and the vrele
part of vput will be moved to the top layer vfs_vnops and made visible
to all layers.  vput will be replaced by unlock in these cases.  Unlocking
will still be done in the per fs layer but the refcount decrement will be
triggered at the top because it doesn't hurt to hold a vnode reference a
little longer.  This will have minimal impact on the structure of the
existing code.

This will only be done for vnode arguments that are released by the various
fs vop implementations.

Wider use of VFS_VRELE will likely require restructuring of the code.

Reviewed by:	phk, dyson, terry et. al.
Submitted by:	Michael Hancock <michaelh@cet.co.jp>
1998-03-01 22:46:53 +00:00