Commit Graph

727 Commits

Author SHA1 Message Date
bde
fee8e1a16f Fixed breakage of mknod() in rev.1.48 of ext2_vnops.c and rev.1.126 of
ufs_vnops.c:

1) i_ino was confused with i_number, so the inode number passed to
   VFS_VGET() was usually wrong (usually 0U).
2) ip was dereferenced after vgone() freed it, so the inode number
   passed to VFS_VGET() was sometimes not even wrong.

Bug (1) was usually fatal in ext2_mknod(), since ext2fs doesn't have
space for inode 0 on the disk; ino_to_fsba() subtracts 1 from the
inode number, so inode number 0U gives a way out of bounds array
index.  Bug(1) was usually harmless in ufs_mknod(); ino_to_fsba()
doesn't subtract 1, and VFS_VGET() reads suitable garbage (all 0's?)
from the disk for the invalid inode number 0U; ufs_mknod() returns
a wrong vnode, but most callers just vput() it; the correct vnode is
eventually obtained by an implicit VFS_VGET() just like it used to be.

Bug (2) usually doesn't happen.
2000-11-04 08:10:56 +00:00
eivind
1afa7eea27 Give vop_mmap an untimely death. The opportunity to give it a timely
death timed out in 1996.
2000-11-01 17:57:24 +00:00
phk
cbbd2b08c2 Add a missing <sys/systm.h> 2000-10-30 20:37:19 +00:00
phk
ff5cdfae2d Move suser() and suser_xxx() prototypes and a related #define from
<sys/proc.h> to <sys/systm.h>.

Correctly document the #includes needed in the manpage.

Add one now needed #include of <sys/systm.h>.
Remove the consequent 48 unused #includes of <sys/proc.h>.
2000-10-29 16:06:56 +00:00
phk
f82e4ca62c Weaken a bogus dependency on <sys/proc.h> in <sys/buf.h> by #ifdef'ing
the offending inline function (BUF_KERNPROC) on it being #included
already.

I'm not sure BUF_KERNPROC() is even the right thing to do or in the
right place or implemented the right way (inline vs normal function).

Remove consequently unneeded #includes of <sys/proc.h>
2000-10-29 14:54:55 +00:00
phk
94a5006c9a Remove unneeded #include <sys/proc.h> lines. 2000-10-29 13:57:19 +00:00
rwatson
9c993b44d0 o Introduce new VOP_ACCESS() flag VADMIN, allowing file systems to perform
"administrative" authorization checks.  In most cases, the VADMIN test
  checks to make sure the credential effective uid is the same as the file
  owner.
o Modify vaccess() to set VADMIN as an available right if the uid is
  appropriate.
o Modify references to uid-based access control operations such that they
  now always invoke VOP_ACCESS() instead of using hard-coded policy checks.
o This allows alternative UFS policies to be implemented by replacing only
  ufs_access() (such as mandatory system policies).
o VOP_ACCESS() requires the caller to hold an exclusive vnode lock on the
  vnode: I believe that new invocations of VOP_ACCESS() are always called
  with the lock held.
o Some direct checks of the uid remain, largely associated with the QUOTA
  and SUIDDIR code.

Reviewed by:	eivind
Obtained from:	TrustedBSD Project
2000-10-19 07:53:59 +00:00
adrian
0458054c4e Initial commit of IFS - a inode-namespaced FFS. Here is a short
description:

How it works:
--

Basically ifs is a copy of ffs, overriding some vfs/vnops. (Yes, hack.)
I didn't see the need in duplicating all of sys/ufs/ffs to get this
off the ground.

File creation is done through a special file - 'newfile' . When newfile
is called, the system allocates and returns an inode. Note that newfile
is done in a cloning fashion:

fd = open("newfile", O_CREAT|O_RDWR, 0644);
fstat(fd, &st);

printf("new file is %d\n", (int)st.st_ino);

Once you have created a file, you can open() and unlink() it by its returned
inode number retrieved from the stat call, ie:

fd = open("5", O_RDWR);

The creation permissions depend entirely if you have write access to the
root directory of the filesystem.

To get the list of currently allocated inodes, VOP_READDIR has been added
which returns a directory listing of those currently allocated.

--

What this entails:

* patching conf/files and conf/options to include IFS as a new compile
  option (and since ifs depends upon FFS, include the FFS routines)

* An entry in i386/conf/NOTES indicating IFS exists and where to go for
  an explanation

* Unstaticize a couple of routines in src/sys/ufs/ffs/ which the IFS
  routines require (ffs_mount() and ffs_reload())

* a new bunch of routines in src/sys/ufs/ifs/ which implement the IFS
  routines. IFS replaces some of the vfsops, and a handful of vnops -
  most notably are VFS_VGET(), VOP_LOOKUP(), VOP_UNLINK() and VOP_READDIR().
  Any other directory operation is marked as invalid.

What this results in:

* an IFS partition's create permissions are controlled by the perm/ownership of
  the root mount point, just like a normal directory

* Each inode has perm and ownership too

* IFS does *NOT* mean an FFS partition can be opened per inode. This is a
  completely seperate filesystem here

* Softupdates doesn't work with IFS, and really I don't think it needs it.
  Besides, fsck's are FAST. (Try it :-)

* Inodes 0 and 1 aren't allocatable because they are special (dump/swap IIRC).
  Inode 2 isn't allocatable since UFS/FFS locks all inodes in the system against
  this particular inode, and unravelling THAT code isn't trivial. Therefore,
  useful inodes start at 3.

Enjoy, and feedback is definitely appreciated!
2000-10-14 03:02:30 +00:00
rwatson
113a9e1b35 o Sanity check was inverted, resulting in a possible spurious panic
during unmount if extended attributes were in use.  Correct by removing
  an unneeded (and undesirable) '!'.
2000-10-09 20:04:39 +00:00
eivind
4a39f454a0 Blow away the v_specmountpoint define, replacing it with what it was
defined as (rdev->si_mountpoint)
2000-10-09 17:31:39 +00:00
rwatson
8d202cd4d0 o Move initialization of ump from mp to the top of the function so that
it is defined whenm used in ufs_extattr_uepm_destroy(), fixing a panic
  due to a NULL pointer dereference.

Submitted by:	Wesley Morgan <morganw@chemicals.tacorp.com>
2000-10-06 15:31:28 +00:00
rwatson
469c6c2bf3 o Add call to ufs_extattr_uepm_destroy() in ffs_unmount() so as to clean
up lock on extattrs.
o Get for free a comment indicating where auto-starting of extended
  attributes will eventually occur, as it was in my commit tree also.
  No implementation change here, only a comment.
2000-10-04 04:44:51 +00:00
rwatson
4e14377293 o Correct use of lockdestroy() by adding a new ufs_extattr_uepm_destroy()
call, which should be the last thing down to a per-mount extattr
  management structure, after ufs_extattr_stop() on the file system.
  This currently has the effect only of destroying the per-mount lock
  on extended attributes, and clearing appropriate flags.
o Remove inappropriate invocation in ufs_extattr_vnode_inactive().
2000-10-04 04:41:33 +00:00
jasone
4e290e67b7 Convert lockmgr locks from using simple locks to using mutexes.
Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.
2000-10-04 01:29:17 +00:00
bp
6110b03d24 Add a lock structure to vnode structure. Previously it was either allocated
separately (nfs, cd9660 etc) or keept as a first element of structure
referenced by v_data pointer(ffs). Such organization leads to known problems
with stacked filesystems.

From this point vop_no*lock*() functions maintain only interlock lock.
vop_std*lock*() functions maintain built-in v_lock structure using lockmgr().
vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared
lock on vnode.

If filesystem wishes to export lockmgr compatible lock, it can put an address
of this lock to v_vnlock field. This indicates that the upper filesystem
can take advantage of it and use single lock structure for entire (or part)
of stack of vnodes. This field shouldn't be examined or modified by VFS code
except for initialization purposes.

Reviewed in general by:	mckusick
2000-09-25 15:24:04 +00:00
rwatson
40aced8438 o Permit UFS Extended Attributes to be associated with special devices
and FIFOs.

Obtained from:	TrustedBSD Project
2000-09-21 19:06:02 +00:00
rwatson
07ac219faf o Disallow privileged processes in jail() from directly accessing
system namespace extended attributes.
o Document privilege/jail() interaction relating to extended
  attributes.

Obtained from:	TrustedBSD Project
2000-09-18 18:10:13 +00:00
rwatson
3546d27e15 o Allow privileged processes in jail() to override sticky bit behavior
on directories.
o Allow privileged processes in jail() to create inodes with the
  setgid bit set even if they are not a member of the group denoted
  by the file creation gid.  This occurs due to inherited gid's from
  parent directories on file creation, allowing a user to create a
  file with a gid that is not in the creating process's credentials.

Obtained from:	TrustedBSD Project
2000-09-18 18:03:49 +00:00
rwatson
b324dcbd3d o Add a comment clarifying interaction between jail(), privileged processes,
and UFS file flags.  Here's what the comment says, for reference:

	Privileged processes in jail() are permitted to modify
	arbitrary user flags on files, but are not permitted
	to modify system flags.

  In other words, privilege does allow a process in jail to modify user
  flags for objects that the process does not own, but privilege will
  not permit the setting of system flags on the file.

Obtained from:	TrustedBSD Project
2000-09-18 17:58:15 +00:00
rwatson
f193def48e o Add missing PRISON_ROOT allowing a privileged process in a jail() to not
remove the setuid/setgid bits by virtue of a change to a file with those
  bits set, even if the process doesn't own the file, or isn't a group
  member of the file's gid.

Obtained from:	TrustedBSD Project
2000-09-18 17:53:22 +00:00
rwatson
4ba86892be o Substitute suser() calls for direct credential checks, which is now
safe as suser() no longer sets ASU.
o Note that in some cases, the PRISON_ROOT flag is used even though no
  process structure is passed, to indicate that if a process structure
  (and hence jail) was available, it would be ok.  In the long run,
  the jail identifier should probably be moved to ucred, as the uidinfo
  information was.
o Some uid 0 checks remain relating to the quota code, which I'll leave
  for another day.

Reviewed by:	phk, eivind
Obtained from:	TrustedBSD Project
2000-09-18 16:13:02 +00:00
des
86bd96948b Silence a warning. 2000-09-17 19:41:26 +00:00
bp
02544af7d4 Add new flag PDIRUNLOCK to the component.cn_flags which should be set by
filesystem lookup() routine if it unlocks parent directory. This flag should
be carefully tracked by filesystems if they want to work properly with nullfs
and other stacked filesystems.

VFS takes advantage of this flag to perform symantically correct usage
of vrele() instead of vput() if parent directory already unlocked.

If filesystem fails to track this flag then previous codepath in VFS left
unchanged.

Convert UFS code to set PDIRUNLOCK flag if necessary. Other filesystmes will
be changed after some period of testing.

Reviewed in general by:	mckusick, dillon, adrian
Obtained from:	NetBSD
2000-09-17 07:26:42 +00:00
phk
f2b4e59044 Remove a pointless casting of a gid_t to a gid_t. 2000-09-16 18:20:27 +00:00
bp
8437d5b6f4 Add VOP_*VOBJECT vops, because MFS requires explicit vop specification.
Noted by:	knu
2000-09-12 16:21:16 +00:00
rwatson
d12caa21f3 o Variety of extended attribute fixes
- In ufs_extattr_enable(), return EEXIST instead of EOPNOTSUPP
	  if the caller tries to configure an attribute name that is
	  already configured
	- Throughout, add IO_NODELOCKED to VOP_{READ,WRITE} calls to
	  indicate lock status of passed vnode.  Apparently not a
	  problem, but worth fixing.
	- For all writes, make use of IO_SYNC consistent.  Really,
	  IO_UNIT and combining of VOP_WRITE's should happen, but I
	  don't have that tested.  At least with this, it's
	  consistent usage.  (pointed out by: bde)
	- In ufs_extattr_get(), fixed nested locking of backing
	  vnode (fine due to recursive lock support, but make it
	  more consistent with other code)
	- In ufs_extattr_get(), clean up return code to set uio_resid
	  more consistently with other pieces of code (worked fine,
	  this is just a cleanup)
	- Fix ufs_extattr_rm(), which was broken--effectively a nop.
	- Minor comment and whitespace fixes.

Obtained from:	TrustedBSD Project
2000-09-12 05:35:47 +00:00
jhb
e467813373 Fix a 64-bitism. Use size_t instead of int for 4th argument to copyinstr.
Approved by:	rwatson
2000-09-11 05:43:02 +00:00
mckusick
7438b4ca6f Cannot do MALLOC with M_WAITOK while holding ACQUIRE_LOCK
Obtained from:	Ethan Solomita <ethan@geocast.com>
2000-09-07 23:02:55 +00:00
jasone
769e0f974d Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*().  See mutex(9).  (Note: The
  alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
  preempted (i386 only).

Partially contributed by:	BSDi (BSD/OS)
Submissions by (at least):	cp, dfr, dillon, grog, jake, jhb, sheldonh
2000-09-07 01:33:02 +00:00
rwatson
e6a536221c Modify extended attribute protection model to authorize based on
attribute namespace and DAC protection on file:
	- Attribute names beginning with '$' are in the system namespace
	- The attribute name "$" is reserved
	- System namespace attributes may only be read/set by suser()
	  or by kernel (cred == NULL)
	- Other attribute names are in the application namespace
	- The attribute name "" is reserved
	- Application namespace attributes are protected in the manner
	  of the target file permission

o Kernel changes
	- Add ufs_extattr_valid_attrname() to check whether the requested
	  attribute "set" or "enable" is appropriate (i.e., non-reserved)
	- Modify ufs_extattr_credcheck() to accept target file vnode, not
	  to take inode uid
	- Modify ufs_extattr_credcheck() to check namespace, then enforce
	  either kernel/suser for system namespace, or vaccess() for
	  application namespace
o EA backing file format changes
	- Remove permission fields from extended attribute backing file
	  header
	- Bump extended attribute backing file header version to 3
o Update extattrctl.c and extattrctl.8
	- Remove now deprecated -r and -w arguments to initattr, as
	  permissions are now implicit
	- (unrelated) fix error reporting and unlinking during failed
	  initattr to remove duplicate/inaccurate error messages, and to
	  only unlink if the failure wasn't in the backing file open()

Obtained from:	TrustedBSD Project
2000-09-02 20:31:26 +00:00
rwatson
e54ea574fa o Restructure vaccess() so as to check for DAC permission to modify the
object before falling back on privilege.  Make vaccess() accept an
  additional optional argument, privused, to determine whether
  privilege was required for vaccess() to return 0.  Add commented
  out capability checks for reference.  Rename some variables to make
  it more clear which modes/uids/etc are associated with the object,
  and which with the access mode.
o Update file system use of vaccess() to pass NULL as the optional
  privused argument.  Once additional patches are applied, suser()
  will no longer set ASU, so privused will permit passing of
  privilege information up the stack to the caller.

Reviewed by:	bde, green, phk, -security, others
Obtained from:	TrustedBSD Project
2000-08-29 14:45:49 +00:00
rwatson
251e663e8a o Correct spelling of ufs_exttatr_find_attr -> ufs_extattr_find_attr
o Add "const" qualifier to attrname argument of various calls to remove
  warnings

Obtained from:	TrustedBSD Project
2000-08-26 22:00:58 +00:00
phk
b648921acc Remove all traces of Julians DEVFS (incl from kern/subr_diskslice.c)
Remove old DEVFS support fields from dev_t.

  Make uid, gid & mode members of dev_t and set them in make_dev().

  Use correct uid, gid & mode in make_dev in disk minilayer.

  Add support for registering alias names for a dev_t using the
  new function make_dev_alias().  These will show up as symlinks
  in DEVFS.

  Use makedev() rather than make_dev() for MFSs magic devices to prevent
  DEVFS from noticing this abuse.

  Add a field for DEVFS inode number in dev_t.

  Add new DEVFS in fs/devfs.

  Add devfs cloning to:
        disk minilayer (ie: ad(4), sd(4), cd(4) etc etc)
        md(4), tun(4), bpf(4), fd(4)

  If DEVFS add -d flag to /sbin/inits args to make it mount devfs.

  Add commented out DEVFS to GENERIC
2000-08-20 21:34:39 +00:00
phk
3d2aecdc81 Centralize the canonical vop_access user/group/other check in vaccess().
Discussed with: bde
2000-08-20 08:36:26 +00:00
tegge
6dac8645b8 Initialize *countp to 0 in stub for softdep_flushworklist().
This allows ffs_fsync() to break out of a loop that might otherwise
be infinite on kernels compiled without the SOFTUPDATES option.
The observed symptom was a system hang at the first unmount attempt.
2000-08-09 00:41:54 +00:00
roberto
3d4cf3c369 Fix the lockmgr panic everyone is seeing at shutdown time.
vput assumes curproc is the lock holder, but it's not true in this case.

Thanks a lot Luoqi !

Submitted by:	luoqi
Tested by:	phk
2000-08-01 14:15:07 +00:00
peter
3f9fc32ece Minor tweak - removed unused variable 'struct mount *mp'; 2000-07-28 22:28:05 +00:00
peter
cfc0cd38b7 Minor change: fix warning - move a 'struct vnode *vp' declaration inside a
#ifdef DIAGNOSTIC to match its corresponding usage.
2000-07-28 22:27:00 +00:00
mckusick
b86877bef0 Clean up the snapshot code so that it no longer depends on the use of
the SF_IMMUTABLE flag to prevent writing. Instead put in explicit
checking for the SF_SNAPSHOT flag in the appropriate places. With
this change, it is now possible to rename and link to snapshot files.
It is also possible to set or clear any of the owner, group, or
other read bits on the file, though none of the write or execute
bits can be set. There is also an explicit test to prevent the
setting or clearing of the SF_SNAPSHOT flag via chflags() or
fchflags(). Note also that the modify time cannot be changed as
it needs to accurately reflect the time that the snapshot was taken.

Submitted by:	Robert Watson <rwatson@FreeBSD.org>
2000-07-26 23:07:01 +00:00
phk
9aed458325 Fix the "mfs_badop[vop_getwritemount] = 45" messages. 2000-07-26 17:53:04 +00:00
mckusick
4223e4856e Add stub for softdep_flushworklist() so that kernels compiled
without the SOFTUPDATES option will load correctly.

Obtained from:	John Baldwin <jhb@bsdi.com>
2000-07-25 05:28:59 +00:00
mckusick
7d9ff6c133 Eliminate periodic 'mfs_badop[vop_getwritemount] = 45' messages.
Submitted by:	Sheldon Hearn <sheldonh@uunet.co.za>
2000-07-25 05:11:57 +00:00
mckusick
acc66855bf This patch corrects the first round of panics and hangs reported
with the new snapshot code.

Update addaliasu to correctly implement the semantics of the old
checkalias function. When a device vnode first comes into existence,
check to see if an anonymous vnode for the same device was created
at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than
creating a new vnode for the device. This corrects a problem which
caused the kernel to panic when taking a snapshot of the root
filesystem.

Change the calling convention of vn_write_suspend_wait() to be the
same as vn_start_write().

Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue when suspending filesystem
operations.

Access to buffers becomes recursive so that snapshots can recursively
traverse their indirect blocks using ffs_copyonwrite() when checking
for the need for copy on write when flushing one of their own indirect
blocks. This eliminates a deadlock between the syncer daemon and a
process taking a snapshot.

Ensure that softdep_process_worklist() can never block because of a
snapshot being taken. This eliminates a problem with buffer starvation.

Cleanup change in ffs_sync() which did not synchronously wait when
MNT_WAIT was specified. The result was an unclean filesystem panic
when doing forcible unmount with heavy filesystem I/O in progress.

Return a zero'ed block when reading a block that was not in use at
the time that a snapshot was taken. Normally, these blocks should
never be read. However, the readahead code will occationally read
them which can cause unexpected behavior.

Clean up the debugging code that ensures that no blocks be written
on a filesystem while it is suspended. Snapshots must explicitly
label the blocks that they are writing during the suspension so that
they do not cause a `write on suspended filesystem' panic.

Reorganize ffs_copyonwrite() to eliminate a deadlock and also to
prevent a race condition that would permit the same block to be
copied twice. This change eliminates an unexpected soft updates
inconsistency in fsck caused by the double allocation.

Use bqrelse rather than brelse for buffers that will be needed
soon again by the snapshot code. This improves snapshot performance.
2000-07-24 05:28:33 +00:00
rwatson
1293199940 o Marius pointed out an unusually inconvenient upper bound on extended
attribute data size.
o Fortunately it turned out to be an unused constant left over from an
  earlier implementation, and is therefore being removed so as not to
  confuse casual observers.

Submitted by:	mbendiks@eunet.no
2000-07-14 03:30:52 +00:00
bp
c956469da1 Prevent possible dereference of NULL pointer.
Submitted by:	Marius Bendiksen <mbendiks@eunet.no>
2000-07-13 02:17:14 +00:00
mckusick
6e81eafe20 Brain fault, forgot to update ffs_snapshot.c with the new calling convention
for vn_start_write.
2000-07-12 00:27:27 +00:00
mckusick
a3d0c189ea Add snapshots to the fast filesystem. Most of the changes support
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.

Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).
2000-07-11 22:07:57 +00:00
mckusick
b2ed023a03 Clean up warning about undeclared function by declaring softdep_fsync
in mount.h instead of ffs_extern.h. The correct solution is to use
an indirect function pointer so that the kernel does not have to be
built with options FFS, but that will be left for another day.
2000-07-11 19:28:26 +00:00
phk
9241ff9fc6 Finish repo-copy:
Move ufs/ufs/ufs_disksubr.c to kern/subr_disklabel.c.

These functions are not UFS specific and are in fact used all over the place.
2000-07-10 13:48:06 +00:00
mckusick
92dfcced5b Delete README as it is now obsolete. Relevant information is in
README.softupdates.
2000-07-08 02:32:49 +00:00
mckusick
0ab089c771 Update to reflect current status. 2000-07-08 02:31:21 +00:00
mckusick
5e6b00a0a7 Get userland visible flags added for snapshots to give a few days
advance preparation for them to get migrated into place so that
subsequent changes in utilities will not fail to compile for lack
of up-to-date header files in /usr/include.
2000-07-04 04:58:34 +00:00
mckusick
040e64cd97 Move the truncation code out of vn_open and into the open system call
after the acquisition of any advisory locks. This fix corrects a case
in which a process tries to open a file with a non-blocking exclusive
lock. Even if it fails to get the lock it would still truncate the
file even though its open failed. With this change, the truncation
is done only after the lock is successfully acquired.

Obtained from:	 BSD/OS
2000-07-04 03:34:11 +00:00
phk
2a91a9dd04 Make the two calls from kern/* into softupdates #ifdef SOFTUPDATES,
that is way cleaner than using the softupdates_stub stunt, which
should be killed when convenient.

Discussed with:	mckusick
2000-07-03 13:26:54 +00:00
phk
0535bee2fb Move prtactive to vfs from ufs. It is used all over the place. 2000-06-27 07:46:22 +00:00
ache
8b610ecf81 Remove obsoleted info about linking from contrib 2000-06-24 13:29:25 +00:00
mckusick
aa0e1b74b0 Update to new copyright. 2000-06-22 00:29:53 +00:00
mckusick
2be2bf630e When running with quotas enabled on a filesystem using soft updates,
the system would panic when a user's inode quota was exceeded (see
PR 18959 for details). This fixes that problem.

PR:		18959
Submitted by:	Jason Godsey <jason@unixguy.fidalgo.net>
2000-06-18 22:14:28 +00:00
mckusick
cad9618566 Some additional performance improvements. When freeing an inode
check to see if it has been committed to disk. If it has never
been written, it can be freed immediately. For short lived files
this change allows the same inode to be reused repeatedly.
Similarly, when upgrading a fragment to a larger size, if it
has never been claimed by an inode on disk, it too can be freed
immediately making it available for reuse often in the next slowly
growing block of the same file.
2000-06-18 22:05:57 +00:00
phk
cb90cb2b60 Revert part of my bioops change which implemented panic(8). 2000-06-16 14:32:13 +00:00
phk
74e1ff15ad ARGH! I have too many source trees :-(
Fix prototype errors in last commit.
2000-06-16 13:00:33 +00:00
phk
4ec91666fa Virtualizes & untangles the bioops operations vector.
Ref: Message-ID: <18317.961014572@critter.freebsd.dk> To: current@
2000-06-16 08:48:51 +00:00
phk
34fec64322 Remove a comment which should never have made it in. 2000-06-14 21:48:19 +00:00
rwatson
051a92f4cd o Remove unneeded off_t variable to clean up compile warning
Obtained from:	TrustedBSD Project
2000-06-05 14:22:51 +00:00
rwatson
584644aae9 o If FFS_EXTATTR is defined, don't print out an error message on unmount
if an FFS partition returns EOPNOTSUPP, as it just means extended
  attributes weren't enabled on that partition.  Prevents spurious
  warning per-partition at shutdown.
2000-06-04 04:50:36 +00:00
jake
961b97d434 Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by:		msmith and others
2000-05-26 02:09:24 +00:00
jake
d93fbc9916 Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by:	phk
Reviewed by:	phk
Approved by:	mdodd
2000-05-23 20:41:01 +00:00
rwatson
5dc4cdc7ab s/ffs_unmonut/ffs_unmount/ in a gratuitous ufs_extattr printf.
Reported by:	knu
2000-05-07 17:21:08 +00:00
phk
36c3965ff9 Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by:    peter
2000-05-05 09:59:14 +00:00
rwatson
51a3d7f35d Don't allow VOP_GETEXTATTR to set uio->uio_offset != 0, as we don't
provide locking over extended attribute operations, requiring that
individual operations be atomic.  Allowing non-zero starting offsets
permits applications/etc to put themselves at risk for inconsistent
behavior.  As VOP_SETEXTATTR already prohibited non-zero write offsets,
this makes sense.

Suggested by:	Andreas Gruenbacher <a.gruenbacher@bestbits.at>
2000-05-03 05:50:46 +00:00
phk
10914aa708 Remove unneeded #include <vm/vm_zone.h>
Generated by:	src/tools/tools/kerninclude
2000-04-30 18:52:11 +00:00
phk
1931990da0 s/biowait/bufwait/g
Prodded by: several.
2000-04-29 16:25:22 +00:00
phk
ce2aa22c93 Remove unneeded #include <sys/kernel.h> 2000-04-29 15:36:14 +00:00
mckusick
9ae79363c4 When files are given to users by root, the quota system failed to
reset their grace timer as their ownership crossed the soft limit
threshhold. Thus if they had been over their limit in the past,
they were suddenly penalized as if they had been over their limit
ever since. The fix is to check when root gives away files, that
when the receiving user crosses their soft limit, their grace timer
is reset. See the PR report for a detailed method of reproducing
the bug.

PR:		kern/17128
Submitted by:	Andre Albsmeier <andre.albsmeier@mchp.siemens.de>
Reviewed by:	Kirk McKusick <mckusick@mckusick.com>
2000-04-28 06:12:56 +00:00
phk
7473110e20 Convert the magic MFS device to a VCHR.
Detected by:	obrien
2000-04-22 05:45:38 +00:00
rwatson
5a7c0dbf09 o Introduce an extended attribute backing file header magic number
o Introduce an extended attribute backing file header version number
2000-04-19 20:12:41 +00:00
phk
6be1308ad1 Remove ~25 unneeded #include <sys/conf.h>
Remove ~60 unneeded #include <sys/malloc.h>
2000-04-19 14:58:28 +00:00
rwatson
5b36470f20 o Cause attribute data writes to use IO_SYNC since this improves the
chances of consistency with other file/directory meta-data in a
  write.  In the current set of extended attribute applications,
  this does not hurt much.  This should be discussed again later when
  it comes time to optimize performance of attributes.

o Include an inode generation number in the per-attribute header
  information.  This allows consistency verification to catch when
  a crash occurs, or an inode is recycled while attributes are not
  properly configured.  For now, an irritating error message is
  displayed when an inconsistency occurs.  At some point, may introduce
  an ``extattrctl check ...'' which catches these before attributes
  are enabled.  Not today.  If you get this message, it means you
  somehow managed to get your attribute backing file out of synch
  with the file system.  When this occurs, attribute not found is
  returned (== undefined).  Writes will overwrite the value there
  correcting the problem.  Might want to think about introducing
  a new errno or two to handle this kind of situation.

Discussed with:	kris
2000-04-19 07:38:20 +00:00
phk
99e3753c3a Retire bufqdisksort(), all drivers use bioqdisksort now. 2000-04-18 13:25:19 +00:00
jlemon
eb30412fdc Remove unneeded cast. 2000-04-17 03:37:13 +00:00
jlemon
42f19b9069 Replace the POLLEXTEND extensions with the kqueue() mechanism. 2000-04-16 18:55:20 +00:00
rwatson
95acaf111c Fix two bugs in extended attribute support for UFS/FFS:
o Put back in {} removed during over-zealous cleanup of gratuitous
  debugging output during preparation for the commit.  Due to the
  missing {}, writes on extended attributes always silently failed.
  Doh.

o Don't unlock the target vnode if it's the backing vnode, as we
  don't lock the target vnode if it's the backing vnode.
2000-04-16 01:35:30 +00:00
phk
aaaef0b54e Complete the bio/buf divorce for all code below devfs::strategy
Exceptions:
        Vinum untouched.  This means that it cannot be compiled.
        Greg Lehey is on the case.

        CCD not converted yet, casts to struct buf (still safe)

        atapi-cd casts to struct buf to examine B_PHYS
2000-04-15 05:54:02 +00:00
rwatson
a0dd5ab0fd Introduce extended attribute support for FFS, allowing arbitrary
(name, value) pairs to be associated with inodes.  This support is
used for ACLs, MAC labels, and Capabilities in the TrustedBSD
security extensions, which are currently under development.

In this implementation, attributes are backed to data vnodes in the
style of the quota support in FFS.  Support for FFS extended
attributes may be enabled using the FFS_EXTATTR kernel option
(disabled by default).  Userland utilities and man pages will be
committed in the next batch.  VFS interfaces and man pages have
been in the repo since 4.0-RELEASE and are unchanged.

o ufs/ufs/extattr.h: UFS-specific extattr defines
o ufs/ufs/ufs_extattr.c: bulk of support routines
o ufs/{ufs,ffs,mfs}/*.[ch]: hooks and extattr.h includes
o contrib/softupdates/ffs_softdep.c: extattr.h includes
o conf/options, conf/files, i386/conf/LINT: added FFS_EXTATTR

o coda/coda_vfsops.c: XXX required extattr.h due to ufsmount.h
(This should not be the case, and will be fixed in a future commit)

Currently attributes are not supported in MFS.  This will be fixed.

Reviewed by:	adrian, bp, freebsd-fs, other unthanked souls
Obtained from:	TrustedBSD Project
2000-04-15 03:34:27 +00:00
phk
f37bdf3ad7 Clone bio versions of certain bits of infrastructure:
devstat_end_transaction_bio()
        bioq_* versions of bufq_* incl bioqdisksort()
the corresponding "buf" versions will disappear when no longer used.

Move b_offset, b_data and b_bcount to struct bio.

Add BIO_FORMAT as a hack for fd.c etc.

We are now largely ready to start converting drivers to use struct
bio instead of struct buf.
2000-04-02 19:08:05 +00:00
phk
8ee11d587f Move B_ERROR flag to b_ioflags and call it BIO_ERROR.
(Much of this done by script)

Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.

Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.

Add bio_queue field for struct bio aware disksort.

Address a lot of stylistic issues brought up by bde.
2000-04-02 15:24:56 +00:00
dillon
057e33d02c Change the write-behind code to take more care when starting
async I/O's.  The sequential read heuristic has been extended to
    cover writes as well.  We continue to call cluster_write() normally,
    thus blocks in the file will still be reallocated for large (but still
    random) I/O's, but I/O will only be initiated for truely sequential
    writes.

    This solves a number of annoying situations, especially with DBM (hash
    method) writes, and also has the side effect of fixing a number of
    (stupid) benchmarks.

Reviewed-by: mckusick
2000-04-02 00:55:28 +00:00
phk
9f5fd263aa diff, patch and cvs didn't like these three last time around, try again. 2000-03-20 12:34:21 +00:00
phk
5df766a0f8 Rename the existing BUF_STRATEGY() to DEV_STRATEGY()
substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)

substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)

This patch is machine generated except for the ccd.c and buf.h parts.
2000-03-20 11:29:10 +00:00
phk
a246e10f55 Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new
field in struct buf: b_iocmd.  The b_iocmd is enforced to have
exactly one bit set.

B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.

Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.

Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue.  It is likely to write on your disk
where it should have been reading.

This change is a step in the direction towards a stackable BIO capability.

A lot of this patch were machine generated (Thanks to style(9) compliance!)

Vinum users:  Greg has not had time to test this yet, be careful.
2000-03-20 10:44:49 +00:00
mckusick
a02c1c5b8a Use 64-bit math to calculate if we have hit our freespace limit.
Necessary for coherent results on filesystems bigger than 0.5Tb.
2000-03-17 03:44:47 +00:00
mckusick
5ce14e7844 Bug fixes for currently harmless bugs that could rise to bite
the unwary if the code were called in slightly different ways.

1) In ufs_bmaparray() the code for calculating 'runb' will stop one block
short of the first entry in an indirect block. i.e. if an indirect block
contains N block numbers b[0]..b[N-1] then the code will never check if
b[0] and b[1] are sequential. For reference, compare with the equivalent
code that deals with direct blocks.

2) In ufs_lookup() there is an off-by-one error in the test that checks
if dp->i_diroff is outside the range of the the current directory size.
This is completely harmless, since the following while-loop condition
'dp->i_offset < endsearch' is never met, so the code immediately
does a second pass starting at dp->i_offset = 0.

3) Again in ufs_lookup(), the condition in a sanity check is wrong
for directories that are longer than one block. This bug means that
the sanity check is only effective for small directories.

Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
2000-03-15 07:18:15 +00:00
mckusick
acdd0d6f53 Use 64-bit math to decide if optimization needs to be changed.
Necessary for coherent results on filesystems bigger than 0.5Tb.

Submitted by:	Paul Saab <ps@yahoo-inc.com>
2000-03-15 07:08:36 +00:00
dillon
464af2ea27 In the 'found' case for ufs_lookup() the underlying bp's data was
being accessed after the bp had been releaed.  A simple move of the
    brelse() solves the problem.

Approved by: jkh
Submitted by:  Ian Dowse <iedowse@maths.tcd.ie>
2000-03-09 18:54:59 +00:00
dillon
414d15acb8 Fix a 'freeing free block' panic in UFS. The problem occurs when the
filesystem fills up.  If the first indirect block exists and FFS is able
    to allocate deeper indirect blocks, but is not able to allocate the
    data block, FFS improperly unwinds the indirect blocks and leaves a
    block pointer hanging to a freed block.  This will cause a panic later
    when the file is removed.  The solution is to properly account for the
    first block-pointer-to-an-indirect-block we had to create in a balloc
    operation and then unwind it if a failure occurs.

Detective work by: Ian Dowse <iedowse@maths.tcd.ie>
Reviewed by: mckusick, Ian Dowse <iedowse@maths.tcd.ie>
Approved by: jkh
2000-02-24 20:43:20 +00:00
rwatson
baa4395a04 After much consulting with bde, concluded that this fix was the best fix
to the current jail/chflags interactions.  This fix conditionalizes ``root
behavior'' in the chflags() case on not being in jail, so attempts to
perform a chflags in a jail are limited to what a normal user could do.
For example, this does allow setting of user flags as appropriate, but
prohibits changing of system flags.

Reviewed by:	bde
2000-02-22 03:56:58 +00:00
rwatson
fd37898b9f Disable chflags() from within jail() so that root within jail can't make
a mess in securelevel environments.  Results in one warning during
/etc/rc as it attempts to remove file flags, but this is harmless.

Approved by:	High Lord Hubbard
2000-02-20 01:10:36 +00:00
mckusick
541e13d43c When writing out bitmap buffers, need to skip over ones that already
have a write in progress. Otherwise one can get in an infinite loop
trying to get them all flushed.

Submitted by:	Matthew Dillon <dillon@apollo.backplane.com>
2000-01-30 20:32:59 +00:00
mckusick
e8eebed1f3 During fastpath processing for removal of a short-lived inode, the
set of restrictions for cancelling an inode dependency (inodedep)
is somewhat stronger than originally coded. Since this check appears
in two places, we codify it into the function check_inode_unwritten
which we then call from the two sites, one freeing blocks and the
other freeing directory entries.

Submitted by:	Steinar Haug via Matthew Dillon
2000-01-18 01:33:05 +00:00
mckusick
37dbb3e53f Need to reorganize the flushing of directory entry (pagedep) dependencies
so that they never try to lock an inode corresponding to ".." as this
can lead to deadlock. We observe that any inode with an updated link count
is always pushed into its buffer at the time of the link count change, so
we do not need to do a VOP_UPDATE, but merely find its buffer and write it.
The only time we need to get the inode itself is from the result of a
mkdir whose name will never be ".." and hence locking such an inode will
never request a lock above us in the filesystem tree. Thanks to Brian
Fundakowski Feldman for providing the test program that tickled soft updates
into hanging in "inode" sleep.

Submitted by:	Brian Fundakowski Feldman <green@FreeBSD.org>
2000-01-18 01:30:03 +00:00