Commit Graph

822 Commits

Author SHA1 Message Date
jhb
4806d88677 Change the kernel's ucred API as follows:
- crhold() returns a reference to the ucred whose refcount it bumps.
- crcopy() now simply copies the credentials from one credential to
  another and has no return value.
- a new crshared() primitive is added which returns true if a ucred's
  refcount is > 1 and false (0) otherwise.
2001-10-11 23:38:17 +00:00
jhb
4c62ba7c58 Add missing includes of sys/lock.h. 2001-10-11 17:52:20 +00:00
dillon
dbebfe18a1 Remove panics for rename() race conditions. The panics are inappropriate
because the IN_RENAME flag only fixes a few of the huge number of race
conditions that can result in the source path becoming invalid even
prior to the VOP_RENAME() call.  The panics created a serious security
issue whereby an attacker could fairly easily cause the panic to
occur, crashing the machine.

The correct solution requires a great deal of work in the namei
path cache code.

MFC after:	0 days
2001-10-08 00:37:54 +00:00
rwatson
63b7da3bb4 o Replace two direct uid!=0 comparisons with suser_xxx() calls.
Obtained from:	TrustedBSD Project
2001-10-02 14:41:43 +00:00
rwatson
bf46cc0b03 o Replace two direct uid!=0 comparisons with suser_td() calls.
Obtained from:	TrustedBSD Project
2001-10-02 14:34:22 +00:00
dillon
c998ec6a97 Backout the last commit. The problem is actually much worse then I
first thought and may require serious work to the VOP_RENAME() api itself.
Basically, by the time the VOP_RENAME() function is called, it's already
too late.
2001-10-02 04:26:58 +00:00
dillon
5eaf310a14 IN_RENAME should only be cleared by the routine that set it. This fixes
a rename/rmdir race that has been shown to cause a panic.

Bug reported by: Yevgeniy Aleynikov <eugenea@infospace.com>
MFC after:	3 days
2001-10-02 02:58:48 +00:00
jhb
fe3fc4701b - Fix some minor whitespace nits.
- Move the SPECIAL_FLAG #define up next to the NOHOLDER #define and fix a
  little nit that caused it to be defined as -(sizeof (struct thread) + 1)
  instead of -2.
2001-09-27 21:04:13 +00:00
rwatson
9eed33b643 o Re-enable support of system file flags in jail() by adding back the
PRISON_ROOT to the suser_xxx() check.  Since securelevels may now
  be raised in specific jails, use of system flags can still be
  restricted in jail(), but in a more configurable way.
o Users of jail() expecting system flags (such as schg) to restrict
  jail()'s should be sure to set the securelevel appropriately in
  jail()'s.
o This fixes activities involving automated system flag removal in
  jail(), including installkernel and friends.

Obtained from:	TrustedBSD Project
2001-09-26 20:44:41 +00:00
rwatson
4e4d85b5d1 o Modify ufs_setattr() so that it uses securelevel_gt() instead of
direct variable access.

Obtained from:	TrustedBSD Project
2001-09-26 20:31:37 +00:00
rwatson
56db2a389f o Further clarify comment: ad Udo's request, re-insert the 'if'
refering to securelevels; also, update the unprivileged process text
  to better indicate the scope of actions permittable when any system
  flags are already set (limited).

Submitted by:	Udo Schweigert <udo.schweigert@siemens.com>
2001-09-25 12:02:44 +00:00
rwatson
de37df3afa o Parallelize the comment on the relationship between privileged un-jailed
processes and the actual securelevel check: make the comment use '> 0'
  instead of inverted '<= 0'.
2001-09-25 02:26:10 +00:00
iedowse
af5c7958aa The addition of i_dirhash to struct inode pushed RELENG_4's
sizeof(struct inode) into a new malloc bucket on the i386. This
didn't happen in -current due to the removal of i_lock, but it does
no harm to apply the workaround to -current first.

Reduce the size of the i_spare[] array in struct inode from 4 to
3 entries, and change ext2fs to use i_din.di_spare[1] so that it
does not need i_spare[3].

Reviewed by:	bde
MFC after:	3 days
2001-09-24 18:29:20 +00:00
julian
5596676e6c KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
iedowse
ddaced1703 The "dirpref" directory layout preference improvements make use of
an array "fs_contigdirs[]" to avoid too many directories getting
created in each cylinder group. The memory required for this and
two other arrays (fs_csp[] and fs_maxcluster[]) is allocated with
a single malloc() call, and divided up afterwards.  However, the
'space' pointer is not advanced correctly, so fs_contigdirs and
fs_maxcluster end up pointing to the same address.

Add the missing code to advance the 'space' pointer, and remove
an unnecessary update of the pointer that follows.

This is likely to fix the "ffs_clusteralloc: map mismatch" panics
that have been reported recently.

Submitted by:		Luke Mewburn <lukem@wasabisystems.com>
2001-09-09 23:48:28 +00:00
jedgar
5b9926665b Use ACL_PERM_NONE instead of hardcoding 0 when initializing
ACL entry permissions.

Reviewed by:	rwatson
2001-09-01 23:18:15 +00:00
rwatson
00ec5e8482 o At some point, unmounting a non-EA file system with EA's compiled
in got a bit broken, when ufs_extattr_stop() was called and failed,
  ufs_extattr_destroy() would panic.  This makes the call to destroy()
  conditional on the success of stop().

Submitted by:		Christian Carstensen <cc@devcon.net>
Obtained from:	TrustedBSD Project
2001-09-01 20:11:05 +00:00
peter
4b437abe78 If a file has been completely unlinked, stop automatically syncing the
file.  ffs will discard any pending dirty pages when it is closed,
so we may as well not waste time trying to clean them.  This doesn't
stop other things from writing it out, eg: pageout, fsync(2) etc.
2001-08-27 06:09:56 +00:00
iedowse
bff1ce4c20 Stop using dirhash when a directory is removed, and ensure that we
never attempt to hash directories once they are deleted. This fixes
a problem where operations on a deleted directory could trigger
dirhash sanity panics.
2001-08-26 20:47:19 +00:00
iedowse
c8ef91ce6c When compacting directories, ufs_direnter() always trusted DIRSIZ()
to supply the number of bytes to be bcopy()'d to move an entry. If
d_ino == 0 however, DIRSIZ() is not guaranteed to return a sensible
length, so ufs_direnter could end up corrupting a directory during
compaction. In practice I believe this can only happen after fsck_ffs
has fixed a previously-corrupted directory.

We now deal with any mid-block unused entries specially to avoid
using DIRSIZ() or bcopy() on such entries. We also ensure that the
variables 'dsize' and 'spacefree' contain meaningful values at all
times. Add a few comments to describe better this intricate piece
of code.

The special handling of mid-block unused entries makes the dirhash-
specific bugfix in the previous revision (1.53) now uncecessary,
so this change removes it.

Reviewed by:	mckusick
2001-08-26 01:25:12 +00:00
iedowse
c37a1a0228 When compressing directory blocks, the dirhash code didn't check
that the directory entry was in use before attempting to find it
in the hash structures to change its offset. Normally, unused
entries do not need to be moved, but fsck can leave behind some
unused entries that do. A dirhash sanity panic resulted when the
entry to be moved was not found. Add a check that stops entries
with d_ino == 0 from being passed to ufsdirhash_move().
2001-08-22 01:35:17 +00:00
peter
886fd9ad64 Sigh. ufs_lookup() calls ffs_snapgone(), meaning that 'options EXT2FS'
without 'options FFS' would fail to link.
2001-08-18 03:08:48 +00:00
iedowse
c768482a0f Two recent commits in sys/ufs/ufs interacted badly with ext2fs
because it shares ufs code. In ufs_fhtovp(), the test on i_effnlink
is invalid because ext2fs does not maintain this field. In ufs_close(),
i_effnlink is also tested, to determines whether or not to call
vn_start_write(). The ufs_fhtovp issue breaks NFS exporting of
ext2fs filesystems; I believe the other is harmless.

Fix both cases by checking um_i_effnlink_valid in the ufsmount
struct, and use i_nlink if necessary.

Noticed by:	bde
Reviewed by:	mckusick, bde
2001-07-29 22:26:01 +00:00
iedowse
5857132545 Disable the dirhash sanity check that panics if an unused directory
entry (d_ino == 0) is found in a position that is not the start of
a DIRBLKSIZ block.

While such entries cannot occur normally (ufs always extends the
previous entry to cover the free space instead), they do not cause
problems and fsck does not fix them, so panicking is bad.
2001-07-27 18:45:41 +00:00
peter
57a6887663 Use a fixed type for times in on-disk structures for ufs rather than
something that could potentially change like time_t.
2001-07-16 00:55:27 +00:00
iedowse
991ee42593 Return a locked struct buf from ufsdirhash_lookup() to avoid one
extra getblk/brelse sequence for each lookup. We already had this
buf in ufsdirhash_lookup(), so there was no point in brelse'ing it
only to have the caller immediately reaquire the same buffer.

This should make the case of sequential lookups marginally faster;
in my tests, sequential lookups with dirhash enabled are now only
around 1% slower than without dirhash.
2001-07-13 20:50:38 +00:00
iedowse
ecbac42d61 Bring in dirhash, a simple hash-based lookup optimisation for large
directories. When enabled via "options UFS_DIRHASH", in-core hash
arrays are maintained for large directories. These allow all
directory operations to take place quickly instead of requiring
long linear searches. For now anyway, dirhash is not enabled by
default.

The in-core hash arrays have a memory requirement that is approximately
half the size of the size of the on-disk directory file. A number
of new sysctl variables allow control over which directories get
hashed and over the maximum amount of memory that dirhash will use:

  vfs.ufs.dirhash_minsize
    The minimum on-disk directory size for which hashing should be
    used. The default is 2560 (2.5k).

  vfs.ufs.dirhash_maxmem
    The system-wide maximum total memory to be used by dirhash data
    structures. The default is 2097152 (2MB).

The current amount of memory being used by dirhash is visible
through the read-only sysctl variable vfs.ufs.dirhash_maxmem.
Finally, some extra sanity checks that are enabled by default, but
which may have an impact on performance, can be disabled by setting
vfs.ufs.dirhash_docheck to 0.

Discussed on: -fs, -hackers
2001-07-10 21:21:29 +00:00
dillon
e028603b7e With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage).  Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
2001-07-04 16:20:28 +00:00
jhb
1bc2ddffa0 Fix more mntvnode and vnode interlock order reversals. 2001-06-28 22:21:33 +00:00
jhb
822af43e20 - Fix a mntvnode and vnode interlock reversal.
- Protect the mnt_vnode list with the mntvnode lock.
- Use queue(9) macros.
2001-06-28 04:12:56 +00:00
peter
efc05ef868 Fix warning:
1973: warning: int format, long int arg (arg 5)
2001-06-15 07:44:39 +00:00
mckusick
2f9709c0f1 Build on the change in revision 1.98 by Tor.Egge@fast.no.
The symptom being treated in 1.98 was to avoid freeing a
pagedep dependency if there was still a newdirblk dependency
referencing it. That change is correct and no longer prints
a warning message when it occurs. The other part of revision
1.98 was to panic when a newdirblk dependency was encountered
during a file truncation. This fix removes that panic and
replaces it with code to find and delete the newdirblk
dependency so that the truncation can succeed.
2001-06-13 23:13:13 +00:00
tmm
126158251a Call vn_close on the backing file vnode if ufs_extattr_enable failed to
avoid leaking it.

Reviewed by:	rwatson
2001-06-07 00:11:32 +00:00
jlemon
bf3a0c1bb1 Add a wrapper for the fifo kqfilter which falls through to the ufs routine.
This permits the fifo to inherit the ufs VNODE kqfilter.
2001-06-06 17:40:57 +00:00
jlemon
404f6dd2da Add a kqueue filter for writing to ufs filesystems which always returns
true.  This permits better interoperability with programs which register
filters on their stdin/stdout handles.

Submitted by: Niels Provos <provos@citi.umich.edu>
2001-06-05 13:52:37 +00:00
obrien
c3d076fe28 There seems to be a problem that the order of disk write operation being
incorrect due to a missing check for some dependency.  This change
avoids the freelist corruption (but not the temporarily inconsistent
state of the file system).

A message is printed as a reminder of the under lying problem when a
pagedep structure is not freed due to the NEWBLOCK flag being set.

Submitted by:	Tor.Egge@fast.no
2001-06-05 01:49:37 +00:00
jhb
ff6bb62be3 Revert the previous commit in favor of the fix in rev 1.42 of
ufs/ffs/ffs_extern.h instead.

Requested by:	bde
2001-05-30 23:09:19 +00:00
jhb
b033de0fdc Forward declare struct cg to quiet a warning.
Submitted by:	bde
2001-05-30 23:08:40 +00:00
jhb
3d4bb0e38d Include <ufs/ffs/fs.h> to get the definition of struct cg to quiet a
warning.
2001-05-29 23:53:16 +00:00
phk
d442761285 Remove last vestiges of MFS. 2001-05-29 21:21:53 +00:00
phk
785e1e1e17 Remove MFS from the kernel. 2001-05-29 18:50:30 +00:00
tmm
76ca05f26f Add a check to determine whether extended attributes have been
initialized on the file system before trying to grab the lock of the
per-mount extattr structure, as this lock is unitialized in that case.
This is needed because ufs_extattr_vnode_inactive is called from
ufs_inactive, which is also used by EA-unaware file systems such as
ext2fs.

Reviewed by:	rwatson
2001-05-25 18:24:52 +00:00
rwatson
f504530d9f o Merge contents of struct pcred into struct ucred. Specifically, add the
real uid, saved uid, real gid, and saved gid to ucred, as well as the
  pcred->pc_uidinfo, which was associated with the real uid, only rename
  it to cr_ruidinfo so as not to conflict with cr_uidinfo, which
  corresponds to the effective uid.
o Remove p_cred from struct proc; add p_ucred to struct proc, replacing
  original macro that pointed.
  p->p_ucred to p->p_cred->pc_ucred.
o Universally update code so that it makes use of ucred instead of pcred,
  p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo,
  cr_{r,sv}{u,g}id instead of p_*, etc.
o Remove pcred0 and its initialization from init_main.c; initialize
  cr_ruidinfo there.
o Restruction many credential modification chunks to always crdup while
  we figure out locking and optimizations; generally speaking, this
  means moving to a structure like this:
        newcred = crdup(oldcred);
        ...
        p->p_ucred = newcred;
        crfree(oldcred);
  It's not race-free, but better than nothing.  There are also races
  in sys_process.c, all inter-process authorization, fork, exec, and
  exit.
o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid;
  remove comments indicating that the old arrangement was a problem.
o Restructure exec1() a little to use newcred/oldcred arrangement, and
  use improved uid management primitives.
o Clean up exit1() so as to do less work in credential cleanup due to
  pcred removal.
o Clean up fork1() so as to do less work in credential cleanup and
  allocation.
o Clean up ktrcanset() to take into account changes, and move to using
  suser_xxx() instead of performing a direct uid==0 comparision.
o Improve commenting in various kern_prot.c credential modification
  calls to better document current behavior.  In a couple of places,
  current behavior is a little questionable and we need to check
  POSIX.1 to make sure it's "right".  More commenting work still
  remains to be done.
o Update credential management calls, such as crfree(), to take into
  account new ruidinfo reference.
o Modify or add the following uid and gid helper routines:
      change_euid()
      change_egid()
      change_ruid()
      change_rgid()
      change_svuid()
      change_svgid()
  In each case, the call now acts on a credential not a process, and as
  such no longer requires more complicated process locking/etc.  They
  now assume the caller will do any necessary allocation of an
  exclusive credential reference.  Each is commented to document its
  reference requirements.
o CANSIGIO() is simplified to require only credentials, not processes
  and pcreds.
o Remove lots of (p_pcred==NULL) checks.
o Add an XXX to authorization code in nfs_lock.c, since it's
  questionable, and needs to be considered carefully.
o Simplify posix4 authorization code to require only credentials, not
  processes and pcreds.  Note that this authorization, as well as
  CANSIGIO(), needs to be updated to use the p_cansignal() and
  p_cansched() centralized authorization routines, as they currently
  do not take into account some desirable restrictions that are handled
  by the centralized routines, as well as being inconsistent with other
  similar authorization instances.
o Update libkvm to take these changes into account.

Obtained from:	TrustedBSD Project
Reviewed by:	green, bde, jhb, freebsd-arch, freebsd-audit
2001-05-25 16:59:11 +00:00
dillon
a179ee09ab This patch implements O_DIRECT about 80% of the way. It takes a patchset
Tor created a while ago, removes the raw I/O piece (that has cache coherency
problems), and adds a buffer cache / VM freeing piece.

Essentially this patch causes O_DIRECT I/O to not be left in the cache, but
does not prevent it from going through the cache, hence the 80%.  For
the last 20% we need a method by which the I/O can be issued directly to
buffer supplied by the user process and bypass the buffer cache entirely,
but still maintain cache coherency.

I also have the code working under -stable but the changes made to sys/file.h
may not be MFCable, so an MFC is not on the table yet.

Submitted by:	tegge, dillon
2001-05-24 07:22:27 +00:00
alfred
edd79c680a ufs_bmaparray() may block on IO, drop vm mutex and aquire Giant when
calling it from the pager routine
2001-05-23 10:30:25 +00:00
ru
35437d86aa - FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION file
systems were repo-copied from sys/miscfs to sys/fs.

- Renamed the following file systems and their modules:
  fdesc -> fdescfs, portal -> portalfs, union -> unionfs.

- Renamed corresponding kernel options:
  FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS.

- Install header files for the above file systems.

- Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland
  Makefiles.
2001-05-23 09:42:29 +00:00
mckusick
c9a89dbc03 Update softdep_setup_directory_add prototype to reflect changes in
actual function.

Obtained from:	Jim Bloom <bloom@jbloom.jbloom.org>
2001-05-20 15:59:55 +00:00
mckusick
aac9daff0f Must ensure that all the entries on the pd_pendinghd list have been
committed to disk before clearing them. More specifically, when
free_newdirblk is called, we know that the inode claims the new
directory block. However, if the associated pagedep is still linked
onto the directory buffer dependency chain, then some of the entries
on the pd_pendinghd list may not be committed to disk yet. In this
case, we will simply note that the inode claims the block and let
the pd_pendinghd list be processed when the pagedep is next written.
If the pagedep is no longer on the buffer dependency chain, then
all the entries on the pd_pending list are committed to disk and
we can free them in free_newdirblk. This corrects a window of
vulnerability introduced in the code added in version 1.95.
2001-05-19 19:24:26 +00:00
alfred
a3f0842419 Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb
2001-05-19 01:28:09 +00:00
mckusick
c8947d0f03 Must be a bit less aggressive about freeing pagedep structures.
Obtained from:	Robert Watson <rwatson@FreeBSD.org> and
		Matthew Jacob <mjacob@feral.com>
2001-05-18 22:16:28 +00:00