Commit Graph

708 Commits

Author SHA1 Message Date
Jason Evans
0384fff8c5 Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*().  See mutex(9).  (Note: The
  alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
  preempted (i386 only).

Partially contributed by:	BSDi (BSD/OS)
Submissions by (at least):	cp, dfr, dillon, grog, jake, jhb, sheldonh
2000-09-07 01:33:02 +00:00
Poul-Henning Kamp
93bcdfe270 Add refcounts to the "global" DEVFS inode slots, this allows us
to recycle inodes after a destroy_dev() but not until all mounts
have picked up the change.

Add support for an overflow table for DEVFS inodes.  The static
table defaults to 1024 inodes, if that fills, an overflow table
of 32k inodes is allocated.  Both numbers can be changed at
compile time, the size of the overflow table also with the
sysctl vfs.devfs.noverflow.

Use atomic instructions to barrier between make_dev()/destroy_dev()
and the mounts.

Add lockmgr() locking of directories for operations accessing or
modifying the directory TAILQs.

Various nitpicking here and there.
2000-09-06 11:26:43 +00:00
Boris Popov
8da8066061 Various cleanups towards make nullfs functional (it is still broken
at this point):

Replace all '#ifdef DEBUG' with '#ifdef NULLFS_DEBUG' and add NULLFSDEBUG
macro.

Protect nullfs hash table with lockmgr.

Use proper order of operations when freeing mnt_data.

Return correct fsid in the null_getattr().

Add null_open() function to catch MNT_NODEV (obtained from NetBSD).

Add null_rename() to catch cross-fs rename operations (submitted by
Ustimenko Semen <semen@iclub.nsu.ru>)

Remove duplicate $FreeBSD$ tags.
2000-09-05 09:02:07 +00:00
Boris Popov
7da1e3f0b0 Get rid from the __P() macros.
Encouraged by:	peter
2000-09-05 07:54:39 +00:00
Poul-Henning Kamp
7665e72021 Off by one error.
Submitted by:	des
2000-09-04 18:24:30 +00:00
Dag-Erling Smørgrav
18203af447 Remove a comment that has been not only obsolete but patently wrong for the
last 31 revisions (almost three years).
2000-09-04 18:18:17 +00:00
Poul-Henning Kamp
db90128160 Avoid the modules madness I inadvertently introduced by making the
cloning infrastructure standard in kern_conf.  Modules are now
the same with or without devfs support.

If you need to detect if devfs is present, in modules or elsewhere,
check the integer variable "devfs_present".

This happily removes an ugly hack from kern/vfs_conf.c.

This forces a rename of the eventhandler and the standard clone
helper function.

Include <sys/eventhandler.h> in <sys/conf.h>: it's a helper #include
like <sys/queue.h>

Remove all #includes of opt_devfs.h they no longer matter.
2000-09-02 19:17:34 +00:00
Robert Watson
84a5637620 o Simplify if/then clause equating ESRCH with ENOENT when hiding a process
Submitted by:	des
2000-09-01 18:41:32 +00:00
Robert Watson
ca94dd37a3 o Make procfs use vaccess() for procfs_access() DAC and super-user checks,
rather than implementing its own {uid,gid,other} checks against vnode
  mode.  Similar change to linprocfs currently under review.

Obtained from:	TrustedBSD Project
2000-09-01 13:41:41 +00:00
Robert Watson
387d2c036b o Centralize inter-process access control, introducing:
int p_can(p1, p2, operation, privused)

  which allows specification of subject process, object process,
  inter-process operation, and an optional call-by-reference privused
  flag, allowing the caller to determine if privilege was required
  for the call to succeed.  This allows jail, kern.ps_showallprocs and
  regular credential-based interaction checks to occur in one block of
  code.  Possible operations are P_CAN_SEE, P_CAN_SCHED, P_CAN_KILL,
  and P_CAN_DEBUG.  p_can currently breaks out as a wrapper to a
  series of static function checks in kern_prot, which should not
  be invoked directly.

o Commented out capabilities entries are included for some checks.

o Update most inter-process authorization to make use of p_can() instead
  of manual checks, PRISON_CHECK(), P_TRESPASS(), and
  kern.ps_showallprocs.

o Modify suser{,_xxx} to use const arguments, as it no longer modifies
  process flags due to the disabling of ASU.

o Modify some checks/errors in procfs so that ENOENT is returned instead
  of ESRCH, further improving concealment of processes that should not
  be visible to other processes.  Also introduce new access checks to
  improve hiding of processes for procfs_lookup(), procfs_getattr(),
  procfs_readdir().  Correct a bug reported by bp concerning not
  handling the CREATE case in procfs_lookup().  Remove volatile flag in
  procfs that caused apparently spurious qualifier warnigns (approved by
  bde).

o Add comment noting that ktrace() has not been updated, as its access
  control checks are different from ptrace(), whereas they should
  probably be the same.  Further discussion should happen on this topic.

Reviewed by:	bde, green, phk, freebsd-security, others
Approved by:	bde
Obtained from:	TrustedBSD Project
2000-08-30 04:49:09 +00:00
Robert Watson
012c643d3e o Restructure vaccess() so as to check for DAC permission to modify the
object before falling back on privilege.  Make vaccess() accept an
  additional optional argument, privused, to determine whether
  privilege was required for vaccess() to return 0.  Add commented
  out capability checks for reference.  Rename some variables to make
  it more clear which modes/uids/etc are associated with the object,
  and which with the access mode.
o Update file system use of vaccess() to pass NULL as the optional
  privused argument.  Once additional patches are applied, suser()
  will no longer set ASU, so privused will permit passing of
  privilege information up the stack to the caller.

Reviewed by:	bde, green, phk, -security, others
Obtained from:	TrustedBSD Project
2000-08-29 14:45:49 +00:00
Poul-Henning Kamp
c32d0a1dcd Reorder vop's alphabetically.
Smarter use of devfs_allocv() (from bp@)
 Introduce devfs_find()
 ".." fixes to devfs_lookup (from bp@)
2000-08-27 14:46:36 +00:00
Poul-Henning Kamp
749e1537ec Minor cleanups tp devfs_readdir();
Add devfs_read() for directories.  (inspired by bp@)
2000-08-26 16:20:57 +00:00
Bruce Evans
ff4ad0c4d8 Quick fix for msdsofs_write() on alphas and other machines with either
longs larger than 32 bits or strict alignment requirements.

pm_fatmask had type u_long, but it must have a type that has precisely
32 bits and this type must be no smaller than int, so that ~pmp->pm_fatmask
has no bits above the 31st set.  Otherwise, comparisons between (cn
| ~pmp->pm_fatmask) and magic 32-bit "cluster" numbers always fail.
The correct fix is to use the C99 type uint_least32_t and mask with
0xffffffff.  The quick fix is to use u_int32_t and assume that ints
have

msdosfs metadata is riddled with unaligned fields, and on alphas,
unaligned_fixup() apparently has problems fixing up the unaligned
accesses caused by this.  The quick fix is to not comment out the
NetBSD code that sort of handles this, and define UNALIGNED_ACCESS on
i386's so that the code doesn't change on i386's.  The correct fix
would define UNALIGNED_ACCESS in a central machine-dependent header
and maybe add some extra cases to unaligned_fixup().  UNALIGNED_ACCESS
is also tested in isofs.

Submitted by:	parts by Mark Abene <phiber@radicalmedia.com>
PR:		19086
2000-08-25 09:03:58 +00:00
Poul-Henning Kamp
a481b90b82 Fix panic when removing open device (found by bp@)
Implement subdirs.
 Build the full "devicename" for cloning functions.
 Fix panic when deleted device goes away.
 Collaps devfs_dir and devfs_dirent structures.
 Add proper cloning to the /dev/fd* "device-"driver.
 Fix a bug in make_dev_alias() handling which made aliases appear
  multiple times.
 Use devfs_clone to implement getdiskbyname()
 Make specfs maintain the stat(2) timestamps per dev_t
2000-08-24 15:36:55 +00:00
Poul-Henning Kamp
fcc9b84ca5 Fix devfs_access() bug on directories.
Remove unused #includes.

Bug spotted by:	markm
2000-08-21 14:45:19 +00:00
Poul-Henning Kamp
3f54a085a6 Remove all traces of Julians DEVFS (incl from kern/subr_diskslice.c)
Remove old DEVFS support fields from dev_t.

  Make uid, gid & mode members of dev_t and set them in make_dev().

  Use correct uid, gid & mode in make_dev in disk minilayer.

  Add support for registering alias names for a dev_t using the
  new function make_dev_alias().  These will show up as symlinks
  in DEVFS.

  Use makedev() rather than make_dev() for MFSs magic devices to prevent
  DEVFS from noticing this abuse.

  Add a field for DEVFS inode number in dev_t.

  Add new DEVFS in fs/devfs.

  Add devfs cloning to:
        disk minilayer (ie: ad(4), sd(4), cd(4) etc etc)
        md(4), tun(4), bpf(4), fd(4)

  If DEVFS add -d flag to /sbin/inits args to make it mount devfs.

  Add commented out DEVFS to GENERIC
2000-08-20 21:34:39 +00:00
Poul-Henning Kamp
e39c53eda5 Centralize the canonical vop_access user/group/other check in vaccess().
Discussed with: bde
2000-08-20 08:36:26 +00:00
Poul-Henning Kamp
39f70682ae Introduce vop_stdinactive() and make it the default if no vop_inactive
is declared.

Sort and prune a few vop_op[].
2000-08-18 10:01:02 +00:00
Sheldon Hearn
1b2fbe6ff9 Rename the loadable nullfs kernel module: null -> nullfs 2000-07-28 11:54:09 +00:00
Kirk McKusick
9b97113391 This patch corrects the first round of panics and hangs reported
with the new snapshot code.

Update addaliasu to correctly implement the semantics of the old
checkalias function. When a device vnode first comes into existence,
check to see if an anonymous vnode for the same device was created
at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than
creating a new vnode for the device. This corrects a problem which
caused the kernel to panic when taking a snapshot of the root
filesystem.

Change the calling convention of vn_write_suspend_wait() to be the
same as vn_start_write().

Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue when suspending filesystem
operations.

Access to buffers becomes recursive so that snapshots can recursively
traverse their indirect blocks using ffs_copyonwrite() when checking
for the need for copy on write when flushing one of their own indirect
blocks. This eliminates a deadlock between the syncer daemon and a
process taking a snapshot.

Ensure that softdep_process_worklist() can never block because of a
snapshot being taken. This eliminates a problem with buffer starvation.

Cleanup change in ffs_sync() which did not synchronously wait when
MNT_WAIT was specified. The result was an unclean filesystem panic
when doing forcible unmount with heavy filesystem I/O in progress.

Return a zero'ed block when reading a block that was not in use at
the time that a snapshot was taken. Normally, these blocks should
never be read. However, the readahead code will occationally read
them which can cause unexpected behavior.

Clean up the debugging code that ensures that no blocks be written
on a filesystem while it is suspended. Snapshots must explicitly
label the blocks that they are writing during the suspension so that
they do not cause a `write on suspended filesystem' panic.

Reorganize ffs_copyonwrite() to eliminate a deadlock and also to
prevent a race condition that would permit the same block to be
copied twice. This change eliminates an unexpected soft updates
inconsistency in fsck caused by the double allocation.

Use bqrelse rather than brelse for buffers that will be needed
soon again by the snapshot code. This improves snapshot performance.
2000-07-24 05:28:33 +00:00
David Malone
4ebb509c1f Certain error contitions cause msdosfs_rename() to decrement the
vnode reference count on 'fdvp' more times than it should.

PR:		17347
Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
Approved by:	bde
2000-07-14 11:52:56 +00:00
Kirk McKusick
f2a2857bb3 Add snapshots to the fast filesystem. Most of the changes support
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.

Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).
2000-07-11 22:07:57 +00:00
Poul-Henning Kamp
77978ab8bc Previous commit changing SYSCTL_HANDLER_ARGS violated KNF.
Pointed out by:	bde
2000-07-04 11:25:35 +00:00
Poul-Henning Kamp
b7ffb34243 Pull the rug under block mode devices. they return ENXIO on open(2) now. 2000-07-03 13:48:37 +00:00
Poul-Henning Kamp
82d9ae4e32 Style police catches up with rev 1.26 of src/sys/sys/sysctl.h:
Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our
sources:

        -sysctl_vm_zone SYSCTL_HANDLER_ARGS
        +sysctl_vm_zone (SYSCTL_HANDLER_ARGS)
2000-07-03 09:35:31 +00:00
Boris Popov
6843189ab9 Fix memory leakage on module unload.
Spotted by:	fixed INVARIANTS code
2000-06-29 01:19:12 +00:00
Boris Popov
432a84000f Fix memory leakage on module unload.
Spotted by:	fixed INVARIANTS code
2000-06-29 01:12:47 +00:00
Chris Costello
726bd7bebf fdesc_getattr:
Don't fake any file types, just set vap->va_type to IFTOVT(stb.st_mode).
  If something does not report its mode, vap->va_type is set to VNON
  accordingly.
2000-06-28 19:18:25 +00:00
Alfred Perlstein
974c7068d7 by changing the logic here we can support dynamic additions of new
filetypes.

Reviewed by: green
2000-06-27 22:46:35 +00:00
Alfred Perlstein
70cb8de9ba if there are leading zeros fail the lookup
Pointed out by: Alexander Viro <viro@math.psu.edu>
2000-06-27 21:37:17 +00:00
Boris Popov
b1bd38b351 Remove obsolete comment.
Submitted by:	Marius Bendiksen <mbendiks@eunet.no>
2000-06-25 02:29:45 +00:00
Chris Costello
04e58856a6 Rename the VRXEC' macro used to clear read and exec bits to FDRX' so
as not to impede upon VFS namespace.
2000-06-20 20:34:11 +00:00
Poul-Henning Kamp
a2e7a027a7 Virtualizes & untangles the bioops operations vector.
Ref: Message-ID: <18317.961014572@critter.freebsd.dk> To: current@
2000-06-16 08:48:51 +00:00
Chris Costello
0b1574bd33 Remove unused include <sys/socketvar.h>. 2000-06-15 20:13:51 +00:00
Chris Costello
3b5ec07347 Replace vattr_null() with VATTR_NULL() and do not explicity set vattr
fields to VNOVAL afterwards.
2000-06-15 17:19:22 +00:00
Jonathan M. Bresler
caea99a368 before this commit, specfs reported disk partitions
using decimal major and minor numbers.  "ls -l" reports
	disk partitions using decimal major numbers and hex
	minor numbers.

	make specfs use decimal major numbers and hex minor numbers,
	just like "ls -l"
2000-06-12 10:20:18 +00:00
Chris Costello
f574046378 Instead of completely disallowing VOP_SETATTR, just do it where there is
an underlying vnode.

Suggested by:	bde
2000-06-06 00:35:39 +00:00
Chris Costello
c1e15e06bb Update the comment for fdesc_setattr to reflect that we no longer
actually setattr() on underlying vnodes.
2000-06-02 07:08:18 +00:00
Chris Costello
908a81d424 - Do not allow VOP_SETATTR to modify underlying vnodes at all. This caused
problems when fetch(1) was passed `-o -'.  The rationale of this change
  is that applications attempting to change underlying vnodes for /dev/fd
  nodes are improperly written and the use of this interface should not
  ever have been encouraged.  Proper alternatives are fchmod, fchown and
  others.

  PR:		18952

- Remove stale, unused fdescnode->fd_link structure member.
2000-06-02 07:02:45 +00:00
Jake Burkholder
e39756439c Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by:		msmith and others
2000-05-26 02:09:24 +00:00
Jake Burkholder
740a1973a6 Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by:	phk
Reviewed by:	phk
Approved by:	mdodd
2000-05-23 20:41:01 +00:00
Chris Costello
38edb6a32f Adapt fdesc to be mounted on /dev/fd and remove fd, stdin, stdout and
stderr nodes.  More specific items of this patch:
  o Removed support for symbolic links, and the need for
    fdesc_readlink().
  o Put all the code from fdesc_attr() into fdesc_getattr() and removed
    fdesc_attr().  This also made it easier to properly give all nodes
    unique inode numbers.
  o The removal of all non-fd nodes allowed the removal of the fdesc_read(),
    fdesc_write(), and fdesc_ioctl() nodes, since we no longer have nodes
    that get special handling.
  o Correct the component name validity-checking in fdesc_lookup().  It
    previously detected the end of the string by checking for a terminating
    NUL, now it uses cnp->cn_namelen.
  o Handle kqueue files as FIFOs.  This is probably the closest file type
    to represent this type of file there is, and it is unfortunately not
    very representative of a kqueue.  Creation time is not supported by
    kqueue, so ctime, mtime and atime are all set to the current time when
    getattr() was called.
  o Also set st_[mca]time to the current time since there's no data in
    socket structures that can be used to fill this in (FIFOs).
  o Simplify fdesc_readdir() since it only has to report the numbered
    fd nodes.  Add `.' and `..' directory links as well.
  o Remove read bits from directories as they tend to confuse programs
    like tar(1).

Reviewed by:	phk
Discussed with:	bde (earlier on, not quite review)
2000-05-11 22:10:51 +00:00
Poul-Henning Kamp
192c06ea1b Change the "bdev-whiner" to whine when open is attempted and extend
the deadline a month.
2000-05-09 18:53:57 +00:00
Poul-Henning Kamp
9626b608de Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by:    peter
2000-05-05 09:59:14 +00:00
Poul-Henning Kamp
e9340d96fa Remove 42 unneeded #include <sys/ioccom.h>.
ioccom.h defines only implementation detail, and should therefore
only be included from the #include which defines the ioctl tags,
in other words: never include it from *.c
2000-05-03 07:31:38 +00:00
Peter Wemm
365c5db0a7 Add $FreeBSD$ 2000-05-01 20:32:07 +00:00
Poul-Henning Kamp
2c9b67a8df Remove unneeded #include <vm/vm_zone.h>
Generated by:	src/tools/tools/kerninclude
2000-04-30 18:52:11 +00:00
Poul-Henning Kamp
eb95c536ad Remove unneeded #include <sys/kernel.h> 2000-04-29 15:36:14 +00:00
Peter Wemm
6ff4e7af7e nwfs depends on ncp 2000-04-29 13:34:28 +00:00