Commit Graph

357 Commits

Author SHA1 Message Date
Robert Watson
30d239bc4c Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

  mac_<object>_<method/action>
  mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme.  Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier.  Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods.  Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by:	SPARTA (original patches against Mac OS X)
Obtained from:	TrustedBSD Project, Apple Computer
2007-10-24 19:04:04 +00:00
Pawel Jakub Dawidek
57fd3d5572 When we do open, we should lock the vnode exclusively. This fixes few races:
- fifo race, where two threads assign v_fifoinfo,
- v_writecount modifications,
- v_object modifications,
- and probably more...

Discussed with:	kib, ups
Approved by:	re (rwatson)
2007-07-26 16:58:09 +00:00
Konstantin Belousov
de10ffa527 Since rev. 1.199 of sys/kern/kern_conf.c, the thread that calls
destroy_dev() from d_close() cdev method would self-deadlock.
devfs_close() bump device thread reference counter, and destroy_dev()
sleeps, waiting for si_threadcount to reach zero for cdev without
d_purge method.

destroy_dev_sched() could be used instead from d_close(), to
schedule execution of destroy_dev() in another context. The
destroy_dev_sched_drain() function can be used to drain the scheduled
calls to destroy_dev_sched(). Similarly, drain_dev_clone_events() drains
the events clone to make sure no lingering devices are left after
dev_clone event handler deregistered.

make_dev_credf(MAKEDEV_REF) function should be used from dev_clone
event handlers instead of make_dev()/make_dev_cred() to ensure that created
device has reference counter bumped before cdev mutex is dropped inside
make_dev().

Reviewed by:	tegge (early versions), njl (programming interface)
Debugging help and testing by:	Peter Holm
Approved by:	re (kensmith)
2007-07-03 17:42:37 +00:00
Robert Watson
32f9753cfb Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths.  Do, however, move those prototypes to priv.h.

Reviewed by:	csjp
Obtained from:	TrustedBSD Project
2007-06-12 00:12:01 +00:00
Konstantin Belousov
9e223287c0 Revert UF_OPENING workaround for CURRENT.
Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation
argument from being file descriptor index into the pointer to struct file.

Proposed and reviewed by:	jhb
Reviewed by:	daichi (unionfs)
Approved by:	re (kensmith)
2007-05-31 11:51:53 +00:00
Robert Watson
305759909e Rename mac*devfsdirent*() to mac*devfs*() to synchronize with SEDarwin,
where similar data structures exist to support devfs and the MAC
Framework, but are named differently.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA, Inc.
2007-04-23 13:36:54 +00:00
Tom Rhodes
164554dec4 In some cases, like whenever devfs file times are zero, the fix(aa) will not
be applied to dev entries.  This leaves us with file times like "Jan 1 1970."
Work around this problem by replacing the tv_sec == 0 check with a
<= 3600 check.  It's doubtful anyone will be booting within an hour of the
Epoch, let alone care about a few seconds worth of nonzero timestamps.  It's
a hackish work around, but it does work and I have not experienced any
negatives in my testing.

Discussed with:	bde
"Ok with me:	phk
2007-04-20 01:47:05 +00:00
Robert Watson
5e3f7694b1 Replace custom file descriptor array sleep lock constructed using a mutex
and flags with an sxlock.  This leads to a significant and measurable
performance improvement as a result of access to shared locking for
frequent lookup operations, reduced general overhead, and reduced overhead
in the event of contention.  All of these are imported for threaded
applications where simultaneous access to a shared file descriptor array
occurs frequently.  Kris has reported 2x-4x transaction rate improvements
on 8-core MySQL benchmarks; smaller improvements can be expected for many
workloads as a result of reduced overhead.

- Generally eliminate the distinction between "fast" and regular
  acquisisition of the filedesc lock; the plan is that they will now all
  be fast.  Change all locking instances to either shared or exclusive
  locks.

- Correct a bug (pointed out by kib) in fdfree() where previously msleep()
  was called without the mutex held; sx_sleep() is now always called with
  the sxlock held exclusively.

- Universally hold the struct file lock over changes to struct file,
  rather than the filedesc lock or no lock.  Always update the f_ops
  field last. A further memory barrier is required here in the future
  (discussed with jhb).

- Improve locking and reference management in linux_at(), which fails to
  properly acquire vnode references before using vnode pointers.  Annotate
  improper use of vn_fullpath(), which will be replaced at a future date.

In fcntl(), we conservatively acquire an exclusive lock, even though in
some cases a shared lock may be sufficient, which should be revisited.
The dropping of the filedesc lock in fdgrowtable() is no longer required
as the sxlock can be held over the sleep operation; we should consider
removing that (pointed out by attilio).

Tested by:	kris
Discussed with:	jhb, kris, attilio, jeff
2007-04-04 09:11:34 +00:00
Kris Kennaway
6455de0029 Annotate that this giant acqusition is dependent on tty locking. 2007-03-26 21:56:46 +00:00
Tor Egge
61b9d89ff0 Make insmntque() externally visibile and allow it to fail (e.g. during
late stages of unmount).  On failure, the vnode is recycled.

Add insmntque1(), to allow for file system specific cleanup when
recycling vnode on failure.

Change getnewvnode() to no longer call insmntque().  Previously,
embryonic vnodes were put onto the list of vnode belonging to a file
system, which is unsafe for a file system marked MPSAFE.

Change vfs_hash_insert() to no longer lock the vnode.  The caller now
has that responsibility.

Change most file systems to lock the vnode and call insmntque() or
insmntque1() after a new vnode has been sufficiently setup.  Handle
failed insmntque*() calls by propagating errors to callers, possibly
after some file system specific cleanup.

Approved by:	re (kensmith)
Reviewed by:	kib
In collaboration with:	kib
2007-03-13 01:50:27 +00:00
Robert Watson
acd3428b7d Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
Robert Watson
aed5570872 Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA
2006-10-22 11:52:19 +00:00
Konstantin Belousov
16f50bcd80 Update the access and modification times for dev while still holding
thread reference on it.

Reviewed by:	tegge
Approved by:	pjd (mentor)
2006-10-20 08:03:42 +00:00
Konstantin Belousov
1663075c64 Fix the race between devfs_fp_check and devfs_reclaim. Derefence the
vnode' v_rdev and increment the dev threadcount , as well as clear it
(in devfs_reclaim) under the dev_lock().

Reviewed by:	tegge
Approved by:	pjd (mentor)
2006-10-20 07:59:50 +00:00
Konstantin Belousov
828d6d12da Properly lock the vnode around vgone() calls.
Unlock the vnode in devfs_close() while calling into the driver d_close()
routine.

devfs_revoke() changes by:	ups
Reviewed and bugfixes by:	tegge
Tested by:	mbr, Peter Holm
Approved by:	pjd (mentor)
MFC after:	1 week
2006-10-18 11:17:14 +00:00
Tor Egge
5da56ddb21 Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag.
This eliminates a race where MNT_UPDATE flag could be lost when nmount()
raced against sync(), sync_fsync() or quotactl().
2006-09-26 04:12:49 +00:00
Konstantin Belousov
af72db7175 Fix the bug in rev. 1.134. In devfs_allocv_drop_refs(), when not_found == 2
and drop_dm_lock is true, no unlocking shall be attempted. The lock is
already dropped and memory is freed.

Found with:	Coverity Prevent(tm)
CID:	1536
Approved by:	pjd (mentor)
2006-09-19 14:03:02 +00:00
Konstantin Belousov
e7f9b74438 Resolve the devfs deadlock caused by LOR between devfs_mount->dm_lock and
vnode lock in devfs_allocv. Do this by temporary dropping dm_lock around
vnode locking.

For safe operation, add hold counters for both devfs_mount and devfs_dirent,
and DE_DOOMED flag for devfs_dirent. The facilities allow to continue after
dropping of the dm_lock, by making sure that referenced memory does not
disappear.

Reviewed by:	tegge
Tested by:	kris
Approved by:	kan (mentor)
PR:		kern/102335
2006-09-18 13:23:08 +00:00
Poul-Henning Kamp
9c499ad92f Remove the NDEVFSINO and NDEVFSOVERFLOW options which no longer exists in
DEVFS.

Remove the opt_devfs.h file now that it is empty.
2006-07-17 09:07:02 +00:00
Stephan Uphoff
56eeb277cb Add vnode interlocking to devfs.
This prevents race conditions that can cause pagefaults or devfs
to use arbitrary vnodes.

MFC after:	1 week
2006-07-12 20:25:35 +00:00
Robert Watson
2551d4f66e Remove now unneeded opt_mac.h and mac.h includes.
MFC after:	3 days
2006-07-06 13:24:22 +00:00
Robert Watson
83ff52a7f3 Use #include "", not #include <> for opt_foo.h.
MFC after:	3 days
2006-07-06 13:22:08 +00:00
Pawel Jakub Dawidek
00a480ac5c Remove unused prototypes. 2006-04-12 12:17:29 +00:00
Jeff Roberson
23b77994f2 - Add a bogus vhold/vdrop around vgone() in devfs_revoke. Without this
the vnode is never recycled.  It is bogus because the reference really
   should be associated with the devfs dirent.
2006-03-31 23:37:29 +00:00
Jeff Roberson
f50b03bfd6 - We must hold a reference to a vnode before calling vgone() otherwise
it may not be removed from the freelist.

MFC After:	1 week
Found by:	kris
2006-02-22 09:05:40 +00:00
Jeff Roberson
3b77d80cdd - Remove a stale comment. This function was rewritten to be SMP safe some
time ago.

Sponsored by:	Isilon Systems, Inc.
2006-01-30 08:24:14 +00:00
Robert Watson
8f0d99d790 When returning EIO from DEVFSIO_RADD ioctl, drop the exclusive rule
lock.  Otherwise the system comes to a rather sudden and grinding
halt.

MFC after:	1 week
2006-01-03 09:49:10 +00:00
Doug White
16e35dcc39 This is a workaround for a complicated issue involving VFS cookies and devfs.
The PR and patch have the details. The ultimate fix requires architectural
changes and clarifications to the VFS API, but this will prevent the system
from panicking when someone does "ls /dev" while running in a shell under the
linuxulator.

This issue affects HEAD and RELENG_6 only.

PR:		88249
Submitted by:	"Devon H. O'Dell" <dodell@ixsystems.com>
MFC after:	3 days
2005-11-09 22:03:50 +00:00
Poul-Henning Kamp
3b72f38b5e Use correct cirteria for determining which directory entries we can
purge right away and which we merely can hide.

Beaten into my skull by:	kris
2005-10-18 20:21:25 +00:00
Poul-Henning Kamp
e515ee7832 Make rule zero really magical, that way we don't have to do anything
when we mount and get zero cost if no rules are used in a mountpoint.

Add code to deref rules on unmount.

Switch from SLIST to TAILQ.

Drop SYSINIT, use SX_SYSINIT and static initializer of TAILQ instead.

Drop goto, a break will do.

Reduce double pointers to single pointers.

Combine reaping and destroying rulesets.

Avoid memory leaks in a some error cases.
2005-09-24 07:03:09 +00:00
Poul-Henning Kamp
e606a3c63e Rewamp DEVFS internals pretty severely [1].
Give DEVFS a proper inode called struct cdev_priv.  It is important
to keep in mind that this "inode" is shared between all DEVFS
mountpoints, therefore it is protected by the global device mutex.

Link the cdev_priv's into a list, protected by the global device
mutex.  Keep track of each cdev_priv's state with a flag bit and
of references from mountpoints with a dedicated usecount.

Reap the benefits of much improved kernel memory allocator and the
generally better defined device driver APIs to get rid of the tables
of pointers + serial numbers, their overflow tables,  the atomics
to muck about in them and all the trouble that resulted in.

This makes RAM the only limit on how many devices we can have.

The cdev_priv is actually a super struct containing the normal cdev
as the "public" part, and therefore allocation and freeing has moved
to devfs_devs.c from kern_conf.c.

The overall responsibility is (to be) split such that kern/kern_conf.c
is the stuff that deals with drivers and struct cdev and fs/devfs
handles filesystems and struct cdev_priv and their private liason
exposed only in devfs_int.h.

Move the inode number from cdev to cdev_priv and allocate inode
numbers properly with unr.  Local dirents in the mountpoints
(directories, symlinks) allocate inodes from the same pool to
guarantee against overlaps.

Various other fields are going to migrate from cdev to cdev_priv
in the future in order to hide them.  A few fields may migrate
from devfs_dirent to cdev_priv as well.

Protect the DEVFS mountpoint with an sx lock instead of lockmgr,
this lock also protects the directory tree of the mountpoint.

Give each mountpoint a unique integer index, allocated with unr.
Use it into an array of devfs_dirent pointers in each cdev_priv.
Initially the array points to a single element also inside cdev_priv,
but as more devfs instances are mounted, the array is extended with
malloc(9) as necessary when the filesystem populates its directory
tree.

Retire the cdev alias lists, the cdev_priv now know about all the
relevant devfs_dirents (and their vnodes) and devfs_revoke() will
pick them up from there.  We still spelunk into other mountpoints
and fondle their data without 100% good locking.  It may make better
sense to vector the revoke event into the tty code and there do a
destroy_dev/make_dev on the tty's devices, but that's for further
study.

Lots of shuffling of stuff and churn of bits for no good reason[2].

XXX: There is still nothing preventing the dev_clone EVENTHANDLER
from being invoked at the same time in two devfs mountpoints.  It
is not obvious what the best course of action is here.

XXX: comment out an if statement that lost its body, until I can
find out what should go there so it doesn't do damage in the meantime.

XXX: Leave in a few extra malloc types and KASSERTS to help track
down any remaining issues.

Much testing provided by:		Kris
Much confusion caused by (races in):	md(4)

[1] You are not supposed to understand anything past this point.

[2] This line should simplify life for the peanut gallery.
2005-09-19 19:56:48 +00:00
Poul-Henning Kamp
59307b0dfe Don't attempt to recurse lockmgr, it doesn't like it. 2005-09-15 21:16:43 +00:00
Poul-Henning Kamp
214c8ff0e4 Various minor polishing. 2005-09-15 10:28:19 +00:00
Poul-Henning Kamp
6556102dcb Protect the devfs rule internal global lists with a sx lock, the per
mount locks are not enough.  Finer granularity (x)locking could be
implemented, but I prefer to keep it simple for now.
2005-09-15 08:50:16 +00:00
Poul-Henning Kamp
ab32e95296 Absolve devfs_rule.c from locking responsibility and call it with
all necessary locking held.
2005-09-15 08:36:37 +00:00
Poul-Henning Kamp
5e080af41f Close a race which could result in unwarranted "ruleset %d already
running" panics.

Previously, recursion through the "include" feature was prevented by
marking each ruleset as "running" when applied.  This doesn't work for
the case where two DEVFS instances try to apply the same ruleset at
the same time.

Instead introduce the sysctl vfs.devfs.rule_depth (default == 1) which
limits how many levels of "include" we will traverse.

Be aware that traversal of "include" is recursive and kernel stack
size is limited.

MFC:	after 3 days
2005-09-15 06:57:28 +00:00
Poul-Henning Kamp
21806f30bc Clean up prototypes. 2005-09-12 08:03:15 +00:00
Poul-Henning Kamp
80447bf701 Add a missing dev_relthread() call.
Remove unused variable.

Spotted by:	Hans Petter Selasky <hselasky@c2i.net>
2005-08-29 11:14:18 +00:00
Poul-Henning Kamp
516ad423b1 Handle device drivers with D_NEEDGIANT in a way which does not
penalize the 'good' drivers:  Allocate a shadow cdevsw and populate
it with wrapper functions which grab Giant
2005-08-17 08:19:52 +00:00
Poul-Henning Kamp
31cc57cdbd Collect the devfs related sysctls in one place 2005-08-16 19:25:02 +00:00
Poul-Henning Kamp
9c0af1310c Create a new internal .h file to communicate very private stuff
from kern_conf.c to devfs.

For now just two prototypes, more to come.
2005-08-16 19:08:01 +00:00
Poul-Henning Kamp
d785dfefa4 Eliminate effectively unused dm_basedir field from devfs_mount. 2005-08-15 19:40:53 +00:00
Robert Watson
6a113b3de7 Merge the dev_clone and dev_clone_cred event handlers into a single
event handler, dev_clone, which accepts a credential argument.
Implementors of the event can ignore it if they're not interested,
and most do.  This avoids having multiple event handler types and
fall-back/precedence logic in devfs.

This changes the kernel API for /dev cloning, and may affect third
party packages containg cloning kernel modules.

Requested by:	phk
MFC after:	3 days
2005-08-08 19:55:32 +00:00
Kris Kennaway
e29c976a58 devfs is not yet fully MPSAFE - for example, multiple concurrent devfs(8)
processes can cause a panic when operating on rulesets.

Approved by:	phk
2005-07-29 23:00:56 +00:00
Simon L. B. Nielsen
02a4be3f74 Correct devfs ruleset bypass.
Submitted by:	csjp
Reviewed by:	phk
Security:	FreeBSD-SA-05:17.devfs
Approved by:	cperciva
2005-07-20 13:34:16 +00:00
Robert Watson
d26dd2d99e When devfs cloning takes place, provide access to the credential of the
process that caused the clone event to take place for the device driver
creating the device.  This allows cloned device drivers to adapt the
device node based on security aspects of the process, such as the uid,
gid, and MAC label.

- Add a cred reference to struct cdev, so that when a device node is
  instantiated as a vnode, the cloning credential can be exposed to
  MAC.

- Add make_dev_cred(), a version of make_dev() that additionally
  accepts the credential to stick in the struct cdev.  Implement it and
  make_dev() in terms of a back-end make_dev_credv().

- Add a new event handler, dev_clone_cred, which can be registered to
  receive the credential instead of dev_clone, if desired.

- Modify the MAC entry point mac_create_devfs_device() to accept an
  optional credential pointer (may be NULL), so that MAC policies can
  inspect and act on the label or other elements of the credential
  when initializing the skeleton device protections.

- Modify tty_pty.c to register clone_dev_cred and invoke make_dev_cred(),
  so that the pty clone credential is exposed to the MAC Framework.

While currently primarily focussed on MAC policies, this change is also
a prerequisite for changes to allow ptys to be instantiated with the UID
of the process looking up the pty.  This requires further changes to the
pty driver -- in particular, to immediately recycle pty nodes on last
close so that the credential-related state can be recreated on next
lookup.

Submitted by:	Andrew Reisse <andrew.reisse@sparta.com>
Obtained from:	TrustedBSD Project
Sponsored by:	SPAWAR, SPARTA
MFC after:	1 week
MFC note:	Merge to 6.x, but not 5.x for ABI reasons
2005-07-14 10:22:09 +00:00
Craig Rodrigues
fd225fe4a3 Do not declare a struct as extern, and then implement
it as static in the same file.  This is not legal C,
and GCC 4.0 will issue an error.

Reviewed by:	phk
Approved by:	das (mentor)
2005-05-31 14:50:49 +00:00
Jeff Roberson
7b6b7657d2 - In devfs_open() and devfs_close() grab Giant if the driver sets NEEDGIANT.
We still have to DROP_GIANT and PICKUP_GIANT when NEEDGIANT is not set
   because vfs is still sometime entered with Giant held.
2005-05-01 00:56:34 +00:00
Jeff Roberson
cd360e947b - Mark devfs as MNTK_MPSAFE as I belive it does not require Giant.
Sponsored by:	Isilon Systems, Inc.
Agreed in principle by:		phk
2005-04-30 11:24:17 +00:00
Jeff Roberson
4585e3ac5a - Change all filesystems and vfs_cache to relock the dvp once the child is
locked in the ISDOTDOT case.  Se vfs_lookup.c r1.79 for details.

Sponsored by:	Isilon Systems, Inc.
2005-04-13 10:59:09 +00:00
Poul-Henning Kamp
f4f6abcb4e Explicitly hold a reference to the cdev we have just cloned. This
closes the race where the cdev was reclaimed before it ever made it
back to devfs lookup.
2005-03-31 12:19:44 +00:00
Poul-Henning Kamp
9477d73e32 cdev (still) needs per instance uid/gid/mode
Add unlocked version of dev_ref()

Clean up various stuff in sys/conf.h
2005-03-31 10:29:57 +00:00
Poul-Henning Kamp
eb151cb989 Rename dev_ref() to dev_refl() 2005-03-31 06:51:54 +00:00
Jeff Roberson
ea124bf597 - LK_NOPAUSE is a nop now.
Sponsored by:	Isilon Systems, Inc.
2005-03-31 04:27:49 +00:00
Jeff Roberson
eddcb03d02 - We no longer have to bother with PDIRUNLOCK, lookup() handles it for us.
Sponsored by:   Isilon Systems, Inc.
2005-03-28 09:34:36 +00:00
Jeff Roberson
d9b2d9f7a2 - Update vfs_root implementations to match the new prototype. None of
these filesystems will support shared locks until they are explicitly
   modified to do so.  Careful review must be done to ensure that this
   is safe for each individual filesystem.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 07:36:16 +00:00
Poul-Henning Kamp
800b42bde0 Prepare for the final onslaught on devices:
Move uid/gid/mode from cdev to cdevsw.

Add kind field to use for devd(8) later.

Bump both D_VERSION and __FreeBSD_version
2005-03-17 12:07:00 +00:00
Jeff Roberson
c0f681c21d - The VI_DOOMED flag now signals the end of a vnode's relationship with
the filesystem.  Check that rather than VI_XLOCK.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 12:14:56 +00:00
Poul-Henning Kamp
2647407860 One more bit of the major/minor patch to make ttyname happy as well. 2005-03-10 18:49:17 +00:00
Poul-Henning Kamp
b43ab0e378 Try to fix the mess I made of devname, with the minimal subset of the
larger minor/major patch which was posted for testing.
2005-03-10 18:21:34 +00:00
Poul-Henning Kamp
f5af7353c0 Remove kernelside support for devfs rules filtering on major numbers. 2005-03-08 19:51:27 +00:00
Poul-Henning Kamp
0454a53d65 We may not have an actual cdev at this point. 2005-02-22 18:17:31 +00:00
Poul-Henning Kamp
aa2f6ddc3f Reap more benefits from DEVFS:
List devfs_dirents rather than vnodes off their shared struct cdev, this
saves a pointer field in the vnode at the expense of a field in the
devfs_dirent.  There are often 100 times more vnodes so this is bargain.
In addition it makes it harder for people to try to do stypid things like
"finding the vnode from cdev".

Since DEVFS handles all VCHR nodes now, we can do the vnode related
cleanup in devfs_reclaim() instead of in dev_rel() and vgonel().
Similarly, we can do the struct cdev related cleanup in dev_rel()
instead of devfs_reclaim().

	rename idestroy_dev() to destroy_devl() for consistency.

	Add LIST_ENTRY de_alias to struct devfs_dirent.
	Remove v_specnext from struct vnode.
	Change si_hlist to si_alist in struct cdev.
	String new devfs vnodes' devfs_dirent on si_alist when
	we create them and take them off in devfs_reclaim().

	Fix devfs_revoke() accordingly.  Also don't clear fields
	devfs_reclaim() will clear when called from vgone();

	Let devfs_reclaim() call dev_rel() instead of vgonel().

	Move the usecount tracking from dev_rel() to devfs_reclaim(),
	and let dev_rel() take a struct cdev argument instead of vnode.

	Destroy SI_CHEAPCLONE devices in dev_rel() (instead of
	devfs_reclaim()) when they are no longer used.   (This
	should maybe happen in devfs_close() instead.)
2005-02-22 15:51:07 +00:00
Poul-Henning Kamp
1a1457d427 Make dev_ref() require the dev_lock() to be held and use it from
devfs instead of directly frobbing the si_refcount.
2005-02-22 14:41:04 +00:00
Poul-Henning Kamp
4d8ac58b05 Introduce vx_wait{l}() and use it instead of home-rolled versions. 2005-02-17 10:49:51 +00:00
Poul-Henning Kamp
5ece08f57a Make a SYSCTL_NODE static 2005-02-10 12:23:29 +00:00
Poul-Henning Kamp
df32e67c73 Statize devfs_ops_f 2005-02-10 12:04:26 +00:00
Poul-Henning Kamp
a369f34d76 Make filesystems get rid of their own vnodes vnode_pager object in
VOP_RECLAIM().
2005-01-28 14:42:17 +00:00
Poul-Henning Kamp
83c6439714 Whitespace in vop_vector{} initializations. 2005-01-13 18:59:48 +00:00
Poul-Henning Kamp
7164e8f291 Silently ignore forced argument to unmount. 2005-01-11 12:02:26 +00:00
Warner Losh
d167cf6f3a /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 18:10:42 +00:00
Poul-Henning Kamp
59f69ba49f Unsupport forceful unmounts of DEVFS.
After disscussing things I have decided to take the easy and
consistent 90% solution instead of aiming for the very involved 99%
solution.

If we allow forceful unmounts of DEVFS we need to decide how to handle
the devices which are in use through this filesystem at the time.

We cannot just readopt the open devices in the main /dev instance since
that would open us to security issues.

For the majority of the devices, this is relatively straightforward
as we can just pretend they got revoke(2)'ed.

Some devices get tricky:  /dev/console and /dev/tty for instance
does a sort of recursive open of the real console device.   Other devices
may be mmap'ed (kill the processes ?).

And then there are disk devices which are mounted.

The correct thing here would be to recursively unmount the filesystems
mounte from devices from our DEVFS instance (forcefully) and if
this succeeds, complete the forcefully unmount of DEVFS.  But if
one of the forceful unmounts fail we cannot complete the forceful
unmount of DEVFS, but we are likely to already have severed a lot
of stuff in the process of trying.

Event attempting this would be a lot of code for a very far out
corner-case which most people would never see or get in touch with.

It's just not worth it.
2005-01-04 07:52:26 +00:00
Poul-Henning Kamp
50a36c111f Be consistent about flag values passed to device drivers read/write
methods:

Read can see O_NONBLOCK and O_DIRECT.

Write can see O_NONBLOCK, O_DIRECT and O_FSYNC.

In addition O_DIRECT is shadowed as IO_DIRECT for now for backwards
compatibility.
2004-12-22 17:05:44 +00:00
Poul-Henning Kamp
10eee285f7 Shuffle numeric values of the IO_* flags to match the O_* flags from
fcntl.h.

This is in preparation for making the flags passed to device drivers be
consistently from fcntl.h for all entrypoints.

Today open, close and ioctl uses fcntl.h flags, while read and write
uses vnode.h flags.
2004-12-22 16:25:50 +00:00
Poul-Henning Kamp
e87047b437 We can only ever get to vgonechrl() from a devfs vnode, so we do not
need to reassign the vp->v_op to devfs_specops, we know that is the
value already.

Make devfs_specops private to devfs.
2004-12-20 21:34:29 +00:00
Poul-Henning Kamp
2c0220129d Add a couple of KASSERTS to try to diagnose a problem reported. 2004-12-20 21:12:11 +00:00
Poul-Henning Kamp
2a9e0c3216 Be a bit more assertive about vnode bypass. 2004-12-14 09:32:18 +00:00
Poul-Henning Kamp
1dc4727ea3 Another FNONBLOCK -> O_NONBLOCK.
Don't unconditionally set IO_UNIT to device drivers in write:  nobody
checks it, and since it was always set it did not carry information anyway.
2004-12-13 07:41:19 +00:00
Poul-Henning Kamp
ab9caf9d67 Use O_NONBLOCK instead of FNONBLOCK alias. 2004-12-13 07:37:29 +00:00
Poul-Henning Kamp
f0d5cba935 Explicit panic in vop_read/vop_write for devices 2004-12-13 07:13:21 +00:00
Poul-Henning Kamp
20a92a18f1 The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly
split the conversion of the remaining three filesystems out from the root
mounting changes, so in one go:

cd9660:
	Convert to nmount.
	Add omount compat shims.
	Remove dedicated rootfs mounting code.
	Use vfs_mountedfrom()
	Rely on vfs_mount.c calling VFS_STATFS()

nfs(client):
	Convert to nmount (the simple way, mount_nfs(8) is still necessary).
	Add omount compat shims.
	Drop COMPAT_PRELITE2 mount arg compatibility.

ffs:
	Convert to nmount.
	Add omount compat shims.
	Remove dedicated rootfs mounting code.
	Use vfs_mountedfrom()
	Rely on vfs_mount.c calling VFS_STATFS()

Remove vfs_omount() method, all filesystems are now converted.

Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem
task, and they all do it now.

Change rootmounting to use DEVFS trampoline:

vfs_mount.c:
	Mount devfs on /.  Devfs needs no 'from' so this is clean.
	symlink /dev to /.  This makes it possible to lookup /dev/foo.
	Mount "real" root filesystem on /.
	Surgically move the devfs mountpoint from under the real root
	filesystem onto /dev in the real root filesystem.

Remove now unnecessary getdiskbyname().

kern_init.c:
	Don't do devfs mounting and rootvnode assignment here, it was
	already handled by vfs_mount.c.

Remove now unused bdevvp(), addaliasu() and addalias().  Put the
few necessary lines in devfs where they belong.  This eliminates the
second-last source of bogo vnodes, leaving only the lemming-syncer.

Remove rootdev variable, it doesn't give meaning in a global context and
was not trustworth anyway.  Correct information is provided by
statfs(/).
2004-12-07 08:15:41 +00:00
Poul-Henning Kamp
a1f5fe1538 Use vfs_mountedfrom() and rely on vfs_mount.c to call VFS_STATFS() 2004-12-06 19:54:31 +00:00
Poul-Henning Kamp
743312367a VFS_STATFS(mp, ...) is mostly called with &mp->mnt_stat, but a few cases
doesn't.  Most of the implementations have grown weeds for this so they
copy some fields from mnt_stat if the passed argument isn't that.

Fix this the cleaner way:  Always call the implementation on mnt_stat
and copy that in toto to the VFS_STATFS argument if different.
2004-12-05 22:41:02 +00:00
Poul-Henning Kamp
aec0fb7b40 Back when VOP_* was introduced, we did not have new-style struct
initializations but we did have lofty goals and big ideals.

Adjust to more contemporary circumstances and gain type checking.

	Replace the entire vop_t frobbing thing with properly typed
	structures.  The only casualty is that we can not add a new
	VOP_ method with a loadable module.  History has not given
	us reason to belive this would ever be feasible in the the
	first place.

	Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc.

	Give coda correct prototypes and function definitions for
	all vop_()s.

	Generate a bit more data from the vnode_if.src file:  a
	struct vop_vector and protype typedefs for all vop methods.

	Add a new vop_bypass() and make vop_default be a pointer
	to another struct vop_vector.

	Remove a lot of vfs_init since vop_vector is ready to use
	from the compiler.

	Cast various vop_mumble() to void * with uppercase name,
	for instance VOP_PANIC, VOP_NULL etc.

	Implement VCALL() by making vdesc_offset the offsetof() the
	relevant function pointer in vop_vector.  This is disgusting
	but since the code is generated by a script comparatively
	safe.  The alternative for nullfs etc. would be much worse.

	Fix up all vnode method vectors to remove casts so they
	become typesafe.  (The bulk of this is generated by scripts)
2004-12-01 23:16:38 +00:00
Poul-Henning Kamp
6fde64c778 Mechanically change prototypes for vnode operations to use the new typedefs. 2004-12-01 12:24:41 +00:00
Poul-Henning Kamp
ce59d2149d Ignore MNT_NODEV, it is implicit in choice of filesystem these days. 2004-11-26 07:37:42 +00:00
Poul-Henning Kamp
ea566ae2a5 Make vnode bypass for devices mandatory. 2004-11-17 07:18:49 +00:00
Poul-Henning Kamp
8352b1925d Make vnode bypass the default for devices.
Can be disabled in case of problems with
	vfs.devfs.fops=0
in loader.conf
2004-11-15 22:11:09 +00:00
Poul-Henning Kamp
49b7607eba Integrate most of vop_revoke() into devfs_revoke() where it belongs. 2004-11-13 23:37:29 +00:00
Poul-Henning Kamp
aac5167c38 Add the devfs_fp_check() function which helps us get from a struct file
to a cdev and a devsw, doing all the relevant checks along the way.

Add the check to see if fp->f_vnode->v_rdev differs from our cached
fp->f_data copy of our cdev.  If it does the device was revoked and
we return ENXIO.
2004-11-13 23:21:54 +00:00
Poul-Henning Kamp
b0aed5267e Refuse attemps to mount root filesystem 2004-11-09 22:14:57 +00:00
Poul-Henning Kamp
56dd3a6182 Add optional device vnode bypass to DEVFS.
The tunable vfs.devfs.fops controls this feature and defaults to off.

When enabled (vfs.devfs.fops=1 in loader), device vnodes opened
through a filedescriptor gets a special fops vector which instead
of the detour through the vnode layer goes directly to DEVFS.

Amongst other things this allows us to run Giant free read/write to
device drivers which have been weaned off D_NEEDGIANT.

Currently this means /dev/null, /dev/zero, disks, (and maybe the
random stuff ?)

On a 700MHz K7 machine this doubles the speed of
	dd if=/dev/zero of=/dev/null bs=1 count=1000000

This roughly translates to shaving 2usec of each read/write syscall.

The poll/kqfilter paths need more work before they are giant free,
this work is ongoing in p4::phk_bufwork

Please test this and report any problems, LORs etc.
2004-11-08 10:46:47 +00:00
Poul-Henning Kamp
5349c79d75 Properly implement a default version of VOP_GETWRITEMOUNT.
Remove improper access to vop_stdgetwritemount() which should and
will instead rely on the VOP default path.
2004-11-06 11:41:22 +00:00
Poul-Henning Kamp
ecc14aae12 Add back securelevel check for disks.
XXX: This should live in geom_dev.c but we don't have access to the
cred there.
XXX: XXX:  This may not matter anymore since filesystems use geom_vfs.
2004-11-04 09:17:55 +00:00
Poul-Henning Kamp
4cea3289da Don't give disks special treatment, they don't come this way anymore. 2004-10-29 11:10:55 +00:00
Poul-Henning Kamp
c108bb741c Remove VOP_SPECSTRATEGY() from the system. 2004-10-29 10:59:28 +00:00
Poul-Henning Kamp
6afb3b1c37 Give dev_strategy() an explict cdev argument in preparation for removing
buf->b-dev.

Put a bio between the buf passed to dev_strategy() and the device driver
strategy routine in order to not clobber fields in the buf.

Assert copyright on vfs_bio.c and update copyright message to canonical
text.  There is no legal difference between John Dysons two-clause
abbreviated BSD license and the canonical text.
2004-10-29 07:16:37 +00:00
Poul-Henning Kamp
45628dd373 What can I say: don't allow people to mount DEVFS with option "nodev". 2004-10-28 06:03:25 +00:00
Poul-Henning Kamp
5d9d81e7ea Put the I/O block size in bufobj->bo_bsize.
We keep si_bsize_phys around for now as that is the simplest way to pull
the number out of disk device drivers in devfs_open().  The correct solution
would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth
when filesystems sit on GEOM, so don't bother for now.
2004-10-26 07:39:12 +00:00
Poul-Henning Kamp
ff7c5a4880 Alas, poor SPECFS! -- I knew him, Horatio; A filesystem of infinite
jest, of most excellent fancy: he hath taught me lessons a thousand
times; and now, how abhorred in my imagination it is! my gorge rises
at it.  Here were those hacks that I have curs'd I know not how
oft.  Where be your kludges now? your workarounds? your layering
violations, that were wont to set the table on a roar?

Move the skeleton of specfs into devfs where it now belongs and
bury the rest.
2004-10-22 09:59:37 +00:00
Poul-Henning Kamp
891822a853 XXX mark two places where we do not hold a threadcount on the dev when
frobbing the cdevsw.

In both cases we examine only the cdevsw and it is a good question if we
weren't better off copying those properties into the cdev in the first
place.  This question will be revisited.
2004-09-24 08:32:36 +00:00
Poul-Henning Kamp
5e8c582ac2 Put a version element in the VFS filesystem configuration structure
and refuse initializing filesystems with a wrong version.  This will
aid maintenance activites on the 5-stable branch.

s/vfs_mount/vfs_omount/

s/vfs_nmount/vfs_mount/

Name our filesystems mount function consistently.

Eliminate the namiedata argument to both vfs_mount and vfs_omount.
It was originally there to save stack space.  A few places abused
it to get hold of some credentials to pass around.  Effectively
it is unused.

Reorganize the root filesystem selection code.
2004-07-30 22:08:52 +00:00
Colin Percival
56f21b9d74 Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with:	rwatson, scottl
Requested by:	jhb
2004-07-26 07:24:04 +00:00
Robert Watson
de592112e1 In devfs_allocv(), rather than assigning 'td = curthread', assert that
the caller passes in a td that is curthread, and consistently pass 'td'
into vget().  Remove some bogus logic that passed in td or curthread
conditional on td being non-NULL, which seems redundant in the face of
the earlier assignment of td to curthread if td is NULL.

In devfs_symlink(), cache the passed thread in 'td' so we don't have
to keep retrieving it from the 'ap' structure, and assert that td is
curthread (since we dereference it to get thread-local td_ucred).  Use
'td' in preference to curthread for later lockmgr calls, since they are
equal.
2004-07-22 17:03:14 +00:00
Alfred Perlstein
f257b7a54b Make VFS_ROOT() and vflush() take a thread argument.
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.
2004-07-12 08:14:09 +00:00
Poul-Henning Kamp
9d96090725 Reduce a fair bit of the atomics because we are now called with a
lock from kern_conf.c and cdev's act a lot more like real objects
these days.
2004-06-18 08:08:47 +00:00
Poul-Henning Kamp
f3732fd15b Second half of the dev_t cleanup.
The big lines are:
	NODEV -> NULL
	NOUDEV -> NODEV
	udev_t -> dev_t
	udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.
2004-06-17 17:16:53 +00:00
Poul-Henning Kamp
89c9c53da0 Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.
2004-06-16 09:47:26 +00:00
Poul-Henning Kamp
bc55355956 Report the correct length for symlink entries. 2004-02-19 19:09:52 +00:00
Poul-Henning Kamp
f82dfde7e5 White-space align a struct definition.
Move a SYSINIT to the file where it belongs.
2004-02-15 21:43:08 +00:00
Colin Percival
9d0be84912 Fix style(9) of my previous commit.
Noticed by: nate
Approved by: nate, rwatson (mentor)
2004-01-21 18:03:54 +00:00
Colin Percival
9f8ef8b8d1 Allow devfs path rules to work on directories. Without this fix,
devfs rule add path fd unhide
is a no-op, while it should unhide the fd subdirectory.

Approved by: phk, rwatson (mentor)
PR: kern/60897
2004-01-21 16:43:29 +00:00
Poul-Henning Kamp
49e9fc0a0d Improve on POLA by populating DEVFS before doing devfs(8) rule ioctls.
PR:	60687
Spotted by:	Colin Percival <cperciva@daemonology.net>
2004-01-02 19:02:28 +00:00
Robert Watson
eca8a663d4 Modify the MAC Framework so that instead of embedding a (struct label)
in various kernel objects to represent security data, we embed a
(struct label *) pointer, which now references labels allocated using
a UMA zone (mac_label.c).  This allows the size and shape of struct
label to be varied without changing the size and shape of these kernel
objects, which become part of the frozen ABI with 5-STABLE.  This opens
the door for boot-time selection of the number of label slots, and hence
changes to the bound on the number of simultaneous labeled policies
at boot-time instead of compile-time.  This also makes it easier to
embed label references in new objects as required for locking/caching
with fine-grained network stack locking, such as inpcb structures.

This change also moves us further in the direction of hiding the
structure of kernel objects from MAC policy modules, not to mention
dramatically reducing the number of '&' symbols appearing in both the
MAC Framework and MAC policy modules, and improving readability.

While this results in minimal performance change with MAC enabled, it
will observably shrink the size of a number of critical kernel data
structures for the !MAC case, and should have a small (but measurable)
performance benefit (i.e., struct vnode, struct socket) do to memory
conservation and reduced cost of zeroing memory.

NOTE: Users of MAC must recompile their kernel and all MAC modules as a
result of this change.  Because this is an API change, third party
MAC modules will also need to be updated to make less use of the '&'
symbol.

Suggestions from:	bmilekic
Obtained from:		TrustedBSD Project
Sponsored by:		DARPA, Network Associates Laboratories
2003-11-12 03:14:31 +00:00
Poul-Henning Kamp
8b285b9088 Remember to check the DE_WHITEOUT flag in the case where a cloned
device is hidden by a devfs(8) rule.

Spotted by:	 Adam Nowacki <ptnowak@bsk.vectranet.pl>
2003-10-20 15:08:10 +00:00
Poul-Henning Kamp
7e8766a940 When a driver successfully created a device on demand, we can directly
pick up the DEVFS inode number from the dev_t and find our directory
entry from that, we don't need to scan the directory to find it.

This also solves an issue with on-demand devices in subdirectories.

Submitted by:	cognet
2003-10-20 07:04:09 +00:00
Poul-Henning Kamp
7652131bee Initialize struct vfsops C99-sparsely.
Submitted by:   hmp
Reviewed by:	phk
2003-06-12 20:48:38 +00:00
Poul-Henning Kamp
2f613363c7 Remove unused variable.
Found by:       FlexeLint
2003-05-31 19:34:52 +00:00
Alexander Kabaev
c162e9c2eb Rename vfs_stdsync function to vfs_stdnosync which matches more
closely what function is really doing. Update all existing consumers
to use the new name.

Introduce a new vfs_stdsync function, which iterates over mount
point's vnodes and call FSYNC on each one of them in turn.

Make nwfs and smbfs use this new function instead of rolling their
own identical sync implementations.

Reviewed by:	jeff
2003-03-11 22:15:10 +00:00
Nate Lawson
99648386d3 Finish cleanup of vprint() which was begun with changing v_tag to a string.
Remove extraneous uses of vop_null, instead defering to the default op.
Rename vnode type "vfs" to the more descriptive "syncer".
Fix formatting for various filesystems that use vop_print.
2003-03-03 19:15:40 +00:00
Dag-Erling Smørgrav
8994a245e0 Clean up whitespace, s/register //, refrain from strong urge to ANSIfy. 2003-03-02 15:56:49 +00:00
Dag-Erling Smørgrav
c952458814 uiomove-related caddr_t -> void * (just the low-hanging fruit) 2003-03-02 15:50:23 +00:00
Poul-Henning Kamp
9285a87efd NODEVFS cleanup:
Replace devfs_{create,destroy} hooks with direct function calls.
2003-03-02 13:35:30 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Poul-Henning Kamp
6718b083be NODEVFS cleanup: remove #ifdefs. 2003-01-29 22:36:45 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Poul-Henning Kamp
7e760e148a Originally when DEVFS was added, a global variable "devfs_present"
was used to control code which were conditional on DEVFS' precense
since this avoided the need for large-scale source pollution with
#include "opt_geom.h"

Now that we approach making DEVFS standard, replace these tests
with an #ifdef to facilitate mechanical removal once DEVFS becomes
non-optional.

No functional change by this commit.
2003-01-19 11:03:07 +00:00
Poul-Henning Kamp
9c71cb4bef Even if the permissions deny it, a process should be allowed to
access its controlling terminal.

In essense, history dictates that any process is allowed to open
/dev/tty for RW, irrespective of credential, because by definition
it is it's own controlling terminal.

Before DEVFS we relied on a hacky half-device thing (kern/tty_tty.c)
which did the magic deep down at device level, which at best was
disgusting from an architectural point of view.

My first shot at this was to use the cloning mechanism to simply
give people the right tty when they ask for /dev/tty, that's why
you get this, slightly counter intuitive result:

        syv# ls -l /dev/tty `tty`
        crw--w----  1 u1  tty    5,   0 Jan 13 22:14 /dev/tty
        crw--w----  1 u1  tty    5,   0 Jan 13 22:14 /dev/ttyp0

Trouble is, when user u1 su(1)'s to user u2, he cannot open
/dev/ttyp0 anymore because he doesn't have permission to do so.

The above fix allows him to do that.

The interesting side effect is that one was previously only able
to access the controlling tty by indirection:
        date > /dev/tty
but not by name:
        date > `tty`

This is now possible, and that feels a lot more like DTRT.

PR:             46635
MFC candidate:  could be.
2003-01-13 22:20:36 +00:00
Dima Dorfman
797159bde3 Add symlink support to devfs_rule_matchpath(). This allows the user
to unhide symlinks as well as hide them.
2003-01-11 02:36:20 +00:00
Poul-Henning Kamp
c6e3ae999b Since Jeffr made the std* functions the default in rev 1.63 of
kern/vfs_defaults.c it is wrong for the individual filesystems to use
the std* functions as that prevents override of the default.

Found by:       src/tools/tools/vop_table
2003-01-04 08:47:19 +00:00
Robert Watson
adfad1eb35 Trim left-over and unused vop_refreshlabel() bits from devfs.
Reported by:	bde
2002-12-28 05:39:25 +00:00
Robert Watson
990b4b2dc5 Remove dm_root entry from struct devfs_mount. It's never set, and is
unused.  Replace it with a dm_mount back-pointer to the struct mount
that the devfs_mount is associated with.  Export that pointer to MAC
Framework entry points, where all current policies don't use the
pointer.  This permits the SEBSD port of SELinux's FLASK/TE to compile
out-of-the-box on 5.0-CURRENT with full file system labeling support.

Approved by:	re (murray)
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-12-09 03:44:28 +00:00
Robert Watson
763bbd2f4f Slightly change the semantics of vnode labels for MAC: rather than
"refreshing" the label on the vnode before use, just get the label
right from inception.  For single-label file systems, set the label
in the generic VFS getnewvnode() code; for multi-label file systems,
leave the labeling up to the file system.  With UFS1/2, this means
reading the extended attribute during vfs_vget() as the inode is
pulled off disk, rather than hitting the extended attributes
frequently during operations later, improving performance.  This
also corrects sematics for shared vnode locks, which were not
previously present in the system.  This chances the cache
coherrency properties WRT out-of-band access to label data, but in
an acceptable form.  With UFS1, there is a small race condition
during automatic extended attribute start -- this is not present
with UFS2, and occurs because EAs aren't available at vnode
inception.  We'll introduce a work around for this shortly.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-26 14:38:24 +00:00
Robert Watson
e6a5564ee2 Missed a case of _POSIX_MAC_PRESENT -> _PC_MAC_PRESENT rename.
Pointed out by:	phk
2002-10-20 22:50:43 +00:00
Poul-Henning Kamp
bc9d8a9a37 Fix comments and one resulting code confusion about the type of the
"command" argument to VOP_IOCTL.

Spotted by:	FlexeLint.
2002-10-16 08:04:11 +00:00
Poul-Henning Kamp
4cfe209335 A better solution to avoiding variable sized structs in DEVFS. 2002-10-16 07:51:18 +00:00
Poul-Henning Kamp
c122d758ca #include "opt_devfs.h" to protect against variable sized structures.
Spotted by:	FlexeLint
2002-10-16 07:16:47 +00:00
Mike Barcroft
2b7f24d210 Change iov_base's type from char *' to the standard void *'. All
uses of iov_base which assume its type is `char *' (in order to do
pointer arithmetic) have been updated to cast iov_base to `char *'.
2002-10-11 14:58:34 +00:00
Dima Dorfman
e5d09546b8 Treat the pathptrn field as a real pattern with the aid of fnmatch(). 2002-10-08 04:21:54 +00:00
Robert Watson
74e62b1b75 Integrate a devfs/MAC fix from the MAC tree: avoid a race condition during
devfs VOP symlink creation by introducing a new entry point to determine
the label of the devfs_dirent prior to allocation of a vnode for the
symlink.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-05 18:40:10 +00:00
Poul-Henning Kamp
2c5e1d1e6f Move the vop-vector declaration into devfs_vnops.c where it belongs. 2002-10-01 10:08:08 +00:00
Poul-Henning Kamp
19ebba326b s/struct dev_t */dev_t */ 2002-09-28 21:21:01 +00:00
Poul-Henning Kamp
562c866822 Fix mis-indent. 2002-09-28 17:37:55 +00:00
Nate Lawson
86ed6d45ac Remove any VOP_PRINT that redundantly prints the tag.
Move lockmgr_printinfo() into vprint() for everyone's benefit.

Suggested by: bde
2002-09-18 20:42:04 +00:00
Nate Lawson
06be2aaa83 Remove all use of vnode->v_tag, replacing with appropriate substitutes.
v_tag is now const char * and should only be used for debugging.

Additionally:
1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK
2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which
is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP.

Suggested by:   phk
Reviewed by:    bde, rwatson (earlier version)
2002-09-14 09:02:28 +00:00
Poul-Henning Kamp
9bf1a75697 Introduce typedefs for the member functions of struct vfsops and employ
these in the main filesystems.  This does not change the resulting code
but makes the source a little bit more grepable.

Sponsored by:	DARPA and NAI Labs.
2002-08-13 10:05:50 +00:00
Jeff Roberson
e6e370a7fe - Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
   with VOP calls is needed.
 - v_iflag is protected by interlock and is used for dealing with vnode
   management issues.  These flags include X/O LOCK, FREE, DOOMED, etc.
 - All accesses to v_iflag and v_vflag have either been locked or marked with
   mp_fixme's.
 - Many ASSERT_VOP_LOCKED calls have been added where the locking was not
   clear.
 - Many functions in vfs_subr.c were restructured to provide for stronger
   locking.

Idea stolen from:	BSD/OS
2002-08-04 10:29:36 +00:00
Robert Watson
67d722ed73 Introduce support for Mandatory Access Control and extensible
kernel access control.

Teach devfs how to respond to pathconf() _POSIX_MAC_PRESENT queries,
allowing it to indicate to user processes that individual vnode labels
are available.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-02 03:12:40 +00:00
Robert Watson
bdc2cd1318 Hook up devfs_pathconf() for specfs devfs nodes, not just regular
devfs nodes.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 22:27:57 +00:00
Robert Watson
6742f32809 Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument devfs to support per-dirent MAC labels.  In particular,
invoke MAC framework when devfs directory entries are instantiated
due to make_dev() and related calls, and invoke the MAC framework
when vnodes are instantiated from these directory entries.  Implement
vop_setlabel() for devfs, which pushes the label update into the
devfs directory entry for semi-persistant store.  This permits the MAC
framework to assign labels to devices and directories as they are
instantiated, and export access control information via devfs vnodes.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 15:45:16 +00:00
Robert Watson
04f3985d88 Introduce support for Mandatory Access Control and extensible
kernel access control.

Label devfs directory entries, permitting labels to be maintained
on device nodes in devfs instances persistently despite vnode
recycling.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-30 23:12:37 +00:00
Dima Dorfman
ec7e38fb22 Correct misindentation of DRA_UID. 2002-07-28 06:57:57 +00:00
Dima Dorfman
af13e3abb9 Unimplement panic(8) by making sure that we don't recurse into a
ruleset.  If we do, that means there's a ruleset loop (10 includes 20
include 30 includes 10), which will quickly cause a double fault due
to stack overflow (since "include" is implemented by recursion).
(Previously, we only checked that X didn't include X.)
2002-07-28 03:52:44 +00:00
Dima Dorfman
a1dc209638 Introduce the DEVFS "rule" subsystem. DEVFS rules permit the
administrator to define certain properties of new devfs nodes before
they become visible to the userland.  Both static (e.g., /dev/speaker)
and dynamic (e.g., /dev/bpf*, some removable devices) nodes are
supported.  Each DEVFS mount may have a different ruleset assigned to
it, permitting different policies to be implemented for things like
jails.

Approved by:	phk
2002-07-17 01:46:48 +00:00
Semen Ustimenko
c83fca1f1f Make devfs to give honour to PDIRUNLOCK flag.
Reviewed by:	jeff
MFC after:	1 week
2002-06-01 09:17:43 +00:00
Maxime Henrion
8eee3a3d58 Fix several bugs in devfs_lookupx(). When we check the nameiop to
make sure it's a correct operation for devfs, do it only in the
ISLASTCN case.  If we don't, we are assuming that the final file will
be in devfs, which is not true if another partition is mounted on top
of devfs or with special filenames (like /dev/net/../../foo).

Reviewed by:	phk
2002-05-10 15:41:14 +00:00
Maxime Henrion
6dbde1fe23 Convert devfs to nmount.
Reviewed by:	phk
2002-05-02 20:27:42 +00:00
Robert Watson
a12cfddc0f Use vnode locking with devfs; permit VFS locking assertions to make
sense for devfs vnodes, and reduce/remove potential races in the devfs
code.

Submitted by:	iadowse
Approved by:	phk
2002-04-29 20:00:39 +00:00
Bruce Evans
4cc6241557 Don't attempt to decvlare M_DEVFS whern MALLOC_DECLARE is not defined.
This fixes warnings that should be errors in fstat.

Reminded by:	alpha tinderbox

Fixed some style bugs (ones near BOF and EOF; there are many more).
2002-04-21 15:47:03 +00:00
Bruce Evans
11257f4d1a Fixed assorted bugs in setting of timestamps in devfs_setattr().
Setting of timestamps on devices had no effect visible to userland
because timestamps for devices were set in places that are never used.
This broke:
- update of file change time after a change of an attribute
- setting of file access and modification times.

The VA_UTIMES_NULL case did not work.  Revs 1.31-1.32 were supposed to
fix this by copying correct bits from ufs, but had little or no effect
because the old checks were not removed.
2002-04-05 15:16:08 +00:00
John Baldwin
44731cab3b Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API.  The entire API now consists of two functions
similar to the pre-KSE API.  The suser() function takes a thread pointer
as its only argument.  The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0.  The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on:	smp@
2002-04-01 21:31:13 +00:00
Alfred Perlstein
11caded34f Remove __P. 2002-03-19 22:20:14 +00:00
Maxim Konovalov
e9fc9230a6 Be consistent with UFS in a way how devfs_setattr() checks credentials
for chmod(2), chown(2) and utimes(2) with respect to jail(2).

Reviewed by:		rwatson, ru
Not objected by:	phk
Approved by:		ru
2002-03-14 11:18:42 +00:00
Mike Smith
a7489fe56f Add a new sysinit SI_SUB_DEVFS. Devfs hooks into the kernel at SI_ORDER_FIRST,
and devices can be created anytime after that.

Print a warning if an atttempt is made to create a device too early.
2002-01-09 04:58:49 +00:00
Mike Smith
92fef27d97 Use a sysinit to initialise the devfs hooks in kern_conf.c rather than common
variables.

Reviewed by:	phk (in principle)
2002-01-09 01:00:20 +00:00
Dima Dorfman
2ab80ed8cf Address two minor issues: implement the _PC_NAME_MAX and _PC_PATH_MAX
pathconf() variables for directories, and set st_size and st_blocks
(of struct stat) for directories as appropriate.  Note that st_size is
always set to DEV_BSIZE, since the size of the directories is not
currently kept.

Reviewed by:	phk, bde
2001-11-25 21:00:38 +00:00
Poul-Henning Kamp
d018a84cbc Fix "echo > /dev/null" for non-root users which broke in previous commit. 2001-11-04 19:12:59 +00:00
Poul-Henning Kamp
9607027339 Use vfs_timestamp() instead of getnanotime().
Add magic stuff copied from ufs_setattr().

Instructed by:	bde
2001-11-03 17:00:02 +00:00
Poul-Henning Kamp
93432a92a4 Use vfs_timestamp() instead of getnanotime() directly.
Fix some modes on directories and symlinks.

Instructed by:	bde
2001-11-03 16:53:24 +00:00
Bruce Evans
c95b982aed Backed out vestiges of the quick fixes for the transient breakage of
<sys/mount.h> in rev.1.106 of the latter (don't include <sys/socket.h>
just to work around bugs in <sys/mount.h>).
2001-10-13 06:41:41 +00:00
Poul-Henning Kamp
40739c02ae The behaviour of whiteout'ing symlinks were too confusing, instead
remove them when asked to.
2001-09-30 08:43:33 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
Poul-Henning Kamp
12d1aec26f linux ls fails on DEVFS /dev because linux_getdents fails because
linux_getdents uses VOP_READDIR( ..., &ncookies, &cookies ) instead of
     VOP_READDIR( ..., NULL, NULL ) because it seems to need the offsets for
     linux_dirent and sizeof(dirent) != sizeof(linux_dirent)...

PR:	29467
Submitted by:	Michael Reifenberger <root@nihil.plaut.de>
Reviewed by:	phk
2001-08-14 06:42:32 +00:00
Brian Somers
51716196a4 Support /dev/tun cloning. Ansify if_tun.c while I'm there.
Only tun0 -> tun32767 may now be opened as struct ifnet's if_unit
is a short.

It's now possible to open /dev/tun and get a handle back for an available
tun device (use devname to find out what you got).

The implementation uses rman by popular demand (and against my judgement)
to track opened devices and uses the new dev_depends() to ensure that
all make_dev()d devices go away before the module is unloaded.

Reviewed by:	phk
2001-06-01 15:51:10 +00:00
Poul-Henning Kamp
e33457d7eb Don't copy the trailing zero in readlink, it confuses namei().
PR:		27656
2001-05-26 20:07:57 +00:00
Poul-Henning Kamp
3344c5a17e Create a general facility for making dev_t's depend on another
dev_t.  The dev_depends(dev_t, dev_t) function is for tying them
to each other.

When destroy_dev() is called on a dev_t, all dev_t's depending
on it will also be destroyed (depth first order).

Rewrite the make_dev_alias() to use this dependency facility.

kern/subr_disk.c:
Make the disk mini-layer use dependencies to make sure all
relevant dev_t's are removed when the disk disappears.

Make the disk mini-layer precreate some magic sub devices
which the disk/slice/label code expects to be there.

kern/subr_disklabel.c:
Remove some now unneeded variables.

kern/subr_diskmbr.c:
Remove some ancient, commented out code.

kern/subr_diskslice.c:
Minor cleanup.  Use name from dev_t instead of dsname()
2001-05-26 08:27:58 +00:00
Poul-Henning Kamp
5a9300c451 Change the way deletes are managed in DEVFS.
This fixes a number of warnings relating to removed cloned devices.

It also makes it possible to recreate deleted devices with
mknod(2).  The major/minor arguments are ignored.
2001-05-23 17:48:20 +00:00
Ian Dowse
0864ef1e8a Change the second argument of vflush() to an integer that specifies
the number of references on the filesystem root vnode to be both
expected and released. Many filesystems hold an extra reference on
the filesystem root vnode, which must be accounted for when
determining if the filesystem is busy and then released if it isn't
busy. The old `skipvp' approach required individual filesystem
xxx_unmount functions to re-implement much of vflush()'s logic to
deal with the root vnode.

All 9 filesystems that hold an extra reference on the root vnode
got the logic wrong in the case of forced unmounts, so `umount -f'
would always fail if there were any extra root vnode references.
Fix this issue centrally in vflush(), now that we can.

This commit also fixes a vnode reference leak in devfs, which could
result in idle devfs filesystems that refuse to unmount.

Reviewed by:	phk, bp
2001-05-16 18:04:37 +00:00
Poul-Henning Kamp
f73cbde4cf After a successfull poll of the cloning functions, match on the
returned dev_t rather than the original name.

This allows cloning from one name to another which is useful for
/dev/tty and later for the pty's.
2001-05-14 08:20:46 +00:00
Poul-Henning Kamp
ab9f3b292e Convert DEVFS from an "opt-in" to an "opt-out" option.
If for some reason DEVFS is undesired, the "NODEVFS" option is
needed now.

Pending any significant issues, DEVFS will be made mandatory in
-current on july 1st so that we can start reaping the full
benefits of having it.
2001-05-13 20:52:40 +00:00
Poul-Henning Kamp
6bd2ea83ef Remove unneeded devfs_badop()
Noticed by:	rwatson
2001-05-06 17:40:34 +00:00
Mark Murray
fb919e4d5a Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)
2001-05-01 08:13:21 +00:00
Poul-Henning Kamp
b7ebffbc08 Add a vop_stdbmap(), and make it part of the default vop vector.
Make 7 filesystems which don't really know about VOP_BMAP rely
on the default vector, rather than more or less complete local
vop_nopbmap() implementations.
2001-04-29 11:48:41 +00:00
Matt Jacob
3be6e0c249 add this ridiculous include foo so it will compile again 2001-04-23 18:14:41 +00:00
Adrian Chadd
f3a90da995 Reviewed by: jlemon
An initial tidyup of the mount() syscall and VFS mount code.

This code replaces the earlier work done by jlemon in an attempt to
make linux_mount() work.

* the guts of the mount work has been moved into vfs_mount().

* move `type', `path' and `flags' from being userland variables into being
  kernel variables in vfs_mount(). `data' remains a pointer into
  userspace.

* Attempt to verify the `type' and `path' strings passed to vfs_mount()
  aren't too long.

* rework mount() and linux_mount() to take the userland parameters
  (besides data, as mentioned) and pass kernel variables to vfs_mount().
  (linux_mount() already did this, I've just tidied it up a little more.)

* remove the copyin*() stuff for `path'. `data' still requires copyin*()
  since its a pointer into userland.

* set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each
  filesystem.  This variable is generally initialised with `path', and
  each filesystem can override it if they want to.

* NOTE: f_mntonname is intiailised with "/" in the case of a root mount.
2001-03-01 21:00:17 +00:00
Poul-Henning Kamp
3e8bea9634 Remove a debug printf. 2001-02-18 09:16:49 +00:00
Poul-Henning Kamp
4b1c62b3f2 At the point in time where most devices are created, we don't know what
time it is because boottime is not yet initialized.  Finagle the relevant
fields when we get the chance.
2001-02-02 22:54:41 +00:00
Poul-Henning Kamp
ecde9a6dae Only superuser can create symlinks.
Give symlinks mode 755 by default to avoid triggering alert eyes.
(the mode isn't use on symlinks)
2001-02-02 18:35:29 +00:00
Poul-Henning Kamp
aadf265525 Fix two minor nits.
Existences revealed, but no details offered by: bp
2001-01-30 08:39:52 +00:00
David Malone
7cc0979fd6 Convert more malloc+bzero to malloc+M_ZERO.
Submitted by:	josh@zipperup.org
Submitted by:	Robert Drehmel <robd@gmx.net>
2000-12-08 21:51:06 +00:00
Poul-Henning Kamp
2a0436783d staticize. 2000-12-08 15:07:24 +00:00
Poul-Henning Kamp
cf9fa8e725 Move suser() and suser_xxx() prototypes and a related #define from
<sys/proc.h> to <sys/systm.h>.

Correctly document the #includes needed in the manpage.

Add one now needed #include of <sys/systm.h>.
Remove the consequent 48 unused #includes of <sys/proc.h>.
2000-10-29 16:06:56 +00:00
Poul-Henning Kamp
53ce36d17a Remove unneeded #include <sys/proc.h> lines. 2000-10-29 13:57:19 +00:00
Poul-Henning Kamp
8abea41d80 Don't hold an extra reference to vnodes. Devfs vnodes are sufficiently
cheap to setup that it doesn't really matter that we recycle device
vnodes at kleenex speed.

Implement first cut try at killing cloned devices when they are
not needed anymore.  For now only the bpf driver is involved in
this experiment.  Cloned devices can set the SI_CHEAPCLONE flag
which allows us to destroy_dev() it when the vcount() drops to zero
and the vnode is reclaimed.  For now it's a requirement that the
driver doesn't keep persistent state from close to (re)open.

Some whitespace changes.
2000-10-09 14:18:07 +00:00
Jason Evans
a18b1f1d4d Convert lockmgr locks from using simple locks to using mutexes.
Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.
2000-10-04 01:29:17 +00:00
Poul-Henning Kamp
eb7ba7f95c Ignore attempts to set flags to zero. This quenches a syslog warning
from login(1).
2000-09-18 09:40:01 +00:00
Poul-Henning Kamp
c80d29139c Add canonical checks to devfs_setattr(). 2000-09-16 12:06:58 +00:00
John Baldwin
b570da11fe Use size_t instead of u_int for 4th argument to copyinstr(). 2000-09-12 22:39:34 +00:00
Poul-Henning Kamp
93bcdfe270 Add refcounts to the "global" DEVFS inode slots, this allows us
to recycle inodes after a destroy_dev() but not until all mounts
have picked up the change.

Add support for an overflow table for DEVFS inodes.  The static
table defaults to 1024 inodes, if that fills, an overflow table
of 32k inodes is allocated.  Both numbers can be changed at
compile time, the size of the overflow table also with the
sysctl vfs.devfs.noverflow.

Use atomic instructions to barrier between make_dev()/destroy_dev()
and the mounts.

Add lockmgr() locking of directories for operations accessing or
modifying the directory TAILQs.

Various nitpicking here and there.
2000-09-06 11:26:43 +00:00
Poul-Henning Kamp
7665e72021 Off by one error.
Submitted by:	des
2000-09-04 18:24:30 +00:00