Commit Graph

177 Commits

Author SHA1 Message Date
Poul-Henning Kamp
e606a3c63e Rewamp DEVFS internals pretty severely [1].
Give DEVFS a proper inode called struct cdev_priv.  It is important
to keep in mind that this "inode" is shared between all DEVFS
mountpoints, therefore it is protected by the global device mutex.

Link the cdev_priv's into a list, protected by the global device
mutex.  Keep track of each cdev_priv's state with a flag bit and
of references from mountpoints with a dedicated usecount.

Reap the benefits of much improved kernel memory allocator and the
generally better defined device driver APIs to get rid of the tables
of pointers + serial numbers, their overflow tables,  the atomics
to muck about in them and all the trouble that resulted in.

This makes RAM the only limit on how many devices we can have.

The cdev_priv is actually a super struct containing the normal cdev
as the "public" part, and therefore allocation and freeing has moved
to devfs_devs.c from kern_conf.c.

The overall responsibility is (to be) split such that kern/kern_conf.c
is the stuff that deals with drivers and struct cdev and fs/devfs
handles filesystems and struct cdev_priv and their private liason
exposed only in devfs_int.h.

Move the inode number from cdev to cdev_priv and allocate inode
numbers properly with unr.  Local dirents in the mountpoints
(directories, symlinks) allocate inodes from the same pool to
guarantee against overlaps.

Various other fields are going to migrate from cdev to cdev_priv
in the future in order to hide them.  A few fields may migrate
from devfs_dirent to cdev_priv as well.

Protect the DEVFS mountpoint with an sx lock instead of lockmgr,
this lock also protects the directory tree of the mountpoint.

Give each mountpoint a unique integer index, allocated with unr.
Use it into an array of devfs_dirent pointers in each cdev_priv.
Initially the array points to a single element also inside cdev_priv,
but as more devfs instances are mounted, the array is extended with
malloc(9) as necessary when the filesystem populates its directory
tree.

Retire the cdev alias lists, the cdev_priv now know about all the
relevant devfs_dirents (and their vnodes) and devfs_revoke() will
pick them up from there.  We still spelunk into other mountpoints
and fondle their data without 100% good locking.  It may make better
sense to vector the revoke event into the tty code and there do a
destroy_dev/make_dev on the tty's devices, but that's for further
study.

Lots of shuffling of stuff and churn of bits for no good reason[2].

XXX: There is still nothing preventing the dev_clone EVENTHANDLER
from being invoked at the same time in two devfs mountpoints.  It
is not obvious what the best course of action is here.

XXX: comment out an if statement that lost its body, until I can
find out what should go there so it doesn't do damage in the meantime.

XXX: Leave in a few extra malloc types and KASSERTS to help track
down any remaining issues.

Much testing provided by:		Kris
Much confusion caused by (races in):	md(4)

[1] You are not supposed to understand anything past this point.

[2] This line should simplify life for the peanut gallery.
2005-09-19 19:56:48 +00:00
Poul-Henning Kamp
59307b0dfe Don't attempt to recurse lockmgr, it doesn't like it. 2005-09-15 21:16:43 +00:00
Poul-Henning Kamp
214c8ff0e4 Various minor polishing. 2005-09-15 10:28:19 +00:00
Poul-Henning Kamp
6556102dcb Protect the devfs rule internal global lists with a sx lock, the per
mount locks are not enough.  Finer granularity (x)locking could be
implemented, but I prefer to keep it simple for now.
2005-09-15 08:50:16 +00:00
Poul-Henning Kamp
ab32e95296 Absolve devfs_rule.c from locking responsibility and call it with
all necessary locking held.
2005-09-15 08:36:37 +00:00
Poul-Henning Kamp
5e080af41f Close a race which could result in unwarranted "ruleset %d already
running" panics.

Previously, recursion through the "include" feature was prevented by
marking each ruleset as "running" when applied.  This doesn't work for
the case where two DEVFS instances try to apply the same ruleset at
the same time.

Instead introduce the sysctl vfs.devfs.rule_depth (default == 1) which
limits how many levels of "include" we will traverse.

Be aware that traversal of "include" is recursive and kernel stack
size is limited.

MFC:	after 3 days
2005-09-15 06:57:28 +00:00
Poul-Henning Kamp
21806f30bc Clean up prototypes. 2005-09-12 08:03:15 +00:00
Poul-Henning Kamp
80447bf701 Add a missing dev_relthread() call.
Remove unused variable.

Spotted by:	Hans Petter Selasky <hselasky@c2i.net>
2005-08-29 11:14:18 +00:00
Poul-Henning Kamp
516ad423b1 Handle device drivers with D_NEEDGIANT in a way which does not
penalize the 'good' drivers:  Allocate a shadow cdevsw and populate
it with wrapper functions which grab Giant
2005-08-17 08:19:52 +00:00
Poul-Henning Kamp
31cc57cdbd Collect the devfs related sysctls in one place 2005-08-16 19:25:02 +00:00
Poul-Henning Kamp
9c0af1310c Create a new internal .h file to communicate very private stuff
from kern_conf.c to devfs.

For now just two prototypes, more to come.
2005-08-16 19:08:01 +00:00
Poul-Henning Kamp
d785dfefa4 Eliminate effectively unused dm_basedir field from devfs_mount. 2005-08-15 19:40:53 +00:00
Robert Watson
6a113b3de7 Merge the dev_clone and dev_clone_cred event handlers into a single
event handler, dev_clone, which accepts a credential argument.
Implementors of the event can ignore it if they're not interested,
and most do.  This avoids having multiple event handler types and
fall-back/precedence logic in devfs.

This changes the kernel API for /dev cloning, and may affect third
party packages containg cloning kernel modules.

Requested by:	phk
MFC after:	3 days
2005-08-08 19:55:32 +00:00
Kris Kennaway
e29c976a58 devfs is not yet fully MPSAFE - for example, multiple concurrent devfs(8)
processes can cause a panic when operating on rulesets.

Approved by:	phk
2005-07-29 23:00:56 +00:00
Simon L. B. Nielsen
02a4be3f74 Correct devfs ruleset bypass.
Submitted by:	csjp
Reviewed by:	phk
Security:	FreeBSD-SA-05:17.devfs
Approved by:	cperciva
2005-07-20 13:34:16 +00:00
Robert Watson
d26dd2d99e When devfs cloning takes place, provide access to the credential of the
process that caused the clone event to take place for the device driver
creating the device.  This allows cloned device drivers to adapt the
device node based on security aspects of the process, such as the uid,
gid, and MAC label.

- Add a cred reference to struct cdev, so that when a device node is
  instantiated as a vnode, the cloning credential can be exposed to
  MAC.

- Add make_dev_cred(), a version of make_dev() that additionally
  accepts the credential to stick in the struct cdev.  Implement it and
  make_dev() in terms of a back-end make_dev_credv().

- Add a new event handler, dev_clone_cred, which can be registered to
  receive the credential instead of dev_clone, if desired.

- Modify the MAC entry point mac_create_devfs_device() to accept an
  optional credential pointer (may be NULL), so that MAC policies can
  inspect and act on the label or other elements of the credential
  when initializing the skeleton device protections.

- Modify tty_pty.c to register clone_dev_cred and invoke make_dev_cred(),
  so that the pty clone credential is exposed to the MAC Framework.

While currently primarily focussed on MAC policies, this change is also
a prerequisite for changes to allow ptys to be instantiated with the UID
of the process looking up the pty.  This requires further changes to the
pty driver -- in particular, to immediately recycle pty nodes on last
close so that the credential-related state can be recreated on next
lookup.

Submitted by:	Andrew Reisse <andrew.reisse@sparta.com>
Obtained from:	TrustedBSD Project
Sponsored by:	SPAWAR, SPARTA
MFC after:	1 week
MFC note:	Merge to 6.x, but not 5.x for ABI reasons
2005-07-14 10:22:09 +00:00
Craig Rodrigues
fd225fe4a3 Do not declare a struct as extern, and then implement
it as static in the same file.  This is not legal C,
and GCC 4.0 will issue an error.

Reviewed by:	phk
Approved by:	das (mentor)
2005-05-31 14:50:49 +00:00
Jeff Roberson
7b6b7657d2 - In devfs_open() and devfs_close() grab Giant if the driver sets NEEDGIANT.
We still have to DROP_GIANT and PICKUP_GIANT when NEEDGIANT is not set
   because vfs is still sometime entered with Giant held.
2005-05-01 00:56:34 +00:00
Jeff Roberson
cd360e947b - Mark devfs as MNTK_MPSAFE as I belive it does not require Giant.
Sponsored by:	Isilon Systems, Inc.
Agreed in principle by:		phk
2005-04-30 11:24:17 +00:00
Jeff Roberson
4585e3ac5a - Change all filesystems and vfs_cache to relock the dvp once the child is
locked in the ISDOTDOT case.  Se vfs_lookup.c r1.79 for details.

Sponsored by:	Isilon Systems, Inc.
2005-04-13 10:59:09 +00:00
Poul-Henning Kamp
f4f6abcb4e Explicitly hold a reference to the cdev we have just cloned. This
closes the race where the cdev was reclaimed before it ever made it
back to devfs lookup.
2005-03-31 12:19:44 +00:00
Poul-Henning Kamp
9477d73e32 cdev (still) needs per instance uid/gid/mode
Add unlocked version of dev_ref()

Clean up various stuff in sys/conf.h
2005-03-31 10:29:57 +00:00
Poul-Henning Kamp
eb151cb989 Rename dev_ref() to dev_refl() 2005-03-31 06:51:54 +00:00
Jeff Roberson
ea124bf597 - LK_NOPAUSE is a nop now.
Sponsored by:	Isilon Systems, Inc.
2005-03-31 04:27:49 +00:00
Jeff Roberson
eddcb03d02 - We no longer have to bother with PDIRUNLOCK, lookup() handles it for us.
Sponsored by:   Isilon Systems, Inc.
2005-03-28 09:34:36 +00:00
Jeff Roberson
d9b2d9f7a2 - Update vfs_root implementations to match the new prototype. None of
these filesystems will support shared locks until they are explicitly
   modified to do so.  Careful review must be done to ensure that this
   is safe for each individual filesystem.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 07:36:16 +00:00
Poul-Henning Kamp
800b42bde0 Prepare for the final onslaught on devices:
Move uid/gid/mode from cdev to cdevsw.

Add kind field to use for devd(8) later.

Bump both D_VERSION and __FreeBSD_version
2005-03-17 12:07:00 +00:00
Jeff Roberson
c0f681c21d - The VI_DOOMED flag now signals the end of a vnode's relationship with
the filesystem.  Check that rather than VI_XLOCK.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 12:14:56 +00:00
Poul-Henning Kamp
2647407860 One more bit of the major/minor patch to make ttyname happy as well. 2005-03-10 18:49:17 +00:00
Poul-Henning Kamp
b43ab0e378 Try to fix the mess I made of devname, with the minimal subset of the
larger minor/major patch which was posted for testing.
2005-03-10 18:21:34 +00:00
Poul-Henning Kamp
f5af7353c0 Remove kernelside support for devfs rules filtering on major numbers. 2005-03-08 19:51:27 +00:00
Poul-Henning Kamp
0454a53d65 We may not have an actual cdev at this point. 2005-02-22 18:17:31 +00:00
Poul-Henning Kamp
aa2f6ddc3f Reap more benefits from DEVFS:
List devfs_dirents rather than vnodes off their shared struct cdev, this
saves a pointer field in the vnode at the expense of a field in the
devfs_dirent.  There are often 100 times more vnodes so this is bargain.
In addition it makes it harder for people to try to do stypid things like
"finding the vnode from cdev".

Since DEVFS handles all VCHR nodes now, we can do the vnode related
cleanup in devfs_reclaim() instead of in dev_rel() and vgonel().
Similarly, we can do the struct cdev related cleanup in dev_rel()
instead of devfs_reclaim().

	rename idestroy_dev() to destroy_devl() for consistency.

	Add LIST_ENTRY de_alias to struct devfs_dirent.
	Remove v_specnext from struct vnode.
	Change si_hlist to si_alist in struct cdev.
	String new devfs vnodes' devfs_dirent on si_alist when
	we create them and take them off in devfs_reclaim().

	Fix devfs_revoke() accordingly.  Also don't clear fields
	devfs_reclaim() will clear when called from vgone();

	Let devfs_reclaim() call dev_rel() instead of vgonel().

	Move the usecount tracking from dev_rel() to devfs_reclaim(),
	and let dev_rel() take a struct cdev argument instead of vnode.

	Destroy SI_CHEAPCLONE devices in dev_rel() (instead of
	devfs_reclaim()) when they are no longer used.   (This
	should maybe happen in devfs_close() instead.)
2005-02-22 15:51:07 +00:00
Poul-Henning Kamp
1a1457d427 Make dev_ref() require the dev_lock() to be held and use it from
devfs instead of directly frobbing the si_refcount.
2005-02-22 14:41:04 +00:00
Poul-Henning Kamp
4d8ac58b05 Introduce vx_wait{l}() and use it instead of home-rolled versions. 2005-02-17 10:49:51 +00:00
Poul-Henning Kamp
5ece08f57a Make a SYSCTL_NODE static 2005-02-10 12:23:29 +00:00
Poul-Henning Kamp
df32e67c73 Statize devfs_ops_f 2005-02-10 12:04:26 +00:00
Poul-Henning Kamp
a369f34d76 Make filesystems get rid of their own vnodes vnode_pager object in
VOP_RECLAIM().
2005-01-28 14:42:17 +00:00
Poul-Henning Kamp
83c6439714 Whitespace in vop_vector{} initializations. 2005-01-13 18:59:48 +00:00
Poul-Henning Kamp
7164e8f291 Silently ignore forced argument to unmount. 2005-01-11 12:02:26 +00:00
Warner Losh
d167cf6f3a /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 18:10:42 +00:00
Poul-Henning Kamp
59f69ba49f Unsupport forceful unmounts of DEVFS.
After disscussing things I have decided to take the easy and
consistent 90% solution instead of aiming for the very involved 99%
solution.

If we allow forceful unmounts of DEVFS we need to decide how to handle
the devices which are in use through this filesystem at the time.

We cannot just readopt the open devices in the main /dev instance since
that would open us to security issues.

For the majority of the devices, this is relatively straightforward
as we can just pretend they got revoke(2)'ed.

Some devices get tricky:  /dev/console and /dev/tty for instance
does a sort of recursive open of the real console device.   Other devices
may be mmap'ed (kill the processes ?).

And then there are disk devices which are mounted.

The correct thing here would be to recursively unmount the filesystems
mounte from devices from our DEVFS instance (forcefully) and if
this succeeds, complete the forcefully unmount of DEVFS.  But if
one of the forceful unmounts fail we cannot complete the forceful
unmount of DEVFS, but we are likely to already have severed a lot
of stuff in the process of trying.

Event attempting this would be a lot of code for a very far out
corner-case which most people would never see or get in touch with.

It's just not worth it.
2005-01-04 07:52:26 +00:00
Poul-Henning Kamp
50a36c111f Be consistent about flag values passed to device drivers read/write
methods:

Read can see O_NONBLOCK and O_DIRECT.

Write can see O_NONBLOCK, O_DIRECT and O_FSYNC.

In addition O_DIRECT is shadowed as IO_DIRECT for now for backwards
compatibility.
2004-12-22 17:05:44 +00:00
Poul-Henning Kamp
10eee285f7 Shuffle numeric values of the IO_* flags to match the O_* flags from
fcntl.h.

This is in preparation for making the flags passed to device drivers be
consistently from fcntl.h for all entrypoints.

Today open, close and ioctl uses fcntl.h flags, while read and write
uses vnode.h flags.
2004-12-22 16:25:50 +00:00
Poul-Henning Kamp
e87047b437 We can only ever get to vgonechrl() from a devfs vnode, so we do not
need to reassign the vp->v_op to devfs_specops, we know that is the
value already.

Make devfs_specops private to devfs.
2004-12-20 21:34:29 +00:00
Poul-Henning Kamp
2c0220129d Add a couple of KASSERTS to try to diagnose a problem reported. 2004-12-20 21:12:11 +00:00
Poul-Henning Kamp
2a9e0c3216 Be a bit more assertive about vnode bypass. 2004-12-14 09:32:18 +00:00
Poul-Henning Kamp
1dc4727ea3 Another FNONBLOCK -> O_NONBLOCK.
Don't unconditionally set IO_UNIT to device drivers in write:  nobody
checks it, and since it was always set it did not carry information anyway.
2004-12-13 07:41:19 +00:00
Poul-Henning Kamp
ab9caf9d67 Use O_NONBLOCK instead of FNONBLOCK alias. 2004-12-13 07:37:29 +00:00
Poul-Henning Kamp
f0d5cba935 Explicit panic in vop_read/vop_write for devices 2004-12-13 07:13:21 +00:00