Commit Graph

1749 Commits

Author SHA1 Message Date
Robert Watson
7028887eac Add fi_sx, an sx lock to serialize I/O operations on the socket pair
underlying the POSIX fifo implementation.  In 6.x/7.x, fifo access is
moved from the VFS layer, where it was serialized using the vnode
lock, to the file descriptor layer, where access is protected by a
reference count but not serialized.  This exposed socket buffer
locking to high levels of parallelism in specific fifo workloads, such
as make -j 32, which expose as yet unresolved socket buffer bugs.

fi_sx re-adds serialization about the read and write routines,
although not paths that simply test socket buffer mbuf queue state,
such as the poll and kqueue methods.  This restores the extra locking
cost previously present in some cases, but is an effective workaround
for the instability that has been experienced.  This workaround should
be removed once the bug in socket buffer handling has been fixed.

Reported by:	kris, jhb, Julien Gabel <jpeg at thilelli dot net>,
		Peter Holm <peter at holm dot cc>, others
MFC after:	3 days
2005-09-22 10:51:12 +00:00
Poul-Henning Kamp
e606a3c63e Rewamp DEVFS internals pretty severely [1].
Give DEVFS a proper inode called struct cdev_priv.  It is important
to keep in mind that this "inode" is shared between all DEVFS
mountpoints, therefore it is protected by the global device mutex.

Link the cdev_priv's into a list, protected by the global device
mutex.  Keep track of each cdev_priv's state with a flag bit and
of references from mountpoints with a dedicated usecount.

Reap the benefits of much improved kernel memory allocator and the
generally better defined device driver APIs to get rid of the tables
of pointers + serial numbers, their overflow tables,  the atomics
to muck about in them and all the trouble that resulted in.

This makes RAM the only limit on how many devices we can have.

The cdev_priv is actually a super struct containing the normal cdev
as the "public" part, and therefore allocation and freeing has moved
to devfs_devs.c from kern_conf.c.

The overall responsibility is (to be) split such that kern/kern_conf.c
is the stuff that deals with drivers and struct cdev and fs/devfs
handles filesystems and struct cdev_priv and their private liason
exposed only in devfs_int.h.

Move the inode number from cdev to cdev_priv and allocate inode
numbers properly with unr.  Local dirents in the mountpoints
(directories, symlinks) allocate inodes from the same pool to
guarantee against overlaps.

Various other fields are going to migrate from cdev to cdev_priv
in the future in order to hide them.  A few fields may migrate
from devfs_dirent to cdev_priv as well.

Protect the DEVFS mountpoint with an sx lock instead of lockmgr,
this lock also protects the directory tree of the mountpoint.

Give each mountpoint a unique integer index, allocated with unr.
Use it into an array of devfs_dirent pointers in each cdev_priv.
Initially the array points to a single element also inside cdev_priv,
but as more devfs instances are mounted, the array is extended with
malloc(9) as necessary when the filesystem populates its directory
tree.

Retire the cdev alias lists, the cdev_priv now know about all the
relevant devfs_dirents (and their vnodes) and devfs_revoke() will
pick them up from there.  We still spelunk into other mountpoints
and fondle their data without 100% good locking.  It may make better
sense to vector the revoke event into the tty code and there do a
destroy_dev/make_dev on the tty's devices, but that's for further
study.

Lots of shuffling of stuff and churn of bits for no good reason[2].

XXX: There is still nothing preventing the dev_clone EVENTHANDLER
from being invoked at the same time in two devfs mountpoints.  It
is not obvious what the best course of action is here.

XXX: comment out an if statement that lost its body, until I can
find out what should go there so it doesn't do damage in the meantime.

XXX: Leave in a few extra malloc types and KASSERTS to help track
down any remaining issues.

Much testing provided by:		Kris
Much confusion caused by (races in):	md(4)

[1] You are not supposed to understand anything past this point.

[2] This line should simplify life for the peanut gallery.
2005-09-19 19:56:48 +00:00
Robert Watson
526e258d3a Assert that (vp) is locked in fifo_close(), since we rely on the
exclusive vnode lock to synchronize the reference counts on struct
fifoinfo.

MFC after:	3 days
2005-09-18 10:44:50 +00:00
Poul-Henning Kamp
59307b0dfe Don't attempt to recurse lockmgr, it doesn't like it. 2005-09-15 21:16:43 +00:00
Alexander Kabaev
d11c07ba56 Handle a race condition where NULLFS vnode can be cleaned while threads
can still be asleep waiting for lowervp lock.

Tested by:	kkenn
Discussed with: ssouhlal, jeffr
2005-09-15 19:21:26 +00:00
Robert Watson
ca17bccaa1 The socket pointers in fifoinfo are not permitted to be NULL, so
don't check if they are, it just confuses the fifo code more.

MFC after:	3 days
2005-09-15 15:45:34 +00:00
Poul-Henning Kamp
214c8ff0e4 Various minor polishing. 2005-09-15 10:28:19 +00:00
Poul-Henning Kamp
6556102dcb Protect the devfs rule internal global lists with a sx lock, the per
mount locks are not enough.  Finer granularity (x)locking could be
implemented, but I prefer to keep it simple for now.
2005-09-15 08:50:16 +00:00
Poul-Henning Kamp
ab32e95296 Absolve devfs_rule.c from locking responsibility and call it with
all necessary locking held.
2005-09-15 08:36:37 +00:00
Poul-Henning Kamp
5e080af41f Close a race which could result in unwarranted "ruleset %d already
running" panics.

Previously, recursion through the "include" feature was prevented by
marking each ruleset as "running" when applied.  This doesn't work for
the case where two DEVFS instances try to apply the same ruleset at
the same time.

Instead introduce the sysctl vfs.devfs.rule_depth (default == 1) which
limits how many levels of "include" we will traverse.

Be aware that traversal of "include" is recursive and kernel stack
size is limited.

MFC:	after 3 days
2005-09-15 06:57:28 +00:00
Robert Watson
447bbaa2cf Trim down now (believed to be) unused fifo_ioctl() and
fifo_kqfilter() VOP implementations, since they in theory are used
only on open file descriptors, in which case the ioctls are via
fifo_ioctl_f() and kqueue requests are via fifo_kqfilter_f().
Generate warnings if they are entered for now.  These printf()
calls should become panic() calls.

Annotate and re-implement fifo_ioctl_f(): don't arbitrarily
forward ioctls to the socket layer, only forward the ones we
explicitly support for fifos.  In the case of FIONREAD, don't
forward the request to the write socket on a read-write fifo, or
the read result is overwritten.  Annotate a nasty case for the
undefined POSIX O_RDWR on fifos, in which failure of the second
ioctl will result in the socket pair being in an inconsistent
state.

Assert copyright as I find myself rewriting non-trivial parts of
fifofs.

MFC after:	3 days
2005-09-13 17:46:48 +00:00
Robert Watson
8a22e151be As a result of kqueue locking work, socket buffer locks will always
be held when entering a kqueue filter for fifos via a socket buffer
event: as such, assert the lock unconditionally rather than acquiring
it conditionall.

MFC after:	3 days
2005-09-13 10:39:24 +00:00
Robert Watson
db7a6c2f43 Annotate two issues:
1) fifo_kqfilter() is not actually ever used, it likely should be GC'd.

2) fifo_kqfilter_f() doesn't implement EVFILT_VNODE, so detecting events
   on the underlying vnode for a fifo no longer works (it did in 4.x).
   Likely, fifo_kqfilter_f() should forward the request to the VFS using
   fp->f_vnode, which would work once fifo_kqfilter() was detached from
   the vnode operation vector (removing the fifo override).

Discussed with:	phk
2005-09-13 09:23:22 +00:00
Robert Watson
88f39e8e95 Introduce no-op nosup fifo kqueue filter and detach routine, which are
used when a read filter is requested on a write-only fifo descriptor, or
a write filter is requested on a read-only fifo descriptor.  This
permits the filters to be registered, but never raises the event, which
causes kqueue behavior for fifos to more closely match similar semantics
for poll and select, which permit testing for the condition even though
the condition will never be raised, and is consistent with POSIX's notion
that a fifo has identical semantics to a one-way IPC channel created
using pipe() on most operating systems.

The fifo regression test suite can now run to completion on HEAD without
errors.

MFC after:	3 days
2005-09-12 19:59:12 +00:00
Robert Watson
48afebb83d When a request is made to register a filter on a fifo that doesn't
apply to the fifo (i.e., not EVFILT_READ or EVFILT_WRITE), reject
it as EINVAL, not by returning 1 (EPERM).

MFC after:	3 days
2005-09-12 18:07:49 +00:00
Robert Watson
114538d85b Remove DFLAG_SEEKABLE from fifo file descriptors: fifos are not seekable
according to POSIX, not to mention the fact that it doesn't make sense
(and hence isn't really implemented).  This causes the fifo_misc
regression test to succeed.
2005-09-12 12:15:12 +00:00
Robert Watson
6dd84b0bdc Only poll the fifo for read events if the fifo is attached to a readable
file descriptor.  Otherwise, the read end of a fifo might return that it
is writable (which it isn't).

Only poll the fifo for write events if the fifo attached to a writable
file descriptor.  Otherwise, the write end of a fifo might return that
it is readable (which it isn't).

In the event that a file is FREAD|FWRITE (which is allowed by POSIX, but
has undefined behavior), we poll for both.

MFC after:	3 days
2005-09-12 10:16:18 +00:00
Robert Watson
845e8e827b After going to some trouble to identify only the write-related events
to poll the write socket for, the fifo polling code proceeded to poll
for the complete set of events.  Use 'levents' instead of 'events' as
the argument to poll, and only poll the write socket if there is
interest in write events.

MFC after:	3 days
2005-09-12 10:13:15 +00:00
Robert Watson
ab5182012a When a writer opens a fifo, wake up the read socket for read, not the
write socket.

MFC after:	3 days
2005-09-12 10:07:21 +00:00
Robert Watson
a1b9943657 Add an assertion that fifo_open() doesn't race against other threads
while sleeping to allocate fifo state: due to using the vnode lock to
serialize access to a fifo during open, it shouldn't happen (tm).

MFC after:	3 days
2005-09-12 10:06:38 +00:00
Robert Watson
ba9eeb43fe Rather than reaching into the internals of the UNIX domain socket code
by calling uipc_connect2() to connect two socket endpoints to create a
fifo, call soconnect2().

MFC after:	3 days
2005-09-12 10:05:08 +00:00
Poul-Henning Kamp
21806f30bc Clean up prototypes. 2005-09-12 08:03:15 +00:00
Craig Rodrigues
b575132598 Cast bf_sysid to const char * when passing it to strncmp(), because
strncmp does not take an unsigned char *.  Eliminates warning with GCC 4.0.
2005-09-11 16:02:14 +00:00
Craig Rodrigues
2a3e0acc5d Do not declare M_NTFSMNT with extern linkage here, since
it is defined with static linkage in ntfs_vfsops.c.
Fixes compilation with GCC 4.0.
2005-09-11 15:57:07 +00:00
David E. O'Brien
5ddf29857e Ensure the full value is written into inode variables.
PR:		85503
Submitted by:	Dmitry Pryanishnikov <dmitry@atlantis.dp.ua>
2005-09-07 10:32:58 +00:00
Suleiman Souhlal
68da388325 Unbreak hpfs/ntfs/udf/ext2fs/reiserfs mounting.
Another pointyhat to:	ssouhlal
2005-09-03 20:23:41 +00:00
Suleiman Souhlal
44bd2bc19a Unbreak the build.
Pointyhat to:	ssouhlal
2005-09-03 00:40:19 +00:00
Suleiman Souhlal
cdeb72045b Use vput() instead of vrele() in null_reclaim() since the lower vnode
is locked.

MFC after:	3 days
2005-09-02 15:49:55 +00:00
Suleiman Souhlal
75d7ba93af *_mountfs() (if the filesystem mounts from a device) needs devvp to be
locked, so lock it.

Glanced at by:	phk
MFC after:	3 days
2005-09-02 15:27:23 +00:00
Poul-Henning Kamp
80447bf701 Add a missing dev_relthread() call.
Remove unused variable.

Spotted by:	Hans Petter Selasky <hselasky@c2i.net>
2005-08-29 11:14:18 +00:00
Poul-Henning Kamp
516ad423b1 Handle device drivers with D_NEEDGIANT in a way which does not
penalize the 'good' drivers:  Allocate a shadow cdevsw and populate
it with wrapper functions which grab Giant
2005-08-17 08:19:52 +00:00
Poul-Henning Kamp
31cc57cdbd Collect the devfs related sysctls in one place 2005-08-16 19:25:02 +00:00
Poul-Henning Kamp
9c0af1310c Create a new internal .h file to communicate very private stuff
from kern_conf.c to devfs.

For now just two prototypes, more to come.
2005-08-16 19:08:01 +00:00
Poul-Henning Kamp
d785dfefa4 Eliminate effectively unused dm_basedir field from devfs_mount. 2005-08-15 19:40:53 +00:00
Peter Grehan
14dcd40fde - restore the ability to mount cd9660 filesystems as root by inverting
some of the options test, specifically the joliet and rockridge tests.
  Since the root mount callchain doesn't go through cd9660_cmount, the
  default mount options aren't set. Rather than having the main codepath
  assume the options are there, test for the absence of the inverted
  optioin

  e.g. instead of vfs_flagopt(.. "joliet" ..), test for
  !vfs_flagopt(.. "nojoliet" ..)

  This works for root mount, non-root mount and future nmount cases.

- in cd9660_cmount, remove inadvertent setting of "gens" when "extatt"
  was set.

Reported by:	grehan, Dario Freni <saturnero at freesbie org>
Tested by:	Dario Freni
Not objected to by:	phk

MFC after:	3 days
2005-08-14 04:19:36 +00:00
Dag-Erling Smørgrav
8ab2a64d2f Eliminate an unnecessary bcopy(). 2005-08-12 12:22:05 +00:00
David E. O'Brien
c11ba30c9a Remove public declarations of variables that were forgotten when they were
made static.
2005-08-10 07:10:02 +00:00
David E. O'Brien
cec9a4bf57 Remove the need to forward declare statics by moving them around. 2005-08-10 07:08:14 +00:00
Robert Watson
6a113b3de7 Merge the dev_clone and dev_clone_cred event handlers into a single
event handler, dev_clone, which accepts a credential argument.
Implementors of the event can ignore it if they're not interested,
and most do.  This avoids having multiple event handler types and
fall-back/precedence logic in devfs.

This changes the kernel API for /dev cloning, and may affect third
party packages containg cloning kernel modules.

Requested by:	phk
MFC after:	3 days
2005-08-08 19:55:32 +00:00
Kris Kennaway
e29c976a58 devfs is not yet fully MPSAFE - for example, multiple concurrent devfs(8)
processes can cause a panic when operating on rulesets.

Approved by:	phk
2005-07-29 23:00:56 +00:00
Simon L. B. Nielsen
02a4be3f74 Correct devfs ruleset bypass.
Submitted by:	csjp
Reviewed by:	phk
Security:	FreeBSD-SA-05:17.devfs
Approved by:	cperciva
2005-07-20 13:34:16 +00:00
R. Imura
697ab829fc [1] unix2doschr()
If a character cannot be converted to DOS code page,
 unix2doschr() returned `0'. As a result, unix2dosfn()
 was forced to return `0', so we saw a file which was
 composed of these characters as `Invalid argument'.
 To correct this, if a character can be converted to
 Unicode, unix2doschr() now returns `1' which is a magic
 number to make unix2dosfn() know that the character
 must be converted to `_'.

[2] unix2dosfn()
 The above-mentioned solution only works if a file
 has both of Unicode name and DOS code page name.
 Unicode name would not be recorded if file name
 can be settled within 11 bytes (DOS short name)
 and if no conversion from Unix charset to DOS code
 page has occurred. Thus, FreeBSD can create a file
 which has only short name, but there is no guarantee
 that the short name contains allways valid characters
 because we leave it to people by using mount_msdosfs(8)
 to select which conversion is used between DOS code
 page and unix charset.
 To avoid this, Unicode file name should be recorded
 unless a character is an ascii character. This is
 the way Windows XP do.

PR:		77074 [1]
MFC after:	1 week
2005-07-17 07:10:05 +00:00
Robert Watson
d26dd2d99e When devfs cloning takes place, provide access to the credential of the
process that caused the clone event to take place for the device driver
creating the device.  This allows cloned device drivers to adapt the
device node based on security aspects of the process, such as the uid,
gid, and MAC label.

- Add a cred reference to struct cdev, so that when a device node is
  instantiated as a vnode, the cloning credential can be exposed to
  MAC.

- Add make_dev_cred(), a version of make_dev() that additionally
  accepts the credential to stick in the struct cdev.  Implement it and
  make_dev() in terms of a back-end make_dev_credv().

- Add a new event handler, dev_clone_cred, which can be registered to
  receive the credential instead of dev_clone, if desired.

- Modify the MAC entry point mac_create_devfs_device() to accept an
  optional credential pointer (may be NULL), so that MAC policies can
  inspect and act on the label or other elements of the credential
  when initializing the skeleton device protections.

- Modify tty_pty.c to register clone_dev_cred and invoke make_dev_cred(),
  so that the pty clone credential is exposed to the MAC Framework.

While currently primarily focussed on MAC policies, this change is also
a prerequisite for changes to allow ptys to be instantiated with the UID
of the process looking up the pty.  This requires further changes to the
pty driver -- in particular, to immediately recycle pty nodes on last
close so that the credential-related state can be recreated on next
lookup.

Submitted by:	Andrew Reisse <andrew.reisse@sparta.com>
Obtained from:	TrustedBSD Project
Sponsored by:	SPAWAR, SPARTA
MFC after:	1 week
MFC note:	Merge to 6.x, but not 5.x for ABI reasons
2005-07-14 10:22:09 +00:00
Seigo Tanimura
045f25a28d Regrab dvp only when ISDOTDOT.
Approved by:	re (scottl)
2005-07-09 13:52:49 +00:00
Jeff Roberson
8b3676f1a1 - Since we don't hold a usecount in pfs_exit we have to get a holdcnt
prior to calling vgone() to prevent any races.

Sponsored by:	Isilon Systems, Inc.
Approved by:	re (vfs blanket)
2005-07-07 07:33:10 +00:00
Peter Wemm
62919d788b Jumbo-commit to enhance 32 bit application support on 64 bit kernels.
This is good enough to be able to run a RELENG_4 gdb binary against
a RELENG_4 application, along with various other tools (eg: 4.x gcore).
We use this at work.

ia32_reg.[ch]: handle the 32 bit register file format, used by ptrace,
	procfs and core dumps.
procfs_*regs.c: vary the format of proc/XXX/*regs depending on the client
	and target application.
procfs_map.c: Don't print a 64 bit value to 32 bit consumers, or their
	sscanf fails.  They expect an unsigned long.
imgact_elf.c: produce a valid 32 bit coredump for 32 bit apps.
sys_process.c: handle 32 bit consumers debugging 32 bit targets.  Note
	that 64 bit consumers can still debug 32 bit targets.

IA64 has got stubs for ia32_reg.c.

Known limitations: a 5.x/6.x gdb uses get/setcontext(), which isn't
implemented in the 32/64 wrapper yet.  We also make a tiny patch to
gdb pacify it over conflicting formats of ld-elf.so.1.

Approved by:	re
2005-06-30 07:49:22 +00:00
Peter Wemm
2de92a386e Conditionally weaken sys_generic.c rev 1.136 to allow certain dubious
ioctl numbers in backwards compatability mode.  eg: an IOC_IN ioctl with
a size of zero.  Traditionally this was what you did before IOC_VOID
existed, and we had some established users of this in the tree, namely
procfs.  Certain 3rd party drivers with binary userland components also
have this too.

This is necessary to have 4.x and 5.x binaries use these ioctl's.  We
found this at work when trying to run 4.x binaries.

Approved by:	re
2005-06-30 00:19:08 +00:00
R. Imura
181fc3c6ea Avoid casting from (int *) to (size_t *) in order to fix udf_iconv on amd64.
Reviewed by:	scottl
MFC after:	2 weeks
2005-06-05 02:09:48 +00:00
Craig Rodrigues
fd225fe4a3 Do not declare a struct as extern, and then implement
it as static in the same file.  This is not legal C,
and GCC 4.0 will issue an error.

Reviewed by:	phk
Approved by:	das (mentor)
2005-05-31 14:50:49 +00:00
Christian Brueffer
befb7f333f Fix three typos in comments. Two of them obtained from OpenBSD.
MFC after:	3 days
2005-05-11 21:10:35 +00:00