Commit Graph

2086 Commits

Author SHA1 Message Date
trhodes
810381390e Fix spacing from my previous commit to this file:
Noticed by:	fjoe
2007-01-30 04:41:38 +00:00
rodrigc
fdf518fe9a Add a "-o large" mount option for msdosfs. Convert compile-time checks for
#ifdef MSDOSFS_LARGE to run-time checks to see if "-o large" was specified.

Test case provided by Oliver Fromme:
  truncate -s 200G test.img
  mdconfig -a -t vnode -f test.img -u 9
  newfs_msdos -s 419430400 -n 1 /dev/md9 zip250
  mount -t msdosfs /dev/md9 /mnt    # should fail
  mount -t msdosfs -o large /dev/md9 /mnt   # should succeed

PR:		105964
Requested by:	Oliver Fromme <olli lurza secnetix de>
Tested by:	trhodes
MFC after:	2 weeks
2007-01-30 03:11:45 +00:00
kib
79752b63e1 Below is slightly edited description of the LOR by Tor Egge:
--------------------------
[Deadlock] is caused by a lock order reversal in vfs_lookup(), where
[some] process is trying to lock a directory vnode, that is the parent
directory of covered vnode) while holding an exclusive vnode lock on
covering vnode.

A simplified scenario:

root fs					var fs
/    		A			/    (/var)	D
/var		B			/log (/var/log) E
vfs lock	C			vfs lock	F

Within each file system, the lock order is clear: C->A->B and F->D->E

When traversing across mounts, the system can choose between two lock orders,
but everything must then follow that lock order:

      L1: C->A->B
		|
	        +->F->D->E

      L2: F->D->E
	     |
             +->C->A->B

The lookup() process for namei("/var") mixes those two lock orders:

    VOP_LOOKUP() obtains B while A is held
    vfs_busy() obtains a shared lock on F while A and B are held (follows L1,
    violates L2)
    vput() releases lock on B
    VOP_UNLOCK() releases lock on A
    VFS_ROOT() obtains lock on D while shared lock on F is held
    vfs_unbusy() releases shared lock on F
    vn_lock() obtains lock on A while D is held (violates L1, follows L2)

dounmount() follows L1 (B is locked while F is drained).

Without unmount activity, vfs_busy() will always succeed without blocking
and the deadlock isn't triggered (the system behaves as if L2 is followed).

With unmount, you can get 4 processes in a deadlock:

     p1: holds D, want A (in lookup())
     p2: holds shared lock on F, want D (in VFS_ROOT())
     p3: holds B, want drain lock on F (in dounmount())
     p4: holds A, want B (in VOP_LOOKUP())

You can have more than one instance of p2.

The reversal was introduced in revision 1.81 of src/sys/kern/vfs_lookup.c and
MFCed to revision 1.80.2.1, probably to avoid a cascade of vnode locks when nfs
servers are dead (VFS_ROOT() just hangs) spreading to the root fs root vnode.

- Tor Egge

To fix the LOR, ups@ noted that when crossing the mount point, ni_dvp
is actually not used by the callers of namei. Thus, placeholder deadfs
vnode vp_crossmp is introduced that is filled into ni_dvp.

Idea by:	ups
Reviewed by:	tegge, ups, jeff, rwatson (mac interaction)
Tested by:	Peter Holm
MFC after:	2 weeks
2007-01-22 11:25:22 +00:00
trhodes
317819a5f6 Add a 3rd entry in the cache, which keeps the end position
from just before extending a file.  This has the desired effect
of keeping the write speed constant.  And yes, that helps a lot
copying large files always at full speed now, and I have seen
improvements using benchmarks/bonnie.

Stolen from:	NetBSD
Reviewed by:	bde
2007-01-16 23:43:14 +00:00
pav
9ca9d354d9 Rewrite the udf_read() routine to use a file vnode instead of the devvp vnode.
The code is modelled after cd9660, including support for simple read-ahead
courtesy of clustered read.

Fix udf_strategy to DTRT.

This change fixes sendfile(2) not to send out garbage.

Reviewed by:	scottl
MFC after:	1 month
2007-01-15 18:45:36 +00:00
pav
419ce53db8 Tell backing v_object the filesize right on it's creation.
MFC after:	1 week
2007-01-07 23:53:16 +00:00
rodrigc
a533f9bcf6 When performing a mount update to change a mount from read-only to read-write,
do not call markvoldirty() until the mount has been flagged as read-write.
Due to the nature of the msdosfs code, this bug only seemed to appear for
FAT-16 and FAT-32.

This fixes the testcase:
#!/bin/sh
dd if=/dev/zero bs=1m count=1 oseek=119 of=image.msdos
mdconfig -a -t vnode -f image.msdos
newfs_msdos -F 16 /dev/md0 fd120m
mount_msdosfs -o ro /dev/md0 /mnt
mount | grep md0
mount -u -o rw /dev/md0; echo $?
mount | grep md0
umount /mnt
mdconfig -d -u 0

PR:		105412
Tested by:	Eugene Grosbein <eugen grosbein pp ru>
2007-01-06 20:46:02 +00:00
rodrigc
605122b836 Simplify code in union_hashins() and union_hashget() functions. These
functions now more closely resemble similar functions in nullfs.
This also eliminates some errors.

Submitted by:	daichi, Masanori OZAWA <ozawa ongs co jp>
2007-01-05 14:06:42 +00:00
rodrigc
0da08222b7 Eliminate obsolete comment, now that getushort() is implemented in
terms of functions in <sys/endian.h>.
2007-01-05 05:28:57 +00:00
rodrigc
ce2987dc85 Eliminate ASSERT_VOP_ELOCKED panics when doing mkdir or symlink when
sysctl vfs.lookup_shared=1.

Submitted by:	daichi, Masanori OZAWA <ozawa ongs co jp>
2007-01-05 02:25:44 +00:00
jhb
bdddbb74e0 Use the vnode interlock to close a race where pfs_vncache_alloc() could
attempt to vn_lock() a destroyed vnode resulting in a hang.

MFC after:	1 week
Submitted by:	ups
Reviewed by:	des
2007-01-02 17:27:52 +00:00
pav
0fd460000c Call vnode_create_vobject() in VOP_OPEN. Makes mmap work on UDF filesystem.
PR:		kern/92040
Approved by:	scottl
MFC after:	1 week
2006-12-23 18:53:22 +00:00
marcel
24b8c057ed Unbreak 64-bit little-endian systems that do require alignment.
The fix involves using le16dec(), le32dec(), le16enc() and
le32enc(). This eliminates invalid casts and duplicated logic.
2006-12-21 05:40:46 +00:00
rodrigc
35ede48071 For big-endian version of getulong() macro, cast result to u_int32_t.
This macro was written expecting a 32-bit unsigned long, and
doesn't work properly on 64-bit systems.  This bug caused vn_stat()
to return incorrect values for files larger than 2gb on msdosfs filesystems
on 64-bit systems.

PR:		106703
Submitted by:	Axel Gonzalez <loox e-shell net>
MFC after:	3 days
2006-12-19 02:31:58 +00:00
rodrigc
24c66ef262 Fix get_ulong() macro on AMD64 (or any little-endian 64-bit platform).
This bug caused vn_stat() to fail on files larger than 2gb on msdosfs
filesystems on AMD64.

PR:		106703
Tested by:	Axel Gonzalez <loox e-shell net>
MFC after:	3 days
2006-12-19 01:55:45 +00:00
rodrigc
ef305f6c5d Remove unused variable in unionfs_root().
Submitted by:	daichi, Masanori OZAWA
2006-12-09 17:24:18 +00:00
rodrigc
930aa00466 Use vfs_mount_error() in a few places to give more descriptive mount error
messages.
2006-12-09 17:21:25 +00:00
rodrigc
b2bd033255 Add locking around calls to unionfs_get_node_status()
in unionfs_ioctl() and unionfs_poll().

Submitted by:	daichi, Masanori OZAWA <ozawa@ongs.co.jp>
Prompted by:	kris
2006-12-09 16:51:09 +00:00
rodrigc
951f903c66 In unionfs_readdir(), prevent a possible NULL dereference.
CID:		1667
Found by:	Coverity Prevent (tm)
2006-12-09 16:34:37 +00:00
rodrigc
e9a21b7fd8 In unionfs_hashrem(), use LIST_FOREACH_SAFE when iterating over
the list of nodes to free them.

CID:		1668
Found by:	Coverity Prevent (tm)
2006-12-09 16:27:50 +00:00
rodrigc
ca5229f781 Minor cleanup. If we are doing a mount update, and we pass in
an "export" flag indicating that we are trying to NFS export the
filesystem, and the MSDOSFS_LARGEFS flag is set on the filesystem,
then deny the mount update and export request.  Otherwise,
let the full mount update proceed normally.
MSDOSFS_LARGES and NFS don't mix because of the way inodes are calculated
for MSDOSFS_LARGEFS.

MFC after:	3 days
2006-12-09 01:49:19 +00:00
kientzle
86c1b97857 The ISO9660 spec does allow files up to 4G. Change the i_size
field to "unsigned long" so that it actually works.
Thanks to Robert Sciuk for sending me a DVD that
demonstrated ISO9660-formatted media with a file >2G.
I've now fixed this both in libarchive and in the cd9660
filesystem.

MFC after: 14 days
2006-12-08 07:43:53 +00:00
julian
396ed947f6 Threading cleanup.. part 2 of several.
Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs.  Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.
2006-12-06 06:34:57 +00:00
maxim
b3ab8f2011 o Do not leave uninitialized birthtime: in MSDOSFSMNT_LONGNAME
set birthtime to FAT CTime (creation time) and in the other cases
set birthtime to -1.

o Set ctime to mtime instead of FAT CTime which has completely
different meaning.

PR:		kern/106018
Submitted by:	Oliver Fromme
MFC after:	1 month
2006-12-03 19:04:26 +00:00
rodrigc
792d126127 Add missing includes for <sys/buf.h> and <sys/bio.h>. 2006-12-02 22:30:30 +00:00
rodrigc
c4618bacd3 Many, many thanks to Masanori OZAWA <ozawa@ongs.co.jp>
and Daichi GOTO <daichi@FreeBSD.org> for submitting this
major rewrite of unionfs.  This rewrite was done to
try to solve many of the longstanding crashing and locking
issues in the existing unionfs implementation.  This
implementation also adds a 'MASQUERADE mode', which allows
the user to set different user, group, and file permission
modes in the upper layer.

Submitted by:	daichi, Masanori OZAWA
Reviewed by:	rodrigc (modified for minor style issues)
2006-12-02 19:35:56 +00:00
maxim
24b0d3f0fd o From the submitter: dos2unixchr will convert to lower case if
LCASE_BASE or LCASE_EXT or both are set.  But dos2unixfn uses
dos2unixchr separately for the basename and the extension.  So if
either LCASE_BASE or LCASE_EXT is set, dos2unixfn will convert both
the basename and extension to lowercase because it is blindly
passing in the state of both flags to dos2unixchr.  The bit masks I
used ensure that only the state of LCASE_BASE gets passed to
dos2unixchr when the basename is converted, and only the state of
LCASE_EXT is passed in when the extension is converted.

PR:		kern/86655
Submitted by:	Micah Lieske
MFC after:	3 weeks
2006-11-26 18:49:44 +00:00
le
d8d1f1dab4 Fix an integer overflow and allow access to files larger than 4GB on
NTFS.
2006-11-20 19:28:36 +00:00
kib
5da2fd53cf Wake up PIOCWAIT handler on the process exit in addition to the stop
events. &p->p_stype is explicitely woken up on process exit for us.

Now, truss /nonexistent exits with error instead of waiting until killed
by signal.

Reported by:	Nikos Vassiliadis nvass at teledomenet gr
Reviewed by:	jhb
MFC after:	1 week
2006-11-17 14:52:38 +00:00
kmacy
0c00ea16db change vop_lock handling to allowing tracking of callers' file and line for
acquisition of lockmgr locks

Approved by: scottl (standing in for mentor rwatson)
2006-11-13 05:51:22 +00:00
rwatson
10d0d9cf47 Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
bp
85d804ea67 Create a bidirectional mapping of the DOS 'read only' attribute
to the 'w' flag.

PR:		kern/77958
Submitted by:	ghozzy gmail com
MFC after:	1 month
2006-11-05 06:38:42 +00:00
jb
f82c799735 Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by:	davidxu@
2006-10-26 21:42:22 +00:00
phk
3334f92e7f Ditch crummy fattime <--> timespec conversion functions 2006-10-24 11:55:18 +00:00
phk
a13705978b Drop crummy fattime to timespec conversion routines.
Leave a XXX here for anybody able to test.
2006-10-24 11:43:41 +00:00
phk
abedeeee55 Replace slightly crummy fattime<->timespec conversion functions. 2006-10-24 11:14:05 +00:00
rwatson
7beaaf5cd2 Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA
2006-10-22 11:52:19 +00:00
trhodes
7e0c59262e Fake the link count until we have no choice but to load data from the
MFT.

PR:		86965
Submitted by:	Lowell Gilbert <lgfbsd@be-well.ilk.org>
2006-10-21 08:17:17 +00:00
kib
8a69d7f5b4 Update the access and modification times for dev while still holding
thread reference on it.

Reviewed by:	tegge
Approved by:	pjd (mentor)
2006-10-20 08:03:42 +00:00
kib
5f5bf9dadc Fix the race between devfs_fp_check and devfs_reclaim. Derefence the
vnode' v_rdev and increment the dev threadcount , as well as clear it
(in devfs_reclaim) under the dev_lock().

Reviewed by:	tegge
Approved by:	pjd (mentor)
2006-10-20 07:59:50 +00:00
kib
afa2e43fb6 Properly lock the vnode around vgone() calls.
Unlock the vnode in devfs_close() while calling into the driver d_close()
routine.

devfs_revoke() changes by:	ups
Reviewed and bugfixes by:	tegge
Tested by:	mbr, Peter Holm
Approved by:	pjd (mentor)
MFC after:	1 week
2006-10-18 11:17:14 +00:00
phk
638e020bc6 Use utc_offset() where applicable, and hide the internals of it
as static variables.
2006-10-02 18:23:37 +00:00
phk
50c81b8a9a First part of a little cleanup in the calendar/timezone/RTC handling.
Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.
2006-10-02 12:59:59 +00:00
ru
4ef62e4ca5 Fix our ioctl(2) implementation when the argument is "int". New
ioctls passing integer arguments should use the _IOWINT() macro.
This fixes a lot of ioctl's not working on sparc64, most notable
being keyboard/syscons ioctls.

Full ABI compatibility is provided, with the bonus of fixing the
handling of old ioctls on sparc64.

Reviewed by:	bde (with contributions)
Tested by:	emax, marius
MFC after:	1 week
2006-09-27 19:57:02 +00:00
tegge
83154f853d Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag.
This eliminates a race where MNT_UPDATE flag could be lost when nmount()
raced against sync(), sync_fsync() or quotactl().
2006-09-26 04:12:49 +00:00
kib
11200e2de3 Fix the bug in rev. 1.134. In devfs_allocv_drop_refs(), when not_found == 2
and drop_dm_lock is true, no unlocking shall be attempted. The lock is
already dropped and memory is freed.

Found with:	Coverity Prevent(tm)
CID:	1536
Approved by:	pjd (mentor)
2006-09-19 14:03:02 +00:00
kib
ecf34f4504 Resolve the devfs deadlock caused by LOR between devfs_mount->dm_lock and
vnode lock in devfs_allocv. Do this by temporary dropping dm_lock around
vnode locking.

For safe operation, add hold counters for both devfs_mount and devfs_dirent,
and DE_DOOMED flag for devfs_dirent. The facilities allow to continue after
dropping of the dm_lock, by making sure that referenced memory does not
disappear.

Reviewed by:	tegge
Tested by:	kris
Approved by:	kan (mentor)
PR:		kern/102335
2006-09-18 13:23:08 +00:00
imp
478d307596 Put the osta.c license on osta.h. The license is the same.
Approved by: scottl@
2006-09-12 19:02:34 +00:00
imp
db85f415fa while (0); -> while (0) in multi-line macros 2006-08-17 22:50:33 +00:00
alc
b98eae58a6 Introduce a field to struct vm_page for storing flags that are
synchronized by the lock on the object containing the page.

Transition PG_WANTED and PG_SWAPINPROG to use the new field,
eliminating the need for holding the page queues lock when setting
or clearing these flags.  Rename PG_WANTED and PG_SWAPINPROG to
VPO_WANTED and VPO_SWAPINPROG, respectively.

Eliminate the assertion that the page queues lock is held in
vm_page_io_finish().

Eliminate the acquisition and release of the page queues lock
around calls to vm_page_io_finish() in kern_sendfile() and
vfs_unbusy_pages().
2006-08-09 17:43:27 +00:00
yar
209e4786e7 Commit the results of the typo hunt by Darren Pilgrim.
This change affects documentation and comments only,
no real code involved.

PR:		misc/101245
Submitted by:	Darren Pilgrim <darren pilgrim bitfreak org>
Tested by:	md5(1)
MFC after:	1 week
2006-08-04 07:56:35 +00:00
delphij
578a019ead When the volume is being downgraded from a read-write mode, mark
it as clean.

PR:		kern/85366
Submitted by:	Dan Lukes <dan at obluda dot cz>
MFC After:	2 weeks
2006-08-03 03:55:52 +00:00
yar
ee7ffb561f In udf_find_partmaps(), when we find a type 1 partition map, we have to
skip the actual type 1 length (6 bytes). With this change, it is now possible
to correctly spot the VAT partition map in certain discs.

Submitted by:	Pedro Martelletto <pedro@ambientworks.net>
2006-07-25 14:15:50 +00:00
jhb
4e36804678 Update comment. 2006-07-18 22:29:54 +00:00
jhb
f126d3e34b Lock the smb share before doing a 'put' on it in smbfs_unmount().
Tested by:	"Jiawei Ye" <leafy7382 at gmail>
2006-07-17 16:13:42 +00:00
phk
0f924547b0 Remove the NDEVFSINO and NDEVFSOVERFLOW options which no longer exists in
DEVFS.

Remove the opt_devfs.h file now that it is empty.
2006-07-17 09:07:02 +00:00
ups
b5dc376dfa Add vnode interlocking to devfs.
This prevents race conditions that can cause pagefaults or devfs
to use arbitrary vnodes.

MFC after:	1 week
2006-07-12 20:25:35 +00:00
jhb
e09e5b52db Add a kern_close() so that the ABIs can close a file descriptor w/o having
to populate a close_args struct and change some of the places that do.
2006-07-08 20:03:39 +00:00
rwatson
ef5c0fe5ce Remove unneeded mac.h include.
MFC after:	3 days
2006-07-06 13:25:01 +00:00
rwatson
c9e5505f09 Remove now unneeded opt_mac.h and mac.h includes.
MFC after:	3 days
2006-07-06 13:24:22 +00:00
rwatson
9d9a014802 Use #include "", not #include <> for opt_foo.h.
MFC after:	3 days
2006-07-06 13:22:08 +00:00
netchild
1fbdab64b8 Correctly calculate a buffer length. It was off by one so a read() returned
one byte less than needed.

This is a RELENG_x_y candidate, since it fixes a problem with Oracle 10.

Noticed by:	Dmitry Ganenko <dima@apk-inform.com>
Testcase by:	Dmitry Ganenko <dima@apk-inform.com>
Reviewed by:	des
Submitted by:	rdivacky
Sponsored by:	Google SoC 2006
MFC after:	1 week
2006-06-27 20:21:38 +00:00
scottl
36eaea84d8 Fix a memory leak and a nested 'for' loop in the spare table handling.
Submitted by: Pedro Martelletto
2006-06-26 03:21:19 +00:00
ghelmer
253ab973ad Upon further review, DES prefers this change over that in revision 1.13
to resolve the directory access problem for processes with P_SUGID flag
set.

Suggested by: des
2006-06-05 16:41:27 +00:00
rodrigc
f00265f1cc mount_msdosfs.c:
- remove call to getmntopts(), and just pass -o options to
    nmount().  This removes some confusion as to what options
    msdosfs can parse, by pushing the responsibility of option parsing
    to the VFS and FS specific code in the kernel.

msdosfs_vfsops.c:
  - add "force" and "sync" to msdosfs_opts.  They used to be specified
    in mount_msdosfs.c, so move them here.  It's not clear whethere these
    options should be placed into global_opts in vfs_mount.c or not.

Motivated by:	marcus
2006-06-01 02:25:00 +00:00
cperciva
4e501fd8a3 Enable inadvertantly disabled "securenet" access controls in ypserv. [1]
Correct a bug in the handling of backslash characters in smbfs which can
allow an attacker to escape from a chroot(2). [2]

Security:	FreeBSD-SA-06:15.ypserv [1]
Security:	FreeBSD-SA-06:16.smbfs [2]
2006-05-31 22:32:22 +00:00
rodrigc
92e37b9fef Remove incorrect null_checkexp() routine. This
will allow the NFS server to call vfs_stdcheckexp() on the exported nullfs
filesystem, not the underlying filesystem being nullfs mounted.
If the lower filesystem was not NFS exported, then the NFS exported
null filesystem would not work.

Pointed out by:	scottl
PR:		kern/87906
MFC after:	1 week
2006-05-28 22:45:52 +00:00
rodrigc
d1d9c4f5bc Modify MNT_UPDATE behavior for nullfs so that it does not
return EOPNOTSUPP if an "export" parameter was passed in.
This should allow nullfs mounts to be NFS exported.

PR:		kern/87906
MFC after:	1 week
2006-05-28 20:09:18 +00:00
rodrigc
52bbc2b4ab Remove calls to vfs_export() for exporting a filesystem for NFS mounting
from individual filesystems.  Call it instead in vfs_mount.c,
after we call VFS_MOUNT() for a specific filesystem.
2006-05-26 01:21:51 +00:00
rodrigc
055e2abe68 Remove calls to vfs_export() for exporting a filesystem for NFS mounting
from individual filesystems.  Call it instead in vfs_mount.c,
after we call VFS_MOUNT() for a specific filesystem.
2006-05-26 00:32:21 +00:00
ups
d193a40233 Call vm_object_page_clean() with the object lock held.
Submitted by:	kensmith@
Reviewed by:	mohans@
MFC after:	6 days
2006-05-25 17:16:11 +00:00
ups
4eb5a7d9ee Do not set B_NOCACHE on buffers when releasing them in flushbuflist().
If B_NOCACHE is set the pages of vm backed buffers will be invalidated.
However clean buffers can be backed by dirty VM pages so invalidating them
can lead to data loss.
Add support for flush dirty page in the data invalidation function
of some network file systems.

This fixes data losses during vnode recycling (and other code paths
using invalbuf(*,V_SAVE,*,*)) for data written using an mmaped file.

Collaborative effort by: jhb@,mohans@,peter@,ps@,ups@
Reviewed by:	tegge@
MFC after:	7 days
2006-05-25 01:00:35 +00:00
ghelmer
8ffa3afe92 Revision 1.4 set access for all sensitive files in /proc/<PID> to mode 0
if a process's uid or gid has changed, but the /proc/<PID> directory
itself was also set to mode 0.  Assuming this doesn't open any
security holes, open access to the /proc/<PID> directory for users
other than root to read or search the directory.

Reviewed by:	des (back in February)
MFC after:	3 weeks
2006-05-24 14:03:51 +00:00
phk
ef310efff8 Since DELAY() was moved, most <machine/clock.h> #includes have been
unnecessary.
2006-05-16 14:37:58 +00:00
kbyanc
defe42e909 Restore the ability to mount procfs and fdescfs filesystems via the
mount(2) system call:

  * Add cmount hook to fdescfs and pseudofs (and, by extension, procfs and
    linprocfs).  This (mostly) restores the ability to mount these
    filesystems using the old mount(2) system call (see below for the
    rest of the fix).

  * Remove not-NULL check for the data argument from the mount(2) entry
    point.  Per the mount(2) man page, it is up to the individual
    filesystem being mounted to verify data.  Or, in the case of procfs,
    etc. the filesystem is free to ignore the data parameter if it does
    not use it.  Enforcing data to be not-NULL in the mount(2) system call
    entry point prevented passing NULL to filesystems which ignored the
    data pointer value.  Apparently, passing NULL was common practice
    in such cases, as even our own mount_std(8) used to do it in the
    pre-nmount(2) world.

All userland programs in the tree were converted to nmount(2) long ago,
but I've found at least one external program which broke due to this
(presumably unintentional) mount(2) API change.  One could argue that
external programs should also be converted to nmount(2), but then there
isn't much point in keeping the mount(2) interface for backward
compatibility if it isn't backward compatible.
2006-05-15 19:42:10 +00:00
pjd
b8538a9381 Remove unused prototypes. 2006-04-12 12:17:29 +00:00
jeff
158187fcb0 - Add a bogus vhold/vdrop around vgone() in devfs_revoke. Without this
the vnode is never recycled.  It is bogus because the reference really
   should be associated with the devfs dirent.
2006-03-31 23:37:29 +00:00
tegge
1952671e7a Call vn_start_write() before locking vnode. 2006-03-19 20:45:06 +00:00
rwatson
918de4c556 Add a_fdidx to comment prototype for fifo_open().
MFC after:	3 days
Submitted by:	Kostik Belousov <kostikbel at gmail dot com>
2006-03-15 10:15:35 +00:00
rwatson
40fd390520 If fifo_open() is called with a negative file descriptor, return EINVAL
rather than panicking later.  This can occur if the kernel calls
vn_open() on a fifo, as there will be no associated file descriptor,
and therefore the file descriptor operations cannot be modified to
point to the fifo operation set.

MFC after:	3 days
Reported by:	Martin <nakal at nurfuerspam dot de>
PR:		94278
2006-03-14 19:29:45 +00:00
joerg
0ec2804cee When encountering a ISO_SUSP_CFLAG_ROOT element in Rock Ridge
processing, this actually means there's a double slash recorded in the
symbolic link's path name.  We used to start over from / then, which
caused link targets like ../../bsdi.1.0/include//pathnames.h to be
interpreted as /pathnahes.h.  This is both contradictionary to our
conventional slash interpretation, as well as potentially dangerous.

The right thing to do is (obviously) to just ignore that element.

bde once pointed out that mistake when he noticed it on the
4.4BSD-Lite2 CD-ROM, and asked me for help.

Reviewed by:	bde (about half a year ago)
MFC after:	3 days
2006-03-13 22:32:33 +00:00
jeff
c98db28d0e - Define a null_getwritemount to get the mount-point for the lower
filesystem so that nullfs doesn't permit you to circumvent snapshots.

Discussed with:		tegge
Sponsored by:		Isilon Systems, Inc.
2006-03-12 04:58:18 +00:00
kris
f1759f2396 Correct the vnode locking in fdescfs.
PR:		kern/93905
Submitted by:	Kostik Belousov <kostikbel@gmail.com>
Reviewed by:	jeff
MFC After:	1 week
2006-02-28 00:05:44 +00:00
yar
3822005cd3 CODA_COMPAT_5 may not be defined unconditionally in the coda5 module.
Otherwise a kernel build would break in the coda5 module if the main
kernel conf file enabled CODA_COMPAT_5, too.  Redefined symbols are
strictly disallowed by -Werror.

To overcome this issue, introduce a different symbol indicating coda5
build, CODA5_MODULE, and translate it to CODA_COMPAT_5 appropriately
in /sys/coda/coda.h.

MFC after:	3 days
2006-02-27 12:04:13 +00:00
jhb
ff9c76bccd Close some races between procfs/ptrace and exit(2):
- Reorder the events in exit(2) slightly so that we trigger the S_EXIT
  stop event earlier.  After we have signalled that, we set P_WEXIT and
  then wait for any processes with a hold on the vmspace via PHOLD to
  release it.  PHOLD now KASSERT()'s that P_WEXIT is clear when it is
  invoked, and PRELE now does a wakeup if P_WEXIT is set and p_lock drops
  to zero.
- Change proc_rwmem() to require that the processing read from has its
  vmspace held via PHOLD by the caller and get rid of all the junk to
  screw around with the vmspace reference count as we no longer need it.
- In ptrace() and pseudofs(), treat a process with P_WEXIT set as if it
  doesn't exist.
- Only do one PHOLD in kern_ptrace() now, and do it earlier so it covers
  FIX_SSTEP() (since on alpha at least this can end up calling proc_rwmem()
  to clear an earlier single-step simualted via a breakpoint).  We only
  do one to avoid races.  Also, by making the EINVAL error for unknown
  requests be part of the default: case in the switch, the various
  switch cases can now just break out to return which removes a _lot_ of
  duplicated PRELE and proc unlocks, etc.  Also, it fixes at least one bug
  where a LWP ptrace command could return EINVAL with the proc lock still
  held.
- Changed the locking for ptrace_single_step(), ptrace_set_pc(), and
  ptrace_clear_single_step() to always be called with the proc lock
  held (it was a mixed bag previously).  Alpha and arm have to drop
  the lock while the mess around with breakpoints, but other archs
  avoid extra lock release/acquires in ptrace().  I did have to fix a
  couple of other consumers in kern_kse and a few other places to
  hold the proc lock and PHOLD.

Tested by:	ps (1 mostly, but some bits of 2-4 as well)
MFC after:	1 week
2006-02-22 18:57:50 +00:00
jhb
82b4c89720 Change pfs_visible() to optionally return a pointer to the process
associated with the passed in pfs_node.  If it does return a pointer, it
keeps the process locked.  This allows a lot of places that were calling
pfind() again right after pfs_visible() to not have to do that and avoids
races since we don't drop the proc lock just to turn around and lock it
again.  This will become more important with future changes to fix races
between procfs/ptrace and exit(2).  Also, removed a duplicate pfs_visible()
call in pfs_getextattr().

Reviewed by:	des
MFC after:	1 week
2006-02-22 17:24:54 +00:00
jhb
d6902d680b Hold the proc lock while calling proc_sstep() since the function asserts
it and remove a PRELE() that didn't have a matching PHOLD().  The calling
code already has a PHOLD anyway.

MFC after:	1 week
2006-02-22 17:20:37 +00:00
jeff
4c3ad6634a - We must hold a reference to a vnode before calling vgone() otherwise
it may not be removed from the freelist.

MFC After:	1 week
Found by:	kris
2006-02-22 09:05:40 +00:00
jeff
d08a9e33bf - spell VOP_LOCK(vp, LK_RELEASE... VOP_UNLOCK(vp,... so that asserts in
vop_lock_post do not trigger.
 - Rearrange null_inactive to null_hashrem earlier so there is no chance
   of finding the null node on the hash list after the locks have been
   switched.
 - We should never have a NULL lowervp in null_reclaim() so there is
   no need to handle this situation.  panic instead.

MFC After:	1 week
2006-02-22 06:17:31 +00:00
jeff
79ca2be05c - Assert that the lowervp is locked in null_hashget().
- Simplify the logic dealing with recycled vnodes in null_hashget() and
   null_hashins().  Since we hold the lower node locked in both cases
   the null node can not be undergoing recycling unless reclaim somehow
   called null_nodeget().  The logic that was in place was not safe and
   was essentially dead code.

MFC After:	1 week
2006-02-22 06:15:12 +00:00
jeff
3133ba817d - Deadfs should not use the std GETWRITEMOUNT routine. Add one that always
returns NULL.

MFC After:	1 week
2006-02-22 06:11:59 +00:00
jhb
6acd384eb7 Correctly set MNTK_MPSAFE flag from the lower vnode's mount rather than
always turning it on along with any flags set in the lower mount.

Tested by:	kris
Reviewed by:	jeff
MFC after:	3 days
2006-02-10 18:06:49 +00:00
jeff
9ea95f5e38 - No need to WANTPARENT when we're just going to vrele it in a deadlock
prone way later.

Reported by:	kkenn
MFC After:	3 days
2006-02-07 11:31:32 +00:00
will
a82365919d Make UDF endian-safe.
Submitted by:	Pedro Martelletto <pedro@ambientworks.net> (via scottl)
Tested on:	sparc64
2006-02-03 15:25:52 +00:00
jeff
30a231055b - Reorder calls to vrele() after calls to vput() when the vrele is a
directory.  vrele() may lock the passed vnode, which in these cases would
   give an invalid lock order of child -> parent.  These situations are
   deadlock prone although do not typically deadlock because the vrele
   is typically not releasing the last reference to the vnode.  Users of
   vrele must consider it as a call to vn_lock() and order it appropriately.

MFC After: 	1 week
Sponsored by:	Isilon Systems, Inc.
Tested by:	kkenn
2006-02-01 00:25:26 +00:00
jeff
af5f248494 - Remove a stale comment. This function was rewritten to be SMP safe some
time ago.

Sponsored by:	Isilon Systems, Inc.
2006-01-30 08:24:14 +00:00
trhodes
54e5c67329 Update incorrect comments here, there should not be a call to panic()
over fs corruption.

Discussed with:	alfred, phk
2006-01-23 17:45:57 +00:00
fjoe
e39df5af9f Do not assume that `char direntry::deExtension[3]' starts right after
`char direntry::deName[8]' and access deExtension[] explicitly.

Found by:	Coverity Prevent(tm)
CID:		350, 351, 352
2006-01-22 21:09:38 +00:00
rwatson
56bc8d8e33 Convert last four functions in coda_vnops.c to ANSI C function
declarations.  I knew I would get to fix something in Coda
eventually.

MFC after:	1 week
2006-01-21 19:51:47 +00:00
alfred
a0282ebc04 I ran into an nfs client panic a couple of times in a row over the
last few days.  I tracked it down to the fact that nfs_reclaim()
is setting vp->v_data to NULL _before_ calling vnode_destroy_object().
After silence from the mailing list I checked further and discovered
that ufs_reclaim() is unique among FreeBSD filesystems for calling
vnode_destroy_object() early, long before tossing v_data or much
of anything else, for that matter.  The rest, including NFS, appear
to be identical, as if they were just clones of one original routine.

The enclosed patch fixes all file systems in essentially the same
way, by moving the call to vnode_destroy_object() to early in the
routine (before the call to vfs_hash_remove(), if any).  I have
only tested NFS, but I've now run for over eighteen hours with the
patch where I wouldn't get past four or five without it.

Submitted by: Frank Mayhar
Requested by: Mohan Srinivasan
MFC After: 1 week
2006-01-17 17:29:03 +00:00
tegge
d344c11861 Add marker vnodes to ensure that all vnodes associated with the mount point are
iterated over when using MNT_VNODE_FOREACH.

Reviewed by:	truckman
2006-01-09 20:42:19 +00:00
maxim
93d7e294fc o Fix typo in the define: s/MRAK_INT_GEN/MARK_INT_GEN/. The typo
was harmless because the define is not used in coda_vfsops.c.

Submitted by:	Hugo Meiland
2006-01-09 18:07:06 +00:00
maxim
e065d5a185 o Typo in the debug message: s/skiped/skipped.
PR:		kern/91346
Submitted by:	Gavin Atkinson
2006-01-05 13:39:23 +00:00
rwatson
428f554873 When returning EIO from DEVFSIO_RADD ioctl, drop the exclusive rule
lock.  Otherwise the system comes to a rather sudden and grinding
halt.

MFC after:	1 week
2006-01-03 09:49:10 +00:00
trhodes
412f766852 Make tv_sec a time_t on all platforms but alpha. Brings us more in line with
POSIX.  This also makes the struct correct we ever implement an i386-time64
architecture.  Not that we need too.

Reviewed by:	imp, brooks
Approved by:	njl (acpica), des (no objects, touches procfs)
Tested with:	make universe
2005-12-24 22:22:17 +00:00
des
5d3c44687b Eradicate caddr_t from the VFS API. 2005-12-14 00:49:52 +00:00
avatar
257abc1fc1 Recent nmount(2) adoption in mount_smbfs(8) did not flag the "long" option
since mount_smbfs(8) assumed long name mounting by default unless "-n long"
was explicitly specified.

Rather than supplying a "long" option in mount_smbfs(8), this commit brings
back the original behaviour by associating SMBFS_MOUNT_NO_LONG with the
"nolong" option.  This should fix the broken long file names on smbfs people
observed recently.

Reported by:	Vladimir Grebenschikov <vova at fbsd dot ru>
Reviewed by:	phk
Tested by:	Slawa Olhovchenkov <slw at zxy dot spb dot ru>
2005-12-05 19:05:06 +00:00
ru
9b19d72862 Fix -Wundef warnings found when compiling i386 LINT, GENERIC and
custom kernels.
2005-12-05 11:58:35 +00:00
ru
798500dfd8 Fix -Wundef from compiling the amd64 LINT. 2005-12-04 10:06:06 +00:00
ru
522e9c2b7b Fix -Wundef. 2005-12-04 02:12:43 +00:00
bp
4248fbe6a6 Fix interaction with Windows 2000/XP based servers:
If the complete reply on the TRANS2_FIND_FIRST2 request fits exactly
into one responce packet, then next call to TRANS2_FIND_NEXT2 will return
zero entries and server will close current transaction.  To avoid
subsequent errors we should not perform FIND_CLOSE2 request.

PR:		kern/78953
Submitted by:	Jim Carroll
2005-11-22 07:13:00 +00:00
rodrigc
736e6b710d Properly parse the nowin95 mount option.
Tested by:	Rainer Hurling <rhurlin at gwdg dot de>
2005-11-19 16:38:39 +00:00
rodrigc
666e602c46 Add "shortnames" and "longnames" mount options which are
synonyms for "shortname" and "longname" mount options.  The old
(before nmount()) mount_msdosfs program accepted "shortnames" and "longnames",
but the kernel nmount() checked for "shortname" and "longname".
So, make the kernel accept "shortnames", "longnames", "shortname", "longname"
for forwards and backwarsd compatibility.

Discovered by:	Rainer Hurling <rhurlin at gwdg dot de>
2005-11-18 22:34:31 +00:00
rodrigc
bda951e116 - Add errmsg to the list of smbfs mount options.
- Use vfs_mount_error() to propagate smbfs mount errors back to userspace.

Reviewed by:	bp (smbfs maintainer)
2005-11-16 02:26:25 +00:00
dwhite
0bcdf7c033 This is a workaround for a complicated issue involving VFS cookies and devfs.
The PR and patch have the details. The ultimate fix requires architectural
changes and clarifications to the VFS API, but this will prevent the system
from panicking when someone does "ls /dev" while running in a shell under the
linuxulator.

This issue affects HEAD and RELENG_6 only.

PR:		88249
Submitted by:	"Devon H. O'Dell" <dodell@ixsystems.com>
MFC after:	3 days
2005-11-09 22:03:50 +00:00
rwatson
be4f357149 Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in
  memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
  as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
  memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
  attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion.  Similar changes are required for UMA zone names.
2005-10-31 15:41:29 +00:00
phk
3ed5c9efd0 Use correct cirteria for determining which directory entries we can
purge right away and which we merely can hide.

Beaten into my skull by:	kris
2005-10-18 20:21:25 +00:00
des
4426988f2c Implement the full range of ISO9660 number conversion routines in iso.h.
MFC after:	2 weeks
2005-10-18 13:35:08 +00:00
rodrigc
cb57dd08dd Unconditionally mount a CD9660 filesystem as read-only, instead of
returning EROFS if we forget to mount it as read-only.
2005-10-17 03:29:53 +00:00
rodrigc
49d8776e03 Use the actual sector size of the media instead of hard-coding it to 2048.
This eliminates KASSERTs in GEOM if we accidentally mount an audio CD
as a cd9660 filesystem.
2005-10-17 03:27:35 +00:00
rodrigc
adce9d7a14 Unconditionally mount a UDF filesystem as read-only, instead of
returning an EROFS if we forget to mount it as read-only.
2005-10-17 03:07:36 +00:00
flz
a0cf3f9b58 - Fix typo.
Approved by:	ssouhlal
MFC after:	1 week
2005-10-17 00:04:35 +00:00
truckman
321926b9ba Update nwfs_lookup() to match the current cache_lookup() API.
cache_lookup() has returned a ref'ed and locked vnode since
vfs_cache.c:1.96, dated Tue Mar 29 12:59:06 2005 UTC.  This change
is similar to the change made to smbfs_lookup() in smbfs_vnops.c:1.58.

Tested by:	"Antony Mawer" ant AT mawer.org
MFC after:	2 weeks
2005-10-16 21:54:35 +00:00
kris
fa8ac58228 Reflect mpsafety of the underlying filesystem in the nullfs image.
I benchmarked this by simultaneously extracting 4 large tarballs (basically
world images) on a 4-processor AMD64 system, in a malloc-backed md.

With this patch, system time was reduced by 43%, and wall clock time by 33%.

Submitted by:	jeff
MFC after: 	1 week
2005-10-16 21:45:25 +00:00
truckman
80700e8efc Apply the same fix to a potential race in the ISDOTDOT code in
cd9660_lookup() that was used to fix an actual race in ufs_lookup.c:1.78.
This is not currently a hazard, but the bug would be activated by
marking cd9660 as MPSAFE.

Requested by:	bde
2005-10-16 21:41:54 +00:00
yar
924e74a759 In preparation for making the modules actually use opt_*.h files
provided in the kernel build directory, fix modules that were
failing to build this way due to not quite correct kernel option
usage.  In particular:

ng_mppc.c uses two complementary options, both of which are listed
in sys/conf/files.  Ideally, there should be a separate option for
including ng_mppc.c in kernel build, but now only
NETGRAPH_MPPC_ENCRYPTION is usable anyway, the other one requires
proprietary files.

nwfs and smbfs were trying to ensure they were built with proper
network components, but the check was rather questionable.

Discussed with:	ru
2005-10-14 23:17:45 +00:00
davidxu
3fbdb3c215 1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
   sendsig use discrete parameters, now they uses member fields of
   ksiginfo_t structure. For sendsig, this change allows us to pass
   POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
   generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
   blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
   thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
   be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
   an extra siginfo list holds additional siginfo_t data for signals.
   kernel code uses psignal() still behavior as before, it won't be failed
   even under memory pressure, only exception is when deleting a signal,
   we should call sigqueue_delete to remove signal from sigqueue but
   not SIGDELSET. Current there is no kernel code will deliver a signal
   with additional data, so kernel should be as stable as before,
   a ksiginfo can carry more information, for example, allow signal to
   be delivered but throw away siginfo data if memory is not enough.
   SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
   not be caught or masked.
   The sigqueue() syscall allows user code to queue a signal to target
   process, if resource is unavailable, EAGAIN will be returned as
   specification said.
   Just before thread exits, signal queue memory will be freed by
   sigqueue_flush.
   Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64
2005-10-14 12:43:47 +00:00
rodrigc
5eb4cdb703 - Do not hardcode the bsize to a sectorsize of 2048, even though
the UDF specification specifies a logical sectorsize of 2048.
  Instead, get it from GEOM.
- When reading the UDF Anchor Volume Descriptor, use the logical
  sectorsize of 2048 when calculating the offset to read from, but
  use the actual sectorsize to determine how much to read.

- works with reading a DVD disk and a DVD disk image file via mdconfig
- correctly returns EINVAL if we try to mount_udf an audio CD, instead
  of panicking inside GEOM when INVARIANTS is set
2005-10-09 04:45:33 +00:00
pjd
5c01c35a0c We don't need 'imp' here. 2005-10-07 10:30:47 +00:00
rwatson
5758dab896 Second attempt at a work-around for fifo-related socket panics during
make -j with high levels of parallelism: acquire Giant in fifo I/O
routines.

Discussed with:	ups
MFC after:	3 days
2005-10-01 20:15:41 +00:00
phk
b23c35710b The NWFS code in RELENG_6 is broken due to a typo in
sys/fs/nwfs/nwfs_vfsop= s.c, introduced with the conversion to
nmount with revision 1.38. This causes mount_nwfs to fail with
the error message:

  mount_nwfs: mount error: /mnt/netware: syserr = No such file or directo=
ry

This is caused by a typo on line 178, which specifies "nwfw_args"
rather than "nwfs_args".

Submitted by:	Antony Mawer <gnats@mawer.org>
Fat fingers:	phk
PR:		86757
MFC:		3 days
2005-09-30 18:21:05 +00:00
peadar
e0565b5794 Remove checks for BOOTSIG[23] from FAT32 bootblocks.
There seems to be very little documentary evidence outside this
implementation to suggest a these checks are neccessary, and more
than one camera-formatted flash disk fails the check, but mounts
successfully on most other systems.

Reviewed By: bde@
2005-09-29 14:09:46 +00:00
rwatson
332a994af0 Back out fifo_vnops.c:1.127, which introduced an sx lock around I/O on
a fifo.  While this did indeed close the race, confirming suspicions
about the nature of the problem, it causes difficulties with blocking
I/O on fifos.

Discussed with:		ups
Also spotted by:	Peter Holm <peter at holm dot cc>
2005-09-27 16:45:22 +00:00
rwatson
87ac2d2498 Assert v_fifoinfo is non-NULL in fifo_close() in order to catch
non-conforming cases sooner.

MFC after:	3 days
Reported by:	Peter Holm <peter at holm dot cc>
2005-09-26 08:17:03 +00:00
rwatson
c9044f078d Lock the read socket receive buffer when frobbing the sb_state flag on
that socket during open, not the write socket receive buffer.  This
might explain clearing of the sb_state SB_LOCK flag seen occasionally
in soreceive() on fifos.

MFC after:	3 days
Spotted by:	ups
2005-09-25 19:52:09 +00:00
phk
a951e327e6 Make rule zero really magical, that way we don't have to do anything
when we mount and get zero cost if no rules are used in a mountpoint.

Add code to deref rules on unmount.

Switch from SLIST to TAILQ.

Drop SYSINIT, use SX_SYSINIT and static initializer of TAILQ instead.

Drop goto, a break will do.

Reduce double pointers to single pointers.

Combine reaping and destroying rulesets.

Avoid memory leaks in a some error cases.
2005-09-24 07:03:09 +00:00
rwatson
27cf2ffed6 For reasons of consistency (and necessity), assert an exclusive vnode
lock on the fifo vnode in fifo_open(): we rely on the vnode lock to
serialize access to v_fifoinfo.

MFC after:	3 days
2005-09-23 12:39:51 +00:00
rwatson
9c2c1cb9fb Add fi_sx, an sx lock to serialize I/O operations on the socket pair
underlying the POSIX fifo implementation.  In 6.x/7.x, fifo access is
moved from the VFS layer, where it was serialized using the vnode
lock, to the file descriptor layer, where access is protected by a
reference count but not serialized.  This exposed socket buffer
locking to high levels of parallelism in specific fifo workloads, such
as make -j 32, which expose as yet unresolved socket buffer bugs.

fi_sx re-adds serialization about the read and write routines,
although not paths that simply test socket buffer mbuf queue state,
such as the poll and kqueue methods.  This restores the extra locking
cost previously present in some cases, but is an effective workaround
for the instability that has been experienced.  This workaround should
be removed once the bug in socket buffer handling has been fixed.

Reported by:	kris, jhb, Julien Gabel <jpeg at thilelli dot net>,
		Peter Holm <peter at holm dot cc>, others
MFC after:	3 days
2005-09-22 10:51:12 +00:00
phk
6a408cbd71 Rewamp DEVFS internals pretty severely [1].
Give DEVFS a proper inode called struct cdev_priv.  It is important
to keep in mind that this "inode" is shared between all DEVFS
mountpoints, therefore it is protected by the global device mutex.

Link the cdev_priv's into a list, protected by the global device
mutex.  Keep track of each cdev_priv's state with a flag bit and
of references from mountpoints with a dedicated usecount.

Reap the benefits of much improved kernel memory allocator and the
generally better defined device driver APIs to get rid of the tables
of pointers + serial numbers, their overflow tables,  the atomics
to muck about in them and all the trouble that resulted in.

This makes RAM the only limit on how many devices we can have.

The cdev_priv is actually a super struct containing the normal cdev
as the "public" part, and therefore allocation and freeing has moved
to devfs_devs.c from kern_conf.c.

The overall responsibility is (to be) split such that kern/kern_conf.c
is the stuff that deals with drivers and struct cdev and fs/devfs
handles filesystems and struct cdev_priv and their private liason
exposed only in devfs_int.h.

Move the inode number from cdev to cdev_priv and allocate inode
numbers properly with unr.  Local dirents in the mountpoints
(directories, symlinks) allocate inodes from the same pool to
guarantee against overlaps.

Various other fields are going to migrate from cdev to cdev_priv
in the future in order to hide them.  A few fields may migrate
from devfs_dirent to cdev_priv as well.

Protect the DEVFS mountpoint with an sx lock instead of lockmgr,
this lock also protects the directory tree of the mountpoint.

Give each mountpoint a unique integer index, allocated with unr.
Use it into an array of devfs_dirent pointers in each cdev_priv.
Initially the array points to a single element also inside cdev_priv,
but as more devfs instances are mounted, the array is extended with
malloc(9) as necessary when the filesystem populates its directory
tree.

Retire the cdev alias lists, the cdev_priv now know about all the
relevant devfs_dirents (and their vnodes) and devfs_revoke() will
pick them up from there.  We still spelunk into other mountpoints
and fondle their data without 100% good locking.  It may make better
sense to vector the revoke event into the tty code and there do a
destroy_dev/make_dev on the tty's devices, but that's for further
study.

Lots of shuffling of stuff and churn of bits for no good reason[2].

XXX: There is still nothing preventing the dev_clone EVENTHANDLER
from being invoked at the same time in two devfs mountpoints.  It
is not obvious what the best course of action is here.

XXX: comment out an if statement that lost its body, until I can
find out what should go there so it doesn't do damage in the meantime.

XXX: Leave in a few extra malloc types and KASSERTS to help track
down any remaining issues.

Much testing provided by:		Kris
Much confusion caused by (races in):	md(4)

[1] You are not supposed to understand anything past this point.

[2] This line should simplify life for the peanut gallery.
2005-09-19 19:56:48 +00:00
rwatson
2d8b6f2e27 Assert that (vp) is locked in fifo_close(), since we rely on the
exclusive vnode lock to synchronize the reference counts on struct
fifoinfo.

MFC after:	3 days
2005-09-18 10:44:50 +00:00
phk
2d4ad1cc44 Don't attempt to recurse lockmgr, it doesn't like it. 2005-09-15 21:16:43 +00:00
kan
b4bff5a977 Handle a race condition where NULLFS vnode can be cleaned while threads
can still be asleep waiting for lowervp lock.

Tested by:	kkenn
Discussed with: ssouhlal, jeffr
2005-09-15 19:21:26 +00:00
rwatson
7584b6b2c3 The socket pointers in fifoinfo are not permitted to be NULL, so
don't check if they are, it just confuses the fifo code more.

MFC after:	3 days
2005-09-15 15:45:34 +00:00
phk
3c4b94c1fe Various minor polishing. 2005-09-15 10:28:19 +00:00
phk
eafa84f647 Protect the devfs rule internal global lists with a sx lock, the per
mount locks are not enough.  Finer granularity (x)locking could be
implemented, but I prefer to keep it simple for now.
2005-09-15 08:50:16 +00:00
phk
1a7de63bbb Absolve devfs_rule.c from locking responsibility and call it with
all necessary locking held.
2005-09-15 08:36:37 +00:00
phk
a543c5b761 Close a race which could result in unwarranted "ruleset %d already
running" panics.

Previously, recursion through the "include" feature was prevented by
marking each ruleset as "running" when applied.  This doesn't work for
the case where two DEVFS instances try to apply the same ruleset at
the same time.

Instead introduce the sysctl vfs.devfs.rule_depth (default == 1) which
limits how many levels of "include" we will traverse.

Be aware that traversal of "include" is recursive and kernel stack
size is limited.

MFC:	after 3 days
2005-09-15 06:57:28 +00:00
rwatson
0e4f08263a Trim down now (believed to be) unused fifo_ioctl() and
fifo_kqfilter() VOP implementations, since they in theory are used
only on open file descriptors, in which case the ioctls are via
fifo_ioctl_f() and kqueue requests are via fifo_kqfilter_f().
Generate warnings if they are entered for now.  These printf()
calls should become panic() calls.

Annotate and re-implement fifo_ioctl_f(): don't arbitrarily
forward ioctls to the socket layer, only forward the ones we
explicitly support for fifos.  In the case of FIONREAD, don't
forward the request to the write socket on a read-write fifo, or
the read result is overwritten.  Annotate a nasty case for the
undefined POSIX O_RDWR on fifos, in which failure of the second
ioctl will result in the socket pair being in an inconsistent
state.

Assert copyright as I find myself rewriting non-trivial parts of
fifofs.

MFC after:	3 days
2005-09-13 17:46:48 +00:00
rwatson
afc7b6e916 As a result of kqueue locking work, socket buffer locks will always
be held when entering a kqueue filter for fifos via a socket buffer
event: as such, assert the lock unconditionally rather than acquiring
it conditionall.

MFC after:	3 days
2005-09-13 10:39:24 +00:00
rwatson
2bda369cf8 Annotate two issues:
1) fifo_kqfilter() is not actually ever used, it likely should be GC'd.

2) fifo_kqfilter_f() doesn't implement EVFILT_VNODE, so detecting events
   on the underlying vnode for a fifo no longer works (it did in 4.x).
   Likely, fifo_kqfilter_f() should forward the request to the VFS using
   fp->f_vnode, which would work once fifo_kqfilter() was detached from
   the vnode operation vector (removing the fifo override).

Discussed with:	phk
2005-09-13 09:23:22 +00:00
rwatson
bc5e7eb1f3 Introduce no-op nosup fifo kqueue filter and detach routine, which are
used when a read filter is requested on a write-only fifo descriptor, or
a write filter is requested on a read-only fifo descriptor.  This
permits the filters to be registered, but never raises the event, which
causes kqueue behavior for fifos to more closely match similar semantics
for poll and select, which permit testing for the condition even though
the condition will never be raised, and is consistent with POSIX's notion
that a fifo has identical semantics to a one-way IPC channel created
using pipe() on most operating systems.

The fifo regression test suite can now run to completion on HEAD without
errors.

MFC after:	3 days
2005-09-12 19:59:12 +00:00
rwatson
491de3e2d2 When a request is made to register a filter on a fifo that doesn't
apply to the fifo (i.e., not EVFILT_READ or EVFILT_WRITE), reject
it as EINVAL, not by returning 1 (EPERM).

MFC after:	3 days
2005-09-12 18:07:49 +00:00
rwatson
6b308e01a1 Remove DFLAG_SEEKABLE from fifo file descriptors: fifos are not seekable
according to POSIX, not to mention the fact that it doesn't make sense
(and hence isn't really implemented).  This causes the fifo_misc
regression test to succeed.
2005-09-12 12:15:12 +00:00
rwatson
1481446aae Only poll the fifo for read events if the fifo is attached to a readable
file descriptor.  Otherwise, the read end of a fifo might return that it
is writable (which it isn't).

Only poll the fifo for write events if the fifo attached to a writable
file descriptor.  Otherwise, the write end of a fifo might return that
it is readable (which it isn't).

In the event that a file is FREAD|FWRITE (which is allowed by POSIX, but
has undefined behavior), we poll for both.

MFC after:	3 days
2005-09-12 10:16:18 +00:00
rwatson
919d519cbb After going to some trouble to identify only the write-related events
to poll the write socket for, the fifo polling code proceeded to poll
for the complete set of events.  Use 'levents' instead of 'events' as
the argument to poll, and only poll the write socket if there is
interest in write events.

MFC after:	3 days
2005-09-12 10:13:15 +00:00
rwatson
5be69d1d56 When a writer opens a fifo, wake up the read socket for read, not the
write socket.

MFC after:	3 days
2005-09-12 10:07:21 +00:00
rwatson
c9f007b159 Add an assertion that fifo_open() doesn't race against other threads
while sleeping to allocate fifo state: due to using the vnode lock to
serialize access to a fifo during open, it shouldn't happen (tm).

MFC after:	3 days
2005-09-12 10:06:38 +00:00
rwatson
a079789c36 Rather than reaching into the internals of the UNIX domain socket code
by calling uipc_connect2() to connect two socket endpoints to create a
fifo, call soconnect2().

MFC after:	3 days
2005-09-12 10:05:08 +00:00
phk
aca041ee53 Clean up prototypes. 2005-09-12 08:03:15 +00:00
rodrigc
97c59ad2c7 Cast bf_sysid to const char * when passing it to strncmp(), because
strncmp does not take an unsigned char *.  Eliminates warning with GCC 4.0.
2005-09-11 16:02:14 +00:00
rodrigc
aeeba2bf5b Do not declare M_NTFSMNT with extern linkage here, since
it is defined with static linkage in ntfs_vfsops.c.
Fixes compilation with GCC 4.0.
2005-09-11 15:57:07 +00:00
obrien
4b003b6283 Ensure the full value is written into inode variables.
PR:		85503
Submitted by:	Dmitry Pryanishnikov <dmitry@atlantis.dp.ua>
2005-09-07 10:32:58 +00:00
ssouhlal
5ea64800e1 Unbreak hpfs/ntfs/udf/ext2fs/reiserfs mounting.
Another pointyhat to:	ssouhlal
2005-09-03 20:23:41 +00:00
ssouhlal
ad48f84298 Unbreak the build.
Pointyhat to:	ssouhlal
2005-09-03 00:40:19 +00:00
ssouhlal
45954b5047 Use vput() instead of vrele() in null_reclaim() since the lower vnode
is locked.

MFC after:	3 days
2005-09-02 15:49:55 +00:00
ssouhlal
f8217f350b *_mountfs() (if the filesystem mounts from a device) needs devvp to be
locked, so lock it.

Glanced at by:	phk
MFC after:	3 days
2005-09-02 15:27:23 +00:00
phk
a469be1ef3 Add a missing dev_relthread() call.
Remove unused variable.

Spotted by:	Hans Petter Selasky <hselasky@c2i.net>
2005-08-29 11:14:18 +00:00
phk
fcf6768753 Handle device drivers with D_NEEDGIANT in a way which does not
penalize the 'good' drivers:  Allocate a shadow cdevsw and populate
it with wrapper functions which grab Giant
2005-08-17 08:19:52 +00:00
phk
8a3fe94804 Collect the devfs related sysctls in one place 2005-08-16 19:25:02 +00:00
phk
e89ebd4119 Create a new internal .h file to communicate very private stuff
from kern_conf.c to devfs.

For now just two prototypes, more to come.
2005-08-16 19:08:01 +00:00
phk
4edc625526 Eliminate effectively unused dm_basedir field from devfs_mount. 2005-08-15 19:40:53 +00:00
grehan
ba88fd3c57 - restore the ability to mount cd9660 filesystems as root by inverting
some of the options test, specifically the joliet and rockridge tests.
  Since the root mount callchain doesn't go through cd9660_cmount, the
  default mount options aren't set. Rather than having the main codepath
  assume the options are there, test for the absence of the inverted
  optioin

  e.g. instead of vfs_flagopt(.. "joliet" ..), test for
  !vfs_flagopt(.. "nojoliet" ..)

  This works for root mount, non-root mount and future nmount cases.

- in cd9660_cmount, remove inadvertent setting of "gens" when "extatt"
  was set.

Reported by:	grehan, Dario Freni <saturnero at freesbie org>
Tested by:	Dario Freni
Not objected to by:	phk

MFC after:	3 days
2005-08-14 04:19:36 +00:00
des
5610061cd3 Eliminate an unnecessary bcopy(). 2005-08-12 12:22:05 +00:00
obrien
d5d343a0fd Remove public declarations of variables that were forgotten when they were
made static.
2005-08-10 07:10:02 +00:00
obrien
10886230c5 Remove the need to forward declare statics by moving them around. 2005-08-10 07:08:14 +00:00
rwatson
daa1c89f45 Merge the dev_clone and dev_clone_cred event handlers into a single
event handler, dev_clone, which accepts a credential argument.
Implementors of the event can ignore it if they're not interested,
and most do.  This avoids having multiple event handler types and
fall-back/precedence logic in devfs.

This changes the kernel API for /dev cloning, and may affect third
party packages containg cloning kernel modules.

Requested by:	phk
MFC after:	3 days
2005-08-08 19:55:32 +00:00
kris
509594693e devfs is not yet fully MPSAFE - for example, multiple concurrent devfs(8)
processes can cause a panic when operating on rulesets.

Approved by:	phk
2005-07-29 23:00:56 +00:00
simon
dd09386bed Correct devfs ruleset bypass.
Submitted by:	csjp
Reviewed by:	phk
Security:	FreeBSD-SA-05:17.devfs
Approved by:	cperciva
2005-07-20 13:34:16 +00:00
imura
a519d1b3a0 [1] unix2doschr()
If a character cannot be converted to DOS code page,
 unix2doschr() returned `0'. As a result, unix2dosfn()
 was forced to return `0', so we saw a file which was
 composed of these characters as `Invalid argument'.
 To correct this, if a character can be converted to
 Unicode, unix2doschr() now returns `1' which is a magic
 number to make unix2dosfn() know that the character
 must be converted to `_'.

[2] unix2dosfn()
 The above-mentioned solution only works if a file
 has both of Unicode name and DOS code page name.
 Unicode name would not be recorded if file name
 can be settled within 11 bytes (DOS short name)
 and if no conversion from Unix charset to DOS code
 page has occurred. Thus, FreeBSD can create a file
 which has only short name, but there is no guarantee
 that the short name contains allways valid characters
 because we leave it to people by using mount_msdosfs(8)
 to select which conversion is used between DOS code
 page and unix charset.
 To avoid this, Unicode file name should be recorded
 unless a character is an ascii character. This is
 the way Windows XP do.

PR:		77074 [1]
MFC after:	1 week
2005-07-17 07:10:05 +00:00
rwatson
79690d711b When devfs cloning takes place, provide access to the credential of the
process that caused the clone event to take place for the device driver
creating the device.  This allows cloned device drivers to adapt the
device node based on security aspects of the process, such as the uid,
gid, and MAC label.

- Add a cred reference to struct cdev, so that when a device node is
  instantiated as a vnode, the cloning credential can be exposed to
  MAC.

- Add make_dev_cred(), a version of make_dev() that additionally
  accepts the credential to stick in the struct cdev.  Implement it and
  make_dev() in terms of a back-end make_dev_credv().

- Add a new event handler, dev_clone_cred, which can be registered to
  receive the credential instead of dev_clone, if desired.

- Modify the MAC entry point mac_create_devfs_device() to accept an
  optional credential pointer (may be NULL), so that MAC policies can
  inspect and act on the label or other elements of the credential
  when initializing the skeleton device protections.

- Modify tty_pty.c to register clone_dev_cred and invoke make_dev_cred(),
  so that the pty clone credential is exposed to the MAC Framework.

While currently primarily focussed on MAC policies, this change is also
a prerequisite for changes to allow ptys to be instantiated with the UID
of the process looking up the pty.  This requires further changes to the
pty driver -- in particular, to immediately recycle pty nodes on last
close so that the credential-related state can be recreated on next
lookup.

Submitted by:	Andrew Reisse <andrew.reisse@sparta.com>
Obtained from:	TrustedBSD Project
Sponsored by:	SPAWAR, SPARTA
MFC after:	1 week
MFC note:	Merge to 6.x, but not 5.x for ABI reasons
2005-07-14 10:22:09 +00:00
tanimura
f03d3eb58c Regrab dvp only when ISDOTDOT.
Approved by:	re (scottl)
2005-07-09 13:52:49 +00:00
jeff
0d69457df8 - Since we don't hold a usecount in pfs_exit we have to get a holdcnt
prior to calling vgone() to prevent any races.

Sponsored by:	Isilon Systems, Inc.
Approved by:	re (vfs blanket)
2005-07-07 07:33:10 +00:00
peter
921b3c5ee4 Jumbo-commit to enhance 32 bit application support on 64 bit kernels.
This is good enough to be able to run a RELENG_4 gdb binary against
a RELENG_4 application, along with various other tools (eg: 4.x gcore).
We use this at work.

ia32_reg.[ch]: handle the 32 bit register file format, used by ptrace,
	procfs and core dumps.
procfs_*regs.c: vary the format of proc/XXX/*regs depending on the client
	and target application.
procfs_map.c: Don't print a 64 bit value to 32 bit consumers, or their
	sscanf fails.  They expect an unsigned long.
imgact_elf.c: produce a valid 32 bit coredump for 32 bit apps.
sys_process.c: handle 32 bit consumers debugging 32 bit targets.  Note
	that 64 bit consumers can still debug 32 bit targets.

IA64 has got stubs for ia32_reg.c.

Known limitations: a 5.x/6.x gdb uses get/setcontext(), which isn't
implemented in the 32/64 wrapper yet.  We also make a tiny patch to
gdb pacify it over conflicting formats of ld-elf.so.1.

Approved by:	re
2005-06-30 07:49:22 +00:00
peter
2778435f72 Conditionally weaken sys_generic.c rev 1.136 to allow certain dubious
ioctl numbers in backwards compatability mode.  eg: an IOC_IN ioctl with
a size of zero.  Traditionally this was what you did before IOC_VOID
existed, and we had some established users of this in the tree, namely
procfs.  Certain 3rd party drivers with binary userland components also
have this too.

This is necessary to have 4.x and 5.x binaries use these ioctl's.  We
found this at work when trying to run 4.x binaries.

Approved by:	re
2005-06-30 00:19:08 +00:00
imura
5809391d17 Avoid casting from (int *) to (size_t *) in order to fix udf_iconv on amd64.
Reviewed by:	scottl
MFC after:	2 weeks
2005-06-05 02:09:48 +00:00
rodrigc
dba46a3ce7 Do not declare a struct as extern, and then implement
it as static in the same file.  This is not legal C,
and GCC 4.0 will issue an error.

Reviewed by:	phk
Approved by:	das (mentor)
2005-05-31 14:50:49 +00:00
brueffer
7bc989441b Fix three typos in comments. Two of them obtained from OpenBSD.
MFC after:	3 days
2005-05-11 21:10:35 +00:00
kan
3f8ab6c93f Do not dereference dvp pointer before doing a NULL check.
Noticed by: Coverity Prevent analysis tool.
2005-05-11 19:08:38 +00:00
anholt
b6be180393 Staticize a symbol used only in this file.
PR:		kern/43613
Submitted by:	Matt Emmerton, matt at gsicomp dot on dot ca
2005-05-06 20:47:09 +00:00
robert
38711d8dea The printf(9) `%p' conversion specifier puts an "0x" in
front of the pointer value.  Therefore, remove the "0x"
from the format string.
2005-05-06 00:15:57 +00:00
robert
46abb4815c Fix our NTFS readdir function.
To check a directory's in-use bitmap bit by bit, we use
a pointer to an 8 bit wide unsigned value.

The index used to dereference this pointer is calculated
by shifting the bit index right 3 bits.  Then we do a
logical AND with the bit# represented by the lower 3
bits of the bit index.

This is an idiomatic way of iterating through a bit map
with simple bitwise operations.

This commit fixes the bug that we only checked bits
3:0 of each 8 bit chunk, because we only used bits 1:0
of the bit index for the bit# in the current 8 bit value.
This resulted in files not being returned by getdirentries(2).

Change the type of the bit map pointer from `char *' to
`u_int8_t *'.
2005-05-06 00:06:06 +00:00
takawata
68a4e0f83e Fix breakage on alpha.
Pointed out by: hrs via IRC
2005-05-05 07:02:51 +00:00
takawata
a61ec3d816 Make smbfs capable to use 16bit char set in filenames.
PR:78110
2005-05-04 15:05:46 +00:00
jeff
2b167167e2 - Set the v_object pointer after a successful VOP_OPEN(). This isn't a
perfect solution as the lower vm object can change at unpredictable times
   if our lower vp happens to be on another unionfs, etc.

Submitted by:	Oleg Sharoiko <os@rsu.ru>
2005-05-03 11:05:33 +00:00
jeff
6c4a330d28 - In devfs_open() and devfs_close() grab Giant if the driver sets NEEDGIANT.
We still have to DROP_GIANT and PICKUP_GIANT when NEEDGIANT is not set
   because vfs is still sometime entered with Giant held.
2005-05-01 00:56:34 +00:00
des
fe9d4ac270 Fix an old pasto. 2005-04-30 16:27:20 +00:00
jeff
53caed435d - Mark devfs as MNTK_MPSAFE as I belive it does not require Giant.
Sponsored by:	Isilon Systems, Inc.
Agreed in principle by:		phk
2005-04-30 11:24:17 +00:00
jeff
5ae67dae9a - Fix several locking problems in unionfs_mount so that it will come
closer to passing DEBUG_VFS_LOCKS.
2005-04-27 09:07:13 +00:00
jeff
b6552bddeb - Pass the ISOPEN flag down to our lower filesystems.
- Remove an erroneous VOP lock assert.
2005-04-27 09:06:06 +00:00
jeff
cb40cf9c09 - As this is presently the one and only place where duplicate acquires of
the vnode interlock are allowed mark it by passing MTX_DUPOK to this
   lock operation only.

Sponsored by:	Isilon Systems, Inc.
2005-04-22 22:42:44 +00:00
das
839fea181d Disable negative name caching for msdosfs to work around a bug.
Since the name cache is case-sensitive and msdosfs isn't,
creating a file 'foo' won't invalidate a negative entry for 'FOO'.
There are similar problems related to 8.3 filenames.

A better solution is to override VOP_LOOKUP with a method that
canonicalizes the name, then calls vfs_cache_lookup().  Unfortunately,
it's not quite that simple because vfs_cache_lookup() will call
msdosfs_lookup() on a cache miss, and msdosfs_lookup() needs a way to
get at the original component name.
2005-04-16 23:47:19 +00:00
njl
9de8e0daf9 Fix mbnambuf support for multi-byte characters. If a substring is larger
than WIN_CHARS bytes, we shift the suffix (previous substrings) upwards
by the amount this substring exceeds its WIN_CHARS slot.  Profiling shows
this change is indistinguishable from the previous code at 95% confidence.
This bug would result in attempts to access or create files or directories
with multi-byte characters returning an error but no data loss.

Reported and tested by:	avatar
MFC after:	3 days
2005-04-16 01:49:50 +00:00
brueffer
bee55215dc Correct typo.
Obtained from:	OpenBSD
2005-04-14 14:40:09 +00:00
jeff
afab3762a0 - Change all filesystems and vfs_cache to relock the dvp once the child is
locked in the ISDOTDOT case.  Se vfs_lookup.c r1.79 for details.

Sponsored by:	Isilon Systems, Inc.
2005-04-13 10:59:09 +00:00
jeff
69e9f89f88 - Clear VI_OWEINACT before calling vget() with no lock type. We know
the node is actually already locked, and VOP_INACTIVE is not desirable
   in this case.
2005-04-11 11:17:20 +00:00
jeff
9375f1d524 - Honor the flags argument passed to null_root(). The filesystem below
us will decide whether or not to grab a real shared lock.
2005-04-11 11:16:29 +00:00
delphij
4c97b619e5 Initialize vp before using it. Failing to do this can cause instant
panic when trying to access a file on mounted smbfs.

Submitted by:	takawata at jp freebsd org
2005-04-10 03:17:42 +00:00
phk
bae9f6cfa0 Give msdosfs a unique inode number which is really the byteoffset of
the directory entry.

This solves the corruption problem I belive.

Regression test script by:	silby
2005-04-07 07:55:37 +00:00
jeff
3ae5dd8f5a - Fix union's assumptions about when the dvp is unlocked. It is only
unlocked in the ISDOTDOT case now, not for all !ISLASTCN lookups.
2005-04-04 09:36:26 +00:00
phk
7af1e31761 Explicitly hold a reference to the cdev we have just cloned. This
closes the race where the cdev was reclaimed before it ever made it
back to devfs lookup.
2005-03-31 12:19:44 +00:00
phk
2379f61770 cdev (still) needs per instance uid/gid/mode
Add unlocked version of dev_ref()

Clean up various stuff in sys/conf.h
2005-03-31 10:29:57 +00:00
phk
b83adaf8e5 Rename dev_ref() to dev_refl() 2005-03-31 06:51:54 +00:00
jeff
902bc24bce - LK_NOPAUSE is a nop now.
Sponsored by:	Isilon Systems, Inc.
2005-03-31 04:27:49 +00:00
jeff
ca1e4c2fe0 - Remove wantparent, it is no longer necessary. An assert in vfs_lookup.c
prevents any callers from doing a modifying op without
   LOCKPARENT or WANTPARENT.
2005-03-29 13:09:42 +00:00
jeff
7d8081dca4 - Remove wantparent, it is no longer necessary. An assert in vfs_lookup.c
prevents any callers from doing a DELETE or RENAME without locking
   the parent.
2005-03-29 13:04:00 +00:00
jeff
141aba2c7b - cache_lookup() now locks the new vnode for us to prevent some races.
Remove redundant code.

Sponsored by:	Isilon Systems, Inc.
2005-03-29 13:00:37 +00:00
jeff
4084503aa0 - Correct the dprintf format int the _lookup routine.
Spotted by:	pjd
2005-03-28 14:26:01 +00:00
jeff
efb09df0e7 - Garbage collect an unused variable. 2005-03-28 13:45:09 +00:00
jeff
d673a48266 - Don't panic if we can't lock a child in lookup, return an error instead.
- Only unlock the directory if this is a DOTDOT lookup.  Previously this
   code could have deadlocked if there was a DOTDOT lookup with LOCKPARENT
   set and another thread was locking the other way up the tree.

Sponsored by:	Isilon Systems, Inc.
2005-03-28 13:39:16 +00:00
jeff
a84b0d4580 - Remove unnecessary LOCKPARENT manipulation.
Sponsored by:	Isilon Systems, Inc.
2005-03-28 13:29:15 +00:00
jeff
2e5ff94ef5 - nwfs_lookup() is no longer responsible for unlocking the dvp, this is
handled in vfs_lookup.c.  This code was missing PDIRUNLOCK use prior
   to the removal of PDIRUNLOCK in rev 1.73 of vfs_lookup.c.

Sponsored by:	Isilon Systems, Inc.
2005-03-28 09:46:33 +00:00
jeff
527fc2c9cc - hpfs_lookup() is no longer responsible for unlocking the dvp, this is
handled in vfs_lookup.c.  This code was missing PDIRUNLOCK use prior
   to the removal of PDIRUNLOCK in rev 1.73 of vfs_lookup.c.

Sponsored by:   Isilon Systems, Inc.
2005-03-28 09:40:59 +00:00
jeff
b136fd4eee - We no longer have to bother with PDIRUNLOCK, lookup() handles it for us.
Sponsored by:   Isilon Systems, Inc.
2005-03-28 09:34:36 +00:00
jeff
0afa18e58f - We no longer have to bother with PDIRUNLOCK, lookup() handles it for us.
- In the ISDOTDOT case we have to unlock the dvp before locking the child,
   if this fails we must relock dvp before returning an error.  This was
   missing before.

Sponsored by:	Isilon Systems, Inc.
2005-03-28 09:31:57 +00:00
jeff
5f8bc80203 - We no longer have to bother with PDIRUNLOCK, lookup() handles it for us.
- Network filesystems are written with a special idiom that checks the
   cache first, and may even unlock dvp before discovering that a network
   round-trip is required to resolve the name.  I believe dvp is prevented
   from being recycled even in the forced unmount case by the shared lock
   on the mount point.  If not, this code should grow checks for VI_DOOMED
   after it relocks dvp or it will access NULL v_data fields.

Sponsored by:	Isilon Systems, Inc.
2005-03-28 09:29:58 +00:00
jeff
430e7e9a03 - Pass LK_EXCLUSIVE as the lock type to vget in vfs_hash_insert(). 2005-03-25 10:51:55 +00:00
jeff
56f1fc7189 - Update vfs_root implementations to match the new prototype. None of
these filesystems will support shared locks until they are explicitly
   modified to do so.  Careful review must be done to ensure that this
   is safe for each individual filesystem.

Sponsored by:   Isilon Systems, Inc.
2005-03-24 07:39:03 +00:00
jeff
226bf6ead4 - Update vfs_root implementations to match the new prototype. None of
these filesystems will support shared locks until they are explicitly
   modified to do so.  Careful review must be done to ensure that this
   is safe for each individual filesystem.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 07:36:16 +00:00
phk
3b151f1bf5 Use subr_unit 2005-03-19 08:22:36 +00:00
phk
444989f1a6 Also remember to set the fsid here. 2005-03-17 15:15:29 +00:00
phk
96d39aba6d Forgot to replace code to set fsid in vop_getattr. 2005-03-17 14:43:40 +00:00
phk
cfa6bb09ea Prepare for the final onslaught on devices:
Move uid/gid/mode from cdev to cdevsw.

Add kind field to use for devd(8) later.

Bump both D_VERSION and __FreeBSD_version
2005-03-17 12:07:00 +00:00
jeff
91796cd6d7 - Lock the clearing of v_data so it is safe to inspect it with the
interlock.

Sponsored by:	Isilon Systems, Inc.
2005-03-17 12:00:05 +00:00
phk
98f1c9b062 Add two arguments to the vfs_hash() KPI so that filesystems which do
not have unique hashes (NFS) can also use it.
2005-03-16 11:20:51 +00:00
phk
6552218237 Remove unused file 2005-03-16 11:10:38 +00:00
phk
5443e9818b Remove inode fields previously used for private inode hash tables. 2005-03-16 08:09:52 +00:00
phk
eeb2c527c0 XXX: unnecessary pointer in inode. 2005-03-16 07:21:38 +00:00
phk
9189809602 Don't store the disk cdev in all inodes. 2005-03-16 07:17:39 +00:00
phk
909be0f0c2 Don't hold a reference to the disk vnode for each inode.
Eliminate cdev and vnode pointer to the disk from the inodes,
the mount holds everything we need.
2005-03-15 21:09:52 +00:00
phk
c3c76f8185 Eliminate cdev pointer in inodes, they're not used or needed.
The cdev could have been pulled out of the mountpoint cheaper back
when it was used anyway.
2005-03-15 20:57:25 +00:00
phk
54d4b170ba Don't hold a reference on the disk vnode for each inode. 2005-03-15 20:50:58 +00:00
phk
d043926750 Improve the vfs_hash() API: vput() the unneeded vnode centrally to
avoid replicating the vput in all the filesystems.
2005-03-15 20:00:03 +00:00
jeff
57fd917aad - Assume that all lower filesystems now support proper locking. Assert
that they set v->v_vnlock.  This is true for all filesystems in the
   tree.
 - Remove all uses of LK_THISLAYER.  If the lower layer is locked, the
   null layer is locked.  We only use vget() to get a reference now.
   null essentially does no locking.  This fixes LOOKUP_SHARED with
   nullfs.
 - Remove the special LK_DRAIN considerations, I do not believe this is
   needed now as LK_DRAIN doesn't destroy the lower vnode's lock, and
   it's hardly used anymore.
 - Add one well commented hack to prevent the lowervp from going away
   while we're in it's VOP_LOCK routine.  This can only happen if we're
   forcibly unmounted while some callers are waiting in the lock.  In
   this case the lowervp could be recycled after we drop our last ref
   in null_reclaim().  Prevent this with a vhold().
2005-03-15 13:49:33 +00:00
phk
651dd9f4d4 Disable two users of findcdev. They do the wrong thing now and will
need to be fixed.  In both cases the API should be reengineered to do
something (more) sensible.
2005-03-15 12:39:30 +00:00
jeff
b59222bfe5 - We have to transfer lockers after reseting our vnlock pointer.
Sponsored by:	Isilon Systems, Inc.
2005-03-15 11:28:45 +00:00
phk
3337fd988c Don't export major,minor, instead export tty name. 2005-03-15 11:05:11 +00:00
phk
4799d2dacc Print devtoname() instead of minor(). 2005-03-15 10:01:31 +00:00
phk
8ea9004b75 Fix typo: pointers are not boolean in style(9). 2005-03-15 10:01:14 +00:00
phk
124bf5e823 Simplify the vfs_hash calling convention. 2005-03-15 08:07:07 +00:00
des
8bd55ce9cb Hook pfs_lookup() up to vfs_cachedlookup_desc instead of vfs_lookup_desc,
as suggested by Matt's comment.  Also fix some style and paranoia issues.

The entire function could benefit from review by a VFS guru.

MFC after:	6 weeks
2005-03-14 16:24:50 +00:00