Commit Graph

1560 Commits

Author SHA1 Message Date
phk
d8b3df3cb9 Make VOP_BMAP return a struct bufobj for the underlying storage device
instead of a vnode for it.

The vnode_pager does not and should not have any interest in what
the filesystem uses for backend.

(vfs_cluster doesn't use the backing store argument.)
2004-11-15 09:18:27 +00:00
phk
aa4f69ad30 Integrate most of vop_revoke() into devfs_revoke() where it belongs. 2004-11-13 23:37:29 +00:00
phk
437fa95897 Add the devfs_fp_check() function which helps us get from a struct file
to a cdev and a devsw, doing all the relevant checks along the way.

Add the check to see if fp->f_vnode->v_rdev differs from our cached
fp->f_data copy of our cdev.  If it does the device was revoked and
we return ENXIO.
2004-11-13 23:21:54 +00:00
phk
5a34290b15 VOP_REVOKE() is only ever for VCHR vnodes, so unionfs does not
need a vop_revoke() method.
2004-11-13 22:56:26 +00:00
phk
7204610fc8 fifos doesn't need a vop_lookup, the default will do fine. 2004-11-13 18:51:13 +00:00
phk
216166ee0d Introduce an alias for FILEDESC_{UN}LOCK() with the suffix _FAST.
Use this in all the places where sleeping with the lock held is not
an issue.

The distinction will become significant once we finalize the exact
lock-type to use for this kind of case.
2004-11-13 11:53:02 +00:00
trhodes
37a1084115 Remove stale comment after previous commit.
Noticed by:	pjd
2004-11-09 23:19:21 +00:00
phk
5eae02ee76 Detect root mount attempts on the flag, not on the NULL path. 2004-11-09 22:21:52 +00:00
phk
37ad4f1923 Refuse attempts to mount root filesystem 2004-11-09 22:21:10 +00:00
phk
921930c585 Refuse attemps to mount root filesystem 2004-11-09 22:14:57 +00:00
phk
63cd9549c7 Add optional device vnode bypass to DEVFS.
The tunable vfs.devfs.fops controls this feature and defaults to off.

When enabled (vfs.devfs.fops=1 in loader), device vnodes opened
through a filedescriptor gets a special fops vector which instead
of the detour through the vnode layer goes directly to DEVFS.

Amongst other things this allows us to run Giant free read/write to
device drivers which have been weaned off D_NEEDGIANT.

Currently this means /dev/null, /dev/zero, disks, (and maybe the
random stuff ?)

On a 700MHz K7 machine this doubles the speed of
	dd if=/dev/zero of=/dev/null bs=1 count=1000000

This roughly translates to shaving 2usec of each read/write syscall.

The poll/kqfilter paths need more work before they are giant free,
this work is ongoing in p4::phk_bufwork

Please test this and report any problems, LORs etc.
2004-11-08 10:46:47 +00:00
phk
723cc1105c Properly implement a default version of VOP_GETWRITEMOUNT.
Remove improper access to vop_stdgetwritemount() which should and
will instead rely on the VOP default path.
2004-11-06 11:41:22 +00:00
phk
31149e65e2 Add back securelevel check for disks.
XXX: This should live in geom_dev.c but we don't have access to the
cred there.
XXX: XXX:  This may not matter anymore since filesystems use geom_vfs.
2004-11-04 09:17:55 +00:00
phk
d002107070 s/ffs/ntfs/
Fix error handling to not use VOP_CLOSE() on the disk.

Spotted by:	tegge
2004-11-04 07:18:54 +00:00
phk
247b99a094 Make a more whole-hearted attempt at GEOM'ifying NTFS.
I must have been sleepy when I did the first pass.

Spotted by:	tegge
2004-11-03 21:36:41 +00:00
phk
8fa3fd0acf Don't give disks special treatment, they don't come this way anymore. 2004-10-29 11:10:55 +00:00
phk
861b10d6de Remove VOP_SPECSTRATEGY() from the system. 2004-10-29 10:59:28 +00:00
phk
ff969023bb Move NTFS to GEOM backing instead of DEVFS.
For details, please see src/sys/ufs/ffs/ffs_vfsops.c 1.250.
2004-10-29 10:43:45 +00:00
phk
2b08c63135 Move HPFS to GEOM backing instead of DEVFS.
For details, please see src/sys/ufs/ffs/ffs_vfsops.c 1.250.
2004-10-29 10:43:07 +00:00
phk
e172d33222 Move CD9660 to GEOM backing instead of DEVFS.
For details, please see src/sys/ufs/ffs/ffs_vfsops.c 1.250.
2004-10-29 10:41:44 +00:00
phk
6dbcd5fd09 Move UDF to GEOM backing instead of DEVFS.
For details, please see src/sys/ufs/ffs/ffs_vfsops.c 1.250.
2004-10-29 10:40:58 +00:00
phk
a4581c6788 Move MSDOSFS to GEOM backing instead of DEVFS.
For details, please see src/sys/ufs/ffs/ffs_vfsops.c 1.250.
2004-10-29 10:40:14 +00:00
phk
86cc21c765 Give dev_strategy() an explict cdev argument in preparation for removing
buf->b-dev.

Put a bio between the buf passed to dev_strategy() and the device driver
strategy routine in order to not clobber fields in the buf.

Assert copyright on vfs_bio.c and update copyright message to canonical
text.  There is no legal difference between John Dysons two-clause
abbreviated BSD license and the canonical text.
2004-10-29 07:16:37 +00:00
phk
5a159b2d78 Reduce the locking activity by epsilon by checking VNON condition before
releasing the mountlock.
2004-10-28 08:22:11 +00:00
phk
6b2d7f7134 What can I say: don't allow people to mount DEVFS with option "nodev". 2004-10-28 06:03:25 +00:00
phk
f3795e5212 Eliminate unnecessary KASSERTs.
Don't use bp->b_vp in VOP_STRATEGY: the vnode is passed in as an argument.
2004-10-27 06:48:21 +00:00
phk
c66aa10c8e Put the I/O block size in bufobj->bo_bsize.
We keep si_bsize_phys around for now as that is the simplest way to pull
the number out of disk device drivers in devfs_open().  The correct solution
would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth
when filesystems sit on GEOM, so don't bother for now.
2004-10-26 07:39:12 +00:00
phk
0e87ab8bc6 Loose the v_dirty* and v_clean* alias macros.
Check the count field where we just want to know the full/empty state,
rather than using TAILQ_EMPTY() or TAILQ_FIRST().
2004-10-25 09:14:03 +00:00
phk
2c3e47b668 Alas, poor SPECFS! -- I knew him, Horatio; A filesystem of infinite
jest, of most excellent fancy: he hath taught me lessons a thousand
times; and now, how abhorred in my imagination it is! my gorge rises
at it.  Here were those hacks that I have curs'd I know not how
oft.  Where be your kludges now? your workarounds? your layering
violations, that were wont to set the table on a roar?

Move the skeleton of specfs into devfs where it now belongs and
bury the rest.
2004-10-22 09:59:37 +00:00
jhb
ce2d3f89af Rework how we store process times in the kernel such that we always store
the raw values including for child process statistics and only compute the
system and user timevals on demand.

- Fix the various kern_wait() syscall wrappers to only pass in a rusage
  pointer if they are going to use the result.
- Add a kern_getrusage() function for the ABI syscalls to use so that they
  don't have to play stackgap games to call getrusage().
- Fix the svr4_sys_times() syscall to just call calcru() to calculate the
  times it needs rather than calling getrusage() twice with associated
  stackgap, etc.
- Add a new rusage_ext structure to store raw time stats such as tick counts
  for user, system, and interrupt time as well as a bintime of the total
  runtime.  A new p_rux field in struct proc replaces the same inline fields
  from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime).  A new p_crux
  field in struct proc contains the "raw" child time usage statistics.
  ruadd() has been changed to handle adding the associated rusage_ext
  structures as well as the values in rusage.  Effectively, the values in
  rusage_ext replace the ru_utime and ru_stime values in struct rusage.  These
  two fields in struct rusage are no longer used in the kernel.
- calcru() has been split into a static worker function calcru1() that
  calculates appropriate timevals for user and system time as well as updating
  the rux_[isu]u fields of a passed in rusage_ext structure.  calcru() uses a
  copy of the process' p_rux structure to compute the timevals after updating
  the runtime appropriately if any of the threads in that process are
  currently executing.  It also now only locks sched_lock internally while
  doing the rux_runtime fixup.  calcru() now only requires the caller to
  hold the proc lock and calcru1() only requires the proc lock internally.
  calcru() also no longer allows callers to ask for an interrupt timeval
  since none of them actually did.
- calcru() now correctly handles threads executing on other CPUs.
- A new calccru() function computes the child system and user timevals by
  calling calcru1() on p_crux.  Note that this means that any code that wants
  child times must now call this function rather than reading from p_cru
  directly.  This function also requires the proc lock.
- This finishes the locking for rusage and friends so some of the Giant locks
  in exit1() and kern_wait() are now gone.
- The locking in ttyinfo() has been tweaked so that a shared lock of the
  proctree lock is used to protect the process group rather than the process
  group lock.  By holding this lock until the end of the function we now
  ensure that the process/thread that we pick to dump info about will no
  longer vanish while we are trying to output its info to the console.

Submitted by:	bde (mostly)
MFC after:	1 month
2004-10-05 18:51:11 +00:00
takawata
c3db025caa Minor Bug fix. Some file was not translated. 2004-10-05 16:53:37 +00:00
takawata
58f5d3f216 Fix unionfs problems when a directory is mounted on other directory
with different file systems. This may cause ill things
with my previous fix. Now it translate fsid of direct child of
mount point directory only.

Pointed out by: Uwe Doering
2004-10-05 05:59:29 +00:00
takawata
2e6eb5bbd1 Fix a problem when you try to mount a directory on another directory
belongs to the same filesystem. In this problem, getcwd(3) will fail.

I found the problem two years ago and I have forgotten to merge.

http://docs.FreeBSD.org/cgi/mid.cgi?200202251435.XAA91094
2004-10-02 17:17:04 +00:00
das
9d2cf40e63 Don't PHOLD() the target process in procfs, since this is already done
in pseudofs.  Moreover, PHOLD() may block between the p_candebug()
access check and the actual operation.
2004-10-01 05:01:17 +00:00
phk
8d623dca9a XXX mark two places where we do not hold a threadcount on the dev when
frobbing the cdevsw.

In both cases we examine only the cdevsw and it is a good question if we
weren't better off copying those properties into the cdev in the first
place.  This question will be revisited.
2004-09-24 08:32:36 +00:00
phk
9f34214d76 Hold proper thread count while frobbing drivers ioctl. 2004-09-24 07:24:02 +00:00
phk
01a7a17373 Remove devsw() call missed in last commit. 2004-09-24 07:08:33 +00:00
phk
f76f5a867a Use def_re[fl]thread().
Retire various old compatibility helpers.
2004-09-24 05:58:06 +00:00
phk
1d992e18ec Eliminate DEV_STRATEGY() macro: call dev_strategy() directly.
Make dev_strategy() handle errors and departing devices properly.
2004-09-23 14:45:04 +00:00
phk
a4f5e76f74 Do not use devsw() but si_devsw direction. This is still bogus but a
fair bit less so.
2004-09-23 12:19:24 +00:00
phk
3947e54e89 Do not refcount the cdevsw, but rather maintain a cdev->si_threadcount
of the number of threads which are inside whatever is behind the
cdevsw for this particular cdev.

Make the device mutex visible through dev_lock() and dev_unlock().
We may want finer granularity later.

Replace spechash_mtx use with dev_lock()/dev_unlock().
2004-09-23 07:17:41 +00:00
phk
eb3be2c541 Pointy hat please!
Refuse VCHR not VREG.
2004-09-22 18:18:26 +00:00
phk
d905f27bbb De support opening device nodes on CD9660 filesystems. They are
still visible, they can still be seen, but they cannot be opened.
Use DEVFS for that.
2004-09-21 08:42:37 +00:00
phk
73cf913d5f The getpages VOP was a good stab at getting scatter/gather I/O without
too much kernel copying, but it is not the right way to do it, and it is
in the way for straightening out the buffer cache.

The right way is to pass the VM page array down through the struct
bio to the disk device driver and DMA directly in to/out off the
physical memory.  Once the VM/buf thing is sorted out it is next on
the list.

Retire most of vnode method. ffs_getpages().  It is not clear if what is
left shouldn't be in the default implementation which we now fall back to.

Retire specfs_getpages() as well, as it has no users now.
2004-09-19 08:14:55 +00:00
phk
02df7323ee Remove unused B_WRITEINPROG flag 2004-09-15 21:49:22 +00:00
phk
2806321da1 Remove the buffercache/vnode side of BIO_DELETE processing in
preparation for integration of p4::phk_bufwork.  In the future,
local filesystems will talk to GEOM directly and they will consequently
be able to issue BIO_DELETE directly.  Since the removal of the fla
driver, BIO_DELETE has effectively been a no-op anyway.
2004-09-13 06:50:42 +00:00
tjr
149e5a04f7 Reduce the size of struct defid's defid_dirclust, defid_dirofs and
(disabled) defid_gen members from u_long to u_int32_t so that alignment
requirements don't cause the structure to become larger than struct fid
on LP64 platforms. This fixes NFS exports of msdos filesystems on at
least amd64.

PR:		71173
2004-09-08 13:03:19 +00:00
tjr
eba4907838 Merge from NetBSD:
Fix a problem in previous: we can't blindly assume that we have
wincnt entries available at the offset the file has been found. If the dos
directory entry is not preceded by appropriate number of long name
entries (happens e.g. when the filesystem is corrupted, or when
the filename complies to DOS rules and doesn't use any long name entry),
we would overwrite random directory entries.

There are still some problems, the whole thing has to be revisited and solved
right.

Submitted by:	Xin LI
2004-09-08 11:25:41 +00:00
tjr
1d08539433 Merge from NetBSD:
Fix a panic that occurred when trying to traverse a corrupt msdosfs
filesystem.  With this particular corruption, the code in pcbmap()
would compute an offset into an array that was way out of bounds,
so check the bounds before trying to access and return an error if
the offset would be out of bounds.

Submitted by:	Xin LI
2004-09-08 10:57:09 +00:00
phk
1912367ebb Create simple function init_va_filerev() for initializing a va_filerev
field.

Replace three instances of longhaired initialization va_filerev fields.

Added XXX comment wondering why we don't use random bits instead of
uptime of the system for this purpose.
2004-09-07 09:17:05 +00:00
phk
766dd89a4b Explicitly pass vnode to smbfs_doio() function. 2004-09-07 08:53:28 +00:00
phk
6dd3840f3a Explicitly pass the vnode to the nw_doio() function. 2004-09-07 08:53:03 +00:00
tjr
39cb4ddfb9 Temporarily back out revision 1.77. This changed cd9660_getattr() and
cd9660_readdir() to return the address of the file's first data block as
the inode number instead of the address of the directory entry, but
neglected to update cd9660_vget_internal() for the new inode numbering
scheme.

Since the NFS server calls VFS_VGET (cd9660_vget()) with inode numbers
returned through VOP_READDIR (cd9660_readdir()) when servicing a READDIRPLUS
request, these two interfaces must agree on the numbering scheme; failure to
do so caused panics and/or bogus information about the entries to be returned
to clients using READDIRPLUS (Solaris, FreeBSD w/ mount -o rdirplus).

PR:		63446
2004-09-05 11:18:53 +00:00
rwatson
7634ea5ca7 Back out pseudo_vnops.c:1.45, which was a workaround for pfind()
returning incompletely initialized processes.  This problem was
eliminated by kern_proc.c:1.215, which causes pfind() not to
return processes in the PRS_NEW state.
2004-09-02 16:04:09 +00:00
brooks
eeddbfb0fa General modernization of coda:
- Ditch NVCODA
 - Don't use a static major
 - Don't declare functions extern

Reviewed by:	peter
2004-09-01 01:19:52 +00:00
peter
1d9abdbe78 Kill count device support from config. I've changed the last few
remaining consumers to have the count passed as an option.  This is
i4b, pc98/wdc, and coda.

Bump configvers.h from 500013 to 600000.

Remove heuristics that tried to parse "device ed5" as 5 units of the ed
device.  This broke things like the snd_emu10k1 device, which required
quotes to make it parse right.  The no-longer-needed quotes have been
removed from NOTES, GENERIC etc.  eg, I've removed the quotes from:
   device  snd_maestro
   device  "snd_maestro3"
   device  snd_mss

I believe everything will still compile and work after this.
2004-08-30 23:03:58 +00:00
tjr
aabb7d1fb4 Remove bogus vrele() call added in previous. 2004-08-27 11:24:31 +00:00
tjr
7845779267 Improve the robustness of MSDOSFSMNT_KICONV handling:
- Use copyinstr() to read cs_win, cs_dos, cs_local strings from the
  mount argument structure instead of reading through user-space pointers(!).
- When mounting a filesystem, or updating an existing mount, only try to
  update the iconv handles from the information in the mount argument
  structure if the structure itself has the MSDOSFSMNT_KICONV flag set.
- Attempt to handle failure of update_mp() in the MNT_UPDATE case.
2004-08-26 13:16:44 +00:00
des
6a9d71f01c Release the vnode cache mutex when calling vgone(), since vgone() may
sleep.  This makes pfs_exit() even less efficient than before, but on
the bright side, the vnode cache mutex no longer needs to be recursive.
2004-08-15 21:58:02 +00:00
jmg
bc1805c6e8 Add locking to the kqueue subsystem. This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers.  Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks.  Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by:	green, rwatson (both earlier versions)
2004-08-15 06:24:42 +00:00
rwatson
6994ab16ad Commit a work-around for a more general bug involving process state:
check whether p_ucred is NULL or not in pfs_getattr() before
dereferencing the credential, and return ENOENT if there wasn't one.

This is a symptom of a larger problem, wherein pfind() can return
references to incompletely initialized processes, and we instead ought
to not return them, or check the process state before acting on the
process.

Reported by:	kris
Discussed with:	tjr, others
2004-08-13 20:27:56 +00:00
phk
db95f8ec86 use bufdone() not biodone(). 2004-08-08 13:23:05 +00:00
phk
134a515cd2 Use bufdone(), not biodone(). 2004-08-08 13:20:43 +00:00
phk
aa6ba3c9dd Push all changes to disk before downgrading a mount from rw to ro. 2004-08-07 22:05:12 +00:00
phk
2d868d02cf Put a version element in the VFS filesystem configuration structure
and refuse initializing filesystems with a wrong version.  This will
aid maintenance activites on the 5-stable branch.

s/vfs_mount/vfs_omount/

s/vfs_nmount/vfs_mount/

Name our filesystems mount function consistently.

Eliminate the namiedata argument to both vfs_mount and vfs_omount.
It was originally there to save stack space.  A few places abused
it to get hold of some credentials to pass around.  Effectively
it is unused.

Reorganize the root filesystem selection code.
2004-07-30 22:08:52 +00:00
phk
075684f5fd Remove global variable rootdevs and rootvp, they are unused as such.
Add local rootvp variables as needed.

Remove checks for miniroot's in the swappartition.  We never did that
and most of the filesystems could never be used for that, but it had
still been copy&pasted all over the place.
2004-07-28 20:21:04 +00:00
kan
65947d062b Avoid casts as lvalues. 2004-07-28 06:30:43 +00:00
kan
cd2bbc3fed Avoid casts as lvalues. 2004-07-28 06:05:41 +00:00
cperciva
d9fecc83c8 Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with:	rwatson, scottl
Requested by:	jhb
2004-07-26 07:24:04 +00:00
rwatson
01be595ab3 In devfs_allocv(), rather than assigning 'td = curthread', assert that
the caller passes in a td that is curthread, and consistently pass 'td'
into vget().  Remove some bogus logic that passed in td or curthread
conditional on td being non-NULL, which seems redundant in the face of
the earlier assignment of td to curthread if td is NULL.

In devfs_symlink(), cache the passed thread in 'td' so we don't have
to keep retrieving it from the 'ap' structure, and assert that td is
curthread (since we dereference it to get thread-local td_ucred).  Use
'td' in preference to curthread for later lockmgr calls, since they are
equal.
2004-07-22 17:03:14 +00:00
phk
5c95d686a1 Do a pass over all modules in the kernel and make them return EOPNOTSUPP
for unknown events.

A number of modules return EINVAL in this instance, and I have left
those alone for now and instead taught MOD_QUIESCE to accept this
as "didn't do anything".
2004-07-15 08:26:07 +00:00
phk
14378802af Another LINT compilation fix 2004-07-13 09:47:27 +00:00
phk
d36b28659f Make LINT compile 2004-07-13 09:46:46 +00:00
rwatson
2fbca9279f Remove 'td = curthread' that shadows the arguments to coda_root().
Missed by:	alfred
2004-07-12 14:11:26 +00:00
alfred
8a1713aada Make VFS_ROOT() and vflush() take a thread argument.
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.
2004-07-12 08:14:09 +00:00
marcel
0d98473ef9 Update for the KDB framework:
o  Call kdb_enter() instead of Debugger().
2004-07-10 21:21:13 +00:00
marcel
32de0087b0 Update for the KDB framework:
o  Call kdb_enter() instead of Debugger().
o  Make debugging code conditional upon KDB instead of DDB.
2004-07-10 21:20:11 +00:00
des
881a348b52 Accumulate directory entries in a fixed-length sbuf, and uiomove them in
one go before returning.  This avoids calling uiomove() while holding
allproc_lock.

Don't adjust uio->uio_offset manually, uiomove() does that for us.

Don't drop allproc_lock before calling panic().

Suggested by:	alfred
2004-07-09 11:43:37 +00:00
phk
070a613a48 When we traverse the vnodes on a mountpoint we need to look out for
our cached 'next vnode' being removed from this mountpoint.  If we
find that it was recycled, we restart our traversal from the start
of the list.

Code to do that is in all local disk filesystems (and a few other
places) and looks roughly like this:

		MNT_ILOCK(mp);
	loop:
		for (vp = TAILQ_FIRST(&mp...);
		    (vp = nvp) != NULL;
		    nvp = TAILQ_NEXT(vp,...)) {
			if (vp->v_mount != mp)
				goto loop;
			MNT_IUNLOCK(mp);
			...
			MNT_ILOCK(mp);
		}
		MNT_IUNLOCK(mp);

The code which takes vnodes off a mountpoint looks like this:

	MNT_ILOCK(vp->v_mount);
	...
	TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes);
	...
	MNT_IUNLOCK(vp->v_mount);
	...
	vp->v_mount = something;

(Take a moment and try to spot the locking error before you read on.)

On a SMP system, one CPU could have removed nvp from our mountlist
but not yet gotten to assign a new value to vp->v_mount while another
CPU simultaneously get to the top of the traversal loop where it
finds that (vp->v_mount != mp) is not true despite the fact that
the vnode has indeed been removed from our mountpoint.

Fix:

Introduce the macro MNT_VNODE_FOREACH() to traverse the list of
vnodes on a mountpoint while taking into account that vnodes may
be removed from the list as we go.  This saves approx 65 lines of
duplicated code.

Split the insmntque() which potentially moves a vnode from one mount
point to another into delmntque() and insmntque() which does just
what the names say.

Fix delmntque() to set vp->v_mount to NULL while holding the
mountpoint lock.
2004-07-04 08:52:35 +00:00
phk
d39ece62c7 Remove "register" keyword and trailing white space. 2004-07-03 16:56:45 +00:00
tjr
ab16560f33 By popular request, add a workaround that allows large (>128GB or so)
FAT32 filesystems to be mounted, subject to some fairly serious limitations.

This works by extending the internal pseudo-inode-numbers generated from
the file's starting cluster number to 64-bits, then creating a table
mapping these into arbitrary 32-bit inode numbers, which can fit in
struct dirent's d_fileno and struct vattr's va_fileid fields. The mappings
do not persist across unmounts or reboots, so it's not possible to export
these filesystems through NFS. The mapping table may grow to be rather
large, and may grow large enough to exhaust kernel memory on filesystems
with millions of files.

Don't enable this option unless you understand the consequences.
2004-07-03 13:22:38 +00:00
rwatson
f22a8169c3 Remove spls from portal_open(). Acquire socket lock while sleeping
waiting for the socket to connect and use msleep() on the socket
mute rather than tsleep().  Acquire socket buffer mutexes around
read-modify-write of socket buffer flags.
2004-06-24 00:47:23 +00:00
scottl
51304a50f3 Make the udf_vnops side endian clean. 2004-06-23 21:49:03 +00:00
scottl
933faf5c3e First half of making UDF be endian-clean. This addresses the vfsops side. 2004-06-23 19:36:09 +00:00
bde
663370f941 Include <sys/mutex.h> and its prerequisite <sys/lock.h> instead of
depending on namespace pollution in <sys/vnode.h> for the definition
of mutex interfaces used in SOCKBUF_*LOCK().

Sorted includes.

Removed unused includes.
2004-06-23 06:47:49 +00:00
rwatson
083bcb28d6 Remove unlocked read annotation for sbspace(); the read is locked. 2004-06-23 00:35:50 +00:00
phk
607546ee37 Reduce a fair bit of the atomics because we are now called with a
lock from kern_conf.c and cdev's act a lot more like real objects
these days.
2004-06-18 08:08:47 +00:00
rwatson
d87fad9f08 Merge some additional leaf node socket buffer locking from
rwatson_netperf:

Introduce conditional locking of the socket buffer in fifofs kqueue
filters; KNOTE() will be called holding the socket buffer locks in
fifofs, but sometimes the kqueue() system call will poll using the
same entry point without holding the socket buffer lock.

Introduce conditional locking of the socket buffer in the socket
kqueue filters; KNOTE() will be called holding the socket buffer
locks in the socket code, but sometimes the kqueue() system call
will poll using the same entry points without holding the socket
buffer lock.

Simplify the logic in sodisconnect() since we no longer need spls.

NOTE: To remove conditional locking in the kqueue filters, it would
make sense to use a separate kqueue API entry into the socket/fifo
code when calling from the kqueue() system call.
2004-06-18 02:57:55 +00:00
rwatson
855c4bb01f Merge additional socket buffer locking from rwatson_netperf:
- Lock down low hanging fruit use of sb_flags with socket buffer
  lock.

- Lock down low hanging fruit use of so_state with socket lock.

- Lock down low hanging fruit use of so_options.

- Lock down low-hanging fruit use of sb_lowwat and sb_hiwat with
  socket buffer lock.

- Annotate situations in which we unlock the socket lock and then
  grab the receive socket buffer lock, which are currently actually
  the same lock.  Depending on how we want to play our cards, we
  may want to coallesce these lock uses to reduce overhead.

- Convert a if()->panic() into a KASSERT relating to so_state in
  soaccept().

- Remove a number of splnet()/splx() references.

More complex merging of socket and socket buffer locking to
follow.
2004-06-17 22:48:11 +00:00
phk
40dd98a3bd Second half of the dev_t cleanup.
The big lines are:
	NODEV -> NULL
	NOUDEV -> NODEV
	udev_t -> dev_t
	udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.
2004-06-17 17:16:53 +00:00
phk
dfd1f7fd50 Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.
2004-06-16 09:47:26 +00:00
julian
6c9d81ae0d Nice, is a property of a process as a whole..
I mistakenly moved it to the ksegroup when breaking up the process
structure. Put it back in the proc structure.
2004-06-16 00:26:31 +00:00
rwatson
029226f3a8 Grab the socket buffer send or receive mutex when performing a
read-modify-write on the sb_state field.  This commit catches only
the "easy" ones where it doesn't interact with as yet unmerged
locking.
2004-06-15 03:51:44 +00:00
rwatson
f2c0db1521 The socket field so_state is used to hold a variety of socket related
flags relating to several aspects of socket functionality.  This change
breaks out several bits relating to send and receive operation into a
new per-socket buffer field, sb_state, in order to facilitate locking.
This is required because, in order to provide more granular locking of
sockets, different state fields have different locking properties.  The
following fields are moved to sb_state:

  SS_CANTRCVMORE            (so_state)
  SS_CANTSENDMORE           (so_state)
  SS_RCVATMARK              (so_state)

Rename respectively to:

  SBS_CANTRCVMORE           (so_rcv.sb_state)
  SBS_CANTSENDMORE          (so_snd.sb_state)
  SBS_RCVATMARK             (so_rcv.sb_state)

This facilitates locking by isolating fields to be located with other
identically locked fields, and permits greater granularity in socket
locking by avoiding storing fields with different locking semantics in
the same short (avoiding locking conflicts).  In the future, we may
wish to coallesce sb_state and sb_flags; for the time being I leave
them separate and there is no additional memory overhead due to the
packing/alignment of shorts in the socket buffer structure.
2004-06-14 18:16:22 +00:00
truckman
d503c79cad Add MSG_NBIO flag option to soreceive() and sosend() that causes
them to behave the same as if the SS_NBIO socket flag had been set
for this call.  The SS_NBIO flag for ordinary sockets is set by
fcntl(fd, F_SETFL, O_NONBLOCK).

Pass the MSG_NBIO flag to the soreceive() and sosend() calls in
fifo_read() and fifo_write() instead of frobbing the SS_NBIO flag
on the underlying socket for each I/O operation.  The O_NONBLOCK
flag is a property of the descriptor, and unlike ordinary sockets,
fifos may be referenced by multiple descriptors.
2004-06-01 01:18:51 +00:00
phk
f43aa0c4bc add missing #include <sys/module.h> 2004-05-30 20:27:19 +00:00
truckman
6174e9d812 Switch from using the vnode interlock to a private mutex in fifo_open()
to avoid lock order problems when manipulating the sockets associated
with the fifo.

Minor optimization of a couple of calls to fifo_cleanup() from
fifo_open().
2004-05-17 20:16:40 +00:00
alc
b57e5e03fd Make vm_page's PG_ZERO flag immutable between the time of the page's
allocation and deallocation.  This flag's principal use is shortly after
allocation.  For such cases, clearing the flag is pointless.  The only
unusual use of PG_ZERO is in vfs_bio_clrbuf().  However, allocbuf() never
requests a prezeroed page.  So, vfs_bio_clrbuf() never sees a prezeroed
page.

Reviewed by:	tegge@
2004-05-06 05:03:23 +00:00
phk
200ffbe56d Do not drop Giant around the poll method yet, we're not ready for it. 2004-04-12 21:52:52 +00:00
imp
b49b7fe799 Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson
2004-04-07 20:46:16 +00:00