Commit Graph

431 Commits

Author SHA1 Message Date
alfred
217f9af8c7 Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb
2001-05-19 01:28:09 +00:00
iedowse
14dd931aa7 Change the second argument of vflush() to an integer that specifies
the number of references on the filesystem root vnode to be both
expected and released. Many filesystems hold an extra reference on
the filesystem root vnode, which must be accounted for when
determining if the filesystem is busy and then released if it isn't
busy. The old `skipvp' approach required individual filesystem
xxx_unmount functions to re-implement much of vflush()'s logic to
deal with the root vnode.

All 9 filesystems that hold an extra reference on the root vnode
got the logic wrong in the case of forced unmounts, so `umount -f'
would always fail if there were any extra root vnode references.
Fix this issue centrally in vflush(), now that we can.

This commit also fixes a vnode reference leak in devfs, which could
result in idle devfs filesystems that refuse to unmount.

Reviewed by:	phk, bp
2001-05-16 18:04:37 +00:00
markm
6ec52cf8be Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)
2001-05-01 08:13:21 +00:00
phk
4d26864fcf Add a vop_stdbmap(), and make it part of the default vop vector.
Make 7 filesystems which don't really know about VOP_BMAP rely
on the default vector, rather than more or less complete local
vop_nopbmap() implementations.
2001-04-29 11:48:41 +00:00
alfred
f4401cbc86 Remove incorrect comment.
Submitted by: quinot@inf.enst.fr <quinot@inf.enst.fr>
PR: kern/26893
2001-04-29 03:10:24 +00:00
grog
609bc7e870 Revert consequences of changes to mount.h, part 2.
Requested by:	bde
2001-04-29 02:45:39 +00:00
grog
405d532596 Correct #includes to work with fixed sys/mount.h. 2001-04-23 09:05:15 +00:00
alfred
8d55d7e4d2 vnode_pager_freepage() is really vm_page_free() in disguise,
nuke vnode_pager_freepage() and replace all calls to it with vm_page_free()
2001-04-19 06:18:23 +00:00
alfred
7cae9abd49 Implement client side NFS locks.
Obtained from: BSD/os
Import Ok'd by: mckusick, jkh, motd on builder.freebsd.org
2001-04-17 20:45:23 +00:00
phk
2f202ddc89 This patch removes the VOP_BWRITE() vector.
VOP_BWRITE() was a hack which made it possible for NFS client
side to use struct buf with non-bio backing.

This patch takes a more general approach and adds a bp->b_op
vector where more methods can be added.

The success of this patch depends on bp->b_op being initialized
all relevant places for some value of "relevant" which is not
easy to determine.  For now the buffers have grown a b_magic
element which will make such issues a tiny bit easier to debug.
2001-04-17 08:56:39 +00:00
peter
ab22fa6efa Create debug.hashstat.[raw]nchash and debug.hashstat.[raw]nfsnode to
enable easy access to the hash chain stats.  The raw prefixed versions
dump an integer array to userland with the chain lengths.  This cheats
and calls it an array of 'struct int' rather than 'int' or sysctl -a
faithfully dumps out the 128K array on an average machine.  The non-raw
versions return 4 integers: count, number of chains used, maximum chain
length, and percentage utilization (fixed point, multiplied by 100).
The raw forms are more useful for analyzing the hash distribution, while
the other form can be read easily by humans and stats loggers.
2001-04-11 00:39:20 +00:00
rwatson
e16f090e57 o Rather than arbitrarily construct a credential in the nfs_statfs()
VFS operation, make use of the calling process's credential.  This
  solution may not be ideal (there are a number of other possible
  proposals, including making use of the proc0 credential, adding a
  credential argument to the VFSOP, and switching from a hard-coded
  ucred to a hard-coded nfscred), it is simple and appears to
  work.  The arguments against using simply crget() are fairly
  strong: it is the only place in the code (other than a nearly
  identical invocation in ncp) where crget() is invoked, other than
  in the process credential creation code; as ucred becomes extensible,
  this use of crget() without appropriate context results in less and
  less meaningful credential data.  The implementation here will
  probably be tweaked as a result of experimentation and further
  exploration of the requirements.  In the mean-time, it allows
  progress to be made in ucred expansion for new security models without
  causing a crash every time df is used on an NFS mounted file system.

  This code has been interop tested against FreeBSD and Solaris NFS
  servers.  While using the process credentials should not introduce
  interop problems, please let me know if any turn out to exist.

Reviewed by:	freebsd-arch
2001-04-05 06:12:38 +00:00
peter
88fd44c4fa Use the same API as the example code.
Allow the initial hash value to be passed in, as the examples do.
Incrementally hash in the dvp->v_id (using the official api) rather than
add it.  This seems to help power-of-two predictable filename trees
where the filenames repeat on a power-of-two cycle and the directory trees
have power-of-two components in it.  The simple add then mask was causing
things like 12000+ entry collision chains while most other entries have
between 0 and 3 entries each.  This way seems to improve things.
2001-03-20 02:10:18 +00:00
peter
156ac8e9c7 Use a generic implementation of the Fowler/Noll/Vo hash (FNV hash).
Make the name cache hash as well as the nfsnode hash use it.

As a special tweak, create an unsigned version of register_t.  This allows
us to use a special tweak for the 64 bit versions that significantly
speeds up the i386 version (ie: int64 XOR int64 is slower than int64
XOR int32).

The code layout is a little strange for the string function, but I was
able to get between 5 to 10% improvement over the original version I
started with. The layout affects gcc code generation choices and this way
was fastest on x86 and alpha.

Note that 'CPUTYPE=p3' etc makes a fair difference to this.  It is
around 45% faster with -march=pentiumpro on a p6 cpu.
2001-03-17 09:31:06 +00:00
peter
3a6e7a5b69 Dramatically improve the **lame** nfs_hash(). This is based on the
Fowler / Noll / Vo Hash (http://www.isthe.com/chongo/tech/comp/fnv/).

This improves hash coverage a *massive* amount.  We were seeing one
set of machines that were using 0.84% of their 131072 entry nfsnode
hash buckets with maximum chain lengths of up to ~500 entries.  The
machine was spending nearly 100% of its time in 'system'.
A test with this has pushed the coverage from a few perCent up to 91%
utilization with a max chain length of 11.

Submitted by:  David Filo
2001-03-17 05:43:01 +00:00
jhb
44b0453b59 Grab the process lock while calling psignal and before calling psignal. 2001-03-07 03:37:06 +00:00
adrian
8650fa6cdb Reviewed by: jlemon
An initial tidyup of the mount() syscall and VFS mount code.

This code replaces the earlier work done by jlemon in an attempt to
make linux_mount() work.

* the guts of the mount work has been moved into vfs_mount().

* move `type', `path' and `flags' from being userland variables into being
  kernel variables in vfs_mount(). `data' remains a pointer into
  userspace.

* Attempt to verify the `type' and `path' strings passed to vfs_mount()
  aren't too long.

* rework mount() and linux_mount() to take the userland parameters
  (besides data, as mentioned) and pass kernel variables to vfs_mount().
  (linux_mount() already did this, I've just tidied it up a little more.)

* remove the copyin*() stuff for `path'. `data' still requires copyin*()
  since its a pointer into userland.

* set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each
  filesystem.  This variable is generally initialised with `path', and
  each filesystem can override it if they want to.

* NOTE: f_mntonname is intiailised with "/" in the case of a root mount.
2001-03-01 21:00:17 +00:00
dillon
de3361f143 Fix lockup for loopback NFS mounts. The pipelined I/O limitations could be
hit on the client side and prevent the server side from retiring writes.
Pipeline operations turned off for all READs (no big loss since reads are
usually synchronous) and for NFS writes, and left on for the default bwrite().
(MFC expected prior to 4.3 freeze)

Testing by: mjacob, dillon
2001-02-28 04:13:11 +00:00
green
79abd4ce26 Switch to using a struct xucred instead of a struct xucred when not
actually in the kernel.  This structure is a different size than
what is currently in -CURRENT, but should hopefully be the last time
any application breakage is caused there.  As soon as any major
inconveniences are removed, the definition of the in-kernel struct
ucred should be conditionalized upon defined(_KERNEL).

This also changes struct export_args to remove dependency on the
constantly-changing struct ucred, as well as limiting the bounds
of the size fields to the correct size.  This means: a) mountd and
friends won't break all the time, b) mountd and friends won't crash
the kernel all the time if they don't know what they're doing wrt
actual struct export_args layout.

Reviewed by:	bde
2001-02-18 13:30:20 +00:00
tegge
37cb0f26bd Enable use of DHCP extensions.
Reviewed by:	Per Kristian Hove <Per.Hove@math.ntnu.no>
2001-02-02 02:35:40 +00:00
dillon
b454c48bc7 NFS O_EXCL file create semantics temporarily uses file attributes to store
the file verifier.  The NFS client is supposed to do a SETATTR after a
successful O_EXCL open/create to clean up the attributes.  FreeBSD's
client code was generating a SETATTR rpc but was not generating an access
or modification time update within that rpc, leaving the file with a
broken access time that solaris chokes on (and it doesn't look very
nice when you ls -lua under FreeBSD either!).    Fixed.
2001-01-04 22:45:19 +00:00
bmilekic
4c48d530e1 * Rename M_WAIT mbuf subsystem flag to M_TRYWAIT.
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
  forever when nothing is available for allocation, and may end up
  returning NULL. Hopefully we now communicate more of the right thing
  to developers and make it very clear that it's necessary to check whether
  calls with M_(TRY)WAIT also resulted in a failed allocation.
  M_TRYWAIT basically means "try harder, block if necessary, but don't
  necessarily wait forever." The time spent blocking is tunable with
  the kern.ipc.mbuf_wait sysctl.
  M_WAIT is now deprecated but still defined for the next little while.

* Fix a typo in a comment in mbuf.h

* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
  malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
  value of the M_WAIT flag, this could have became a big problem.
2000-12-21 21:44:31 +00:00
dwmalone
418e7e45d7 Convert more malloc+bzero to malloc+M_ZERO.
Submitted by:	josh@zipperup.org
Submitted by:	Robert Drehmel <robd@gmx.net>
2000-12-08 21:51:06 +00:00
phk
a4213fe59e Simplify the tprintf() API.
Loose the special <sys/tprintf.h> #include file.
2000-11-26 20:35:21 +00:00
dillon
1f9b705709 This patchset fixes a large number of file descriptor race conditions.
Pre-rfork code assumed inherent locking of a process's file descriptor
    array.  However, with the advent of rfork() the file descriptor table
    could be shared between processes.  This patch closes over a dozen
    serious race conditions related to one thread manipulating the table
    (e.g. closing or dup()ing a descriptor) while another is blocked in
    an open(), close(), fcntl(), read(), write(), etc...

PR: kern/11629
Discussed with: Alexander Viro <viro@math.psu.edu>
2000-11-18 21:01:04 +00:00
mckusick
4302bdce72 In preparation for deprecating CIRCLEQ macros in favor of TAILQ
macros which provide the same functionality and are a bit more
efficient, convert use of CIRCLEQ's in NFS to TAILQ's.
2000-11-14 08:00:39 +00:00
eivind
cdf6d89a60 Give vop_mmap an untimely death. The opportunity to give it a timely
death timed out in 1996.
2000-11-01 17:57:24 +00:00
phk
05fcfd3993 Remove unneeded #include <sys/proc.h> lines. 2000-10-29 13:57:19 +00:00
tegge
da17a89c52 Reduce kernel stack usage by not having large packets on the stack.
Supply correct size parameter to dhcpd.
Replace some magic numbers with macro names.
Handle more than one interface.
2000-10-29 01:19:32 +00:00
tegge
eb76b5ea48 Eliminate some bitrot (nonexisting member variable names).
Don't use curproc when a proc pointer is available.
2000-10-24 23:33:01 +00:00
tegge
e22e759f6f Style fixes. 2000-10-24 22:40:18 +00:00
tegge
a75932293a Make RPC timeout message more readable.
Supply proc pointer to sosend.
2000-10-24 22:37:55 +00:00
dwmalone
604e688433 Problem to avoid processes getting stuck in "vmopar". From Ian's
mail:

	The problem seems to originate with NFS's postop_attr
	information that is returned with a read or write RPC.
	Within a vm_fault context, the code cannot deal with
	vnode_pager_setsize() shrinking a vnode.

	The workaround in the patch below stops the nfsm_postop_attr()
	macro from ever shrinking a vnode. If the new size in the
	postop_attr information is smaller, then it just sets the
	nfsnode n_attrstamp to 0 to stop the wrong size getting
	used in the future. This change only affects postop_attr
	attributes; the nfsm_loadattr() macro works as normal.

	The change is implemented by adding a new argument to
	nfs_loadattrcache() called 'dontshrink'. When this is
	non-zero, nfs_loadattrcache() will never reduce the
	vnode/nfsnode size; instead it zeros n_attrstamp.

There remain other was processes can get stuck in vmopar.

Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
Reviewed by:	dillon
Tested by:	Vadim Belman <voland@lflat.org>
2000-10-24 10:13:36 +00:00
bp
495225ba4a Make nfs PDIRUNLOCK aware. Now it is possible to use nullfs mounts on top
of nfs mounts, but there can be side effects because nfs uses shared locks
for vnodes.
2000-10-15 08:06:32 +00:00
bp
47ff69730a Add missed vop_stdunlock() for fifo's vnops (this affects only v2 mounts).
Give nfs's node lock its own name.
2000-10-15 08:01:28 +00:00
jasone
46ca3ece23 Convert lockmgr locks from using simple locks to using mutexes.
Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.
2000-10-04 01:29:17 +00:00
bp
e5f935eef3 Add a lock structure to vnode structure. Previously it was either allocated
separately (nfs, cd9660 etc) or keept as a first element of structure
referenced by v_data pointer(ffs). Such organization leads to known problems
with stacked filesystems.

From this point vop_no*lock*() functions maintain only interlock lock.
vop_std*lock*() functions maintain built-in v_lock structure using lockmgr().
vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared
lock on vnode.

If filesystem wishes to export lockmgr compatible lock, it can put an address
of this lock to v_vnlock field. This indicates that the upper filesystem
can take advantage of it and use single lock structure for entire (or part)
of stack of vnodes. This field shouldn't be examined or modified by VFS code
except for initialization purposes.

Reviewed in general by:	mckusick
2000-09-25 15:24:04 +00:00
msmith
5d1d135a13 Don't scan for the "right" network interface by shooting in the dark.
Assume that the nfs_diskless structure is correctly set up; the provider
ought to be getting it right.
2000-09-05 22:29:36 +00:00
mckusick
29497f482d This patch corrects the first round of panics and hangs reported
with the new snapshot code.

Update addaliasu to correctly implement the semantics of the old
checkalias function. When a device vnode first comes into existence,
check to see if an anonymous vnode for the same device was created
at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than
creating a new vnode for the device. This corrects a problem which
caused the kernel to panic when taking a snapshot of the root
filesystem.

Change the calling convention of vn_write_suspend_wait() to be the
same as vn_start_write().

Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue when suspending filesystem
operations.

Access to buffers becomes recursive so that snapshots can recursively
traverse their indirect blocks using ffs_copyonwrite() when checking
for the need for copy on write when flushing one of their own indirect
blocks. This eliminates a deadlock between the syncer daemon and a
process taking a snapshot.

Ensure that softdep_process_worklist() can never block because of a
snapshot being taken. This eliminates a problem with buffer starvation.

Cleanup change in ffs_sync() which did not synchronously wait when
MNT_WAIT was specified. The result was an unclean filesystem panic
when doing forcible unmount with heavy filesystem I/O in progress.

Return a zero'ed block when reading a block that was not in use at
the time that a snapshot was taken. Normally, these blocks should
never be read. However, the readahead code will occationally read
them which can cause unexpected behavior.

Clean up the debugging code that ensures that no blocks be written
on a filesystem while it is suspended. Snapshots must explicitly
label the blocks that they are writing during the suspension so that
they do not cause a `write on suspended filesystem' panic.

Reorganize ffs_copyonwrite() to eliminate a deadlock and also to
prevent a race condition that would permit the same block to be
copied twice. This change eliminates an unexpected soft updates
inconsistency in fsck caused by the double allocation.

Use bqrelse rather than brelse for buffers that will be needed
soon again by the snapshot code. This improves snapshot performance.
2000-07-24 05:28:33 +00:00
ps
b32ea269fa Correctly set the Maximum DHCP Message Size. bootpd now works
again as well as ISC dhcpd.
2000-06-13 09:32:09 +00:00
jake
5e208b0c18 Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by:		msmith and others
2000-05-26 02:09:24 +00:00
jake
1d685644e0 Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by:	phk
Reviewed by:	phk
Approved by:	mdodd
2000-05-23 20:41:01 +00:00
phk
d7b9833b0a Include a RFC 1533 "Maximum DHCP Message Size" option in our request.
ISC DHCP will limit the reply length to 64 bytes for bootp replies
unless we explicitly tell it we can do more.  We tell it that we can do
1200 bytes.
2000-05-07 14:29:19 +00:00
phk
633deb3a69 Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by:    peter
2000-05-05 09:59:14 +00:00
phk
929ca23961 Remove unneeded #include <vm/vm_zone.h>
Generated by:	src/tools/tools/kerninclude
2000-04-30 18:52:11 +00:00
phk
5ba878c33f s/biowait/bufwait/g
Prodded by: several.
2000-04-29 16:25:22 +00:00
phk
43018e3fb6 Remove ~25 unneeded #include <sys/conf.h>
Remove ~60 unneeded #include <sys/malloc.h>
2000-04-19 14:58:28 +00:00
phk
fe2c3527f6 Complete the bio/buf divorce for all code below devfs::strategy
Exceptions:
        Vinum untouched.  This means that it cannot be compiled.
        Greg Lehey is on the case.

        CCD not converted yet, casts to struct buf (still safe)

        atapi-cd casts to struct buf to examine B_PHYS
2000-04-15 05:54:02 +00:00
phk
6746c7cf0d Move B_ERROR flag to b_ioflags and call it BIO_ERROR.
(Much of this done by script)

Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.

Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.

Add bio_queue field for struct bio aware disksort.

Address a lot of stylistic issues brought up by bde.
2000-04-02 15:24:56 +00:00
dillon
d9466ccd89 Add a sysctl to specify the amount of UDP receive space NFS should
reserve, in maximal NFS packets.  Originally only 2 packets worth of
    space was reserved.  The default is now 4, which appears to greatly
    improve performance for slow to mid-speed machines on gigabit networks.

    Add documentation and correct some prior documentation.

Problem Researched by: Andrew Gallatin <gallatin@cs.duke.edu>
Approved by: jkh
2000-03-27 21:38:35 +00:00