Commit Graph

463 Commits

Author SHA1 Message Date
dillon
7fdb864af7 NFS O_EXCL file create semantics temporarily uses file attributes to store
the file verifier.  The NFS client is supposed to do a SETATTR after a
successful O_EXCL open/create to clean up the attributes.  FreeBSD's
client code was generating a SETATTR rpc but was not generating an access
or modification time update within that rpc, leaving the file with a
broken access time that solaris chokes on (and it doesn't look very
nice when you ls -lua under FreeBSD either!).    Fixed.
2001-01-04 22:45:19 +00:00
bmilekic
4b6a7bddad * Rename M_WAIT mbuf subsystem flag to M_TRYWAIT.
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
  forever when nothing is available for allocation, and may end up
  returning NULL. Hopefully we now communicate more of the right thing
  to developers and make it very clear that it's necessary to check whether
  calls with M_(TRY)WAIT also resulted in a failed allocation.
  M_TRYWAIT basically means "try harder, block if necessary, but don't
  necessarily wait forever." The time spent blocking is tunable with
  the kern.ipc.mbuf_wait sysctl.
  M_WAIT is now deprecated but still defined for the next little while.

* Fix a typo in a comment in mbuf.h

* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
  malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
  value of the M_WAIT flag, this could have became a big problem.
2000-12-21 21:44:31 +00:00
dwmalone
dd75d1d73b Convert more malloc+bzero to malloc+M_ZERO.
Submitted by:	josh@zipperup.org
Submitted by:	Robert Drehmel <robd@gmx.net>
2000-12-08 21:51:06 +00:00
phk
7101ba5caa Simplify the tprintf() API.
Loose the special <sys/tprintf.h> #include file.
2000-11-26 20:35:21 +00:00
dillon
15a44d16ca This patchset fixes a large number of file descriptor race conditions.
Pre-rfork code assumed inherent locking of a process's file descriptor
    array.  However, with the advent of rfork() the file descriptor table
    could be shared between processes.  This patch closes over a dozen
    serious race conditions related to one thread manipulating the table
    (e.g. closing or dup()ing a descriptor) while another is blocked in
    an open(), close(), fcntl(), read(), write(), etc...

PR: kern/11629
Discussed with: Alexander Viro <viro@math.psu.edu>
2000-11-18 21:01:04 +00:00
mckusick
42ecbdff70 In preparation for deprecating CIRCLEQ macros in favor of TAILQ
macros which provide the same functionality and are a bit more
efficient, convert use of CIRCLEQ's in NFS to TAILQ's.
2000-11-14 08:00:39 +00:00
eivind
1afa7eea27 Give vop_mmap an untimely death. The opportunity to give it a timely
death timed out in 1996.
2000-11-01 17:57:24 +00:00
phk
94a5006c9a Remove unneeded #include <sys/proc.h> lines. 2000-10-29 13:57:19 +00:00
tegge
251bfbe494 Reduce kernel stack usage by not having large packets on the stack.
Supply correct size parameter to dhcpd.
Replace some magic numbers with macro names.
Handle more than one interface.
2000-10-29 01:19:32 +00:00
tegge
5ee55d1bc1 Eliminate some bitrot (nonexisting member variable names).
Don't use curproc when a proc pointer is available.
2000-10-24 23:33:01 +00:00
tegge
d3a3d81338 Style fixes. 2000-10-24 22:40:18 +00:00
tegge
46334203c1 Make RPC timeout message more readable.
Supply proc pointer to sosend.
2000-10-24 22:37:55 +00:00
dwmalone
23aacadb39 Problem to avoid processes getting stuck in "vmopar". From Ian's
mail:

	The problem seems to originate with NFS's postop_attr
	information that is returned with a read or write RPC.
	Within a vm_fault context, the code cannot deal with
	vnode_pager_setsize() shrinking a vnode.

	The workaround in the patch below stops the nfsm_postop_attr()
	macro from ever shrinking a vnode. If the new size in the
	postop_attr information is smaller, then it just sets the
	nfsnode n_attrstamp to 0 to stop the wrong size getting
	used in the future. This change only affects postop_attr
	attributes; the nfsm_loadattr() macro works as normal.

	The change is implemented by adding a new argument to
	nfs_loadattrcache() called 'dontshrink'. When this is
	non-zero, nfs_loadattrcache() will never reduce the
	vnode/nfsnode size; instead it zeros n_attrstamp.

There remain other was processes can get stuck in vmopar.

Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
Reviewed by:	dillon
Tested by:	Vadim Belman <voland@lflat.org>
2000-10-24 10:13:36 +00:00
bp
9c029c0ce5 Make nfs PDIRUNLOCK aware. Now it is possible to use nullfs mounts on top
of nfs mounts, but there can be side effects because nfs uses shared locks
for vnodes.
2000-10-15 08:06:32 +00:00
bp
86d96862d1 Add missed vop_stdunlock() for fifo's vnops (this affects only v2 mounts).
Give nfs's node lock its own name.
2000-10-15 08:01:28 +00:00
jasone
4e290e67b7 Convert lockmgr locks from using simple locks to using mutexes.
Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.
2000-10-04 01:29:17 +00:00
bp
6110b03d24 Add a lock structure to vnode structure. Previously it was either allocated
separately (nfs, cd9660 etc) or keept as a first element of structure
referenced by v_data pointer(ffs). Such organization leads to known problems
with stacked filesystems.

From this point vop_no*lock*() functions maintain only interlock lock.
vop_std*lock*() functions maintain built-in v_lock structure using lockmgr().
vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared
lock on vnode.

If filesystem wishes to export lockmgr compatible lock, it can put an address
of this lock to v_vnlock field. This indicates that the upper filesystem
can take advantage of it and use single lock structure for entire (or part)
of stack of vnodes. This field shouldn't be examined or modified by VFS code
except for initialization purposes.

Reviewed in general by:	mckusick
2000-09-25 15:24:04 +00:00
jasone
769e0f974d Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*().  See mutex(9).  (Note: The
  alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
  preempted (i386 only).

Partially contributed by:	BSDi (BSD/OS)
Submissions by (at least):	cp, dfr, dillon, grog, jake, jhb, sheldonh
2000-09-07 01:33:02 +00:00
msmith
7de3089b6a Don't scan for the "right" network interface by shooting in the dark.
Assume that the nfs_diskless structure is correctly set up; the provider
ought to be getting it right.
2000-09-05 22:29:36 +00:00
mckusick
acc66855bf This patch corrects the first round of panics and hangs reported
with the new snapshot code.

Update addaliasu to correctly implement the semantics of the old
checkalias function. When a device vnode first comes into existence,
check to see if an anonymous vnode for the same device was created
at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than
creating a new vnode for the device. This corrects a problem which
caused the kernel to panic when taking a snapshot of the root
filesystem.

Change the calling convention of vn_write_suspend_wait() to be the
same as vn_start_write().

Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue when suspending filesystem
operations.

Access to buffers becomes recursive so that snapshots can recursively
traverse their indirect blocks using ffs_copyonwrite() when checking
for the need for copy on write when flushing one of their own indirect
blocks. This eliminates a deadlock between the syncer daemon and a
process taking a snapshot.

Ensure that softdep_process_worklist() can never block because of a
snapshot being taken. This eliminates a problem with buffer starvation.

Cleanup change in ffs_sync() which did not synchronously wait when
MNT_WAIT was specified. The result was an unclean filesystem panic
when doing forcible unmount with heavy filesystem I/O in progress.

Return a zero'ed block when reading a block that was not in use at
the time that a snapshot was taken. Normally, these blocks should
never be read. However, the readahead code will occationally read
them which can cause unexpected behavior.

Clean up the debugging code that ensures that no blocks be written
on a filesystem while it is suspended. Snapshots must explicitly
label the blocks that they are writing during the suspension so that
they do not cause a `write on suspended filesystem' panic.

Reorganize ffs_copyonwrite() to eliminate a deadlock and also to
prevent a race condition that would permit the same block to be
copied twice. This change eliminates an unexpected soft updates
inconsistency in fsck caused by the double allocation.

Use bqrelse rather than brelse for buffers that will be needed
soon again by the snapshot code. This improves snapshot performance.
2000-07-24 05:28:33 +00:00
mckusick
a3d0c189ea Add snapshots to the fast filesystem. Most of the changes support
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.

Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).
2000-07-11 22:07:57 +00:00
ps
34013c4020 Correctly set the Maximum DHCP Message Size. bootpd now works
again as well as ISC dhcpd.
2000-06-13 09:32:09 +00:00
jake
961b97d434 Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by:		msmith and others
2000-05-26 02:09:24 +00:00
jake
d93fbc9916 Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by:	phk
Reviewed by:	phk
Approved by:	mdodd
2000-05-23 20:41:01 +00:00
phk
204a9e7fba Include a RFC 1533 "Maximum DHCP Message Size" option in our request.
ISC DHCP will limit the reply length to 64 bytes for bootp replies
unless we explicitly tell it we can do more.  We tell it that we can do
1200 bytes.
2000-05-07 14:29:19 +00:00
phk
36c3965ff9 Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by:    peter
2000-05-05 09:59:14 +00:00
phk
10914aa708 Remove unneeded #include <vm/vm_zone.h>
Generated by:	src/tools/tools/kerninclude
2000-04-30 18:52:11 +00:00
phk
1931990da0 s/biowait/bufwait/g
Prodded by: several.
2000-04-29 16:25:22 +00:00
phk
6be1308ad1 Remove ~25 unneeded #include <sys/conf.h>
Remove ~60 unneeded #include <sys/malloc.h>
2000-04-19 14:58:28 +00:00
phk
aaaef0b54e Complete the bio/buf divorce for all code below devfs::strategy
Exceptions:
        Vinum untouched.  This means that it cannot be compiled.
        Greg Lehey is on the case.

        CCD not converted yet, casts to struct buf (still safe)

        atapi-cd casts to struct buf to examine B_PHYS
2000-04-15 05:54:02 +00:00
phk
8ee11d587f Move B_ERROR flag to b_ioflags and call it BIO_ERROR.
(Much of this done by script)

Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.

Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.

Add bio_queue field for struct bio aware disksort.

Address a lot of stylistic issues brought up by bde.
2000-04-02 15:24:56 +00:00
dillon
d7295a1a39 Add a sysctl to specify the amount of UDP receive space NFS should
reserve, in maximal NFS packets.  Originally only 2 packets worth of
    space was reserved.  The default is now 4, which appears to greatly
    improve performance for slow to mid-speed machines on gigabit networks.

    Add documentation and correct some prior documentation.

Problem Researched by: Andrew Gallatin <gallatin@cs.duke.edu>
Approved by: jkh
2000-03-27 21:38:35 +00:00
phk
5df766a0f8 Rename the existing BUF_STRATEGY() to DEV_STRATEGY()
substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)

substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)

This patch is machine generated except for the ccd.c and buf.h parts.
2000-03-20 11:29:10 +00:00
phk
a246e10f55 Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new
field in struct buf: b_iocmd.  The b_iocmd is enforced to have
exactly one bit set.

B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.

Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.

Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue.  It is likely to write on your disk
where it should have been reading.

This change is a step in the direction towards a stackable BIO capability.

A lot of this patch were machine generated (Thanks to style(9) compliance!)

Vinum users:  Greg has not had time to test this yet, be careful.
2000-03-20 10:44:49 +00:00
peter
a5441090de Clean up some loose ends in the network code, including the X.25 and ISO
#ifdefs.  Clean out unused netisr's and leftover netisr linker set gunk.
Tested on x86 and alpha, including world.

Approved by:	jkh
2000-02-13 03:32:07 +00:00
dillon
860c8411e1 Fix catastrophic bug in NQNFS related to UDP mounts. The 'nqhost'
struct contains a major union for which lph_slp was being initialized
    only for TCP connections, but accessed for all types of connections
    leading to a crash.  Also, a conditional controlling an nfs_slplock()
    call contained an improper paren grouping, causing a second crash in
    the UDP case.

    The nqhost structure has been reorganized and lph_slp has been made a
    normal structural field rather then a union field, and properly
    initialized for all connection types.

Approved by: jkh
2000-01-26 20:51:29 +00:00
dillon
9bba72b2a3 The alpha build cuases the 'nfsuid bloated' warning to occur. Well,
there is nothing we can do about it.  In fact, after further review
    there simply are not very many instances of the two structures NFS
    checks for 'bloat' so I've decided to simply rip the checks out entirely.

Submitted by:	 Andrew Gallatin <gallatin@cs.duke.edu>
2000-01-13 20:18:25 +00:00
shin
3bdc213839 tcp updates to support IPv6.
also a small patch to sys/nfs/nfs_socket.c, as max_hdr size change.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project
2000-01-09 19:17:30 +00:00
dillon
c6689c797d Enhance reassignbuf(). When a buffer cannot be time-optimally inserted
into vnode dirtyblkhd we append it to the list instead of prepend it to
    the list in order to maintain a 'forward' locality of reference, which
    is arguably better then 'reverse'.  The original algorithm did things this
    way to but at a huge time cost.

    Enhance the append interlock for NFS writes to handle intr/soft mounts
    better.

    Fix the hysteresis for NFS async daemon I/O requests to reduce the
    number of unnecessary context switches.

    Modify handling of NFS mount options.  Any given user option that is
    too high now defaults to the kernel maximum for that option rather then
    the kernel default for that option.

Reviewed by:	 Alfred Perlstein <bright@wintelcom.net>
2000-01-05 05:11:37 +00:00
dillon
42026f70c2 Fix at least one source of the continued 'NFS append race'. close()
was calling nfs_flush() and then clearing the NMODIFIED bit.  This is
    not legal since there might still be dirty buffers after the nfs_flush
    (for example, pending commits).  The clearing of this bit in turn prevented
    a necessary vinvalbuf() from occuring leaving left over dirty buffers
    even after truncating the file in a new operation.  The fix is to
    simply not clear NMODIFIED.

    Also added a sysctl vfs.nfs.nfsv3_commit_on_close which, if set to 1,
    will cause close() to do a stage 1 write AND a stage 2 commit
    synchronously.  By default only the stage 1 write is done synchronously.

Reviewed by:	Alfred Perlstein <bright@wintelcom.net>
2000-01-05 00:32:18 +00:00
peter
d53e4c1d80 Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot).  This is consistant with the other
BSD's who made this change quite some time ago.  More commits to come.
1999-12-29 05:07:58 +00:00
alfred
4f1be4b332 make getfh a standard syscall instead of dependant on having
NFSSERVER defined, useful for userland fileservers that want to
use a filehandle type interface to the filesystem.

Submitted by: Assar Westerlund assar@stacken.kth.se
PR: kern/15452
1999-12-21 20:21:12 +00:00
rwatson
4b6baecfc7 Second pass commit to introduce new ACL and Extended Attribute system
calls, vnops, vfsops, both in /kern, and to individual file systems that
require a vfsop_ array entry.

Reviewed by:	eivind
1999-12-19 06:08:07 +00:00
green
99a0254a52 M_PREPEND-related cleanups (unregisterifying struct mbuf *s). 1999-12-19 01:55:37 +00:00
dillon
00382428de Fix compilation warning on alpha when converting pointer to integer
to generate hash index.

Reviewed by:	 Andrew Gallatin <gallatin@cs.duke.edu>
1999-12-18 19:20:05 +00:00
dillon
db701263c1 Have NFS use a snapshot of boottime instead of boottime itself to
generate the NFSv3 Version id.  boottime itself may change, sometimes
    once every tick if you are running xntpd, which really throws off
    clients.  Clients will tend to throw away what they believe to be
    stale data too often, and can get into long loops rewriting the same
    data over and over again because they believe the server has rebooted
    over and over again due to the changing version id.

Approved by:	jkh
1999-12-16 17:01:32 +00:00
eivind
87724eb673 Introduce NDFREE (and remove VOP_ABORTOP) 1999-12-15 23:02:35 +00:00
dillon
3968ced3f8 Fix two problems: First, fix the append seek position race that can
occur due to np->n_size potentially changing if nfs_getcacheblk()
    blocks in nfs_write().

    Second, under -current we must supply the proper bufsize when obtaining
    buffers that straddle the EOF, but due to the fact that np->n_size can
    change out from under us it is possible that we may specify the wrong
    buffer size and wind up truncating dirty data written by another
    process.

    Both problems are solved by implementing nfs_rslock(), which allows us
    to lock around sensitive buffer cache operations such as those that
    occur when appending to a file.

    It is believed that this race is responsible for causing dirtyoff/dirtyend
    and (in stable) validoff/validend to exceed the buffer size.  Therefore
    we have now added a warning printf for the dirtyoff/end case in current.

    However, we have introduced a new problem which we need to fix at some
    point, and that is that soft or intr NFS mounts may become
    uninterruptable from the point of view of process A which is stuck waiting
    on rslock while process B is stuck doing the rpc.  To unstick process A,
    process B would have to be interrupted first.

Reviewed by:	Alfred Perlstein <bright@wintelcom.net>
1999-12-14 19:07:54 +00:00
dillon
bab004e729 Add a readahead heuristic to the NFS server side code. While the server
cannot unilaterally pass data to a client it can reduce the physical
    disk transaction overhead by reading larger blocks.  This results in
    better pipelining of requests/responses over the network and an almost
    100% increase in cpu efficiency on the server.  On a 100BaseTX network
    NFS read performance increases from 8.5 MBytes/sec to 10 MB/sec (maxed
    out), and cpu efficiency increases from 72% idle to 80% idle on the server.

Reviewed by:	Alfred Perlstein <bright@wintelcom.net>
1999-12-13 17:34:45 +00:00
dillon
b01e2d1796 PR: kern/15222
Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
1999-12-13 17:07:03 +00:00