Commit Graph

1053 Commits

Author SHA1 Message Date
Sam Leffler
49d5157434 consolidate parsing of nfs root mount options in one place
and handle all options (some may require fixes elsewhere)

Reviewed by:	jhb, mohans
MFC after:	1 month
2006-12-06 02:15:25 +00:00
Mohan Srinivasan
594ece53bc In nfs_nget(), we must initialize the fh in the nfsnode before inserting the
vnode into the vfs hash. Otherwise, another thread walking the hash can trip
on an nfsnode with an uninitialized or partially initialized fh.
Thanks to ups@ for spotting this race.
2006-11-29 02:21:40 +00:00
Mohan Srinivasan
d4875805d7 bde@ pointed out that tprintf() acquires Giant so callers of tprintf() don't
have to explicitly acquire Giant (although they need to be aware of this and
not hold any locks at that point). Remove the acquisitions of Giant in the
NFS client wrapping tprintf().
2006-11-27 23:26:06 +00:00
Mohan Srinivasan
88d5725c38 Fix for a bug caused by a race when 2 threads lookup the same
file. Leave the loser's lock(s) initialized, so the reclaim logic can
unconditionally destroy them when that race occurs (or if the vfs hash
insert happened to fail for some other reason). Thanks to ups@ for a
careful review of the code.
Reported by : Kris Kennaway
2006-11-27 19:06:43 +00:00
Mohan Srinivasan
a18c4dc336 1) Fix up locking in nfs_up() and nfs_down.
2) Reduce the acquisitions of the Giant lock in the nfs_socket.c paths significantly.
- We don't need to acquire Giant before tsleeping on lbolt anymore,
  since jhb specialcased lbolt handling in msleep.
- nfs_up() needs to acquire Giant only if printing the "server up"
  message.
- nfs_timer() held Giant for the duration of the NFS timer processing,
  just because the printing of the message in nfs_down() needed it
  (and we acquire other locks in nfs_timer()). The acquisition of
  Giant is moved down into nfs_down() now, reducing the time Giant is
  held in that path.

Reported by: Kris Kennaway
2006-11-20 04:14:23 +00:00
Mohan Srinivasan
3c2fcc3c92 vfs_hash_insert() vputs() the losing vnode before returning, in the event of
a race where a duplicate vnode is entered into the vfs hash. nfs_nget() shouldn't
be releasing the vnode in that case.
2006-11-16 23:03:46 +00:00
Mohan Srinivasan
87c125cecc Fix to readdir+ reply handling. When inserting an entry into the namecache,
initialize the nfsnode's ctime. Otherwise a subsequent lookup purges the
just entered namecache entry.
2006-11-16 23:02:37 +00:00
Sam Leffler
83cc6b9ad2 honor nolockd flag in root mount options
MFC after:	2 weeks
2006-11-07 18:02:45 +00:00
Mohan Srinivasan
88b94fba38 Make EWOULDBLOCK a recoverable error so that the request is retransmitted.
This bug results in data corruption with NFS/TCP. Writes are silently dropped
on EWOULDBLOCK (because socket send buffer is full and sockbuf timer fires).

Reviewed by: ups@
2006-10-31 20:25:37 +00:00
Bruce Evans
35259c2c89 Fixed some style bugs (especially ones involving long lines and use
of __P(())).  There are many more.
2006-10-17 22:07:07 +00:00
Bruce Evans
6a72ff6b09 Don't do null Setattr RPCs for VA_MARK_ATIME. When we added the
VA_MARK_ATIME feature to fix POSIX conformance fore execve() and mmap(),
we thought that it was optimized well enough for the one file system
that supports it (ffs) and harmless for other file systems (except
layered ones which already get the layering for VOP_SETATTR() wrong).
However, nfs_setattr() doesn't do much parameter checking, so when
it gets a combination of parameters that it doesn't understand, it
always does a Setattr RPC.  This RPC can't do anything good, and for
VA_MARK_ATIME it is null except for wasting a lot of time.

This is the smallest and easiest to fix of several bugs that have
increased the number of RPCs for kernel builds on nfs by more than
100% since 2004-11-05.  The real-time increase depends on network
latency and parallelization and can also be very large (approaching
the same percentage for unparallelized operations like "make depend"
on systems with fast CPUs and high-latency networks).
2006-10-14 07:25:11 +00:00
Poul-Henning Kamp
f645b0b51c First part of a little cleanup in the calendar/timezone/RTC handling.
Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.
2006-10-02 12:59:59 +00:00
Tor Egge
a1e363f256 Add mnt_noasync counter to better handle interleaved calls to nmount(),
sync() and sync_fsync() without losing MNT_ASYNC.  Add MNTK_ASYNC flag
which is set only when MNT_ASYNC is set and mnt_noasync is zero, and
check that flag instead of MNT_ASYNC before initiating async io.
2006-09-26 04:15:59 +00:00
Tor Egge
5da56ddb21 Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag.
This eliminates a race where MNT_UPDATE flag could be lost when nmount()
raced against sync(), sync_fsync() or quotactl().
2006-09-26 04:12:49 +00:00
Mohan Srinivasan
7d7d9e2242 Fixes up the handling of shared vnode lock lookups in the NFS client,
adds a FS type specific flag indicating that the FS supports shared
vnode lock lookups, adds some logic in vfs_lookup.c to test this flag
and set lock flags appropriately.

- amd on 6.x is a non-starter (without this change). Using amd under
  heavy load results in a deadlock (with cascading vnode locks all the
  way to the root) very quickly.
- This change should also fix the more general problem of cascading
  vnode deadlocks when an NFS server goes down.

Ideally, we wouldn't need these changes, as enabling shared vnode lock
lookups globally would work. Unfortunately, UFS, for example isn't
ready for shared vnode lock lookups, crashing pretty quickly.

This change is the result of discussions with Stephan Uphoff (ups@).

Reviewed by:	ups@
2006-09-13 18:39:09 +00:00
Mohan Srinivasan
6cd7078919 Fix for a deadlock triggered by a 'umount -f' causing a NFS request to never
retransmit (or return). Thanks to John Baldwin for helping nail this one.

Found by : Kris Kennaway
2006-08-29 22:00:12 +00:00
Thomas Quinot
3401780fa0 Fix typos in comment. 2006-08-16 23:53:05 +00:00
Alan Cox
5786be7cc7 Introduce a field to struct vm_page for storing flags that are
synchronized by the lock on the object containing the page.

Transition PG_WANTED and PG_SWAPINPROG to use the new field,
eliminating the need for holding the page queues lock when setting
or clearing these flags.  Rename PG_WANTED and PG_SWAPINPROG to
VPO_WANTED and VPO_SWAPINPROG, respectively.

Eliminate the assertion that the page queues lock is held in
vm_page_io_finish().

Eliminate the acquisition and release of the page queues lock
around calls to vm_page_io_finish() in kern_sendfile() and
vfs_unbusy_pages().
2006-08-09 17:43:27 +00:00
Brooks Davis
a36aa44a85 Add a new kernel environment variable "boot.netif.mtu" which is used to
set the MTU prior to mounting root via NFS.  This is required if the
server supports a higher than default MTU because the client will not
see the responses otherwise.

MFC after:	3 weeks
2006-08-09 01:56:17 +00:00
Robert Watson
b0668f7151 soreceive_generic(), and sopoll_generic(). Add new functions sosend(),
soreceive(), and sopoll(), which are wrappers for pru_sosend,
pru_soreceive, and pru_sopoll, and are now used univerally by socket
consumers rather than either directly invoking the old so*() functions
or directly invoking the protocol switch method (about an even split
prior to this commit).

This completes an architectural change that was begun in 1996 to permit
protocols to provide substitute implementations, as now used by UDP.
Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to
perform these operations on sockets -- in particular, distributed file
systems and socket system calls.

Architectural head nod:	sam, gnn, wollman
2006-07-24 15:20:08 +00:00
Konstantin Belousov
c915bcbad2 Signals may be delivered to process as well as to the thread. Check the
thread-delivered signals in addition to the process one.

Reviewed by:	mohan
MFC after:	1 month
Approved by:	kan (mentor)
2006-07-08 15:39:11 +00:00
Konstantin Belousov
201599c3af Always supply curthread as argument to nfs_asyncio and nfs_doio
in nfs_strategy. Otherwise, for some buffers, signals would be ignored
at the intr mounts.

Reviewed by:	mohan
MFC after:	1 month
Approved by:	kan (mentor)
2006-07-08 15:36:51 +00:00
Yaroslav Tykhiy
4b97d7affd There is a consensus that ifaddr.ifa_addr should never be NULL,
except in places dealing with ifaddr creation or destruction; and
in such special places incomplete ifaddrs should never be linked
to system-wide data structures.  Therefore we can eliminate all the
superfluous checks for "ifa->ifa_addr != NULL" and get ready
to the system crashing honestly instead of masking possible bugs.

Suggested by:	glebius, jhb, ru
2006-06-29 19:22:05 +00:00
Yaroslav Tykhiy
576cdf4352 Use the elegant TAILQ_FOREACH() in place of a hand-rolled for() loop. 2006-06-29 15:37:39 +00:00
Mohan Srinivasan
64c3892747 Kris Kennaway found that for '/' NFS mounts, the MPSAFE mount flag was
not being set, which means Giant would be acquired for these mounts.
2006-05-30 20:32:44 +00:00
Mohan Srinivasan
1af6f471ca Fix for a potential attempt to sleep while holding nm_mtx. Caught and reported
by Witness (which forces the mbuf allocation flag to M_NOWAIT).

Reported by: "sekes".
2006-05-26 18:45:55 +00:00
Stephan Uphoff
6c1b7d16c2 Call vm_object_page_clean() with the object lock held.
Submitted by:	kensmith@
Reviewed by:	mohans@
MFC after:	6 days
2006-05-25 17:16:11 +00:00
Stephan Uphoff
dcf67e65d2 Do not set B_NOCACHE on buffers when releasing them in flushbuflist().
If B_NOCACHE is set the pages of vm backed buffers will be invalidated.
However clean buffers can be backed by dirty VM pages so invalidating them
can lead to data loss.
Add support for flush dirty page in the data invalidation function
of some network file systems.

This fixes data losses during vnode recycling (and other code paths
using invalbuf(*,V_SAVE,*,*)) for data written using an mmaped file.

Collaborative effort by: jhb@,mohans@,peter@,ps@,ups@
Reviewed by:	tegge@
MFC after:	7 days
2006-05-25 01:00:35 +00:00
Mohan Srinivasan
5bbfbd1422 Since NFSv4 is not SMP safe, nfsiod needs to acquire Giant for NFSv4 mounts
before doing the read/write.

Reported by:	Chuck Lever.
2006-05-24 23:06:50 +00:00
Robert Watson
33c6a485bd Adjust minimum iod threads from 4 to 0 -- since we compile the NFS
client into the kernel by default, and many users won't use NFS,
don't start an extra 4 kernel threads that are unused.  Once NFS
becomes active, it will start nfsiod's as it needs them.

We might consider mandating a minimum iod's equal to the number of
active NFS mounts (truncated to some value), which would force some
to remain available without having to create a new one if the file
system is mostly inactive.

PR:		70880
MFC after:	2 weeks
Prodded by:	cel
Head nod:	peter
Pointed out by:	Joe <fbsd_user at a1poweruser dot com>
2006-05-24 21:04:46 +00:00
Chuck Lever
6d0699a5ba NFS over TCP retransmit behavior should default to a 60 second time out,
mimicing the NFS reference implementation.

NFS over TCP does not need fast retransmit timeouts, since network loss
and congestion are managed by the transport (TCP), unlike with NFS over
UDP.  A long timeout prevents the unnecessary retransmission of non-
idempotent NFS requests.

Reviewed by:	mohans, silby, rees?
Sponsored by:	Network Appliance, Incorporated
2006-05-23 18:48:07 +00:00
Chuck Lever
94163ea283 Refactor the NFS over UDP retransmit timeout estimation logic to allow
the estimator to be more easily tuned and maintained.

There should be no functional change except there is now a lower limit
on the retransmit timeout to prevent the client from retransmitting
faster than the server's disks can fill requests, and an upper limit
to prevent the estimator from taking to long to retransmit during a
server outage.

Reviewed by:	mohan, kris, silby
Sponsored by:	Network Appliance, Incorporated
2006-05-23 18:33:58 +00:00
Mohan Srinivasan
f2c48228fe Vnode locks are recursive and the NFS client support shared vnode locks.
Found by: Kris Kennaway.
2006-05-23 16:07:23 +00:00
Mohan Srinivasan
f1cdf89911 Changes to make the NFS client MP safe.
Thanks to Kris Kennaway for testing and sending lots of bugs my way.
2006-05-19 00:04:24 +00:00
Mohan Srinivasan
671d06fb2e Fix a snafu caused while patching the previous fix from another branch. 2006-05-05 18:12:13 +00:00
Mohan Srinivasan
9f5b7dea42 Fix for a NFS/TCP client bug which would cause the NFS/TCP stream to get
out of sync under heavy loads, forcing frequent reconnets, causing EBADRPC
errors etc.
2006-05-05 18:04:53 +00:00
Mohan Srinivasan
5ef7d50da5 Keep track of the number of in-progress async direct IO writes in the nfsnode.
Make fsync/close wait until all of these drain. Add a check to nfs_getpage() and
nfs_putpage().
2006-04-06 01:20:30 +00:00
Jeff Roberson
b2282f9a3f - Busy the filesystem in nfs_statfs to prevent us from creating a new
vnode after vflush() has succeeded.  This would cause a dangling vnode
   panic at unmount time otherwise.  Other filesystems may have this problem
   via their VFS_VGET() routines.

Found by:	kris
Sponsored by:	Isilon Systems, Inc.
2006-04-01 01:15:23 +00:00
Kris Kennaway
78e31796c9 Fix a bug in the NFS/TCP retransmission path.
The bug was that earlier, if a request was retransmitted,
we would do subsequent retransmits every 10 msecs.

This can cause data corruption under moderate loads by reordering
operations as seen by the client NFS attribute cache, and on the
server side when the retransmission occurs after the original request
has left the duplicate cache, since the operation will be committed
for a second time.

Further work on retransmission handling is needed (e.g. they are still
being done sent too often since they are scaled by HZ, and the size of
the dup cache is too small and easily overwhelmed on busy servers).

Submitted by:	mohans
2006-03-23 22:58:42 +00:00
Pawel Jakub Dawidek
9972deb772 Actually I wanted 'nolockd' here instead of 'lockd'.
MFC after:	2 days
2006-03-19 13:27:37 +00:00
Chuck Lever
a59b03bf0e If an NFS server returns more than a few EJUKEBOX errors for a given RPC
request, the FreeBSD NFS client will quickly back off to a excessively
long wait (days, then weeks) before retrying the request.

Change the behavior of the FreeBSD NFS client to match the behavior of
the reference NFS client implementation (Solaris).  This provides a fixed
delay of 10 seconds between each retry by default.  A sysctl, called
nfs3_jukebox_delay, is now available to tune the delay.  Unlike Solaris,
the sysctl value on FreeBSD is in seconds, rather than in HZ.

Sponsored by:	Network Appliance, Incorporated
Reviewed by:	rick
Approved by:	silby
MFC after:	3 days
2006-03-17 22:14:23 +00:00
Chuck Lever
9f5349f23d Fix a bug in NFSv3 READDIRPLUS reply processing
The client's READDIRPLUS logic skips the attributes and
filehandle of the ".." entry.  If the server doesn't send
attributes but does send a filehandle for "..", the
client's logic doesn't account for the extra "value
follows" field that indicates whether the filehandle is
present, causing the remaining entries in the reply
to be ignored.

Sponsored by:	Network Appliance, Inc.
Reviewed by:	rick, mohans
Approved by:	silby
MFC after:	2 weeks
2006-03-08 01:43:01 +00:00
Jim Rees
4b81d0eb0f Don't log an error on tcp connection reset, even if we don't get ECONNRESET.
Submitted by:	cel@citi.umich.edu
2006-01-20 15:07:18 +00:00
Alfred Perlstein
92e73f5711 I ran into an nfs client panic a couple of times in a row over the
last few days.  I tracked it down to the fact that nfs_reclaim()
is setting vp->v_data to NULL _before_ calling vnode_destroy_object().
After silence from the mailing list I checked further and discovered
that ufs_reclaim() is unique among FreeBSD filesystems for calling
vnode_destroy_object() early, long before tossing v_data or much
of anything else, for that matter.  The rest, including NFS, appear
to be identical, as if they were just clones of one original routine.

The enclosed patch fixes all file systems in essentially the same
way, by moving the call to vnode_destroy_object() to early in the
routine (before the call to vfs_hash_remove(), if any).  I have
only tested NFS, but I've now run for over eighteen hours with the
patch where I wouldn't get past four or five without it.

Submitted by: Frank Mayhar
Requested by: Mohan Srinivasan
MFC After: 1 week
2006-01-17 17:29:03 +00:00
Robert Watson
63074a901a In nfs_dolock(), GC now under-used ioflg, rendered obsolete when we moved
from using a fifo to talk to rpc.lockd to using a special device node.

Noticed by:	Coverity Prevent analysis tool
MFC after:	3 days
2006-01-13 23:16:29 +00:00
Tor Egge
82be0a5a24 Add marker vnodes to ensure that all vnodes associated with the mount point are
iterated over when using MNT_VNODE_FOREACH.

Reviewed by:	truckman
2006-01-09 20:42:19 +00:00
Xin LI
fc9fac4c78 Correct a typo 2005-12-28 10:03:48 +00:00
Paul Saab
fc6ff223c4 Improve upon rev 1.133 where NFS/TCP would not reconnect.
Submitted by:	Mohan Srinivasan
2005-12-12 23:18:05 +00:00
Ruslan Ermilov
2f1b461447 Unexpand LLADDR(). 2005-11-29 09:51:47 +00:00
Paul Saab
38b29f71ef Fix for a bug where NFS/TCP would not reconnect (in the case where
the server FIN'ed). Seen with Solaris NFS servers.

Reported by:	TOMITA Yoshinori <yoshint@flab.fujitsu.co.jp>
Submitted by:	Mohan Strinivasan
2005-11-21 19:25:24 +00:00
Paul Saab
3834aac17e - Always return success from NFS strategy. nfs_doio(), in the
event of an error, does the right thing, in terms of setting
  the error flags in the buf header. That fixes a crash from
  bstrategy().
- Treat ETIMEDOUT as a "recoverable" error, causing the buffer
  to be re-dirtied. ETIMEDOUT can occur on soft mounts, when
  the number of retries are exceeded, and we don't want data loss
  in that case.

Submitted by:	Mohan Srinivasan
2005-11-21 19:23:46 +00:00
Jim Rees
cb156cc603 fix a problem with XID re-use when a server returns NFSERR_JUKEBOX.
Submitted by:	cel@citi.umich.edu
Fixed by:	rick@snowhite.cis.uoguelph.ca
Approved by:	alfred
MFC after:	3 weeks
2005-11-21 18:39:18 +00:00
Jonathan Chen
0b3e7451da fix a crash when an nfsv2 mount fails
MFC after:	1 week
2005-11-10 23:25:16 +00:00
Paul Saab
9c31df40bb Fix for a crash (from nfs_lookup() in an error case).
Submitted by:	Mohan Srinivasan
2005-11-03 19:24:54 +00:00
Paul Saab
41ce2892bb In nfs_flush(), clear the NMODIFIED bit only if there are no dirty
buffers *and* there are no buffers queued up for writing.  The bug
was that NMODIFIED was being cleared even while there were buffers
scheduled to be written out, which leads to all sorts of interesting
bugs - one where the file could shrink (because of a post-op getattr
load, say) causing data in buffer(s) queued for write to be tossed,
resulting in data corruption.

Submitted by:	Mohan Srinivasan
2005-11-03 07:42:15 +00:00
Paul Saab
120c58288c Fix for a race between the thread transmitting the request and the
thread processing the reply.

Submitted by:	Mohan Srinivasan
2005-11-03 07:31:06 +00:00
Robert Watson
5bb84bc84b Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in
  memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
  as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
  memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
  attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion.  Similar changes are required for UMA zone names.
2005-10-31 15:41:29 +00:00
Gleb Smirnoff
c0bc2867c1 - Fix leak of struct nlminfo on process exit.
- Fix malloc type collision, that made the above problem
  difficult to understand.

Reported by:	Vladimir Sharun <sharun ukr.net>
2005-10-26 07:18:37 +00:00
Pawel Jakub Dawidek
df71afde00 - Use strsep() instead of strtok().
- strdup() uses M_WAITOK, so we don't need to check it's return value
  against NULL.

MFC after:	2 weeks
2005-10-06 19:04:08 +00:00
Pawel Jakub Dawidek
720f3948c0 Add boot.nfsroot.options loader tunable.
It allows to specify options for NFS root file system.
Currently supported options are: soft, intr, conn, lockd.

I'm adding this functionality mostly for 'lockd' option, which is only
honored when performing the initial mount and will be silently ignored
if used while updating the mount options.

This will allow to use flock(2) without the need of using varmfs or
rpc.lockd and friends.

Example of use:
boot.nfsroot.options="intr,lockd"

MFC after:	2 weeks
2005-10-06 11:18:34 +00:00
Robert Watson
84d2b7df26 Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(),
as they both interact with the tty code (!MPSAFE) and may sleep if the
tty buffer is full (per comment).

Modify all consumers of uprintf() and tprintf() to hold Giant around
calls into these functions.  In most cases, this means adding an
acquisition of Giant immediately around the function.  In some cases
(nfs_timer()), it means acquiring Giant higher up in the callout.

With these changes, UFS no longer panics on SMP when either blocks are
exhausted or inodes are exhausted under load due to races in the tty
code when running without Giant.

NB: Some reduction in calls to uprintf() in the svr4 code is probably
desirable.

NB: In the case of nfs_timer(), calling uprintf() while holding a mutex,
or even in a callout at all, is a bad idea, and will generate warnings
and potential upset.  This needs to be fixed, but was a problem before
this change.

NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having
non-MPSAFE tty code.

MFC after:	1 week
2005-09-19 16:51:43 +00:00
Paul Saab
250614c5ab FIx for a bug in the change that made nfs_timer() MPSAFE. We need to
grab Giant before calling pru_send() (if running with mpsafenet = 0).

Found by:	Jeremie Le Hen.
Fixed by:	Maxime Henrion
2005-07-27 15:06:26 +00:00
Paul Saab
4fb48d10b0 In nfs_nget() if two threads race on the same filehandle, the loser should
cause the nfsnode to get freed. This fixes a potential vnode (and nfsnode)
leak in that path.

Submitted by:	Mohan Srinivasan
Reviewed by:	phk
2005-07-27 15:05:31 +00:00
Paul Saab
865b5cc7fd Remove the NFS client rslock. The rslock was used to serialize
writers that want to extend the file. It was also used to serialize
readers that might want to read the last block of the file (with a
writer extending the file).  Now that we support vnode locking for
NFS, the rslock is unnecessary. Writers grab the exclusive vnode
lock before writing and readers grab the shared (or in some cases
the exclusive) lock.

Submitted by:	Mohan Srinivasan
2005-07-21 22:46:56 +00:00
Paul Saab
4321eae6b7 Make nfs_timer() MPSAFE. With this change, the bottom half of the NFS
client (the interface with the protocol stack and callouts) is
Giant-free.

Submitted by:	Mohan Srinivasan.
2005-07-19 21:27:25 +00:00
Paul Saab
38b8570c55 Fix for a NFS soft mounts bug where if the number of retries exceeds
the max rexmits, the request was not being bounced back with a
ETIMEDOUT error.

Reported by:	Oliver Lehmann
Submitted by:	Mohan Srinivasan
2005-07-18 02:12:17 +00:00
Paul Saab
0e38f5365b Fixes for NFS crashes on architectures that require strict alignment.
- Fix nfsm_disct() so that after pulling up data, the remaining data
  is aligned if necessary.
- Fix nfs_clnt_tcp_soupcall() to bcopy() the rpc length out of the
  mbuf (instead of casting m_data to a uint32).

Submitted by:	Pyun YongHyeon
Reviewed by:	Mohan Srinivasan
2005-07-14 20:08:27 +00:00
Brian Feldman
6979a7592a Ifdef out the incomplete non-blocking IO implementation for NFS
pending discussion of how implementation would proceed.  Applications
like -lc_r expect select(3) to match the EAGAIN-status of IO
functions.

Approved by:	re
2005-06-16 15:43:17 +00:00
Brian Feldman
cc3149b1ea Fix a serious deadlock with the NFS client. Given a large enough
atomic write request, it can fill the buffer cache with the entirety
of that write in order to handle retries.  However, it never drops
the vnode lock, or else it wouldn't be atomic, so it ends up waiting
indefinitely for more buf memory that cannot be gotten as it has it
all, and it waits in an uncancellable state.

To fix this, hibufspace is exported and scaled to a reasonable
fraction.  This is used as the limit of how much of an atomic write
request by the NFS client will be handled asynchronously.  If the
request is larger than this, it will be turned into a synchronous
request which won't deadlock the system.  It's possible this value is
far off from what is required by some, so it shall be tunable as soon
as mount_nfs(8) learns of the new field.

The slowdown between an asynchronous and a synchronous write on NFS
appears to be on the order of 2x-4x.

General nod by:	gad
MFC after:	2 weeks
More testing:	wes
PR:		kern/79208
2005-06-10 23:50:41 +00:00
Dag-Erling Smørgrav
3f54cc0505 Ugh. Previous commit got the logic exactly backward.
Submitted by:	bland
Pointy hat to:	des
2005-05-17 18:23:03 +00:00
Dag-Erling Smørgrav
ff17c7a727 Revision 1.173 broke updating a mount from ro to rw. Fix that by clearing
the MNT_RDONLY flag if MNT_UPDATE is set and "ro" was not specified.

Suggested by:	cognet
2005-05-17 12:00:43 +00:00
Jim Rees
3785bdbe7f set R_MUSTRESEND flag in mark_for_reconnect so re-connected requests get
re-sent instead of timing out.

don't log an error message on reconnection, which is not an error.

remove unused nfs_mrep_before_tsleep.

Reviewed by:	Mohan Srinivasan
Approved by:	alfred
2005-05-10 14:25:14 +00:00
Paul Saab
15ec3fe2f0 Fix a bug in NFS/TCP where retransmissions would not reliably happen
if the server rebooted or tore down the connection for any reason.

Found by:	Jonathan Noack.
Submitted by:	Mohan Srinivasan.
2005-05-04 16:37:31 +00:00
Ian Dowse
2c443c417c Don't copy the NFSMNT_* flags into struct statfs's f_flags field,
as they have no connection with the expected MNT_* flags. This bug
was exposed 18 months ago when the assignments to f_flags in
vfs_syscalls.c were moved to before the VFS_STATFS() call. It was
fixed in the CSRG source 10 years ago, but we never picked up that
change.

PR:		kern/80390
MFC after:	1 week
2005-05-02 15:57:10 +00:00
Dag-Erling Smørgrav
4104e6bc1d When NFS was converted to the new mount syscall, code was written that sets
the MNT_RDONLY flag if the "ro" option was passed in from userland, and
clears it otherwise.  In the diskless case, the MNT_RDONLY flag is already
set when this code is reached, but there are no mount options, so it was
incorrectly cleared.  Change the logic so the MNT_RDONLY flag is set if the
"ro" option was specified, and left alone otherwise.

Note that the NFS code will still happily let you mount a filesystem RW
even if the server exports it RO.  I'm not sure how to fix that.
2005-04-27 14:46:02 +00:00
Dag-Erling Smørgrav
c6acf6d557 While I'm here, list the new kenv (boot.netif.name) along with the others. 2005-04-26 20:47:59 +00:00
Dag-Erling Smørgrav
8f0aecc01f When netbooting, as soon as we've figured out which interface we booted
from, store its name in a kenv variable.
2005-04-26 20:45:29 +00:00
Jim Rees
dcee1d0771 TCP reconnect is not an error.
Change the message from LOG_ERR to LOG_INFO.

Approved by:	alfred
2005-04-18 13:42:13 +00:00
Jeff Roberson
5b5f16b5a8 - cache_lookup() relocks the parent in the DOTDOT case for us.
Spotted by:	phk
Sponsored by:	Isilon Systems, Inc.
2005-04-14 07:08:34 +00:00
Jeff Roberson
4585e3ac5a - Change all filesystems and vfs_cache to relock the dvp once the child is
locked in the ISDOTDOT case.  Se vfs_lookup.c r1.79 for details.

Sponsored by:	Isilon Systems, Inc.
2005-04-13 10:59:09 +00:00
Jeff Roberson
f247a5240d - LK_NOPAUSE is a nop now.
Sponsored by:   Isilon Systems, Inc.
2005-03-31 04:37:09 +00:00
Jeff Roberson
da1c9cb2b5 - Remove wantparent, it is no longer necessary. An assert in vfs_lookup.c
prevents any callers from doing a modifying op without
   LOCKPARENT or WANTPARENT.
2005-03-29 13:09:42 +00:00
Jeff Roberson
5c5e51fd9a - cache_lookup() now locks the new vnode for us to prevent some races.
Remove redundant code.

Sponsored by:	Isilon Systems, Inc.
2005-03-29 13:00:37 +00:00
Jeff Roberson
f6576f194e - We no longer have to bother with PDIRUNLOCK, lookup() handles it for us.
- Network filesystems are written with a special idiom that checks the
   cache first, and may even unlock dvp before discovering that a network
   round-trip is required to resolve the name.  I believe dvp is prevented
   from being recycled even in the forced unmount case by the shared lock
   on the mount point.  If not, this code should grow checks for VI_DOOMED
   after it relocks dvp or it will access NULL v_data fields.

Sponsored by:	Isilon Systems, Inc.
2005-03-28 09:29:58 +00:00
Jeff Roberson
a176ceb322 - Update vfs_root implementations to match the new prototype. None of
these filesystems will support shared locks until they are explicitly
   modified to do so.  Careful review must be done to ensure that this
   is safe for each individual filesystem.

Sponsored by:   Isilon Systems, Inc.
2005-03-24 07:39:03 +00:00
Paul Saab
cae2d2c61f - The NFS client was incorrectly masking SIGSTOP (which is
non-maskable).
- The NFS client needs to guard against spurious wakeups
  while waiting for the response. ltrace causes the process
  under question to wakeup (possibly from ptrace()), which
  causes NFS to wakeup from tsleep without the response being
  delivered.

Submitted by:	Mohan Srinivasan
2005-03-23 22:10:10 +00:00
David Schultz
938838feb1 Don't brelse(bp) if bp is null. Also, eliminate some redundancy
and dead code.

Found by:	Coverity Prevent analysis tool
2005-03-18 21:23:32 +00:00
Poul-Henning Kamp
8b5505c013 Use vfs_hash. 2005-03-16 11:28:19 +00:00
John-Mark Gurney
7f76b06b35 MFp4: use the function to fix the packet header length instead of rolling
our own...
2005-03-16 08:13:08 +00:00
Jeff Roberson
8d8d331063 - VOP_INACTIVE should no longer drop the vnode lock.
Sponsored by:	Isilon Systems, Inc.
2005-03-13 12:15:36 +00:00
Jeff Roberson
c0f681c21d - The VI_DOOMED flag now signals the end of a vnode's relationship with
the filesystem.  Check that rather than VI_XLOCK.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 12:14:56 +00:00
Jeff Roberson
30144f05f0 - It is no longer necessary to lock and unlock the vnode in nfs_close() as
the top level does this for us now.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 12:11:23 +00:00
Paul Saab
6ff1ccae7f Minor cleanup in nfs_request() and removal of a comment that doesn't
reflect reality.

Submitted by:	Mohan Srinivasan
2005-02-26 18:55:36 +00:00
Poul-Henning Kamp
33822d53bf vp->v_id is a private field for the vfs namecache and it is a big mistake
that NFS ever started using it.  Long time ago I added the necessary
vhold()/vdrop() calls to replace it, but forgot to remove the v_id code.

Do it now.
2005-02-22 14:52:00 +00:00
Poul-Henning Kamp
dfd4be14bd Try to unbreak the vnode locking around vop_reclaim() (based mostly on
patch from kan@).

Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on
close.  This is not yet a generally safe function, but for this very
specific use it is safe.  This solves the problem with buffers not
being flushed by unmount or after failed mount attempts.
2005-02-19 11:44:57 +00:00
Paul Saab
7a5147540a Fix for a potential NFS client race where shared data is updated from
base context as well as the socket callback.

Submitted by:	Mohan Srinivasan
2005-02-18 23:41:39 +00:00
John Baldwin
01660e7bc2 Drop Giant before calling kthread_exit(). 2005-02-07 18:21:50 +00:00
Robert Watson
38aa565976 Style cleanup for O_DIRECT sysctl comment introduced in nfs_vnops.c:1.242. 2005-01-29 23:19:08 +00:00
Poul-Henning Kamp
a369f34d76 Make filesystems get rid of their own vnodes vnode_pager object in
VOP_RECLAIM().
2005-01-28 14:42:17 +00:00
Poul-Henning Kamp
b3a4d73ebe Create a vnode_pager object when a file is opened. 2005-01-24 23:03:29 +00:00
Poul-Henning Kamp
56dd36b1a6 Remove unused cred arg from nfs_vinvalbuf() and many bogus arguments
passed for it.
2005-01-24 12:31:06 +00:00
Peter Wemm
bcbfb8bc3d Mostly back out rev 1.33 from quite some time ago, and the followup fixes
and tweaks.  The code was actually quite broken because it discarded the
upper bits of the 64 bit division.  We only had a 50% chance of scaling up
the blocksize for large NFS client mounts when it was needed.  For 5.x and
beyond, this was harmless because we could represent the result in either
case.  For 4.x this was a big problem though.  (4.x also has a df(1) bug to
compound the problem)
2005-01-18 21:59:44 +00:00
Poul-Henning Kamp
7c0745eeae Eliminate unused and unnecessary "cred" argument from vinvalbuf() 2005-01-14 07:33:51 +00:00
Brian Somers
3e056862a2 Include opt_bootp.h for BOOTP_NFSROOT
PR:		73183
Submitted by:	Darrin Smith sdar at salseast dot org
MFC after:	7 days
2005-01-12 12:42:46 +00:00
Poul-Henning Kamp
6ef8480a88 Add BO_SYNC() and add a default which uses the secret vnode pointer
and VOP_FSYNC() for now.
2005-01-11 10:43:08 +00:00
Poul-Henning Kamp
8df6bac4c7 Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().
I'm not sure why a credential was added to these in the first place, it is
not used anywhere and it doesn't make much sense:

	The credentials for syncing a file (ability to write to the
	file) should be checked at the system call level.

	Credentials for syncing one or more filesystems ("none")
	should be checked at the system call level as well.

	If the filesystem implementation needs a particular credential
	to carry out the syncing it would logically have to the
	cached mount credential, or a credential cached along with
	any delayed write data.

Discussed with:	rwatson
2005-01-11 07:36:22 +00:00
Warner Losh
c398230b64 /* -> /*- for license, minor formatting changes 2005-01-07 01:45:51 +00:00
Paul Saab
b6e223d8d4 If the NFS/TCP stream is out of sync between the client and server,
and if the client (erroneously) reads the RPC length as 0 bytes, the
client can loop around in the socket callback. Explicitly check for
the length being 0 case and teardown/re-connect.

Submitted by:	Mohan Srinivasan
2005-01-05 23:21:13 +00:00
Paul Saab
72af302481 Turn NFS directio off until the stability issues are resolved. 2004-12-23 21:30:30 +00:00
Paul Saab
cb36cd34b8 Change the NFS sillyrename convention so that we won't run out
of sillyrenames (which were limited to 58 per pid per directory,
for no good reason). The new format of sillyrenames looks like

	.nfs.0000b31a.00d24.4
	     ^^^^^^^^ ^^^^^
	     ticks    pid

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
Obtained from:	Yahoo!
2004-12-16 19:28:37 +00:00
Paul Saab
a7500bceb0 First cut of NFS direct IO support.
- NFS direct IO completely bypasses the buffer and page caches.
  If a file is open for direct IO all caching is disabled.
- Direct IO for Directories will be addressed later.
- 2 new NFS directio related sysctls are added. One is a knob to
  disable NFS direct IO completely (direct IO is enabled by default).
  The other is to disallow mmaped IO on a file that has at least one
  O_DIRECT open (see the comment in nfs_vnops.c for more details).
  The default is to allow mmaps on a file that has O_DIRECT opens.

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
Obtained from:	Yahoo!
2004-12-15 22:20:22 +00:00
Marcel Moolenaar
129999637e Revert rev 1.233. The null-pointer function call (a dereference on
ia64) was not the result of a change in the vector operations. It
was caused by the NFS locking code using a FIFO and those bypassing
the vnode. This indirectly caused the panic. The NFS locking code has
been changed.

Requested by: phk
2004-12-11 21:36:29 +00:00
Paul Saab
41bc38d132 In nfs_rename(), skip the otw rename operation if the fsync (to
either src or dst) fails. This closes a potential data loss case
(where the fsync failed with ENOSPC, for example).

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
Obtained from:	Yahoo!
2004-12-10 03:29:02 +00:00
Paul Saab
4342aac774 Store a hint in the nfsnode to detect sequential access of the file.
Kick off a readahead only when sequential access is detected.  This
eliminates wasteful readaheads in random file access.

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
Obtained from:	Yahoo!
2004-12-10 03:27:12 +00:00
Paul Saab
5e5f905de3 Fix for a Lock Order Reversal in the nfs_flush() path, between the
vnode interlock and the proc lock.

Reported by:	marcel
Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
2004-12-07 21:16:32 +00:00
Poul-Henning Kamp
eeeb5c7f9a Don't clobber mnt_stat.f_mntonname 2004-12-07 14:26:39 +00:00
Poul-Henning Kamp
20a92a18f1 The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly
split the conversion of the remaining three filesystems out from the root
mounting changes, so in one go:

cd9660:
	Convert to nmount.
	Add omount compat shims.
	Remove dedicated rootfs mounting code.
	Use vfs_mountedfrom()
	Rely on vfs_mount.c calling VFS_STATFS()

nfs(client):
	Convert to nmount (the simple way, mount_nfs(8) is still necessary).
	Add omount compat shims.
	Drop COMPAT_PRELITE2 mount arg compatibility.

ffs:
	Convert to nmount.
	Add omount compat shims.
	Remove dedicated rootfs mounting code.
	Use vfs_mountedfrom()
	Rely on vfs_mount.c calling VFS_STATFS()

Remove vfs_omount() method, all filesystems are now converted.

Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem
task, and they all do it now.

Change rootmounting to use DEVFS trampoline:

vfs_mount.c:
	Mount devfs on /.  Devfs needs no 'from' so this is clean.
	symlink /dev to /.  This makes it possible to lookup /dev/foo.
	Mount "real" root filesystem on /.
	Surgically move the devfs mountpoint from under the real root
	filesystem onto /dev in the real root filesystem.

Remove now unnecessary getdiskbyname().

kern_init.c:
	Don't do devfs mounting and rootvnode assignment here, it was
	already handled by vfs_mount.c.

Remove now unused bdevvp(), addaliasu() and addalias().  Put the
few necessary lines in devfs where they belong.  This eliminates the
second-last source of bogo vnodes, leaving only the lemming-syncer.

Remove rootdev variable, it doesn't give meaning in a global context and
was not trustworth anyway.  Correct information is provided by
statfs(/).
2004-12-07 08:15:41 +00:00
Paul Saab
c10bac25f6 Always issue wakeups() to the NFS requestors under the mutex
to close all potential cases of missed wakeups.

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
2004-12-07 03:39:52 +00:00
Paul Saab
35ec46b7f2 Rewrite of the NFS client's reply handling. We now have NFS socket
upcalls which do RPC header parsing and match up the reply with the
request. NFS calls now sleep on the nfsreq structure. This enables
us to eliminate the NFS recvlock.

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
2004-12-06 21:11:15 +00:00
Paul Saab
ddc6c40075 2 fixes that improve on the consistency of the NFS client cache.
- Change the cached mtime to a 'struct timespec' from a
  time_t. Improving the precision of the cached mtime tightens up
  NFS' "close-to-open" consistency considerably.
- Always force an over-the-wire consistency check from nfs_open()
  (unless the file is marked modified). This further improves
  NFS' "close-to-open" consistency.

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
2004-12-06 19:18:00 +00:00
Paul Saab
d54d263a79 Serialize NFS vinvalbuf operations by acquiring/upgrading to the
vnode EXCLUSIVE lock. This prevents threads from adding pages to
the vnode while an invalidation is in progress, closing potential
races. In the bioread() path, callers acquire the SHARED vnode lock
- so while an invalidate was in progress, it was possible to fault
in new pages onto the vnode causing the invalidation to take a while
or fail. We saw these races at Yahoo! with very large files+heavy
concurrent access. Forcing an upgrade to EXCLUSIVE lock before doing
the invalidation closes all these races.

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
2004-12-06 18:52:28 +00:00
Paul Saab
b8d0fc9581 Add non-blocking versions of nfsm_dissect() and friends, for use from
socket callbacks or similar callers, from both the NFS client and the
server.
Instituted nfsm_dissect_nonblock(), nfsm_dissect_xx_nonblock(). And
nfsm_disct() now takes an extra M_TRYWAIT/M_DONTWAIT argument.

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
2004-12-06 17:33:52 +00:00
Paul Saab
8fefdf0057 - If all data has been committed to stable storage on the server, it
is safe to turn off the nfsnode's NMODIFIED flag.
- Move the check for signals to the top of the loop where we loop
  around the dirty buffers on the vnode, scheduling writes. This
  ensures that we'll break ouf of the flush operation on reception of
  a signal.

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
2004-12-06 16:35:58 +00:00
Robert Watson
0fe6462ad5 Correct a typo in a comment. 2004-12-06 16:11:25 +00:00
Poul-Henning Kamp
8b431c9576 For reasons unknown, the nfs locking code used a fifo to send requests to
userland and a dedicated system call to get replies.

The vnode-bypass of fifos broke this into a panic.

Ditch all the magic and create a device /dev/nfslock instead, and
use that for both directions apart from the shorter path, this is
also faster because the device driver runs Giant free using the
vnode bypass.

Noticed by:	marcel
2004-12-06 08:31:32 +00:00
Robert Watson
8880ff1eba Convert GIANT_REQUIRED; in nfs_mountroot() to NET_ASSERT_GIANT(),
and annotate that nfs_mountroot assumes it is OK to step on the
values in the global NFSv3 diskless structure as the mountroot
function is called during a serialized part of the boot, before
any other NFS client activity occurs.

MFC after:	2 weeks
2004-12-05 22:53:17 +00:00
Robert Watson
6bfde9e63b Convert a GIANT_REQUIRED; into a NET_ASSERT_GIANT();, as sockets are
now only conditionally protected by Giant based on debug.mpsafenet.
2004-12-05 22:50:09 +00:00
Poul-Henning Kamp
743312367a VFS_STATFS(mp, ...) is mostly called with &mp->mnt_stat, but a few cases
doesn't.  Most of the implementations have grown weeds for this so they
copy some fields from mnt_stat if the passed argument isn't that.

Fix this the cleaner way:  Always call the implementation on mnt_stat
and copy that in toto to the VFS_STATFS argument if different.
2004-12-05 22:41:02 +00:00
Marcel Moolenaar
061f5ec825 Fix null-pointer indirect function calls introduced in the previous
commit. In the new world order, the transitive closure on the vector
operations is not precomputed. As such, it's unsafe to actually use
any of the function pointers in an indirect function call. They can
be null, and we need to use the default vector in that case.
This is mostly a quick fix for the four function pointers that are
ed explicitly. A more generic or scalable solution is likely to see
the light of day.

No pathos on: current@
2004-12-05 22:30:28 +00:00
Poul-Henning Kamp
aec0fb7b40 Back when VOP_* was introduced, we did not have new-style struct
initializations but we did have lofty goals and big ideals.

Adjust to more contemporary circumstances and gain type checking.

	Replace the entire vop_t frobbing thing with properly typed
	structures.  The only casualty is that we can not add a new
	VOP_ method with a loadable module.  History has not given
	us reason to belive this would ever be feasible in the the
	first place.

	Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc.

	Give coda correct prototypes and function definitions for
	all vop_()s.

	Generate a bit more data from the vnode_if.src file:  a
	struct vop_vector and protype typedefs for all vop methods.

	Add a new vop_bypass() and make vop_default be a pointer
	to another struct vop_vector.

	Remove a lot of vfs_init since vop_vector is ready to use
	from the compiler.

	Cast various vop_mumble() to void * with uppercase name,
	for instance VOP_PANIC, VOP_NULL etc.

	Implement VCALL() by making vdesc_offset the offsetof() the
	relevant function pointer in vop_vector.  This is disgusting
	but since the code is generated by a script comparatively
	safe.  The alternative for nullfs etc. would be much worse.

	Fix up all vnode method vectors to remove casts so they
	become typesafe.  (The bulk of this is generated by scripts)
2004-12-01 23:16:38 +00:00
Poul-Henning Kamp
a4e16be2b4 Remove redundant functions (repo-copied from nfsclient) for dealing with
fifos.
2004-12-01 20:18:56 +00:00
Poul-Henning Kamp
ccae7d65f7 Scripted modification of vop_* prototypes to use typedefs. 2004-12-01 19:08:40 +00:00
Poul-Henning Kamp
e9d823dde4 Add missing #include 2004-12-01 07:34:08 +00:00
Paul Saab
cd15125084 Fix for a race between lookup and readdirplus, that causes
a deadlock (with NFS exclusive vnode locks enabled). Lookup
grabs the parent's lock and wants to lock child. Readdirplus
locks the child and wants to lock parent (for loading the attrs
for ".."). The fix is to not load the attrs for ".." in
readdirplus.

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
Reviewed by:	rwatson
2004-12-01 06:51:07 +00:00
Paul Saab
3e9c9e432a Clean all dirty pages (dirtied by mmap'ed writes) in nfs_close().
This closes a major hole in close-to-open consistency support.
Added a new sysctl so that this can be disabled for single NFS
client applications with very large amounts of mmap'ed IO (for
performance).

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
Reviewed by:	rwatson
2004-12-01 06:48:54 +00:00
Paul Saab
813d33a869 Fix for a (blocks) underrun bug where negative values were being
returned back to df from a statfs call. Causing df to print negative
values.

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
Reviewed by:	rwatson
2004-12-01 06:42:21 +00:00
Paul Saab
74f44849b5 Fix for a bug in nfs_mkdir() that called vrele() instead of vput()
in the error cases, causing panics.

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
Reviewed by:	rwatson
2004-11-29 23:05:30 +00:00
Jeff Roberson
b646893f0f - Eliminate the acquisition and release of the bqlock in bremfree() by
setting the B_REMFREE flag in the buf.  This is done to prevent lock order
   reversals with code that must call bremfree() with a local lock held.
   This also reduces overhead by removing two lock operations per buf for
   fsync() and similar.
 - Check for the B_REMFREE flag in brelse() and bqrelse() after the bqlock
   has been acquired so that we may remove ourself from the free-list.
 - Provide a bremfreef() function to immediately remove a buf from a
   free-list for use only by NFS.  This is done because the nfsclient code
   overloads the b_freelist queue for its own async. io queue.
 - Simplify the numfreebuffers accounting by removing a switch statement
   that executed the same code in every possible case.
 - getnewbuf() can encounter locked bufs on free-lists once Giant is removed.
   Remove a panic associated with this condition and delay asserts that
   inspect the buf until after it is locked.

Reviewed by:	phk
Sponsored by:	Isilon Systems, Inc.
2004-11-18 08:44:09 +00:00
Poul-Henning Kamp
282d0382ac Detect root mount attempts on the flag, not on the NULL path. 2004-11-09 22:21:52 +00:00
Poul-Henning Kamp
6e67e2a710 Retire b_magic now, we have the bufobj containing the same hint. 2004-11-04 09:48:18 +00:00
Poul-Henning Kamp
b792bebeea Move the buffer method vector (buf->b_op) to the bufobj.
Extend it with a strategy method.

Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY
song and dance.

Rename ibwrite to bufwrite().

Move the two NFS buf_ops to more sensible places, add bufstrategy
to them.

Add inlines for bwrite() and bstrategy() which calls through
buf->b_bufobj->b_ops->b_{write,strategy}().

Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().
2004-10-24 20:03:41 +00:00
Poul-Henning Kamp
494eb176e7 Add b_bufobj to struct buf which eventually will eliminate the need for b_vp.
Initialize b_bufobj for all buffers.

Make incore() and gbincore() take a bufobj instead of a vnode.

Make inmem() local to vfs_bio.c

Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj)
also VI_MTX() to BO_MTX(),

Make buf_vlist_add() take a bufobj instead of a vnode.

Eliminate other uses of bp->b_vp where bp->b_bufobj will do.

Various minor polishing: remove "register", turn panic into KASSERT,
use new function declarations, TAILQ_FOREACH_SAFE() etc.
2004-10-22 08:47:20 +00:00
Poul-Henning Kamp
a76d8f4ec9 Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT
Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write
count on a bufobj.  Bufobj_wdrop() replaces vwakeup().

Use these functions all relevant places except in ffs_softdep.c where
the use if interlocked_sleep() makes this impossible.

Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.
2004-10-21 15:53:54 +00:00
Pawel Jakub Dawidek
1a32dca7a3 Add a missing newline character. 2004-10-14 19:00:44 +00:00
David Schultz
506d3e1bcc nfsclient/nfs_bio.c has a PHOLD() without a PRELE(). Neither should
be necessary here.  Also, use killproc() instead of psignal().
2004-10-01 05:01:41 +00:00
Poul-Henning Kamp
c0f46dd1e4 Remove support for using NFS device nodes. 2004-09-28 08:50:01 +00:00
Poul-Henning Kamp
52c55a26b1 Remove NFS4 vop method vector for devices: we are desupporing device nodes
on anything but DEVFS and in this case it was not even used (see below).

Put the NFS4 vop method for fifo's behind "#if 0" because it is unused.
Add a XXX comment to say that I think the unusedness is a bug.
2004-09-27 20:02:50 +00:00
Poul-Henning Kamp
9f2b7bc4a8 style consistency. 2004-09-27 19:44:39 +00:00
Poul-Henning Kamp
08dbd671ff Remove unused B_WRITEINPROG flag 2004-09-15 21:49:22 +00:00
Poul-Henning Kamp
35f134080f Explicitly pass vnode to nfs_doio() and mountpoint to nfs_asyncio(). 2004-09-07 08:56:43 +00:00
Robert Watson
640c9dcf69 In nfs_timer(), pass curthread rather than &thread0 into the protocol
send routine.  In IPv6 UDP, the thread will be passed to suser(), which
asserts that if a thread is used for a super user check, it be
curthread.  Many of these protocol entry points probably need to
accept credentials instead of threads.

MT5 candidate.

Noticed/tested by:	kuriyama
2004-08-25 01:23:38 +00:00
Poul-Henning Kamp
5e8c582ac2 Put a version element in the VFS filesystem configuration structure
and refuse initializing filesystems with a wrong version.  This will
aid maintenance activites on the 5-stable branch.

s/vfs_mount/vfs_omount/

s/vfs_nmount/vfs_mount/

Name our filesystems mount function consistently.

Eliminate the namiedata argument to both vfs_mount and vfs_omount.
It was originally there to save stack space.  A few places abused
it to get hold of some credentials to pass around.  Effectively
it is unused.

Reorganize the root filesystem selection code.
2004-07-30 22:08:52 +00:00
Poul-Henning Kamp
0658bb8ef8 Move a relic to its correct location(s): Put nfs diskless initialization
calls with the code they call.  (Yet another example of mindless copy&paste).
2004-07-28 21:54:57 +00:00
Poul-Henning Kamp
d634f69316 Remove global variable rootdevs and rootvp, they are unused as such.
Add local rootvp variables as needed.

Remove checks for miniroot's in the swappartition.  We never did that
and most of the filesystems could never be used for that, but it had
still been copy&pasted all over the place.
2004-07-28 20:21:04 +00:00
Poul-Henning Kamp
cf95b5c381 Eliminate unused second argument to reassignbuf() and simplify it
accordingly.
2004-07-25 21:24:23 +00:00
Alfred Perlstein
8f0a7125a1 Turn off SO_REUSEADDR and SO_REUSEPORT, they were causing EADDRINUSE
to be returned from the protocol stack.

Pointy hat to me for not groking what those options _really_ mean.
2004-07-13 05:42:59 +00:00
David Malone
dcee93dcf9 Rename Alfred's kern_setsockopt to so_setsockopt, as this seems a
a better name. I have a kern_[sg]etsockopt which I plan to commit
shortly, but the arguments to these function will be quite different
from so_setsockopt.

Approved by:	alfred
2004-07-12 21:42:33 +00:00
Alfred Perlstein
f257b7a54b Make VFS_ROOT() and vflush() take a thread argument.
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.
2004-07-12 08:14:09 +00:00
Alfred Perlstein
d58d3648dd Use SO_REUSEADDR and SO_REUSEPORT when reconnecting NFS mounts.
Tune the timeout from 5 seconds to 12 seconds.
Provide a sysctl to show how many reconnects the NFS client has done.

Seems to fix IPv6 from: kuriyama
2004-07-12 06:22:42 +00:00
Brian Somers
0ac4013324 Change the following environment variables to kernel options:
bootp -> BOOTP
    bootp.nfsroot -> BOOTP_NFSROOT
    bootp.nfsv3 -> BOOTP_NFSV3
    bootp.compat -> BOOTP_COMPAT
    bootp.wired_to -> BOOTP_WIRED_TO

- i.e. back out the previous commit.  It's already possible to
pxeboot(8) with a GENERIC kernel.

Pointed out by: dwmalone
2004-07-08 22:35:36 +00:00
Brian Somers
59e1ebc9b5 Change the following kernel options to environment variables:
BOOTP -> bootp
    BOOTP_NFSROOT -> bootp.nfsroot
    BOOTP_NFSV3 -> bootp.nfsv3
    BOOTP_COMPAT -> bootp.compat
    BOOTP_WIRED_TO -> bootp.wired_to

This lets you PXE boot with a GENERIC kernel by putting this sort of thing
in loader.conf:

    bootp="YES"
    bootp.nfsroot="YES"
    bootp.nfsv3="YES"
    bootp.wired_to="bge1"

or even setting the variables manually from the OK prompt.
2004-07-08 13:40:33 +00:00
Robert Watson
cf813ab244 Acquire socket lock in nfs_connect() connection/sleep loop to protect
socket state and avoid missed wakeups.
2004-07-06 16:55:41 +00:00
Alfred Perlstein
14af415c34 use vfs_suser() to restrict access to the nfs mount's timeout. 2004-07-06 09:40:44 +00:00
Alfred Perlstein
67ac03405a NFS mobility Phase VI:
Export NFS mount state via sysctl.
Export timeout via sysctl.
2004-07-06 09:23:17 +00:00
Alfred Perlstein
c713aaaeca NFS mobility PHASE I, II & III (phase VI, and V pending):
Rebind the client socket when we experience a timeout.  This fixes
the case where our IP changes for some reason.

Signal a VFS event when NFS transitions from up to down and vice
versa.

Add a placeholder vfs_sysctl where we will put status reporting
shortly.

Also:
Make down NFS mounts return EIO instead of EINTR when there is a
soft timeout or force unmount in progress.
2004-07-06 09:12:03 +00:00
Poul-Henning Kamp
e3c5a7a4dd When we traverse the vnodes on a mountpoint we need to look out for
our cached 'next vnode' being removed from this mountpoint.  If we
find that it was recycled, we restart our traversal from the start
of the list.

Code to do that is in all local disk filesystems (and a few other
places) and looks roughly like this:

		MNT_ILOCK(mp);
	loop:
		for (vp = TAILQ_FIRST(&mp...);
		    (vp = nvp) != NULL;
		    nvp = TAILQ_NEXT(vp,...)) {
			if (vp->v_mount != mp)
				goto loop;
			MNT_IUNLOCK(mp);
			...
			MNT_ILOCK(mp);
		}
		MNT_IUNLOCK(mp);

The code which takes vnodes off a mountpoint looks like this:

	MNT_ILOCK(vp->v_mount);
	...
	TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes);
	...
	MNT_IUNLOCK(vp->v_mount);
	...
	vp->v_mount = something;

(Take a moment and try to spot the locking error before you read on.)

On a SMP system, one CPU could have removed nvp from our mountlist
but not yet gotten to assign a new value to vp->v_mount while another
CPU simultaneously get to the top of the traversal loop where it
finds that (vp->v_mount != mp) is not true despite the fact that
the vnode has indeed been removed from our mountpoint.

Fix:

Introduce the macro MNT_VNODE_FOREACH() to traverse the list of
vnodes on a mountpoint while taking into account that vnodes may
be removed from the list as we go.  This saves approx 65 lines of
duplicated code.

Split the insmntque() which potentially moves a vnode from one mount
point to another into delmntque() and insmntque() which does just
what the names say.

Fix delmntque() to set vp->v_mount to NULL while holding the
mountpoint lock.
2004-07-04 08:52:35 +00:00
Robert Watson
7322ba7d8b When updating sb_flags, acquire the socket buffer lock to prevent
races.
2004-06-24 03:12:13 +00:00
Poul-Henning Kamp
f3732fd15b Second half of the dev_t cleanup.
The big lines are:
	NODEV -> NULL
	NOUDEV -> NODEV
	udev_t -> dev_t
	udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.
2004-06-17 17:16:53 +00:00
Robert Watson
8950572050 Remove bad cookie vp kernel printf; while it does notify about an
interesting event, there's little or nothing the user can do about
it.
2004-06-17 00:15:37 +00:00
Robert Watson
9ef2900f9d Convert GIANT_REQUIRED to NET_ASSERT_GIANT where Giant is used to
protect socket operations.  Leave one "as-is" as it also frobs
rootvp.
2004-06-16 03:12:50 +00:00
Alan Cox
5a32489377 Make vm_page's PG_ZERO flag immutable between the time of the page's
allocation and deallocation.  This flag's principal use is shortly after
allocation.  For such cases, clearing the flag is pointless.  The only
unusual use of PG_ZERO is in vfs_bio_clrbuf().  However, allocbuf() never
requests a prezeroed page.  So, vfs_bio_clrbuf() never sees a prezeroed
page.

Reviewed by:	tegge@
2004-05-06 05:03:23 +00:00
Peter Edwards
7c8ca9400e Let the NFS client notice a file's size changing as a modification.
This avoids presenting invalid data to the client's applications
when the file is modified, and then extended within the window of
the resolution of the modifcation timestamp.

Reviewed By:	iedowse
PR:		kern/64091
2004-04-14 23:23:55 +00:00
Marcel Moolenaar
5fa064a8d8 Unbreak build: s/TAILQ_ISEMPTY/TAILQ_EMPTY/g 2004-04-11 17:15:36 +00:00
Peter Edwards
1630ff08a3 Clean up properly when unloading NFS client module.
This includes a modified form of some code from Thomas Moestl (tmm@)
to properly clean up the UMA zone and the "nfsnodehashtbl" hash
table.

Reviewed By:	iedowse
PR:		16299
2004-04-11 13:30:20 +00:00
Warner Losh
2fcbca0d85 Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson
2004-04-07 05:00:01 +00:00
Robert Watson
ecd189d420 Spell 2 as SHUT_RDWR when used as an argument to soshutdown(). 2004-04-04 19:24:08 +00:00
Peter Edwards
5182703cc8 Flush cached access mode after modifying a files attributes for
NFSv3. It's likely that modifying the attributes will affect the
file's accessibility. This version of the patch is one suggested
by Ian Dowse after reviewing my original attempt in the PR

Reviewed By: iedowse
PR: kern/44336
MFC after: 3 days
2004-04-03 17:23:46 +00:00
Alexander Kabaev
186c0bc04b Reset callout if in nfs_timeout and rpcclnt_timeout functions. Timer
are supposed to continue firing as long as there is work to do, not
stop after the first invocation.

This is damage control after a patch that has been committed prematurely.

Tested by:	kris
2004-03-28 05:55:27 +00:00
Jim Rees
f9955a5f53 only do nfs rpc callouts if there is work to do.
Submitted by:	kan
Approved by:	alfred
2004-03-25 21:48:09 +00:00
Pawel Jakub Dawidek
512af1646c Add a comment with an explanation why we don't report EPIPE errors on
nfs sockets.

Requested by:	ru
2004-03-17 21:10:20 +00:00
Pawel Jakub Dawidek
41eda4e2b1 Don't report EPIPE errors on nfs sockets. These can be due to idle tcp
mounts which will be closed by netapp, solaris, etc. if left idle too long.

Obtained from:	NetBSD
2004-03-17 18:10:38 +00:00
Peter Wemm
6df0617286 Calculate NFS timeouts in units of 10ms, not 5ms. This matches the default
clock precision on i386.  This is a NOP change on i386.  But this stops
the mount_nfs units from suddenly changing to units of 1/20 of a second
(vs the normal 1/10 of a second) if HZ is increased.
2004-03-14 06:21:56 +00:00
Brooks Davis
41b7cd3729 Allow kernel with the BOOTP option to boot when DHCP/BOOTP sets the root
path to an absolute path without a host name.  Previously, there was a
nasty POLA violation where a system would PXE boot until you added the
BOOTP option and then it would panic instead.

Reviewed by:	tegge, Dirk-Willem van Gulik <dirkx at webweaving.org>
		(a previous version)
Submitted by:	tegge (getip function)
2004-03-12 20:37:40 +00:00
Poul-Henning Kamp
4d453ef101 Properly vector all bwrite() and BUF_WRITE() calls through the same path
and s/BUF_WRITE()/bwrite()/ since it now does the same as bwrite().
2004-03-11 18:02:36 +00:00
Poul-Henning Kamp
651b11eaf2 Remove unused second arg to vfinddev().
Don't call addaliasu() on VBLK nodes.
2004-03-11 16:33:11 +00:00
Robert Watson
746e5bf09b Rename dup_sockaddr() to sodupsockaddr() for consistency with other
functions in kern_socket.c.

Rename the "canwait" field to "mflags" and pass M_WAITOK and M_NOWAIT
in from the caller context rather than "1" or "0".

Correct mflags pass into mac_init_socket() from previous commit to not
include M_ZERO.

Submitted by:	sam
2004-03-01 03:14:23 +00:00
Jim Rees
73c02c410e NFSv4 fixes from Connectathon 2004:
remove unused pid field of file context struct
map nfs4 error codes to errnos
eliminate redundant code from nfs4_request
use zero stateid on setattr that doesn't set file size
use same clientid on all mounts until reboot
invalidate dirty bufs in nfs4_close, to play it safe
open file for writing if truncating and it's not already open

Approved by:	alfred
2004-02-27 19:37:43 +00:00
Colin Percival
11233aabf8 If mountnfs returns an error, it will have already freed nam; no need to
free it again.

Reported by:	"Ted Unangst" <tedu@coverity.com>
Approved by:	rwatson (mentor)
2004-02-22 01:17:47 +00:00
John Baldwin
91d5354a2c Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count.  The plimit
  structure is treated similarly to struct ucred in that is is always copy
  on write, so having a reference to a structure is sufficient to read from
  it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
  limits from a process to keep the limit structure from changing out from
  under you while reading from it.
- Various global limits that are ints are not protected by a lock since
  int writes are atomic on all the archs we support and thus a lock
  wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
  behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
  either an rlimit, or the current or max individual limit of the specified
  resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
  other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
  (it didn't used the stackgap when it should have) but uses lim_rlimit()
  and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
  but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits.  It
  also no longer uses the stackgap for accessing sysctl's for the
  ibcs2_sysconf() syscall but uses kernel_sysctl() instead.  As a result,
  ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by:	mtm (mostly, I only did a few cleanups and catchups)
Tested on:	i386
Compiled on:	alpha, amd64
2004-02-04 21:52:57 +00:00
David E. O'Brien
f191a0bcf6 Bump the NFCv3/TCP defaults for rsize and wsize from 8K to 32K to match
Solaris and HP-UX.  This increases read performance for large files across NFS.

PR:		62024 & 26324
Submitted by:	Bjoern Groenvall <bg@sics.se>
2004-01-31 10:40:15 +00:00
Alfred Perlstein
90abe7f294 Use function pointers to remove the depenancy cross dependancy on nfs4
and the nfs3 client.  Also fix some bugs that happen to be causing crashes
in both v3 and v4 introduced by the v4 import.

Submitted by: Jim Rees <rees@umich.edu>
Approved by: re
2003-11-22 02:21:49 +00:00
Alfred Perlstein
42233ecdc1 Move the declaration for "struct nfs4_fctx" out from under #ifdef KERNEL
for fstat(1).
2003-11-15 05:03:15 +00:00
Alfred Perlstein
5ed904e503 unbreak LINT. 2003-11-15 00:26:42 +00:00
Alfred Perlstein
1bf8720450 University of Michigan's Citi NFSv4 kernel client code.
Submitted by: Jim Rees <rees@umich.edu>
2003-11-14 20:54:10 +00:00
Alexander Kabaev
5c957adbf1 1. Consolidate mount struct allocation/destruction into a common code in
vfs_mount_alloc/vfs_mount_destroy functions and take care to completely
destroy the mount point along with its locks. Mount struct has grown in
coplexity recently and depending on each failure path to destroy it
completely isn't working anymore.

2. Eliminate largely identical vfs_mount and vfs_unmount question by
moving the code to handle both cases into a newly introduced vfs_domount
function.

3. Simplify nfs_mount_diskless to always expect an allocated mount
struct and never attempt an allocation/destruction itself. The
vfs_allocroot allocation was there to support 'magic' swap space
configuration for diskless clients that was already removed by PHK some
time ago.

4. Include a vfs_buildopts cleanups by Peter Edwards to validate the
sanity of nmount parameters passed from userland.

Submitted by:  (4) Peter Edwards <peter.edwards@openet-telecom.com>
Reviewed by:    rwatson
2003-11-12 02:54:47 +00:00
Alfred Perlstein
b46b1a899f Stop using shared locks for nfs vop locks.
The reason this was done was to avoid a race to the root when an
NFS server went down.  However a semi-recent change to the way that
the kernel's lookup() routine traverses mount points prevents this.

Rev 1.39 of vfs_lookup.c changed the ordering of locks such that we
aquire a shared lock on the mount point being accessed and then drop
the directory vnode lock before requesting the target lock.

With that in place we no longer need shared locks for NFS to prevent
race to the root lockups.
2003-11-11 00:32:46 +00:00
Sam Leffler
a96756932a Assert GIANT_REQUIRED where sockets are manipulated. This is
preparatory for MPSAFE network commits and ongoing socket
locking work.

Supported by:	FreeBSD Foundation
2003-11-07 22:57:09 +00:00
Alexander Kabaev
ca430f2e92 Remove mntvnode_mtx and replace it with per-mountpoint mutex.
Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to
operate on this mutex transparently.

Eventually new mutex will be protecting more fields in
struct mount, not only vnode list.

Discussed with: jeff
2003-11-05 04:30:08 +00:00
Alexander Kabaev
cb9ddc80ae Take care not to call vput if thread used in corresponding vget
wasn't curthread, i.e. when we receive a thread pointer to use
as a function argument. Use VOP_UNLOCK/vrele in these cases.

The only case there td != curthread known at the moment is
boot() calling sync with thread0 pointer.

This fixes the panic on shutdown people have reported.
2003-11-02 04:52:53 +00:00
Brooks Davis
9bf40ede4a Replace the if_name and if_unit members of struct ifnet with new members
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.

This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.

Approved By:	re (in principle)
Reviewed By:	njl, imp
Tested On:	i386, amd64, sparc64
Obtained From:	NetBSD (if_xname)
2003-10-31 18:32:15 +00:00
Poul-Henning Kamp
2c18019f14 DuH!
bp->b_iooffset (the spot on the disk), not bp->b_offset (the offset in
the file)
2003-10-18 14:10:28 +00:00
Poul-Henning Kamp
901b6327b1 Initialize bp->b_offset before calling VOP_STRATEGY().
Remove KASSERTS and panics with B_PHYS checks which no longer apply.
2003-10-18 11:14:29 +00:00
Poul-Henning Kamp
e6ad40bb11 We do not get B_PHYS buffers here anymore. /dev/drum is long gone. 2003-10-18 09:33:13 +00:00
Ian Dowse
d94f8c66b8 Since the addition of the VI_DOINGINACT flag some time ago,
VOP_INACTIVE routines need not worry about their vnode getting
recycled if they block. Remove the code from nfs_inactive() that
used vget() to get an extra vnode reference that was held during
the nfs_vinvalbuf() call.
2003-10-05 12:41:35 +00:00
Jeff Roberson
1b01fc6ef7 - Remove an incorrect XXX comment. This code does respect the XLOCK since
it uses vget() which will fail if the identity changes.
2003-10-05 06:47:56 +00:00
Jeff Roberson
b21ad23fff - Check the XLOCK before we inspect the vnode. 2003-10-05 06:46:45 +00:00
Jeff Roberson
4b1a52f639 - We don't need to cache_purge() in nfs_reclaim(), vclean() does it for us. 2003-10-05 06:46:02 +00:00
Jeff Roberson
b2b64a90d2 - Consistently set sopt_dir.
Pointed out by:		pete@isilon.com
2003-10-04 17:41:59 +00:00
Jeff Roberson
bb33b5fabf - Acquire the vnode interlock prior to dropping the mntvnode_mtx.
- Make a note of the lack of XLOCK protection in this code.  We would access
   a vnode while it is changing identities without Giant.
2003-10-04 13:44:51 +00:00
Jeff Roberson
8b5905a47d - Remove the backtrace() call from the *_vinvalbuf() functions. Thanks to a
stack trace supplied by phk, I now understand what's going on here.  The
   check for VI_XLOCK stops us from calling vinvalbuf once the vnode has been
   partially torn down in vclean().  It is not clear that this would cause
   a problem.  Document this in nfs_bio.c, which is where the other two
   filesystems copied this code from.
2003-10-04 08:51:50 +00:00
Jeff Roberson
ce1fb23146 - Remove interlock protection around VI_XLOCK. The interlock is not
sufficient to guarantee that this race is not hit.  The XLOCK will likely
   have to be redesigned due to the way reference counting and mutexes work
   in FreeBSD.  We currently can not be guaranteed that xlock was not set
   and cleared while we were blocked on the interlock while waiting to check
   for XLOCK.  This would lead us to reference a vnode which was not the
   vnode we requested.
 - Add a backtrace() call inside of INVARIANTS in the hopes of finding out if
   this condition is ever hit.  It should not, since we should be retaining
   a reference to the vnode in these cases.  The reference would be sufficient
   to block recycling.
2003-09-19 23:37:49 +00:00
Poul-Henning Kamp
2a3d23acc6 Name the vnode method vectors consistently with the rest of the filesystems.
This improves the output of src/tools/tools/vop_table
2003-09-12 16:44:40 +00:00
Poul-Henning Kamp
0ace036ce5 Remove now unused BOOTP tags related to NFS swap device. 2003-09-05 11:12:55 +00:00
Diomidis Spinellis
cf669e5456 KNF: parentheses around return values.
Suggested by:	bde
Approved by:	schweikh (mentor - blanket)
MFC after:	6 weeks
2003-09-04 11:27:13 +00:00
Diomidis Spinellis
5a5f2134b8 Fix errno return values to better represent failure reasons for
read and open.

Approved by:	schweikh (mentor)
Agreed:		bde
MFC after:	6 weeks
2003-09-02 16:46:31 +00:00
Poul-Henning Kamp
0bddf4c8e9 Remove the magic way of configuring NFS backed swap.
This code dates back to the very first diskless support on FreeBSD,
back when swapon(8) couldn't simply be run on a NFS backed file.

Suggested replacement command sequence on the client:

        dd if=/dev/zero of=/swapfile bs=1k count=1 oseek=100000
        swapon /swapfile
        rm -f /swapfile

For whatever value of 100000 you want.
2003-08-15 12:04:02 +00:00
Bill Fumerola
2766bd022f 0) preallocate per-interface context structures without the ifnet lock held
1) avoid immediately calling bzero() after malloc() by passing M_ZERO
2) do not initialize individual members of the global context to zero
3) remove an unused assignment of ifctx in bootpc_init()

Reviewed by:	tegge
2003-08-07 21:27:17 +00:00
Tim J. Robbins
aae962d56e Fix a problem that occurs when truncating files on NFSv3 mounts: we need
to set np->n_size back to the desired size again after calling
nfs_meta_setsize(), since it could end up in nfs_loadattrcache() getting
called, which would change n_size back to the value it had before the
truncate request was issued. The result of this bug is that the size info
cached in the nfsnode becomes incorrect, lseek(fd, ofs, SEEK_END) seeks
past the end of the file, stat() returns the wrong size, etc.

PR:		41792
MFC after:	2 weeks
2003-07-29 00:17:29 +00:00
Poul-Henning Kamp
7c89f162bc Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout. 2003-07-27 17:04:56 +00:00
Poul-Henning Kamp
e82b33f337 Change idle sleep indentifier to "-" for nfsiod 2003-07-02 08:09:20 +00:00
Alan Cox
aec774abec Lock the vm object when freeing a page. 2003-06-17 05:17:00 +00:00
Poul-Henning Kamp
cefb5754dd Add the same KASSERT to all VOP_STRATEGY and VOP_SPECSTRATEGY implementations
to check that the buffer points to the correct vnode.
2003-06-15 18:53:00 +00:00
Poul-Henning Kamp
7652131bee Initialize struct vfsops C99-sparsely.
Submitted by:   hmp
Reviewed by:	phk
2003-06-12 20:48:38 +00:00
Ian Dowse
df12c16630 When removing a sillyrename file, make sure that the directory vnode
has not been cleaned in the meantime, since this can happen during
a forced unmount. Also add a comment that nfs_removeit() should
really be locking the directory vnode before calling nfs_removerpc().

Reported by:	mbr
Tested by:	mbr
MFC after:	1 week
2003-06-12 15:41:20 +00:00
David E. O'Brien
ab0de15baf Use __FBSDID(). 2003-06-11 05:37:42 +00:00
Robert Watson
13b7350a5b Add the comment I meant to add about not passing in PCATCH to the
tsleep().  Note the XXX.
2003-06-11 03:32:42 +00:00
Jeffrey Hsu
807c988d7a On a socket creation error, don't close the socket. 2003-06-09 03:44:34 +00:00
Poul-Henning Kamp
bf975fe45b Remove unsed variables.
Add explicit breaks to switch

Found by:       FlexeLint
2003-05-31 20:05:25 +00:00
Poul-Henning Kamp
17a1391990 The IO_NOWDRAIN and B_NOWDRAIN hacks are no longer needed to prevent
deadlocks with vnode backed md(4) devices because md now uses a
kthread to run the bio requests instead of doing it directly from
the bio down path.
2003-05-31 16:42:45 +00:00
Robert Watson
6d7f268ad1 rpc.lockd stability workaround: remove PCATCH from the tsleep() in
nfs_lock.c.  Right now, if we permit a signal to interrupt the sleep,
we will slip the lock and no process on that client, the server, or
any other client will be able to acquire the lock.  This can happen,
for example, if a user hits Ctrl-C or Ctrl-T while a process is
waiting for the lock.  By removing PCATCH, we prevent that from
happening, at the cost of not permitting a user-requested lock abort:
also nasty.  However, a user interface bug might be preferable to a
serious semantic bug, so we go with that for now.

We need to teach the rpc.lockd/kernel protocol how to abort lock
requests, and rpc.lockd how to handle aborted lock requests; patches
for the kernel bit are floating around, but no rpc.lockd bit yet.

Approved by:	re (scottl)
2003-05-30 17:15:56 +00:00
Peter Wemm
62d8fb93d0 Deal with the possibility of negative available space from the file server
to avoid Bad Things(TM) happening (eg: df crashing with a floating point
exception).

Submitted by:	Harold Gutch <logix@foobar.franken.de>
Approved by:	re (scottl)
2003-05-19 22:35:00 +00:00
Robert Watson
7042ac8cd7 This change grabs the vnode lock for NFS client vnodes when calling
VOP_SETATTR() or VOP_GETATTR(); without these locks (a) VFS_DEBUG_LOCKS
will panic, and (b) it may be possible to corrupt entries in the cached
vnode attributes in the nfsnode, since nfsnode attribute cache data is
also protected by the vnode lock.

Approved by:	re (jhb)
Pointed out by:	VFS_DEBUG_LOCKS
2003-05-15 21:12:08 +00:00
John Baldwin
90af4afacb - Merge struct procsig with struct sigacts.
- Move struct sigacts out of the u-area and malloc() it using the
  M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
  sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
  that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
  and thread_stopped() are now MP safe.

Reviewed by:	arch@
Approved by:	re (rwatson)
2003-05-13 20:36:02 +00:00
Dag-Erling Smørgrav
87ccef7b77 Instead of recording the Unix time in a process when it starts, record the
uptime.  Where necessary, convert it back to Unix time by adding boottime
to it.  This fixes a potential problem in the accounting code, which would
compute the elapsed time incorrectly if the Unix time was stepped during
the lifetime of the process.
2003-05-01 16:59:23 +00:00
Alexander Kabaev
104a9b7e3e Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on:	standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
2003-04-29 13:36:06 +00:00
Don Lewis
78b0aaefb5 VOP_FSYNC() expects to be called with the vnode locked, so lock fvp in
nfs_rename() before calling VOP_FSYNC() and unlock fvp immediately after.

Reviewed by:	bde
2003-04-24 20:39:40 +00:00
Peter Wemm
d76f16de39 Fix a bug with df on large (>1TB) nfsv3 file servers on 32 bit client
machines where the 'long' number of blocks in struct statfs wont fit.
Instead of chosing an artificial 512 byte block size, simply scale it up
until we avoid an overflow.  NFSv3 reports the sizes in bytes, and the
blocksize is a figment of nfsclient's imagination.
2003-04-24 20:36:32 +00:00
Don Lewis
8b3182e212 Release the vnode interlock in nfs_flush() before calling nfs_sigintr(),
and grab it again later if necessary.  This prevents a lock order reversal
because nfs_sigintr() calls PROC_LOCK().
2003-04-23 02:58:26 +00:00
Thomas Quinot
da4898b1d7 Revert change 1.201 (removing mapping of VAPPEND to VWRITE).
Instead, use the generic vaccess() operation to determine whether
an operation is permitted. This avoids embedding knowledge on
vnode permission bits such as VAPPEND in the NFS client.

PR:		kern/46515
vaccess() patch submitted by:	"Peter Edwards" <pmedwards@eircom.net>
Approved by:	tjr, roberto (mentor)
2003-03-31 23:26:10 +00:00
Jeff Roberson
4093529dee - Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with
a follow on commit to kern_sig.c
 - signotify() now operates on a thread since unmasked pending signals are
   stored in the thread.
 - PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.
2003-03-31 22:49:17 +00:00
Robert Watson
847b14bd3c Add O_NONBLOCK to the vn_open_cred() flags for NFS client locking when
opening the POSIX fifo; convert ENXIO error returns to EOPNOTSUPP.

This improves handling of the case where the /var/run/lock fifo exists
but there is no listener: we immediately return EOPNOTSUPP rather
than blocking until a listener turns up.  This could occur during a
diskless boot before rpc.lockd is loaded, or if the lock file persists
across a reboot following the disabling of rpc.lockd.  This may have
suddenly started to occur due to fifo blocking fixes--previously it
looks like attempts to read on a fifo with no listener would time out
due to insufficient resources.

Reviewed by:	alfred
2003-03-26 19:21:34 +00:00
Alfred Perlstein
cbee8fbe2e req can not be NULL or we'd die.
Sponsored by: RED
2003-03-26 01:46:11 +00:00
Tim J. Robbins
9ba703c024 Map VAPPEND to VWRITE in nfsspec_access() - VAPPEND is never set in the
mode returned by VOP_GETATTR. This fixes incorrect "Permission denied"
errors when trying to append to a file on an NFSv2 mount.
2003-03-21 05:13:23 +00:00
Jeff Roberson
8501ead911 - Add a forgotten BUF_LOCK()
Most sincere apologies to:	jake
2003-03-14 05:13:19 +00:00
Jeff Roberson
619bddc702 - Lock the buf before inspecting its contents. 2003-03-13 07:04:11 +00:00
Jeff Roberson
7261f5f68e - Add a new 'flags' parameter to getblk().
- Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT
   flag to the initial BUF_LOCK().  This will eventually be used in cases
   were we want to use a buffer only if it is not currently in use.
 - Convert all consumers of the getblk() api to use this extra parameter.

Reviwed by:	arch
Not objected to by:	mckusick
2003-03-04 00:04:44 +00:00
Nate Lawson
99648386d3 Finish cleanup of vprint() which was begun with changing v_tag to a string.
Remove extraneous uses of vop_null, instead defering to the default op.
Rename vnode type "vfs" to the more descriptive "syncer".
Fix formatting for various filesystems that use vop_print.
2003-03-03 19:15:40 +00:00
Dag-Erling Smørgrav
521f364b80 More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9). 2003-03-02 16:54:40 +00:00
Jeff Roberson
48d4ffc119 - The interlock was not being droped in nfs_flush() if the first part of
an if clause was true.  Break the two clauses out into seperate statements
   since they require different actions.

Reported/Tested by:	jake
Spotted by:	jhb
2003-02-26 00:24:19 +00:00
Jeff Roberson
869d735043 - Properly handle the vnode interlock in nfs_fsync.
Reported by:	phk
2003-02-25 08:50:21 +00:00
Jeff Roberson
17661e5ac4 - Add an interlock argument to BUF_LOCK and BUF_TIMELOCK.
- Remove the buftimelock mutex and acquire the buf's interlock to protect
   these fields instead.
 - Hold the vnode interlock while locking bufs on the clean/dirty queues.
   This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another
   BUF_LOCK with a LK_TIMEFAIL to a single lock.

Reviewed by:	arch, mckusick
2003-02-25 03:37:48 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Peter Wemm
169ade77af Get rid of a silly message I added back in Sept 2001 (1.68). 2003-02-18 23:45:01 +00:00
Tim J. Robbins
1bdd8ae409 Lock proc while accessing p_siglist, p_sigmask and p_sigignore
in nfs_sigintr().
2003-02-15 08:25:57 +00:00
Matthew Dillon
146cc84e0d Provide a sysctl to allow defaulting of the connectionless (-c) feature
to mount_nfs.  The sysctl defaults to 1 (paranoid mode).  Setting it to 0
will allow an NFS client to receive replies on a different IP then they
were sent to by default.

Submitted by:	Sean Eric Fagan <sef@kithrup.com>
2003-01-22 19:57:31 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Poul-Henning Kamp
c6e3ae999b Since Jeffr made the std* functions the default in rev 1.63 of
kern/vfs_defaults.c it is wrong for the individual filesystems to use
the std* functions as that prevents override of the default.

Found by:       src/tools/tools/vop_table
2003-01-04 08:47:19 +00:00
Poul-Henning Kamp
862702306b Convert calls to BUF_STRATEGY to VOP_STRATEGY calls. This is a no-op since
all BUF_STRATEGY did in the first place was call VOP_STRATEGY.
2003-01-03 06:32:15 +00:00
Jens Schweikhardt
d64ada501a Fix typos, mostly s/ an / a / where appropriate and a few s/an/and/
Add FreeBSD Id tag where missing.
2002-12-30 21:18:15 +00:00
Matthew Dillon
3a3d82ec0a Abstract-out the constants for the sequential heuristic.
No operational changes.

MFC after:	1 day
2002-12-28 20:37:50 +00:00
Jeffrey Hsu
956b0b653c SMP locking for radix nodes. 2002-12-24 03:03:39 +00:00
Alan Cox
903caa598d Avoid holding the vnode interlock around malloc() or free() to prevent a
lock order reversal.

Reviewed by:	jeff
2002-12-23 06:20:41 +00:00
Jeffrey Hsu
b30a244c34 SMP locking for ifnet list. 2002-12-22 05:35:03 +00:00
Matthew Dillon
a19f30d8c9 do not try to free a mountpoint that we did not allocate.
X-MFC after:	immediately
2002-12-21 20:55:34 +00:00
Alfred Perlstein
818407fe06 reapply 1.26 through 1.28.
Approved by: re
2002-11-20 15:21:06 +00:00
Alfred Perlstein
7affe44ee3 forgot about 5.x freeze, backout 1.26 through 1.28 pending re@ appoval. 2002-11-20 10:53:06 +00:00
Alfred Perlstein
0b2724b10f remove useless casts, unused macros and cleanup a line wrap. 2002-11-20 10:13:04 +00:00
Alfred Perlstein
9822015014 comment and untwist error return logic 2002-11-20 10:06:51 +00:00
Alfred Perlstein
32cb464571 Remove an outdated comment complaining about exporting struct ucred
to userspace, I fixed it a while ago.
2002-11-20 10:00:04 +00:00
Poul-Henning Kamp
6999d2ef6d Don't examine an un-initialized variable.
Spotted by:	FlexeLint.
2002-10-20 21:52:05 +00:00
Poul-Henning Kamp
0c183c5a56 Remove extern declarations of stuff which is static in nfs_node.c
Move related macro to nfs_node.c

Spotted by:	FlexeLint
2002-10-20 21:40:55 +00:00
Kirk McKusick
a5b65058d5 Regularize the vop_stdlock'ing protocol across all the filesystems
that use it. Specifically, vop_stdlock uses the lock pointed to by
vp->v_vnlock. By default, getnewvnode sets up vp->v_vnlock to
reference vp->v_lock. Filesystems that wish to use the default
do not need to allocate a lock at the front of their node structure
(as some still did) or do a lockinit. They can simply start using
vn_lock/VOP_UNLOCK. Filesystems that wish to manage their own locks,
but still use the vop_stdlock functions (such as nullfs) can simply
replace vp->v_vnlock with a pointer to the lock that they wish to
have used for the vnode. Such filesystems are responsible for
setting the vp->v_vnlock back to the default in their vop_reclaim
routine (e.g., vp->v_vnlock = &vp->v_lock).

In theory, this set of changes cleans up the existing filesystem
lock interface and should have no function change to the existing
locking scheme.

Sponsored by:	DARPA & NAI Labs.
2002-10-14 03:20:36 +00:00
Mike Barcroft
2b7f24d210 Change iov_base's type from char *' to the standard void *'. All
uses of iov_base which assume its type is `char *' (in order to do
pointer arithmetic) have been updated to cast iov_base to `char *'.
2002-10-11 14:58:34 +00:00
Scott Long
316ec49abd Some kernel threads try to do significant work, and the default KSTACK_PAGES
doesn't give them enough stack to do much before blowing away the pcb.
This adds MI and MD code to allow the allocation of an alternate kstack
who's size can be speficied when calling kthread_create.  Passing the
value 0 prevents the alternate kstack from being created.  Note that the
ia64 MD code is missing for now, and PowerPC was only partially written
due to the pmap.c being incomplete there.
Though this patch does not modify anything to make use of the alternate
kstack, acpi and usb are good candidates.

Reviewed by:	jake, peter, jhb
2002-10-02 07:44:29 +00:00
Juli Mallett
1d9c56964d Back our kernel support for reliable signal queues.
Requested by:	rwatson, phk, and many others
2002-10-01 17:15:53 +00:00
Juli Mallett
f4430f22b8 Lock access to the signal queue, and related structures, with PROC_LOCK.
Submitted by:	jhb
2002-09-30 21:15:33 +00:00
Juli Mallett
70d4d0c0f5 Convert use of p_siglist and old SIG*() macros to use <sys/ksiginfo.h>
prototyped functions to get a sigset_t, and further to check for any
queued signals, rather than an empty signal set, to go with the move
to signal queues rather than signal sets.
2002-09-30 20:48:29 +00:00
Poul-Henning Kamp
37c841831f Be consistent about "static" functions: if the function is marked
static in its prototype, mark it static at the definition too.

Inspired by:    FlexeLint warning #512
2002-09-28 17:15:38 +00:00
Robert Watson
f0fb902771 Remove an errant debugging printf that got left in during my last
commit.

Pointed out by:	guido
2002-09-27 00:25:54 +00:00
Robert Watson
203639c449 Apparently pxeboot passes in a mygateway of non-zero sin length
from DHCP in the event that no gateway is returned from DHCP, breaking
the assumption that we skip the routing insertion of the gateway
if the sin length is zero.  Check also for s_addr of 0 to avoid the
"Oh no, adding my default route failed" panic, making it possible
to pxeboot machines on segments without default routes.  Arguably
this could be a bug in pxeboot, or in the TUNABLE code, but this
makes my boxes boot.
2002-09-26 19:56:43 +00:00
Jeff Roberson
8926aed697 - Lock access to the buf lists.
- Use vrefcnt() where appropriate.
 - Add some locking asserts.
2002-09-25 02:38:43 +00:00
Jake Burkholder
abc370fa85 Moved nfs_diskless setup code from autoconf.c to nfsclient/nfs_diskless.c
so that it is MI.  Allow nfs_mountroot to return an error if the nfs_diskless
struct is not valid, rather than panicing later on.  Call nfs_setup_diskless()
from nfs_mountroot if NFS_ROOT is defined, like bootpc_init().  Removed legacy
root mount support for sparc64, and enabled NFS_ROOT by default.
2002-09-22 00:59:02 +00:00
Poul-Henning Kamp
7ed60de837 Use m_length() instead of home-rolled versions. 2002-09-18 19:44:14 +00:00
Nate Lawson
06be2aaa83 Remove all use of vnode->v_tag, replacing with appropriate substitutes.
v_tag is now const char * and should only be used for debugging.

Additionally:
1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK
2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which
is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP.

Suggested by:   phk
Reviewed by:    bde, rwatson (earlier version)
2002-09-14 09:02:28 +00:00
Poul-Henning Kamp
7e6fb406ff Now that we have a cached mount credential in struct mount, use it istead
of a private cached copy.
2002-09-08 15:11:18 +00:00
Bruce Evans
6af7f1e511 Use `struct uma_zone *' instead of uma_zone_t, so that <sys/uma.h> isn't
a prerequisite.
2002-09-05 14:04:34 +00:00
Maxim Sobolev
62f7648682 Increase size of ifnet.if_flags from 16 bits (short) to 32 bits (int). To avoid
breaking application ABI use unused ifreq.ifru_flags[1] for upper 16 bits in
SIOCSIFFLAGS and SIOCGIFFLAGS ioctl's.

Reviewed by:	-hackers, -net
2002-08-18 07:05:00 +00:00
Alfred Perlstein
f898f7c5b2 Remove a case of exposing 'struct ucred' to userspace. Use a struct xucred
for LOCKD_MSG instead.

Requested by: rwatson
2002-08-15 21:52:22 +00:00
Robert Watson
9ca435893b In order to better support flexible and extensible access control,
make a series of modifications to the credential arguments relating
to file read and write operations to cliarfy which credential is
used for what:

- Change fo_read() and fo_write() to accept "active_cred" instead of
  "cred", and change the semantics of consumers of fo_read() and
  fo_write() to pass the active credential of the thread requesting
  an operation rather than the cached file cred.  The cached file
  cred is still available in fo_read() and fo_write() consumers
  via fp->f_cred.  These changes largely in sys_generic.c.

For each implementation of fo_read() and fo_write(), update cred
usage to reflect this change and maintain current semantics:

- badfo_readwrite() unchanged
- kqueue_read/write() unchanged
  pipe_read/write() now authorize MAC using active_cred rather
  than td->td_ucred
- soo_read/write() unchanged
- vn_read/write() now authorize MAC using active_cred but
  VOP_READ/WRITE() with fp->f_cred

Modify vn_rdwr() to accept two credential arguments instead of a
single credential: active_cred and file_cred.  Use active_cred
for MAC authorization, and select a credential for use in
VOP_READ/WRITE() based on whether file_cred is NULL or not.  If
file_cred is provided, authorize the VOP using that cred,
otherwise the active credential, matching current semantics.

Modify current vn_rdwr() consumers to pass a file_cred if used
in the context of a struct file, and to always pass active_cred.
When vn_rdwr() is used without a file_cred, pass NOCRED.

These changes should maintain current semantics for read/write,
but avoid a redundant passing of fp->f_cred, as well as making
it more clear what the origin of each credential is in file
descriptor read/write operations.

Follow-up commits will make similar changes to other file descriptor
operations, and modify the MAC framework to pass both credentials
to MAC policy modules so they can implement either semantic for
revocation.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-15 20:55:08 +00:00
Poul-Henning Kamp
9bf1a75697 Introduce typedefs for the member functions of struct vfsops and employ
these in the main filesystems.  This does not change the resulting code
but makes the source a little bit more grepable.

Sponsored by:	DARPA and NAI Labs.
2002-08-13 10:05:50 +00:00
Robert Watson
c08b677fb5 Pass IO_NOMACCHECK to vn_rdwr() in the following checks to prevent
enforcement of MAC policy on the read or write operations:

- In ext2fs, don't enforce MAC on loop-back reads and writes supporting
  directory read operations in lookup(), directory modifications in
  rename(), directory write operations in mkdir(), symlink write
  operations in symlink().

- In the NFS client locking code, perform vn_rdwr() on the NFS locking
  socket without enforcing MAC, since the write is done on behalf of
  the kernel NFS implementation rather than the user process.

- In UFS, don't enforce MAC on loop-back reads and writes supporting
  directory read operations in lookup(), and symlink write operations
  in symlink().

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-12 16:43:04 +00:00
Jeff Roberson
be12d7a61d - Add a missing VI_UNLOCK to an error case in nfs_flush. 2002-08-05 08:54:29 +00:00
Jeff Roberson
e6e370a7fe - Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
   with VOP calls is needed.
 - v_iflag is protected by interlock and is used for dealing with vnode
   management issues.  These flags include X/O LOCK, FREE, DOOMED, etc.
 - All accesses to v_iflag and v_vflag have either been locked or marked with
   mp_fixme's.
 - Many ASSERT_VOP_LOCKED calls have been added where the locking was not
   clear.
 - Many functions in vfs_subr.c were restructured to provide for stronger
   locking.

Idea stolen from:	BSD/OS
2002-08-04 10:29:36 +00:00
Alan Cox
e0c9fdb50e o Lock page queue accesses in nfs_getpages(). 2002-07-21 20:01:32 +00:00
Matthew Dillon
8e0619c6b0 Fix a bug nfs_write() related to ^C'ing during a file write on an
interruptable mount.  We were returning from inside the loop without
releasing the rslock.

Submitted by:	Mike Junk <junk@isilon.com>
MFC after:	3 days
2002-07-16 19:43:59 +00:00
John Baldwin
14d199ad29 If we get a receive error in nfs_receive() and then get an error trying to
obtain the send lock, we would bogusly try to unlock the send lock before
returning resulting in a panic.  Instead, only unlock the send lock if
nfs_sndlock() succeeds and nfs_reconnect() fails.

MFC after:	3 days
Sponsored by:	The Weather Channel
2002-07-16 15:12:07 +00:00
Alfred Perlstein
09ce4f7aaf Add IPv6 support.
Submitted by: Jean-Luc Richier <Jean-Luc.Richier@imag.fr>
2002-07-15 19:40:23 +00:00
Matthew Dillon
3d8f797ac1 Convert old style (type foo *)0 casts to NULLs
PR:		kern/40360
Requested by:	Hiten PAndya via direct email
2002-07-11 17:54:58 +00:00
Matthew Dillon
d331c5d43f Replace the global buffer hash table with per-vnode splay trees using a
methodology similar to the vm_map_entry splay and the VM splay that Alan
Cox is working on.  Extensive testing has appeared to have shown no
increase in overhead.

Disadvantages
    Dirties more cache lines during lookups.

    Not as fast as a hash table lookup (but still N log N and optimal
    when there is locality of reference).

Advantages
    vnode->v_dirtyblkhd is now perfectly sorted, making fsync/sync/filesystem
    syncer operate more efficiently.

    I get to rip out all the old hacks (some of which were mine) that tried
    to keep the v_dirtyblkhd tailq sorted.

    The per-vnode splay tree should be easier to lock / SMPng pushdown on
    vnodes will be easier.

    This commit along with another that Alan is working on for the VM page
    global hash table will allow me to implement ranged fsync(), optimize
    server-side nfs commit rpcs, and implement partial syncs by the
    filesystem syncer (aka filesystem syncer would detect that someone is
    trying to get the vnode lock, remembers its place, and skip to the
    next vnode).

Note that the buffer cache splay is somewhat more complex then other splays
due to special handling of background bitmap writes (multiple buffers with
the same lblkno in the same vnode), and B_INVAL discontinuities between the
old hash table and the existence of the buffer on the v_cleanblkhd list.

Suggested by: alc
2002-07-10 17:02:32 +00:00
John Baldwin
56e9ce41a5 In namei(), we use a NULL thread for uio_td when doing a VOP_READLINK().
nfs_readlink() calls nfs_bioread() which passes in uio_td as the thread
argument to nfs_getcacheblk().  In nfs_getcacheblk() we dereference the
thread pointer to get a process pointer to pass to nfs_sigintr().  This
obviously results in a panic. :)

Rather than change nfs_getcacheblk() to check if the thread pointer is
NULL when calling nfs_sigintr() like other callers do, change
nfs_sigintr() to take a thread as the last argument instead of a
process so none of the callers have to care if the thread is NULL or not.
2002-06-28 21:53:08 +00:00