Commit Graph

197 Commits

Author SHA1 Message Date
rmacklem
6a6a18bf5c PR# 165923 reported intermittent write failures for dirty
memory mapped pages being written back on an NFS mount.
Since any thread can call VOP_PUTPAGES() to write back a
dirty page, the credentials of that thread may not have
write access to the file on an NFS server. (Often the uid
is 0, which may be mapped to "nobody" in the NFS server.)
Although there is no completely correct fix for this
(NFS servers check access on every write RPC instead of at
open/mmap time), this patch avoids the common cases by
holding onto a credential that recently opened the file
for writing and uses that credential for the write RPCs
being done by VOP_PUTPAGES() for both NFS clients.

Tested by:	Joel Ray Holveck (joelh at juniper.net)
PR:		kern/165923
Reviewed by:	kib
MFC after:	2 weeks
2012-05-12 12:02:51 +00:00
trasz
baac623cd9 Remove unused thread argument from vtruncbuf().
Reviewed by:	kib
2012-04-23 13:21:28 +00:00
rmacklem
2d142012e7 Fix the NFS clients so that they use copyin() instead of bcopy(),
when doing direct I/O. This direct I/O code is not enabled by default.

Submitted by:	kib (earlier version)
Reviewed by:	kib
MFC after:	1 week
2012-03-01 03:53:07 +00:00
kib
80ae8fe82c Fix found places where uio_resid is truncated to int.
Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.

Discussed with:	bde, das (previous versions)
MFC after:	1 month
2012-02-21 01:05:12 +00:00
rmacklem
67ad565252 A problem with respect to data read through the buffer cache for both
NFS clients was reported to freebsd-fs@ under the subject "NFS
corruption in recent HEAD" on Nov. 26, 2011. This problem occurred when
a TCP mounted root fs was changed to using UDP. I believe that this
problem was caused by the change in mnt_stat.f_iosize that occurred
because rsize was decreased to the maximum supported by UDP. This
patch fixes the problem by using v_bufobj.bo_bsize instead of f_iosize,
since the latter is set to f_iosize when the vnode is allocated, but
does not change for a given vnode when f_iosize changes.

Reported by:	pjd
Reviewed by:	kib
MFC after:	2 weeks
2012-01-27 02:46:12 +00:00
kib
d326d5565d Rename vm_page_set_valid() to vm_page_set_valid_range().
The vm_page_set_valid() is the most reasonable name for the m->valid
accessor.

Reviewed by:	attilio, alc
2011-11-30 17:39:00 +00:00
jhb
0a3de76e74 Merge 220876, 220877, and 221537 from the new NFS client to the old:
Allow the NFS client to use a max file size larger than 1TB for v3 mounts.
It now allows files up to OFF_MAX subject to whatever limit the server
advertises.

Reviewed by:	rmacklem
Approved by:	re (kib)
MFC after:	1 week
2011-08-09 15:29:58 +00:00
kib
ad5bd06523 In the VOP_PUTPAGES() implementations, change the default error from
VM_PAGER_AGAIN to VM_PAGER_ERROR for the uwritten pages. Return
VM_PAGER_AGAIN for the partially written page. Always forward at least
one page in the loop of vm_object_page_clean().

VM_PAGER_ERROR causes the page reactivation and does not clear the
page dirty state, so the write is not lost.

The change fixes an infinite loop in vm_object_page_clean() when the
filesystem returns permanent errors for some page writes.

Reported and tested by:	gavin
Reviewed by:	alc, rmacklem
MFC after:	1 week
2011-06-01 21:00:28 +00:00
rmacklem
0e8f2a36e9 Move sys/nfsclient/nfs_kdtrace.h to sys/nfs/nfs_kdtrace.h so
it can be used by the new NFS client as well as the old one.
2011-05-06 20:02:19 +00:00
kib
9aa04c0151 Do not synchronously start the nfsiod threads at all. The r212506
fixed the issues with file descriptor locks, but the same problems are
present for vnode lock/user map lock.

If the nfs_asyncio() cannot find the free nfsiod, schedule task to
create new nfsiod and return error. This causes fall back to the
synchronous i/o for nfs_strategy(), or does not start read at all in
the case of readahead. The caller that holds vnode and potentially
user map lock does not wait for kproc_create() to finish, preventing
the LORs.

The change effectively reverts r203072, because we never hand off the
request to newly created nfsiod thread anymore.

Reviewed by:	jhb
Tested by:	jhb, pluknet
MFC after:	3 weeks
2010-10-18 19:06:46 +00:00
kib
9100139edc In NFS clients, instead of inconsistently using #ifdef
DIAGNOSTIC and #ifndef DIAGNOSTIC for debug assertions, prefer
KASSERT(). Also change one #ifdef DIAGNOSTIC in the new nfs server.

Submitted by:	Mikolaj Golub <to.my.trociny gmail com>
MFC after:	2 weeks
2010-06-13 05:24:27 +00:00
alc
3c8033e013 Push down the page queues lock into vm_page_activate(). 2010-05-07 15:49:43 +00:00
alc
fecc56fac1 Eliminate page queues locking around most calls to vm_page_free(). 2010-05-06 18:58:32 +00:00
alc
5c7ca3ee73 Acquire the page lock around all remaining calls to vm_page_free() on
managed pages that didn't already have that lock held.  (Freeing an
unmanaged page, such as the various pmaps use, doesn't require the page
lock.)

This allows a change in vm_page_remove()'s locking requirements.  It now
expects the page lock to be held instead of the page queues lock.
Consequently, the page queues lock is no longer required at all by callers
to vm_page_rename().

Discussed with: kib
2010-05-05 18:16:06 +00:00
trasz
402e3baade Move checking against RLIMIT_FSIZE into one place, vn_rlimit_fsize().
Reviewed by:	kib
2010-05-05 16:44:25 +00:00
kib
44c384aeff Lock the page around vm_page_activate() and vm_page_deactivate() calls
where it was missed. The wrapped fragments now protect wire_count with
page lock.

Reviewed by:	alc
2010-05-03 20:31:13 +00:00
rmacklem
3c42ac5cd5 Fix a race that can occur when nfs nfsiod threads are being created.
Without this patch it was possible for a different thread that calls
nfs_asyncio() to snitch a newly created nfsiod thread that was
intended for another caller of nfs_asyncio(), because the nfs_iod_mtx
mutex was unlocked while the new nfsiod thread was created. This patch
labels the newly created nfsiod, so that it is not taken by another
caller of nfs_asyncio(). This is believed to fix the problem reported
on the freebsd-stable email list under the subject:
FreeBSD NFS client/Linux NFS server issue.

Tested by:	to DOT my DOT trociny AT gmail DOT com
Reviewed by:	jhb
MFC after:	2 weeks
2010-01-27 15:22:20 +00:00
kib
50c96d8f04 Use PBDRY flag for msleep(9) in NFS and NLM when sleeping thread owns
kernel resources that block other threads, like vnode locks. The SIGSTOP
sent to such thread (process, rather) shall not stop it until thread
releases the resources.

Tested by:	pho
Reviewed by:	jhb
Approved by:	re (kensmith)
2009-07-14 22:54:29 +00:00
dfr
836bf4cc58 Adjust the internal NFS KPI to avoid the last traces of NFS_LEGACYRPC.
Approved by: re
2009-06-30 19:10:17 +00:00
dfr
5d248bb05f Remove the old kernel RPC implementation and the NFS_LEGACYRPC option.
Approved by: re
2009-06-30 19:03:27 +00:00
alc
f0cb2be073 Fix some of the style errors in *getpages(). 2009-06-18 05:56:24 +00:00
rmacklem
3d458aebb2 Add a test for VI_DOOMED just after nfs_upgrade_vnlock() in
nfs_bioread_check_cons(). This is required since it is possible
for the vnode to be vgonel()'d while in nfs_upgrade_vnlock() when
a forced dismount is in progress. Also, move the check for VI_DOOMED
in nfs_vinvalbuf() down to after nfs_upgrade_vnlock() and replace the
out of date comment for it.

Submitted by:	jhb
Tested by:	pho
Approved by:	kib (mentor)
MFC after:	1 month
2009-06-10 21:03:57 +00:00
alc
5e539a33da nfs_write() can use the recently introduced vfs_bio_set_valid() instead of
vfs_bio_set_validclean(), thereby avoiding the page queues lock.

Garbage collect vfs_bio_set_validclean().  Nothing uses it any longer.
2009-05-31 20:18:02 +00:00
alc
87880e35a7 Make *getpages()s' assertion on the state of each page's dirty bits
stricter.
2009-05-28 18:11:09 +00:00
rwatson
ccb17e335a Remove the unmaintained University of Michigan NFSv4 client from 8.x
prior to 8.0-RELEASE.  Rick Macklem's new and more feature-rich NFSv234
client and server are replacing it.

Discussed with:	rmacklem
2009-05-22 12:35:12 +00:00
alc
1af8842f56 Eliminate unnecessary clearing of the page's dirty mask from various
getpages functions.

Eliminate a stale comment.
2009-05-15 04:33:35 +00:00
alc
cb76946a7f Eliminate gratuitous clearing of the page's dirty mask. 2009-05-12 05:49:02 +00:00
alc
63529fb14b Eliminate stale comments.
Eliminate a case of unnecessary page queues locking.
2009-05-10 17:05:43 +00:00
rwatson
f0f3719742 Add DTrace probes to the NFS access and attribute caches. Access cache
events are:

  nfsclient:accesscache:flush:done
  nfsclient:accesscache:get:hit
  nfsclient:accesscache:get:miss
  nfsclient:accesscache:load:done

They pass the vnode, uid, and requested or loaded access mode (if any);
the load event may also report a load error if the RPC fails.

The attribute cache events are:

  nfsclient:attrcache:flush:done
  nfsclient:attrcache:get:hit
  nfsclient:attrcache:get:miss
  nfsclient:attrcache:load:done

They pass the vnode, optionally the vattr if one is present (hit or load),
and in the case of a load event, also a possible RPC error.

MFC after:	1 month
Sponsored by:	Google, Inc.
2009-03-24 17:14:34 +00:00
attilio
b8bf37e585 Remove the struct thread unuseful argument from bufobj interface.
In particular following functions KPI results modified:
- bufobj_invalbuf()
- bufsync()

and BO_SYNC() "virtual method" of the buffer objects set.
Main consumers of bufobj functions are affected by this change too and,
in particular, functions which changed their KPI are:
- vinvalbuf()
- g_vfs_close()

Due to the KPI breakage, __FreeBSD_version will be bumped in a later
commit.

As a side note, please consider just temporary the 'curthread' argument
passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP

Reviewed by:	kib
Tested by:	Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2008-10-10 21:23:50 +00:00
attilio
dbf35e279f Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2008-08-28 15:23:18 +00:00
attilio
4274e0aa54 namei() can call underlying nfs_readlink() passing a struct uio pointer
owned by a NULL owner. This will lead consequent VOP_ISLOCKED() present
into nfs_upgrade_vnlock() to panic as it only acquire curthread now.
Fix nfs_upgrade_vnlock() and nfs_downgrade_vnlock() in order to not use
more the struct thread pointer passed as argument (as it is really nomore
required there as vn_lock() and VOP_UNLOCK doesn't get the lock more).
Using curthread, in place, doesn't get ambiguity as LK_EXCLOTHER should
be handled as a "not locked" request by both functions.

Reported by: kris
Tested by: kris
Reviewed by: ups
2008-02-09 20:13:19 +00:00
mohans
422753af04 Fix for a very rare race, caused by the nfsiod wakeup and nfsiod idle
timeout occurring at exactly the same time. If this happens, the nfsiod
exits although there may be a queued async IO request for it.

Found by : Kris Kennaway
Approved by: re
2007-09-25 21:08:49 +00:00
jhb
788b235895 Fix up NFS client write error handling. Errors are split into
recoverable and unrecoverable. For the former, we redirty the
buffer and hang onto it for future retries. For the latter (eg.
ESTALE), we discard the buffer and return the error back to the
user on the next syscall. This fixes a number of vfs panics and
fixes having a large number of dirty buffers (that cannot be
written out and reclaimed) from hanging around. Thanks to ups@
for discussions on this issue.

Reported by:	kris, Kai, others
Approved by:	re (kensmith)
2007-07-03 18:30:55 +00:00
attilio
9bd4fdf7ce Do proper "locking" for missing vmmeters part.
Now, we assume no more sched_lock protection for some of them and use the
distribuited loads method for vmmeter (distribuited through CPUs).

Reviewed by: alc, bde
Approved by: jeff (mentor)
2007-06-04 21:45:18 +00:00
attilio
7dd8ed88a9 Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)
2007-05-31 22:52:15 +00:00
jeff
e1996cb960 - define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts.  This can be used to abstract away pcpu details but also changes
   to use atomics for all counters now.  This means sched lock is no longer
   responsible for protecting counts in the switch routines.

Contributed by:		Attilio Rao <attilio@FreeBSD.org>
2007-05-18 07:10:50 +00:00
jhb
961de35e83 Various fixes to the NFS Directio support.
- Fix for a bug where a close would not wait for all (directio)
  dirty buffers to drain. The nfsnode was not marked NMODIFIED
  when there were directio dirtied buffers pending, causing this.
- No reason to vhold/vrele the vp when enqueueing DirectIO requests
  for the nfsiods. The vnode can't really go way since the close
  has to wait for these requests to drain.

MFC after:	1 week
Submitted by:	mohans
2007-04-25 20:34:55 +00:00
alc
b98eae58a6 Introduce a field to struct vm_page for storing flags that are
synchronized by the lock on the object containing the page.

Transition PG_WANTED and PG_SWAPINPROG to use the new field,
eliminating the need for holding the page queues lock when setting
or clearing these flags.  Rename PG_WANTED and PG_SWAPINPROG to
VPO_WANTED and VPO_SWAPINPROG, respectively.

Eliminate the assertion that the page queues lock is held in
vm_page_io_finish().

Eliminate the acquisition and release of the page queues lock
around calls to vm_page_io_finish() in kern_sendfile() and
vfs_unbusy_pages().
2006-08-09 17:43:27 +00:00
ups
d193a40233 Call vm_object_page_clean() with the object lock held.
Submitted by:	kensmith@
Reviewed by:	mohans@
MFC after:	6 days
2006-05-25 17:16:11 +00:00
ups
4eb5a7d9ee Do not set B_NOCACHE on buffers when releasing them in flushbuflist().
If B_NOCACHE is set the pages of vm backed buffers will be invalidated.
However clean buffers can be backed by dirty VM pages so invalidating them
can lead to data loss.
Add support for flush dirty page in the data invalidation function
of some network file systems.

This fixes data losses during vnode recycling (and other code paths
using invalbuf(*,V_SAVE,*,*)) for data written using an mmaped file.

Collaborative effort by: jhb@,mohans@,peter@,ps@,ups@
Reviewed by:	tegge@
MFC after:	7 days
2006-05-25 01:00:35 +00:00
mohans
60ef615733 Changes to make the NFS client MP safe.
Thanks to Kris Kennaway for testing and sending lots of bugs my way.
2006-05-19 00:04:24 +00:00
mohans
8a6c02da7d Keep track of the number of in-progress async direct IO writes in the nfsnode.
Make fsync/close wait until all of these drain. Add a check to nfs_getpage() and
nfs_putpage().
2006-04-06 01:20:30 +00:00
ps
6364b280f8 - Always return success from NFS strategy. nfs_doio(), in the
event of an error, does the right thing, in terms of setting
  the error flags in the buf header. That fixes a crash from
  bstrategy().
- Treat ETIMEDOUT as a "recoverable" error, causing the buffer
  to be re-dirtied. ETIMEDOUT can occur on soft mounts, when
  the number of retries are exceeded, and we don't want data loss
  in that case.

Submitted by:	Mohan Srinivasan
2005-11-21 19:23:46 +00:00
ps
3c7af06c7a Remove the NFS client rslock. The rslock was used to serialize
writers that want to extend the file. It was also used to serialize
readers that might want to read the last block of the file (with a
writer extending the file).  Now that we support vnode locking for
NFS, the rslock is unnecessary. Writers grab the exclusive vnode
lock before writing and readers grab the shared (or in some cases
the exclusive) lock.

Submitted by:	Mohan Srinivasan
2005-07-21 22:46:56 +00:00
green
b4b5044eed Ifdef out the incomplete non-blocking IO implementation for NFS
pending discussion of how implementation would proceed.  Applications
like -lc_r expect select(3) to match the EAGAIN-status of IO
functions.

Approved by:	re
2005-06-16 15:43:17 +00:00
green
ff904ffb64 Fix a serious deadlock with the NFS client. Given a large enough
atomic write request, it can fill the buffer cache with the entirety
of that write in order to handle retries.  However, it never drops
the vnode lock, or else it wouldn't be atomic, so it ends up waiting
indefinitely for more buf memory that cannot be gotten as it has it
all, and it waits in an uncancellable state.

To fix this, hibufspace is exported and scaled to a reasonable
fraction.  This is used as the limit of how much of an atomic write
request by the NFS client will be handled asynchronously.  If the
request is larger than this, it will be turned into a synchronous
request which won't deadlock the system.  It's possible this value is
far off from what is required by some, so it shall be tunable as soon
as mount_nfs(8) learns of the new field.

The slowdown between an asynchronous and a synchronous write on NFS
appears to be on the order of 2x-4x.

General nod by:	gad
MFC after:	2 weeks
More testing:	wes
PR:		kern/79208
2005-06-10 23:50:41 +00:00
das
89bc04ad2d Don't brelse(bp) if bp is null. Also, eliminate some redundancy
and dead code.

Found by:	Coverity Prevent analysis tool
2005-03-18 21:23:32 +00:00
jeff
5bd51ec6e6 - The VI_DOOMED flag now signals the end of a vnode's relationship with
the filesystem.  Check that rather than VI_XLOCK.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 12:14:56 +00:00
phk
8dba90be16 Remove unused cred arg from nfs_vinvalbuf() and many bogus arguments
passed for it.
2005-01-24 12:31:06 +00:00