Commit Graph

2756 Commits

Author SHA1 Message Date
Rick Macklem
0149d177fb Revert r230516, since it doesn't really fix the problem. 2012-01-26 00:07:34 +00:00
Konstantin Belousov
d5210589b7 Fix remaining calls to cache_enter() in both NFS clients to provide
appropriate timestamps.  Restore the assertions which verify that
NCF_TS is set when timestamp is asked for.

Reviewed by:  jhb (previous version)
MFC after:    2 weeks
2012-01-25 20:48:20 +00:00
John Baldwin
0b17c7bea5 Add a timeout on positive name cache entries in the NFS client. That is,
we will only trust a positive name cache entry for a specified amount of
time before falling back to a LOOKUP RPC, even if the ctime for the file
handle matches the cached copy in the name cache entry.  The timeout is
configured via a new 'nametimeo' mount option and defaults to 60 seconds.
It may be set to zero to disable positive name caching entirely.

Reviewed by:	rmacklem
MFC after:	1 week
2012-01-25 20:05:58 +00:00
Rick Macklem
6403723880 If a mount -u is done to either NFS client that switches it
from TCP to UDP and the rsize/wsize/readdirsize is greater
than NFS_MAXDGRAMDATA, it is possible for a thread doing an
I/O RPC to get stuck repeatedly doing retries. This happens
because the RPC will use a resize/wsize/readdirsize that won't
work for UDP and, as such, it will keep failing indefinitely.
This patch returns an error for this case, to avoid the problem.
A discussion on freebsd-fs@ seemed to indicate that returning
an error was preferable to silently ignoring the "udp"/"mntudp"
option.
This problem was discovered while investigating a problem reported
by pjd@ via email.

MFC after:	2 weeks
2012-01-25 00:22:53 +00:00
John Baldwin
5aefb4cbbf Close a race in NFS lookup processing that could result in stale name cache
entries on one client when a directory was renamed on another client.  The
root cause for the stale entry being trusted is that each per-vnode nfsnode
structure has a single 'n_ctime' timestamp used to validate positive name
cache entries.  However, if there are multiple entries for a single vnode,
they all share a single timestamp.  To fix this, extend the name cache
to allow filesystems to optionally store a timestamp value in each name
cache entry.  The NFS clients now fetch the timestamp associated with
each name cache entry and use that to validate cache hits instead of the
timestamps previously stored in the nfsnode.  Another part of the fix is
that the NFS clients now use timestamps from the post-op attributes of
RPCs when adding name cache entries rather than pulling the timestamps out
of the file's attribute cache.  The latter is subject to races with other
lookups updating the attribute cache concurrently.  Some more details:
- Add a variant of nfsm_postop_attr() to the old NFS client that can return
  a vattr structure with a copy of the post-op attributes.
- Handle lookups of "." as a special case in the NFS clients since the name
  cache does not store name cache entries for ".", so we cannot get a
  useful timestamp.  It didn't really make much sense to recheck the
  attributes on the the directory to validate the namecache hit for "."
  anyway.
- ABI compat shims for the name cache routines are present in this commit
  so that it is safe to MFC.

MFC after:	2 weeks
2012-01-20 20:02:01 +00:00
Rick Macklem
23b3566364 Martin Cracauer reported a problem to freebsd-current@ under the
subject "Data corruption over NFS in -current". During investigation
of this, I came across an ugly bogusity in the new NFS client where
it replaced the cr_uid with the one used for the mount. This was
done so that "system operations" like the NFSv4 Renew would be
performed as the user that did the mount. However, if any other
thread shares the credential with the one doing this operation,
it could do an RPC (or just about anything else) as the wrong cr_uid.
This patch fixes the above, by using the mount credentials instead of
the one provided as an argument for this case. It appears
to have fixed Martin's problem.
This patch is needed for NFSv4 mounts and NFSv3 mounts against
some non-FreeBSD servers that do not put post operation attributes
in the NFSv3 Statfs RPC reply.

Tested by:	Martin Cracauer (cracauer at cons.org)
Reviewed by:	jhb
MFC after:	2 weeks
2012-01-20 00:58:51 +00:00
Eygene Ryabinkin
15c75a0d9f Subject: NULLFS: properly destroy node hash
Use hashdestroy() instead of naive free().

Approved by:	kib
MFC after:	2 weeks
2012-01-18 11:23:46 +00:00
Kevin Lo
e0d3195bd6 Return EOPNOTSUPP since we only support update mounts for NFS export.
Spotted by:	trociny
2012-01-17 01:25:53 +00:00
Kirk McKusick
cc672d3599 Make sure all intermediate variables holding mount flags (mnt_flag)
and that all internal kernel calls passing mount flags are declared
as uint64_t so that flags in the top 32-bits are not lost.

MFC after: 2 weeks
2012-01-17 01:08:01 +00:00
Kevin Lo
57eb5548c9 Add nfs export support to tmpfs(5)
Reviewed by:	kib
2012-01-16 10:25:22 +00:00
Alan Cox
0b05cac3d2 When tmpfs_write() resets an extended file to its original size after an
error, we want tmpfs_reg_resize() to ignore I/O errors and unconditionally
update the file's size.

Reviewed by:	kib
MFC after:	3 weeks
2012-01-16 00:26:49 +00:00
Mikolaj Golub
fe7f89b71a Abrogate nchr argument in proc_getargv() and proc_getenvv(): we always want
to read strings completely to know the actual size.

As a side effect it fixes the issue with kern.proc.args and kern.proc.env
sysctls, which didn't return the size of available data when calling
sysctl(3) with the NULL argument for oldp.

Note, in get_ps_strings(), which does actual work for proc_getargv() and
proc_getenvv(), we still have a safety limit on the size of data read in
case of a corrupted procces stack.

Suggested by:	kib
MFC after:	3 days
2012-01-15 18:47:24 +00:00
Ulrich Spörlein
9a14aa017b Convert files to UTF-8 2012-01-15 13:23:18 +00:00
Alan Cox
93431cb74c Neither tmpfs_nocacheread() nor tmpfs_mappedwrite() needs to call
vm_object_pip_{add,subtract}() on the swap object because the swap
object can't be destroyed while the vnode is exclusively locked.
Moreover, even if the swap object could have been destroyed during
tmpfs_nocacheread() and tmpfs_mappedwrite() this code is broken
because vm_object_pip_subtract() does not wake up the sleeping thread
that is trying to destroy the swap object.

Free invalid pages after an I/O error.  There is no virtue in keeping
them around in the swap object creating more work for the page daemon.
(I believe that any non-busy page in the swap object will now always
be valid.)

vm_pager_get_pages() does not return a standard errno, so its return
value should not be returned by tmpfs without translation to an errno
value.

There is no reason for the wakeup on vpg in tmpfs_mappedwrite() to
occur with the swap object locked.

Eliminate printf()s from tmpfs_nocacheread() and tmpfs_mappedwrite().
(The swap pager already spam your console if data corruption is
imminent.)

Reviewed by:	kib
MFC after:	3 weeks
2012-01-14 23:04:27 +00:00
Rick Macklem
5b79362b47 Tai Horgan reported via email that there were two places in
the new NFSv4 server where the code follows the wrong list.
Fortunately, for these fairly rare cases, the lc_stateid[]
lists are normally empty. This patch fixes the code to
follow the correct list.

Reported by:	tai.horgan at isilon.com
Discussed with:	zack
MFC after:	2 weeks
2012-01-14 04:04:58 +00:00
Rick Macklem
a16cd9c05e jwd@ reported via email that the "CacheSize" field reported by "nfsstat -e -s"
would go negative after using the "-z" option to zero out the stats.
This patch fixes that by not zeroing out the srvcache_size field
for "-z", since it is the size of the cache and not a counter.

MFC after:	2 weeks
2012-01-11 02:46:42 +00:00
Alan Cox
2971897d51 Correct an error of omission in the implementation of the truncation
operation on POSIX shared memory objects and tmpfs.  Previously, neither of
these modules correctly handled the case in which the new size of the object
or file was not a multiple of the page size.  Specifically, they did not
handle partial page truncation of data stored on swap.  As a result, stale
data might later be returned to an application.

Interestingly, a data inconsistency was less likely to occur under tmpfs
than POSIX shared memory objects.  The reason being that a different mistake
by the tmpfs truncation operation helped avoid a data inconsistency.  If the
data was still resident in memory in a PG_CACHED page, then the tmpfs
truncation operation would reactivate that page, zero the truncated portion,
and leave the page pinned in memory.  More precisely, the benevolent error
was that the truncation operation didn't add the reactivated page to any of
the paging queues, effectively pinning the page.  This page would remain
pinned until the file was destroyed or the page was read or written.  With
this change, the page is now added to the inactive queue.

Discussed with:	jhb
Reviewed by:	kib (an earlier version)
MFC after:	3 weeks
2012-01-08 20:09:26 +00:00
Rick Macklem
f725864490 opt_inet6.h was missing from some files in the new NFS subsystem.
The effect of this was, for clients mounted via inet6 addresses,
that the DRC cache would never have a hit in the server. It also
broke NFSv4 callbacks when an inet6 address was the only one available
in the client. This patch fixes the above, plus deletes opt_inet6.h
from a couple of files it is not needed for.

MFC after:	2 weeks
2012-01-08 01:54:46 +00:00
Jaakko Heinonen
d467c9472a r222004 changed sbuf_finish() to not clear the buffer error status. As a
consequence sbuf_len() will return -1 for buffers which had the error
status set prior to sbuf_finish() call. This causes a problem in
pfs_read() which purposely uses a fixed size sbuf to discard bytes which
are not needed to fulfill the read request.

Work around the problem by using the full buffer length when
sbuf_finish() indicates an overflow. An overflowed sbuf with fixed size
is always full.

PR:		kern/163076
Approved by:	des
MFC after:	2 weeks
2012-01-06 10:12:59 +00:00
Jaakko Heinonen
9cb24e3c98 Check the return value of sbuf_finish() in pfs_readlink() and return
ENAMETOOLONG if the buffer overflowed.

Approved by:	des
MFC after:	2 weeks
2012-01-06 09:17:34 +00:00
Dimitry Andric
f39adedd5b In sys/fs/nullfs/null_subr.c, in a KASSERT, output the correct vnode
pointer 'lowervp' instead of 'vp', which is uninitialized at that point.

Reviewed by:	kib
MFC after:	1 week
2012-01-05 17:06:04 +00:00
Konstantin Belousov
dd0f9532f3 Do the vput() for the lowervp in the null_nodeget() for error case too.
Several callers of null_nodeget() did the cleanup itself, but several
missed it, most prominent being null_bypass(). Remove the cleanup from
the callers, now null_nodeget() handles lowervp free itself.

Reported and tested by:	pho
MFC after:	1 week
2012-01-03 21:09:07 +00:00
Konstantin Belousov
48a1e3f624 Document the state of the lowervp vnode for null_nodeget().
Tested by:	pho
MFC after:	1 week
2012-01-03 21:03:20 +00:00
Pedro F. Giffuni
5eda6329b2 Minor cleanups to ntfs code
bzero -> memset
rename variables to avoid shadowing.

PR:		142401
Obtained from:	NetBSD
Approved by	jhb (mentor)
2012-01-03 19:09:01 +00:00
Alan Cox
04f883d798 Don't pass VM_ALLOC_ZERO to vm_page_grab() in tmpfs_mappedwrite() and
tmpfs_nocacheread().  It is both unnecessary and a pessimization.  It
results in either the page being zeroed twice or zeroed first and then
overwritten by an I/O operation.

MFC after:	3 weeks
2012-01-03 03:29:01 +00:00
Ed Schouten
dc15eac046 Use strchr() and strrchr().
It seems strchr() and strrchr() are used more often than index() and
rindex(). Therefore, simply migrate all kernel code to use it.

For the XFS code, remove an empty line to make the code identical to
the code in the Linux kernel.
2012-01-02 12:12:10 +00:00
Ed Schouten
8f8d30274a Migrate ufs and ext2fs from skpc() to memcchr().
While there, remove a useless check from the code. memcchr() always
returns characters unequal to 0xff in this case, so inosused[i] ^ 0xff
can never be equal to zero. Also, the fact that memcchr() returns a
pointer instead of the number of bytes until the end, makes conversion
to an offset far more easy.
2012-01-01 20:47:33 +00:00
Kevin Lo
824be4a073 Discard local array based on return values.
Pointed out by:	uqs
Found with:	Coverity Prevent(tm)
CID:	10089
2011-12-24 15:49:52 +00:00
Rick Macklem
f855a3c570 During investigation of an NFSv4 client crash reported by glebius@,
jhb@ spotted that nfscl_getstateid() might modify credentials when
called from nfsrpc_read() for the case where p != NULL, whereas
nfsrpc_read() only did a crdup() to get new credentials for p == NULL.
This bug was introduced by r195510, since pre-r195510 nfscl_getstateid()
only modified credentials for the p == NULL case. This patch modifies
nfsrpc_read()/nfsrpc_write() so that they do crdup() for the p != NULL case.
It is conceivable that this bug caused the crash reported by glebius@, but
that will not be determined for some time, since the crash occurred after
about 1month of operation.

Tested by:	glebius
Reviewed by:	jhb
MFC after:	2 weeks
2011-12-23 02:04:35 +00:00
Kevin Lo
e2ee19e346 Discarding local array based on return values 2011-12-22 06:31:29 +00:00
Rick Macklem
713f46ac47 jwd@ reported a problem via email where the old NFS client would
get a reply of EEXIST from an NFS server when a Mkdir RPC was retried,
for an NFS over UDP mount.
Upon investigation, it was found that the client was retransmitting
the Mkdir RPC request over UDP, but with a different xid. As such,
the retransmitted message would miss the Duplicate Request Cache
in the server, causing it to reply EEXIST. The kernel client side
UDP rpc code has two timers. The first one causes a retransmit using
the same xid and socket and was set to a fixed value of 3seconds.
(The default can be overridden via CLSET_RETRY_TIMEOUT.)
The second one creates a new socket and xid and should be larger
than the first. However, both NFS clients were setting the second
timer to nm_timeo ("timeout=<value>" mount argument), which defaulted to
1second, so the first timer would never time out.
This patch fixes both NFS clients so that they set the first timer
using nm_timeo and makes the second timer larger than the first one.

Reported by:	jwd
Tested by:	jwd
Reviewed by:	jhb
MFC after:	2 weeks
2011-12-21 02:45:51 +00:00
Pedro F. Giffuni
5ed5554f0a Style cleanups by jh@.
Fix a comment from the previous commit.
Use M_ZERO instead of bzero() in ext2_vfsops.c
Add include guards from PR.

PR:		162564
Approved by:	jhb (mentor)
MFC after:	2 weeks
2011-12-16 15:47:43 +00:00
Rick Macklem
22ea9f58f0 Patch the new NFS server in a manner analagous to r228520 for the
old NFS server, so that it correctly handles a count == 0 argument
for Commit.

PR:		kern/118126
MFC after:	2 weeks
2011-12-16 00:58:41 +00:00
Pedro F. Giffuni
5b63c1252b Bring in reallocblk to ext2fs.
The feature has been standard for a while in UFS as a means to reduce
fragmentation, therefore maintaining consistent performance with
filesystem aging. This is also very similar to what ext4 calls
"delayed allocation".

In his 2010 GSoC, Zheng Liu ported and benchmarked the missing
FANCY_REALLOC code to find more consistent performance improvements than
with the preallocation approach.

PR:		159233
Author:		Zheng Liu <gnehzuil AT SPAMFREE gmail DOT com>
Sponsored by:	Google Inc.
Approved by:	jhb (mentor)
MFC after:	2 weeks
2011-12-15 20:31:18 +00:00
Pedro F. Giffuni
c14d4ad1c6 Merge ext2_readwrite.c into ext2_vnops.c as done in UFS in r101729.
This removes the obfuscations mentioned in ext2_readwrite and
places the clustering funtion in a location similar to other
UFS-based implementations.

No performance or functional changeses are expected from
this move.

PR:		kern/159232
Suggested by:	bde
Approved by:	jhb (mentor)
MFC after:	2 weeks
2011-12-14 22:04:14 +00:00
John Baldwin
e517e6f12c Explicitly use curthread while manipulating td_fpop during last close
of a devfs file descriptor in devfs_close_f().  The passed in td argument
may be NULL if the close was invoked by garbage collection of open
file descriptors in pending control messages in the socket buffer of a
UNIX domain socket after it was closed.

PR:		kern/151758
Submitted by:	Andrey Shidakov  andrey shidakov ru
Submitted by:	Ruben van Staveren  ruben verweg com
Reviewed by:	kib
MFC after:	2 weeks
2011-12-09 17:49:34 +00:00
Konstantin Belousov
d8e8af3166 Initialize fifoinfo fi_wgen field on open. The only important is the
difference between fi_wgen and f_seqcount, so the change is purely
cosmetic, but it makes the code easier to understand.

Submitted by:	gianni
MFC after:	2 weeks
2011-12-04 19:25:49 +00:00
Rick Macklem
34f2e649d0 This patch adds a sysctl to the NFSv4 server which optionally disables the
check for a UTF-8 compliant file name. Enabling this sysctl results in
an NFSv4 server that is non-RFC3530 compliant, therefore it is not enabled
by default. However, enabling this sysctl results in NFSv3 compatible
behaviour and fixes the problem reported by "dan at sunsaturn.com"
to freebsd-current@ on Nov. 14, 2011 under the subject "NFSV4 readlink_stat".

Tested by:	dan at sunsaturn.com
Reviewed by:	zack
MFC after:	2 weeks
2011-12-04 16:33:04 +00:00
Rick Macklem
7a2e4d803c Post r223774, the NFSv4 client no longer has multiple instances
of the same lock_owner4 string. As such, the handling of cleanup
of lock_owners could be simplified. This simplification permitted
the client to do a ReleaseLockOwner operation when the process that
the lock_owner4 string represents, has exited. This permits the
server to release any storage related to the lock_owner4 string
before the associated open is closed. Without this change, it
is possible to exhaust a server's storage when a long running
process opens a file and then many child processes do locking
on the file, because the open doesn't get closed. A similar patch
was applied to the Linux NFSv4 client recently so that it wouldn't
exhaust a server's storage.

Reviewed by:	zack
MFC after:	2 weeks
2011-12-03 02:27:26 +00:00
John Baldwin
574862c8ba Enhance the sequential access heuristic used to perform readahead in the
NFS server and reuse it for writes as well to allow writes to the backing
store to be clustered.
- Use a prime number for the size of the heuristic table (1017 is not
  prime).
- Move the logic to locate a heuristic entry from the table and compute
  the sequential count out of VOP_READ() and into a separate routine.
- Use the logic from sequential_heuristic() in vfs_vnops.c to update the
  seqcount when a sequential access is performed rather than just
  increasing seqcount by 1.  This lets the clustering count ramp up
  faster.
- Allow for some reordering of RPCs and if it is detected leave the current
  seqcount as-is rather than dropping back to a seqcount of 1.  Also,
  when out of order access is encountered, cut seqcount in half rather than
  dropping it all the way back to 1 to further aid with reordering.
- Fix the new NFS server to properly update the next offset after a
  successful VOP_READ() so that the readahead actually works.

Some of these changes came from an earlier patch by Bjorn Gronwall that was
forwarded to me by bde@.

Discussed with:	bde, rmacklem, fs@
Submitted by:	Bjorn Gronwall (1, 4)
MFC after:	2 weeks
2011-12-01 18:46:28 +00:00
Konstantin Belousov
dc874f9881 Rename vm_page_set_valid() to vm_page_set_valid_range().
The vm_page_set_valid() is the most reasonable name for the m->valid
accessor.

Reviewed by:	attilio, alc
2011-11-30 17:39:00 +00:00
Kevin Lo
bdcdb55387 Add unicode support to ntfs
Obtained from:	imura
2011-11-27 15:43:49 +00:00
Mikolaj Golub
beb7471b16 In procfs_doproccmdline() if arguments are not cashed read them from
the process stack.

Suggested by:	kib
Reviewed by:	kib
Tested by:	pho
MFC after:	2 weeks
2011-11-22 20:43:03 +00:00
Ivan Voras
6e92aee4e2 Avoid panics from recursive rename operations. Not a perfect patch but
good enough for now.

PR:		kern/159418
Submitted by:	Gleb Kurtsou
Reviewed by:	kib
MFC after:	1 month
2011-11-22 16:18:12 +00:00
Konstantin Belousov
54cf919857 Put all the messages from msdosfs under the MSDOSFS_DEBUG ifdef.
They are confusing to user, and not informative for general consumption.

MFC after:	1 week
2011-11-22 13:30:36 +00:00
Rick Macklem
6854d64811 This patch enables the new/default NFS server's use of shared
vnode locking for read, readdir, readlink, getattr and access.
It is hoped that this will improve server performance for these
operations, since they will no longer be serialized for a given
file/vnode.
2011-11-22 00:35:30 +00:00
Xin LI
296a25a245 Improve the way to calculate available pages in tmpfs:
- Don't deduct wired pages from total usable counts because it does not
   make any sense.  To make things worse, on systems where swap size is
   smaller than physical memory and use a lot of wired pages (e.g. ZFS),
   tmpfs can suddenly have free space of 0 because of this;
 - Count cached pages as available; [1]
 - Don't count inactive pages as available, technically we could but that
   might be too aggressive; [1]

[1] Suggested by kib@

MFC after:	1 week
2011-11-21 20:26:22 +00:00
Rick Macklem
f9340edfc0 Clean up some cruft in the NFSv4 client left over from the
OpenBSD port, so that it is more readable. No logic change
is made by this commit.

MFC after:	2 weeks
2011-11-21 16:06:23 +00:00
Rick Macklem
034235528f Add two arguments to the nfsrpc_rellockown() function in the NFSv4
client. This does not change the client's behaviour, but prepares
the code so that nfsrpc_rellockown() can be called elsewhere in a
future commit.

MFC after:	2 weeks
2011-11-20 16:46:50 +00:00
Rick Macklem
d57a9d5f52 Since the nfscl_cleanup() function isn't used by the FreeBSD NFSv4 client,
delete the code and fix up the related comments. This should not have
any functional effect on the client.

MFC after:	2 weeks
2011-11-20 01:18:47 +00:00