Commit Graph

2812 Commits

Author SHA1 Message Date
Rick Macklem
8c9c322347 r228827 fixed a problem where copying of NFSv4 open credentials into
a credential structure would corrupt it. This happened when the
p argument was != NULL. However, I now realize that the copying of
open credentials should only happen for p == NULL, since that indicates
that it is a read-ahead or write-behind. This patch fixes this.
After this commit, r228827 could be reverted, but I think the code is
clearer and safer with the patch, so I am going to leave it in.
Without this patch, it was possible that a NFSv4 VOP_SETATTR() could have
changed the credentials of the caller. This would have happened if
the process doing the VOP_SETATTR() did not have the file open, but
some other process running as a different uid had the file open for writing
at the same time.

MFC after:	5 days
2012-02-07 16:32:43 +00:00
John Baldwin
bf40d24a3f Rename cache_lookup_times() to cache_lookup() and retire the old API and
ABI stub for cache_lookup().
2012-02-06 17:00:28 +00:00
Konstantin Belousov
c480f781ea Current implementations of sync(2) and syncer vnode fsync() VOP uses
mnt_noasync counter to temporary remove MNTK_ASYNC mount option, which
is needed to guarantee a synchronous completion of the initiated i/o
before syscall or VOP return.  Global removal of MNTK_ASYNC option is
harmful because not only i/o started from corresponding thread becomes
synchronous, but all i/o is synchronous on the filesystem which is
initiated during sync(2) or syncer activity.

Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local
thread flag to disable async i/o for current thread only. Use the
opportunity to move DOINGASYNC() macro into sys/vnode.h and
consistently use it through places which tested for MNTK_ASYNC.

Some testing demonstrated 60-70% improvements in run time for the
metadata-intensive operations on async-mounted UFS volumes, but still
with great deviation due to other reasons.

Reviewed by:	mckusick
Tested by:	scottl
MFC after:	2 weeks
2012-02-06 11:04:36 +00:00
Bjoern A. Zeeb
81d5d46b3c Add multi-FIB IPv6 support to the core network stack supplementing
the original IPv4 implementation from r178888:

- Use RT_DEFAULT_FIB in the IPv4 implementation where noticed.
- Use rt*fib() KPI with explicit RT_DEFAULT_FIB where applicable in
  the NFS code.
- Use the new in6_rt* KPI in TCP, gif(4), and the IPv6 network stack
  where applicable.
- Split in6_rtqtimo() and in6_mtutimo() as done in IPv4 and equally
  prevent multiple initializations of callouts in in6_inithead().
- Use wrapper functions where needed to preserve the current KPI to
  ease MFCs.  Use BURN_BRIDGES to indicate expected future cleanup.
- Fix (related) comments (both technical or style).
- Convert to rtinit() where applicable and only use custom loops where
  currently not possible otherwise.
- Multicast group, most neighbor discovery address actions and faith(4)
  are locked to the default FIB.  Individual IPv6 addresses will only
  appear in the default FIB, however redirect information and prefixes
  of connected subnets are automatically propagated to all FIBs by
  default (mimicking IPv4 behavior as closely as possible).

Sponsored by:	Cisco Systems, Inc.
2012-02-03 13:08:44 +00:00
Rick Macklem
87b633678b When a "mount -u" switches an NFS mount point from TCP to UDP,
any thread doing an I/O RPC with a transfer size greater than
NFS_UDPMAXDATA will be hung indefinitely, retrying the RPC.
After a discussion on freebsd-fs@, I decided to add a warning
message for this case, as suggested by Jeremy Chadwick.

Suggested by:	freebsd at jdc.parodius.com (Jeremy Chadwick)
MFC after:	2 weeks
2012-01-31 03:58:26 +00:00
Rick Macklem
7f763fc39c A problem with respect to data read through the buffer cache for both
NFS clients was reported to freebsd-fs@ under the subject "NFS
corruption in recent HEAD" on Nov. 26, 2011. This problem occurred when
a TCP mounted root fs was changed to using UDP. I believe that this
problem was caused by the change in mnt_stat.f_iosize that occurred
because rsize was decreased to the maximum supported by UDP. This
patch fixes the problem by using v_bufobj.bo_bsize instead of f_iosize,
since the latter is set to f_iosize when the vnode is allocated, but
does not change for a given vnode when f_iosize changes.

Reported by:	pjd
Reviewed by:	kib
MFC after:	2 weeks
2012-01-27 02:46:12 +00:00
Rick Macklem
0149d177fb Revert r230516, since it doesn't really fix the problem. 2012-01-26 00:07:34 +00:00
Konstantin Belousov
d5210589b7 Fix remaining calls to cache_enter() in both NFS clients to provide
appropriate timestamps.  Restore the assertions which verify that
NCF_TS is set when timestamp is asked for.

Reviewed by:  jhb (previous version)
MFC after:    2 weeks
2012-01-25 20:48:20 +00:00
John Baldwin
0b17c7bea5 Add a timeout on positive name cache entries in the NFS client. That is,
we will only trust a positive name cache entry for a specified amount of
time before falling back to a LOOKUP RPC, even if the ctime for the file
handle matches the cached copy in the name cache entry.  The timeout is
configured via a new 'nametimeo' mount option and defaults to 60 seconds.
It may be set to zero to disable positive name caching entirely.

Reviewed by:	rmacklem
MFC after:	1 week
2012-01-25 20:05:58 +00:00
Rick Macklem
6403723880 If a mount -u is done to either NFS client that switches it
from TCP to UDP and the rsize/wsize/readdirsize is greater
than NFS_MAXDGRAMDATA, it is possible for a thread doing an
I/O RPC to get stuck repeatedly doing retries. This happens
because the RPC will use a resize/wsize/readdirsize that won't
work for UDP and, as such, it will keep failing indefinitely.
This patch returns an error for this case, to avoid the problem.
A discussion on freebsd-fs@ seemed to indicate that returning
an error was preferable to silently ignoring the "udp"/"mntudp"
option.
This problem was discovered while investigating a problem reported
by pjd@ via email.

MFC after:	2 weeks
2012-01-25 00:22:53 +00:00
John Baldwin
5aefb4cbbf Close a race in NFS lookup processing that could result in stale name cache
entries on one client when a directory was renamed on another client.  The
root cause for the stale entry being trusted is that each per-vnode nfsnode
structure has a single 'n_ctime' timestamp used to validate positive name
cache entries.  However, if there are multiple entries for a single vnode,
they all share a single timestamp.  To fix this, extend the name cache
to allow filesystems to optionally store a timestamp value in each name
cache entry.  The NFS clients now fetch the timestamp associated with
each name cache entry and use that to validate cache hits instead of the
timestamps previously stored in the nfsnode.  Another part of the fix is
that the NFS clients now use timestamps from the post-op attributes of
RPCs when adding name cache entries rather than pulling the timestamps out
of the file's attribute cache.  The latter is subject to races with other
lookups updating the attribute cache concurrently.  Some more details:
- Add a variant of nfsm_postop_attr() to the old NFS client that can return
  a vattr structure with a copy of the post-op attributes.
- Handle lookups of "." as a special case in the NFS clients since the name
  cache does not store name cache entries for ".", so we cannot get a
  useful timestamp.  It didn't really make much sense to recheck the
  attributes on the the directory to validate the namecache hit for "."
  anyway.
- ABI compat shims for the name cache routines are present in this commit
  so that it is safe to MFC.

MFC after:	2 weeks
2012-01-20 20:02:01 +00:00
Rick Macklem
23b3566364 Martin Cracauer reported a problem to freebsd-current@ under the
subject "Data corruption over NFS in -current". During investigation
of this, I came across an ugly bogusity in the new NFS client where
it replaced the cr_uid with the one used for the mount. This was
done so that "system operations" like the NFSv4 Renew would be
performed as the user that did the mount. However, if any other
thread shares the credential with the one doing this operation,
it could do an RPC (or just about anything else) as the wrong cr_uid.
This patch fixes the above, by using the mount credentials instead of
the one provided as an argument for this case. It appears
to have fixed Martin's problem.
This patch is needed for NFSv4 mounts and NFSv3 mounts against
some non-FreeBSD servers that do not put post operation attributes
in the NFSv3 Statfs RPC reply.

Tested by:	Martin Cracauer (cracauer at cons.org)
Reviewed by:	jhb
MFC after:	2 weeks
2012-01-20 00:58:51 +00:00
Eygene Ryabinkin
15c75a0d9f Subject: NULLFS: properly destroy node hash
Use hashdestroy() instead of naive free().

Approved by:	kib
MFC after:	2 weeks
2012-01-18 11:23:46 +00:00
Kevin Lo
e0d3195bd6 Return EOPNOTSUPP since we only support update mounts for NFS export.
Spotted by:	trociny
2012-01-17 01:25:53 +00:00
Kirk McKusick
cc672d3599 Make sure all intermediate variables holding mount flags (mnt_flag)
and that all internal kernel calls passing mount flags are declared
as uint64_t so that flags in the top 32-bits are not lost.

MFC after: 2 weeks
2012-01-17 01:08:01 +00:00
Kevin Lo
57eb5548c9 Add nfs export support to tmpfs(5)
Reviewed by:	kib
2012-01-16 10:25:22 +00:00
Alan Cox
0b05cac3d2 When tmpfs_write() resets an extended file to its original size after an
error, we want tmpfs_reg_resize() to ignore I/O errors and unconditionally
update the file's size.

Reviewed by:	kib
MFC after:	3 weeks
2012-01-16 00:26:49 +00:00
Mikolaj Golub
fe7f89b71a Abrogate nchr argument in proc_getargv() and proc_getenvv(): we always want
to read strings completely to know the actual size.

As a side effect it fixes the issue with kern.proc.args and kern.proc.env
sysctls, which didn't return the size of available data when calling
sysctl(3) with the NULL argument for oldp.

Note, in get_ps_strings(), which does actual work for proc_getargv() and
proc_getenvv(), we still have a safety limit on the size of data read in
case of a corrupted procces stack.

Suggested by:	kib
MFC after:	3 days
2012-01-15 18:47:24 +00:00
Ulrich Spörlein
9a14aa017b Convert files to UTF-8 2012-01-15 13:23:18 +00:00
Alan Cox
93431cb74c Neither tmpfs_nocacheread() nor tmpfs_mappedwrite() needs to call
vm_object_pip_{add,subtract}() on the swap object because the swap
object can't be destroyed while the vnode is exclusively locked.
Moreover, even if the swap object could have been destroyed during
tmpfs_nocacheread() and tmpfs_mappedwrite() this code is broken
because vm_object_pip_subtract() does not wake up the sleeping thread
that is trying to destroy the swap object.

Free invalid pages after an I/O error.  There is no virtue in keeping
them around in the swap object creating more work for the page daemon.
(I believe that any non-busy page in the swap object will now always
be valid.)

vm_pager_get_pages() does not return a standard errno, so its return
value should not be returned by tmpfs without translation to an errno
value.

There is no reason for the wakeup on vpg in tmpfs_mappedwrite() to
occur with the swap object locked.

Eliminate printf()s from tmpfs_nocacheread() and tmpfs_mappedwrite().
(The swap pager already spam your console if data corruption is
imminent.)

Reviewed by:	kib
MFC after:	3 weeks
2012-01-14 23:04:27 +00:00
Rick Macklem
5b79362b47 Tai Horgan reported via email that there were two places in
the new NFSv4 server where the code follows the wrong list.
Fortunately, for these fairly rare cases, the lc_stateid[]
lists are normally empty. This patch fixes the code to
follow the correct list.

Reported by:	tai.horgan at isilon.com
Discussed with:	zack
MFC after:	2 weeks
2012-01-14 04:04:58 +00:00
Rick Macklem
a16cd9c05e jwd@ reported via email that the "CacheSize" field reported by "nfsstat -e -s"
would go negative after using the "-z" option to zero out the stats.
This patch fixes that by not zeroing out the srvcache_size field
for "-z", since it is the size of the cache and not a counter.

MFC after:	2 weeks
2012-01-11 02:46:42 +00:00
Alan Cox
2971897d51 Correct an error of omission in the implementation of the truncation
operation on POSIX shared memory objects and tmpfs.  Previously, neither of
these modules correctly handled the case in which the new size of the object
or file was not a multiple of the page size.  Specifically, they did not
handle partial page truncation of data stored on swap.  As a result, stale
data might later be returned to an application.

Interestingly, a data inconsistency was less likely to occur under tmpfs
than POSIX shared memory objects.  The reason being that a different mistake
by the tmpfs truncation operation helped avoid a data inconsistency.  If the
data was still resident in memory in a PG_CACHED page, then the tmpfs
truncation operation would reactivate that page, zero the truncated portion,
and leave the page pinned in memory.  More precisely, the benevolent error
was that the truncation operation didn't add the reactivated page to any of
the paging queues, effectively pinning the page.  This page would remain
pinned until the file was destroyed or the page was read or written.  With
this change, the page is now added to the inactive queue.

Discussed with:	jhb
Reviewed by:	kib (an earlier version)
MFC after:	3 weeks
2012-01-08 20:09:26 +00:00
Rick Macklem
f725864490 opt_inet6.h was missing from some files in the new NFS subsystem.
The effect of this was, for clients mounted via inet6 addresses,
that the DRC cache would never have a hit in the server. It also
broke NFSv4 callbacks when an inet6 address was the only one available
in the client. This patch fixes the above, plus deletes opt_inet6.h
from a couple of files it is not needed for.

MFC after:	2 weeks
2012-01-08 01:54:46 +00:00
Jaakko Heinonen
d467c9472a r222004 changed sbuf_finish() to not clear the buffer error status. As a
consequence sbuf_len() will return -1 for buffers which had the error
status set prior to sbuf_finish() call. This causes a problem in
pfs_read() which purposely uses a fixed size sbuf to discard bytes which
are not needed to fulfill the read request.

Work around the problem by using the full buffer length when
sbuf_finish() indicates an overflow. An overflowed sbuf with fixed size
is always full.

PR:		kern/163076
Approved by:	des
MFC after:	2 weeks
2012-01-06 10:12:59 +00:00
Jaakko Heinonen
9cb24e3c98 Check the return value of sbuf_finish() in pfs_readlink() and return
ENAMETOOLONG if the buffer overflowed.

Approved by:	des
MFC after:	2 weeks
2012-01-06 09:17:34 +00:00
Dimitry Andric
f39adedd5b In sys/fs/nullfs/null_subr.c, in a KASSERT, output the correct vnode
pointer 'lowervp' instead of 'vp', which is uninitialized at that point.

Reviewed by:	kib
MFC after:	1 week
2012-01-05 17:06:04 +00:00
Konstantin Belousov
dd0f9532f3 Do the vput() for the lowervp in the null_nodeget() for error case too.
Several callers of null_nodeget() did the cleanup itself, but several
missed it, most prominent being null_bypass(). Remove the cleanup from
the callers, now null_nodeget() handles lowervp free itself.

Reported and tested by:	pho
MFC after:	1 week
2012-01-03 21:09:07 +00:00
Konstantin Belousov
48a1e3f624 Document the state of the lowervp vnode for null_nodeget().
Tested by:	pho
MFC after:	1 week
2012-01-03 21:03:20 +00:00
Pedro F. Giffuni
5eda6329b2 Minor cleanups to ntfs code
bzero -> memset
rename variables to avoid shadowing.

PR:		142401
Obtained from:	NetBSD
Approved by	jhb (mentor)
2012-01-03 19:09:01 +00:00
Alan Cox
04f883d798 Don't pass VM_ALLOC_ZERO to vm_page_grab() in tmpfs_mappedwrite() and
tmpfs_nocacheread().  It is both unnecessary and a pessimization.  It
results in either the page being zeroed twice or zeroed first and then
overwritten by an I/O operation.

MFC after:	3 weeks
2012-01-03 03:29:01 +00:00
Ed Schouten
dc15eac046 Use strchr() and strrchr().
It seems strchr() and strrchr() are used more often than index() and
rindex(). Therefore, simply migrate all kernel code to use it.

For the XFS code, remove an empty line to make the code identical to
the code in the Linux kernel.
2012-01-02 12:12:10 +00:00
Ed Schouten
8f8d30274a Migrate ufs and ext2fs from skpc() to memcchr().
While there, remove a useless check from the code. memcchr() always
returns characters unequal to 0xff in this case, so inosused[i] ^ 0xff
can never be equal to zero. Also, the fact that memcchr() returns a
pointer instead of the number of bytes until the end, makes conversion
to an offset far more easy.
2012-01-01 20:47:33 +00:00
Kevin Lo
824be4a073 Discard local array based on return values.
Pointed out by:	uqs
Found with:	Coverity Prevent(tm)
CID:	10089
2011-12-24 15:49:52 +00:00
Rick Macklem
f855a3c570 During investigation of an NFSv4 client crash reported by glebius@,
jhb@ spotted that nfscl_getstateid() might modify credentials when
called from nfsrpc_read() for the case where p != NULL, whereas
nfsrpc_read() only did a crdup() to get new credentials for p == NULL.
This bug was introduced by r195510, since pre-r195510 nfscl_getstateid()
only modified credentials for the p == NULL case. This patch modifies
nfsrpc_read()/nfsrpc_write() so that they do crdup() for the p != NULL case.
It is conceivable that this bug caused the crash reported by glebius@, but
that will not be determined for some time, since the crash occurred after
about 1month of operation.

Tested by:	glebius
Reviewed by:	jhb
MFC after:	2 weeks
2011-12-23 02:04:35 +00:00
Kevin Lo
e2ee19e346 Discarding local array based on return values 2011-12-22 06:31:29 +00:00
Rick Macklem
713f46ac47 jwd@ reported a problem via email where the old NFS client would
get a reply of EEXIST from an NFS server when a Mkdir RPC was retried,
for an NFS over UDP mount.
Upon investigation, it was found that the client was retransmitting
the Mkdir RPC request over UDP, but with a different xid. As such,
the retransmitted message would miss the Duplicate Request Cache
in the server, causing it to reply EEXIST. The kernel client side
UDP rpc code has two timers. The first one causes a retransmit using
the same xid and socket and was set to a fixed value of 3seconds.
(The default can be overridden via CLSET_RETRY_TIMEOUT.)
The second one creates a new socket and xid and should be larger
than the first. However, both NFS clients were setting the second
timer to nm_timeo ("timeout=<value>" mount argument), which defaulted to
1second, so the first timer would never time out.
This patch fixes both NFS clients so that they set the first timer
using nm_timeo and makes the second timer larger than the first one.

Reported by:	jwd
Tested by:	jwd
Reviewed by:	jhb
MFC after:	2 weeks
2011-12-21 02:45:51 +00:00
Pedro F. Giffuni
5ed5554f0a Style cleanups by jh@.
Fix a comment from the previous commit.
Use M_ZERO instead of bzero() in ext2_vfsops.c
Add include guards from PR.

PR:		162564
Approved by:	jhb (mentor)
MFC after:	2 weeks
2011-12-16 15:47:43 +00:00
Rick Macklem
22ea9f58f0 Patch the new NFS server in a manner analagous to r228520 for the
old NFS server, so that it correctly handles a count == 0 argument
for Commit.

PR:		kern/118126
MFC after:	2 weeks
2011-12-16 00:58:41 +00:00
Pedro F. Giffuni
5b63c1252b Bring in reallocblk to ext2fs.
The feature has been standard for a while in UFS as a means to reduce
fragmentation, therefore maintaining consistent performance with
filesystem aging. This is also very similar to what ext4 calls
"delayed allocation".

In his 2010 GSoC, Zheng Liu ported and benchmarked the missing
FANCY_REALLOC code to find more consistent performance improvements than
with the preallocation approach.

PR:		159233
Author:		Zheng Liu <gnehzuil AT SPAMFREE gmail DOT com>
Sponsored by:	Google Inc.
Approved by:	jhb (mentor)
MFC after:	2 weeks
2011-12-15 20:31:18 +00:00
Pedro F. Giffuni
c14d4ad1c6 Merge ext2_readwrite.c into ext2_vnops.c as done in UFS in r101729.
This removes the obfuscations mentioned in ext2_readwrite and
places the clustering funtion in a location similar to other
UFS-based implementations.

No performance or functional changeses are expected from
this move.

PR:		kern/159232
Suggested by:	bde
Approved by:	jhb (mentor)
MFC after:	2 weeks
2011-12-14 22:04:14 +00:00
John Baldwin
e517e6f12c Explicitly use curthread while manipulating td_fpop during last close
of a devfs file descriptor in devfs_close_f().  The passed in td argument
may be NULL if the close was invoked by garbage collection of open
file descriptors in pending control messages in the socket buffer of a
UNIX domain socket after it was closed.

PR:		kern/151758
Submitted by:	Andrey Shidakov  andrey shidakov ru
Submitted by:	Ruben van Staveren  ruben verweg com
Reviewed by:	kib
MFC after:	2 weeks
2011-12-09 17:49:34 +00:00
Konstantin Belousov
d8e8af3166 Initialize fifoinfo fi_wgen field on open. The only important is the
difference between fi_wgen and f_seqcount, so the change is purely
cosmetic, but it makes the code easier to understand.

Submitted by:	gianni
MFC after:	2 weeks
2011-12-04 19:25:49 +00:00
Rick Macklem
34f2e649d0 This patch adds a sysctl to the NFSv4 server which optionally disables the
check for a UTF-8 compliant file name. Enabling this sysctl results in
an NFSv4 server that is non-RFC3530 compliant, therefore it is not enabled
by default. However, enabling this sysctl results in NFSv3 compatible
behaviour and fixes the problem reported by "dan at sunsaturn.com"
to freebsd-current@ on Nov. 14, 2011 under the subject "NFSV4 readlink_stat".

Tested by:	dan at sunsaturn.com
Reviewed by:	zack
MFC after:	2 weeks
2011-12-04 16:33:04 +00:00
Rick Macklem
7a2e4d803c Post r223774, the NFSv4 client no longer has multiple instances
of the same lock_owner4 string. As such, the handling of cleanup
of lock_owners could be simplified. This simplification permitted
the client to do a ReleaseLockOwner operation when the process that
the lock_owner4 string represents, has exited. This permits the
server to release any storage related to the lock_owner4 string
before the associated open is closed. Without this change, it
is possible to exhaust a server's storage when a long running
process opens a file and then many child processes do locking
on the file, because the open doesn't get closed. A similar patch
was applied to the Linux NFSv4 client recently so that it wouldn't
exhaust a server's storage.

Reviewed by:	zack
MFC after:	2 weeks
2011-12-03 02:27:26 +00:00
John Baldwin
574862c8ba Enhance the sequential access heuristic used to perform readahead in the
NFS server and reuse it for writes as well to allow writes to the backing
store to be clustered.
- Use a prime number for the size of the heuristic table (1017 is not
  prime).
- Move the logic to locate a heuristic entry from the table and compute
  the sequential count out of VOP_READ() and into a separate routine.
- Use the logic from sequential_heuristic() in vfs_vnops.c to update the
  seqcount when a sequential access is performed rather than just
  increasing seqcount by 1.  This lets the clustering count ramp up
  faster.
- Allow for some reordering of RPCs and if it is detected leave the current
  seqcount as-is rather than dropping back to a seqcount of 1.  Also,
  when out of order access is encountered, cut seqcount in half rather than
  dropping it all the way back to 1 to further aid with reordering.
- Fix the new NFS server to properly update the next offset after a
  successful VOP_READ() so that the readahead actually works.

Some of these changes came from an earlier patch by Bjorn Gronwall that was
forwarded to me by bde@.

Discussed with:	bde, rmacklem, fs@
Submitted by:	Bjorn Gronwall (1, 4)
MFC after:	2 weeks
2011-12-01 18:46:28 +00:00
Konstantin Belousov
dc874f9881 Rename vm_page_set_valid() to vm_page_set_valid_range().
The vm_page_set_valid() is the most reasonable name for the m->valid
accessor.

Reviewed by:	attilio, alc
2011-11-30 17:39:00 +00:00
Kevin Lo
bdcdb55387 Add unicode support to ntfs
Obtained from:	imura
2011-11-27 15:43:49 +00:00
Mikolaj Golub
beb7471b16 In procfs_doproccmdline() if arguments are not cashed read them from
the process stack.

Suggested by:	kib
Reviewed by:	kib
Tested by:	pho
MFC after:	2 weeks
2011-11-22 20:43:03 +00:00
Ivan Voras
6e92aee4e2 Avoid panics from recursive rename operations. Not a perfect patch but
good enough for now.

PR:		kern/159418
Submitted by:	Gleb Kurtsou
Reviewed by:	kib
MFC after:	1 month
2011-11-22 16:18:12 +00:00
Konstantin Belousov
54cf919857 Put all the messages from msdosfs under the MSDOSFS_DEBUG ifdef.
They are confusing to user, and not informative for general consumption.

MFC after:	1 week
2011-11-22 13:30:36 +00:00
Rick Macklem
6854d64811 This patch enables the new/default NFS server's use of shared
vnode locking for read, readdir, readlink, getattr and access.
It is hoped that this will improve server performance for these
operations, since they will no longer be serialized for a given
file/vnode.
2011-11-22 00:35:30 +00:00
Xin LI
296a25a245 Improve the way to calculate available pages in tmpfs:
- Don't deduct wired pages from total usable counts because it does not
   make any sense.  To make things worse, on systems where swap size is
   smaller than physical memory and use a lot of wired pages (e.g. ZFS),
   tmpfs can suddenly have free space of 0 because of this;
 - Count cached pages as available; [1]
 - Don't count inactive pages as available, technically we could but that
   might be too aggressive; [1]

[1] Suggested by kib@

MFC after:	1 week
2011-11-21 20:26:22 +00:00
Rick Macklem
f9340edfc0 Clean up some cruft in the NFSv4 client left over from the
OpenBSD port, so that it is more readable. No logic change
is made by this commit.

MFC after:	2 weeks
2011-11-21 16:06:23 +00:00
Rick Macklem
034235528f Add two arguments to the nfsrpc_rellockown() function in the NFSv4
client. This does not change the client's behaviour, but prepares
the code so that nfsrpc_rellockown() can be called elsewhere in a
future commit.

MFC after:	2 weeks
2011-11-20 16:46:50 +00:00
Rick Macklem
d57a9d5f52 Since the nfscl_cleanup() function isn't used by the FreeBSD NFSv4 client,
delete the code and fix up the related comments. This should not have
any functional effect on the client.

MFC after:	2 weeks
2011-11-20 01:18:47 +00:00
Rick Macklem
2f27585ef9 Post r223774 the NFSv4 client never uses the linked list with the
head nfsc_defunctlockowner. This patch simply removes the code that
loops through this always empty list, since the code no longer does
anything useful. It should not have any effect on the client's
behaviour.

MFC after:	2 weeks
2011-11-20 00:39:15 +00:00
Konstantin Belousov
f82360acf2 Existing VOP_VPTOCNP() interface has a fatal flow that is critical for
nullfs.  The problem is that resulting vnode is only required to be
held on return from the successfull call to vop, instead of being
referenced.

Nullfs VOP_INACTIVE() method reclaims the vnode, which in combination
with the VOP_VPTOCNP() interface means that the directory vnode
returned from VOP_VPTOCNP() is reclaimed in advance, causing
vn_fullpath() to error with EBADF or like.

Change the interface for VOP_VPTOCNP(), now the dvp must be
referenced. Convert all in-tree implementations of VOP_VPTOCNP(),
which is trivial, because vhold(9) and vref(9) are similar in the
locking prerequisites. Out-of-tree fs implementation of VOP_VPTOCNP(),
if any, should have no trouble with the fix.

Tested by:	pho
Reviewed by:	mckusick
MFC after:	3 weeks (subject of re approval)
2011-11-19 07:50:49 +00:00
Konstantin Belousov
f82ee01c1c Do not use NULLVPTOLOWERVP() in the null_print(). If diagnostic is compiled
in, and show vnode is used from ddb on the faulty nullfs vnode, we get
panic instead of vnode dump.

MFC after:	1 week
2011-11-19 07:41:37 +00:00
Konstantin Belousov
4d2310dd81 Use the plain panic calls, without additional printing around them.
The debugger and dumping support is adequate.

Tested by:	pho
MFC after:	1 week
2011-11-19 07:40:13 +00:00
Kevin Lo
41f1dccceb Add unicode support to msdosfs and smbfs; original pathes from imura,
bug fixes by Kuan-Chung Chiu <buganini at gmail dot com>.

Tested by me in production for several days at work.
2011-11-18 03:05:20 +00:00
Konstantin Belousov
1fb5311e00 Fix build, use %d for int value formatting. 2011-11-16 18:41:59 +00:00
Peter Holm
50546f8ffe Handle invalid large values for getdirentries(2) data buffer size.
In collaboration with:	kib
Reviewed by:	des
Reported by:	The iknowthis syscall fuzzer.
MFC after:	1 week
2011-11-16 10:11:55 +00:00
Rick Macklem
a5e583eea0 Modify the new NFS client so that nfs_fsync() only calls ncl_flush()
for regular files. Since other file types don't write into the
buffer cache, calling ncl_flush() is almost a no-op. However, it does
clear the NMODIFIED flag and this shouldn't be done by nfs_fsync() for
directories.

MFC after:	2 weeks
2011-11-15 23:35:43 +00:00
Peter Holm
3c93d4433f Removed extra PRELE() call.
MFC after:	1 week
2011-11-15 09:23:21 +00:00
Rick Macklem
e42a8d7e24 Move the setting of the default value for nm_wcommitsize to
before the nfs_decode_args() call in the new NFS client, so
that a specfied command line value won't be overwritten.
Also, modify the calculation for small values of desiredvnodes
to avoid an unusually large value or a divide by zero crash.
It seems that the default value for nm_wcommitsize is very
conservative and may need to change at some time.

PR:		kern/159351
Submitted by:	onwahe at gmail.com (earlier version)
Reviewed by:	jhb
MFC after:	2 weeks
2011-11-15 01:39:02 +00:00
John Baldwin
840fb1c02b Finish making 'wcommitsize' an NFS client mount option.
Reviewed by:	rmacklem
MFC after:	1 week
2011-11-14 18:52:07 +00:00
John Baldwin
e43c042fec Sync with the old NFS client: Remove an obsolete comment. 2011-11-14 18:23:50 +00:00
Rick Macklem
670bf6f126 Since NFSv4 byte range locking only works for regular files,
add a sanity check for the vnode type to the NFSv4 client.

MFC after:	2 weeks
2011-11-14 00:10:11 +00:00
Rick Macklem
90379d6116 Move the assignment of default values for some mount options
to before the nfs_decode_args() call in the new NFS client,
so they don't overwrite the value specified on the command line.

MFC after:	2 weeks
2011-11-13 23:09:26 +00:00
Eitan Adler
3b6dc18ef5 - fix duplicate "a a" in some comments
Submitted by:	eadler
Approved by:	simon
MFC after:	3 days
2011-11-13 17:06:33 +00:00
Konstantin Belousov
c8997bf02a Lock the thread lock around block that retrieves td_wmesg. Otherwise,
procfs could see a thread with assigned td_wchan but still NULL td_wmesg.

Reported and tested by:	pho
MFC after:	1 week
2011-11-09 17:15:51 +00:00
Marcel Moolenaar
82543c5928 Don astbestos garment and remove the warning about TMPFS being experimental
-- highly experimental even. So far the closest to a bug in TMPFS that people
have gotten to relates to how ZFS can take away from the memory that TMPFS
needs. One can argue that such is not a bug in TMPFS. Irrespective, even if
there is a bug here and there in TMPFS, it's not in our own advantage to
scare people away from using TMPFS. I for one have been using it, even with
ZFS, very successfully.
2011-11-07 16:21:50 +00:00
Ed Schouten
6472ac3d8a Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
Ed Schouten
d745c852be Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.
This means that their use is restricted to a single C file.
2011-11-07 06:44:47 +00:00
Ed Schouten
8f80f103b4 Remove MALLOC_DECLAREs of nonexisting malloc-pools.
After careful grepping, it seems none of these pools can be found in our
source tree. They are not in use, nor are they defined.
2011-11-06 20:16:50 +00:00
Konstantin Belousov
25cc6027cf Fix typo.
MFC after:	3 days
2011-11-05 09:04:13 +00:00
John Baldwin
dccc45e4c0 Move the cleanup of f_cdevpriv when the reference count of a devfs
file descriptor drops to zero out of _fdrop() and into devfs_close_f()
as it is only relevant for devfs file descriptors.

Reviewed by:	kib
MFC after:	1 week
2011-11-04 03:39:31 +00:00
Konstantin Belousov
1fef78c3f0 Fix kernel panic when d_fdopen csw method is called for NULL fp.
This may happen when kernel consumer calls VOP_OPEN().

Reported by:	Tavis Ormandy <taviso  cmpxchg8b com> through delphij
MFC after:	3 days
2011-11-03 18:55:18 +00:00
Peter Holm
948fa27d49 Added missing cache purge of from argument for rename().
Reported by:	Anton Yuzhaninov <citrin citrin ru>
In collaboration with:	kib
MFC after:	1 week
2011-11-01 12:33:06 +00:00
Konstantin Belousov
17edcd764d The use of VOP_ISLOCKED() without a check for the return values can cause
false positives. Replace the #ifdef block with the proper
ASSERT_VOP_UNLOCKED() assert.

Tested by:	pho
MFC after:	1 week
2011-10-24 13:56:31 +00:00
Konstantin Belousov
234ab7412e The only possible error return from null_nodeget() is due to insmntque1
failure (the getnewvnode cannot return an error). In this case, the
null_insmntque_dtr() already unlocked the reclaimed vnode, so VOP_UNLOCK()
in the nullfs_mount() after null_nodeget() failure is wrong.

Tested by:	pho
MFC after:	1 week
2011-10-24 13:53:32 +00:00
Konstantin Belousov
ffa43617e8 The covered vnode must be reloced if it was unlocked. Remove VOP_ISLOCKED
test because of this and also because it can lead to false positives.

Tested by:	pho
MFC after:	1 week
2011-10-24 13:48:13 +00:00
Peter Holm
9ce7379778 Only unlock if the lock is exclusive.
Reported by:	Subbsd <subbsd gmail com>
Discussed with:	kib
2011-10-24 10:35:37 +00:00
Dag-Erling Smørgrav
0fc93d0b00 Trace attempts to open a portal device.
Ceterum censeo portalfs esse delendam.
2011-10-18 07:31:49 +00:00
Edward Tomasz Napierala
5c0c5a182f Make unionfs also clear VAPPEND when clearing VWRITE, since VAPPEND
is just a modifier for VWRITE.

Submitted by:	rmacklem
2011-10-10 21:32:08 +00:00
Konstantin Belousov
084e62e91b Export devfs inode number allocator for the kernel consumers.
Reviewed by:	jhb
MFC after:	2 weeks
2011-10-05 16:50:15 +00:00
Kip Macy
8451d0dd78 In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
Konstantin Belousov
3407fefef6 Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.

Document the changes to flags field to only require the page lock.

Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.

Reviewed by:    alc, attilio
Tested by:      marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by:	re (bz)
2011-09-06 10:30:11 +00:00
Rick Macklem
322c8d9e25 Fix the NFS servers so that they can do a Lookup of "..",
which requires that ni_strictrelative be set to 0, post-r224810.

Tested by:	swills (earlier version), geo dot liaskos at gmail.com
Approved by:	re (kib)
2011-09-03 00:28:53 +00:00
Rick Macklem
de67b4966c Fix the NFSv4 server so that it returns NFSERR_SYMLINK when
an attempt to do an Open operation on any type of file other
than VREG is done. A recent discussion on the IETF working group's
mailing list (nfsv4@ietf.org) decided that NFSERR_SYMLINK
should be returned for all non-regular files and not just symlinks,
so that the Linux client would work correctly.
This change does not affect the FreeBSD NFSv4 client and is not
believed to have a negative effect on other NFSv4 clients.

Reviewed by:	zkirsch
Approved by:	re (kib)
MFC after:	2 weeks
2011-08-20 21:26:35 +00:00
Konstantin Belousov
4c023a3365 Do not return success and a string "unknown" when vn_fullpath() was unable
to resolve the path of the text vnode of the process. The behaviour is
very confusing for any consumer of the procfs, in particular, java.

Reported and tested by:	bf
MFC after:	2 weeks
Approved by:	re (bz)
2011-08-16 20:13:17 +00:00
Konstantin Belousov
9c00bb9190 Add the fo_chown and fo_chmod methods to struct fileops and use them
to implement fchown(2) and fchmod(2) support for several file types
that previously lacked it. Add MAC entries for chown/chmod done on
posix shared memory and (old) in-kernel posix semaphores.

Based on the submission by:	glebius
Reviewed by:	rwatson
Approved by:	re (bz)
2011-08-16 20:07:47 +00:00
Jonathan Anderson
985a88e2a6 Fix a merge conflict.
r224086 added "goto out"-style error handling to nfssvc_nfsd(), in order
to reliably call NFSEXITCODE() before returning. Our Capsicum changes,
based on the old "return (error)" model, did not merge nicely.

Approved by: re (kib), mentor (rwatson)
Sponsored by: Google Inc
2011-08-16 14:23:16 +00:00
Robert Watson
a9d2f8d84f Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *.  With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by:	re (bz)
Submitted by:	jonathan
Sponsored by:	Google Inc
2011-08-11 12:30:23 +00:00
Konstantin Belousov
e047ade947 Do not update mountpoint generation counter to the value which was not
yet acted upon by devfs_populate().

Submitted by:	Kohji Okuno <okuno.kohji jp panasonic com>
Approved by:	re (bz)
MFC after:	1 week
2011-08-09 20:53:33 +00:00
Zack Kirsch
06521fbb49 Fix an NFS server issue where it was not correctly setting the eof flag when a
READ had hit the end of the file. Also, clean up some cruft in the code.

Approved by:    re (kib)
Reviewed by:    rmacklem
MFC after:      2 weeks
2011-08-03 18:50:19 +00:00
Rick Macklem
e2eb210c09 Fix a LOR in the NFS client which could cause a deadlock.
This was reported to the mailing list freebsd-net@freebsd.org
on July 21, 2011 under the subject "LOR with nfsclient sillyrename".
The LOR occurred when nfs_inactive() called vrele(sp->s_dvp)
while holding the vnode lock on the file in s_dvp. This patch
modifies the client so that it performs the vrele(sp->s_dvp)
as a separate task to avoid the LOR. This fix was discussed
with jhb@ and kib@, who both proposed variations of it.

Tested by:	pho, jlott at averesystems.com
Submitted by:	jhb (earlier version)
Reviewed by:	kib
Approved by:	re (kib)
MFC after:	2 weeks
2011-08-02 11:28:42 +00:00
Rick Macklem
6b3dfc6ab0 Fix rename in the new NFS server so that it does not require a
recursive vnode lock on the directory for the case where the
new file name is in the same directory as the old one. The patch
handles this as a special case, recognized by the new directory
having the same file handle as the old one and just VREF()s the old
dir vnode for this case, instead of doing a second VFS_FHTOVP() to get it.
This is required so that the server will work for file systems like
msdosfs, that do not support recursive vnode locking.
This problem was discovered during recent testing by pho@
when exporting an msdosfs file system via the new NFS server.

Tested by:	pho
Reviewed by:	zkirsch
Approved by:	re (kib)
MFC after:	2 weeks
2011-07-31 20:06:11 +00:00
Rick Macklem
d1907de2ba The new NFS client failed to vput() the new vnode if a setattr
failed after the file was created in nfs_create(). This would
probably only happen during a forced dismount. The old NFS client
does have a vput() for this case. Detected by pho during recent
testing, where an open syscall returned with a vnode still locked.

Tested by:	pho
Approved by:	re (kib)
MFC after:	2 weeks
2011-07-30 22:57:38 +00:00