Commit Graph

65 Commits

Author SHA1 Message Date
John Baldwin
574862c8ba Enhance the sequential access heuristic used to perform readahead in the
NFS server and reuse it for writes as well to allow writes to the backing
store to be clustered.
- Use a prime number for the size of the heuristic table (1017 is not
  prime).
- Move the logic to locate a heuristic entry from the table and compute
  the sequential count out of VOP_READ() and into a separate routine.
- Use the logic from sequential_heuristic() in vfs_vnops.c to update the
  seqcount when a sequential access is performed rather than just
  increasing seqcount by 1.  This lets the clustering count ramp up
  faster.
- Allow for some reordering of RPCs and if it is detected leave the current
  seqcount as-is rather than dropping back to a seqcount of 1.  Also,
  when out of order access is encountered, cut seqcount in half rather than
  dropping it all the way back to 1 to further aid with reordering.
- Fix the new NFS server to properly update the next offset after a
  successful VOP_READ() so that the readahead actually works.

Some of these changes came from an earlier patch by Bjorn Gronwall that was
forwarded to me by bde@.

Discussed with:	bde, rmacklem, fs@
Submitted by:	Bjorn Gronwall (1, 4)
MFC after:	2 weeks
2011-12-01 18:46:28 +00:00
Rick Macklem
6854d64811 This patch enables the new/default NFS server's use of shared
vnode locking for read, readdir, readlink, getattr and access.
It is hoped that this will improve server performance for these
operations, since they will no longer be serialized for a given
file/vnode.
2011-11-22 00:35:30 +00:00
Kip Macy
8451d0dd78 In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
Rick Macklem
322c8d9e25 Fix the NFS servers so that they can do a Lookup of "..",
which requires that ni_strictrelative be set to 0, post-r224810.

Tested by:	swills (earlier version), geo dot liaskos at gmail.com
Approved by:	re (kib)
2011-09-03 00:28:53 +00:00
Jonathan Anderson
985a88e2a6 Fix a merge conflict.
r224086 added "goto out"-style error handling to nfssvc_nfsd(), in order
to reliably call NFSEXITCODE() before returning. Our Capsicum changes,
based on the old "return (error)" model, did not merge nicely.

Approved by: re (kib), mentor (rwatson)
Sponsored by: Google Inc
2011-08-16 14:23:16 +00:00
Robert Watson
a9d2f8d84f Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *.  With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by:	re (bz)
Submitted by:	jonathan
Sponsored by:	Google Inc
2011-08-11 12:30:23 +00:00
Zack Kirsch
a9285ae5c4 Add DEXITCODE plumbing to NFS.
Isilon has the concept of an in-memory exit-code ring that saves the last exit
code of a function and allows for stack tracing. This is very helpful when
debugging tough issues.

This patch is essentially a no-op for BSD at this point, until we upstream
the dexitcode logic itself. The patch adds DEXITCODE calls to every NFS
function that returns an errno error code. A number of code paths were also
reorganized to have single exit paths, to reduce code duplication.

Submitted by:   David Kwan <dkwan@isilon.com>
Reviewed by:    rmacklem
Approved by:    zml (mentor)
MFC after:      2 weeks
2011-07-16 08:51:09 +00:00
Zack Kirsch
68347a92db Simple find/replace of VOP_ISLOCKED -> NFSVOPISLOCKED. This is done so that NFSVOPISLOCKED can be modified later to add enhanced logging and assertions.
Reviewed by:    rmacklem
Approved by:    zml (mentor)
MFC after:      2 weeks
2011-07-16 08:05:41 +00:00
Zack Kirsch
a998963469 Simple find/replace of VOP_UNLOCK -> NFSVOPUNLOCK. This is done so that NFSVOPUNLOCK can be modified later to add enhanced logging and assertions.
Reviewed by:    rmacklem
Approved by:    zml (mentor)
MFC after:      2 weeks
2011-07-16 08:05:36 +00:00
Zack Kirsch
98f234f338 Simple find/replace of vn_lock -> NFSVOPLOCK. This is done so that NFSVOPLOCK can be modified later to add enhanced logging and assertions.
Reviewed by:    rmacklem
Approved by:    zml (mentor)
MFC after:      2 weeks
2011-07-16 08:05:31 +00:00
Zack Kirsch
c383087c0c Remove unnecessary thread pointer from VOPLOCK macros and current users.
Reviewed by:    rmacklem
Approved by:    zml (mentor)
MFC after:      2 weeks
2011-07-16 08:05:26 +00:00
Zack Kirsch
40435b74f4 Move nfsvno_pathconf to be accessible to sys/fs/nfs; no functionality change.
Reviewed by:    rmacklem
Approved by:    zml (mentor)
MFC after:      2 weeks
2011-07-16 08:05:17 +00:00
Zack Kirsch
b008a72c86 Small acl patch to return the aclerror that comes back from nfsrv_dissectacl(). This fixes a problem where ATTRNOTSUPP was being returned instead of BADOWNER.
Reviewed by:    rmacklem
Approved by:    zml (mentor)
MFC after:      2 weeks
2011-07-16 08:04:57 +00:00
Rick Macklem
c5c142f652 Modify the new NFS server so that the NFSv3 Pathconf RPC
doesn't return an error when the underlying file system
lacks support for any of the four _PC_xxx values used, by
falling back to default values.

Tested by:	avg
MFC after:	2 weeks
2011-06-04 01:13:09 +00:00
Rick Macklem
694a586a43 Add a lock flags argument to the VFS_FHTOVP() file system
method, so that callers can indicate the minimum vnode
locking requirement. This will allow some file systems to choose
to return a LK_SHARED locked vnode when LK_SHARED is specified
for the flags argument. This patch only adds the flag. It
does not change any file system to use it and all callers
specify LK_EXCLUSIVE, so file system semantics are not changed.

Reviewed by:	kib
2011-05-22 01:07:54 +00:00
Rick Macklem
a0c2c3691c Change the new NFS server so that it uses vfs.nfsd naming
for its sysctls instead of vfs.newnfs. This separates the
names from the ones used by the client.
2011-05-08 01:01:27 +00:00
Rick Macklem
78e4b1f838 Change the new NFS server so that it returns 0 when the f_bavail
or f_ffree fields of "struct statfs" are negative, since the
values that go on the wire are unsigned and will appear to be
very large positive values otherwise. This makes the handling
of a negative f_bavail compatible with the old/regular NFS server.

MFC after:	2 weeks
2011-05-06 01:29:14 +00:00
Rick Macklem
a09001a82b Fix the experimental NFSv4 server so that it uses VOP_PATHCONF()
to determine if a file system supports NFSv4 ACLs. Since
VOP_PATHCONF() must be called with a locked vnode, the function
is called before nfsvno_fillattr() and the result is passed in
as an extra argument.

MFC after:	2 weeks
2011-04-14 23:46:15 +00:00
Rick Macklem
07c0c166e4 Modify the experimental NFSv4 server so that it handles
crossing of server mount points properly. The functions
nfsvno_fillattr() and nfsv4_fillattr() were modified to
take the extra arguments that are the mount point, a flag
to indicate that it is a file system root and the mounted
on fileno. The mount point argument needs to be busy when
nfsvno_fillattr() is called, since the vp argument is not
locked.

Reviewed by:	kib
MFC after:	2 weeks
2011-04-14 21:49:52 +00:00
Rick Macklem
f659876f01 Vrele ni_startdir in the experimental NFS server for the case
of NFSv2 getting an error return from VOP_MKNOD(). Without this
patch, the server file system remains busy after an NFSv2
VOP_MKNOD() fails.

MFC after:	2 weeks
2011-04-11 20:54:30 +00:00
Rick Macklem
806e2e4bb6 Add some cleanup code to the module unload operation for
the experimental NFS server, so that it doesn't leak memory
when unloaded. However, unloading the NFSv4 server is not
recommended, since all NFSv4 state will be lost by the unload
and clients will have to recover the state after a server
reload/restart as if the server crashed/rebooted.

MFC after:	2 weeks
2011-04-10 20:43:07 +00:00
Rick Macklem
8d2f180ea4 Add a VOP_UNLOCK() for the directory, when that is not what
VOP_LOOKUP() returned. This fixes a bug in the experimental
NFS server for the case where VFS_VGET() fails returning EOPNOTSUPP
in the ReaddirPlus RPC, forcing the use of VOP_LOOKUP() instead.

MFC after:	2 weeks
2011-04-09 23:55:27 +00:00
Alexander Leidinger
de5b19526b Add some FEATURE macros for various features (AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/
PMC/SYSV/...).

No FreeBSD version bump, the userland application to query the features will
be committed last and can serve as an indication of the availablility if
needed.

Sponsored by:   Google Summer of Code 2010
Submitted by:   kibab
Reviewed by:    arch@ (parts by rwatson, trasz, jhb)
X-MFC after:    to be determined in last commit with code from this project
2011-02-25 10:11:01 +00:00
Alan Cox
17f3095d1a Unless "cnt" exceeds MAX_COMMIT_COUNT, nfsrv_commit() and nfsvno_fsync() are
incorrectly calling vm_object_page_clean().  They are passing the length of
the range rather than the ending offset of the range.

Perform the OFF_TO_IDX() conversion in vm_object_page_clean() rather than the
callers.

Reviewed by:	kib
MFC after:	3 weeks
2011-02-05 21:21:27 +00:00
Rick Macklem
5f73287a6e Modify the experimental NFSv4 server so that it posts a SIGUSR2
signal to the master nfsd daemon whenever the stable restart
file has been modified. This will allow the master nfsd daemon
to maintain an up to date backup copy of the file. This is
enabled via the nfssvc() syscall, so that older nfsd daemons
will not be signaled.

Reviewed by:	jhb
MFC after:	1 week
2011-01-14 23:30:35 +00:00
Zack Kirsch
52776c502b Clean up the experimental NFS server replay cache when the module is unloaded.
Reviewed by:    rmacklem
Approved by:    zml (mentor)
2011-01-12 23:34:09 +00:00
Rick Macklem
f9266eb1f9 Modify readdirplus in the experimental NFS server in a
manner analogous to r216633 for the regular server. This
change busies the file system so that VFS_VGET() is
guaranteed to be using the correct mount point even
during a forced dismount attempt. Since nfsd_fhtovp() is
not called immediately before readdirplus, the patch is
actually a clone of pjd@'s nfs_serv.c.4.patch instead of
the one committed in r216633.

Reviewed by:	kib
MFC after:	10 days
2011-01-09 02:10:54 +00:00
Rick Macklem
8974bc2f3a Since the VFS_LOCK_GIANT() code in the experimental NFS
server is broken and the major file systems are now all
mpsafe, modify the server so that it will only export
mpsafe file systems. This was discussed on freebsd-fs@
and removes a fair bit of crufty code.

MFC after:	12 days
2011-01-06 19:50:11 +00:00
Rick Macklem
47524363da Fix the experimental NFS server to use vfs_busyfs() instead
of vfs_getvfs() so that the mount point is busied for the
VFS_FHTOVP() call. This is analagous to r185432 for the
regular NFS server.

Reviewed by:	kib
MFC after:	12 days
2011-01-05 18:46:05 +00:00
Rick Macklem
90305aa38b Fix the nlm so that it no longer depends on the regular
nfs client and, as such, can be loaded for the experimental
nfs client without the regular client.

Reviewed by:	jhb
MFC after:	2 weeks
2011-01-03 20:37:31 +00:00
Rick Macklem
fa5ecdd3b9 Fix the experimental NFS server so that it doesn't leak
a reference count on the directory when creating device
special files.

MFC after:	2 weeks
2011-01-03 00:40:13 +00:00
Rick Macklem
c9aad40f5f Delete some cruft from the experimental NFS server that was
only used by the OpenBSD port for its pseudo-fs.

MFC after:	2 weeks
2011-01-02 21:34:01 +00:00
Rick Macklem
629fa50e68 Add checks for VI_DOOMED and vn_lock() failures to the
experimental NFS server, to handle the case where an
exported file system is forced dismounted while an RPC
is in progress. Further commits will fix the cases where
a mount point is used when the associated vnode isn't locked.

Reviewed by:	kib
MFC after:	2 weeks
2011-01-02 19:58:39 +00:00
Rick Macklem
bd2fa726e0 Delete the nfsvno_localconflict() function in the experimental
NFS server since it is no longer used and is broken.

MFC after:	2 weeks
2010-12-28 23:50:13 +00:00
Rick Macklem
17891d0082 Modify the experimental NFS server so that it uses LK_SHARED
for RPC operations when it can. Since VFS_FHTOVP() currently
always gets an exclusively locked vnode and is usually called
at the beginning of each RPC, the RPCs for a given vnode will
still be serialized. As such, passing a lock type argument to
VFS_FHTOVP() would be preferable to doing the vn_lock() with
LK_DOWNGRADE after the VFS_FHTOVP() call.

Reviewed by:	kib
MFC after:	2 weeks
2010-12-25 21:56:25 +00:00
Rick Macklem
0cf42b622b Add an argument to nfsvno_getattr() in the experimental
NFS server, so that it can avoid calling VOP_ISLOCKED()
when the vnode is known to be locked. This will allow
LK_SHARED to be used for these cases, which happen to
be all the cases that can use LK_SHARED. This does not
fix any bug, but it reduces the number of calls to
VOP_ISLOCKED() and prepares the code so that it can be
switched to using LK_SHARED in a future patch.

Reviewed by:	kib
MFC after:	2 weeks
2010-12-24 21:31:18 +00:00
Rick Macklem
a852f40b7a Simplify vnode locking in the expeimental NFS server's
readdir functions. In particular, get rid of two bogus
VOP_ISLOCKED() calls. Removing the VOP_ISLOCKED() calls
is the only actual bug fixed by this patch.

Reviewed by:	kib
MFC after:	2 weeks
2010-12-24 20:24:07 +00:00
Rick Macklem
63e1cb4308 Since VOP_READDIR() for ZFS does not return monotonically
increasing directory offset cookies, disable the UFS related
loop that skips over directory entries at the beginning of
the block for the experimental NFS server. This loop is
required for UFS since it always returns directory entries
starting at the beginning of the block that
the requested directory offset is in. In discussion with pjd@
and mckusick@ it seems that this behaviour of UFS should maybe
change, with this fix being an interim patch until then.
This patch only fixes the experimental server, since pjd@ is
working on a patch for the regular server.

Discussed with:	pjd, mckusick
MFC after:	5 days
2010-12-24 18:46:44 +00:00
Rick Macklem
377c50f67a Modify the experimental NFSv4 server's file handle hash function
to use the generic hash32_buf() function. Although adding the
bytes seemed sufficient for UFS and ZFS, since most of the bytes
are the same for file handles on the same volume, this might not
be sufficient for other file systems. Use of a generic function
also seems preferable to one specific to NFSv4.

Suggested by:	gleb.kurtsou at gmail.com
MFC after:	10 days
2010-10-23 22:28:29 +00:00
Rick Macklem
91027b4ef0 Modify the file handle hash function in the experimental NFS
server so that it will work better for non-UFS file systems.
The new function simply sums the bytes of the fh_fid field
of fhandle_t.

MFC after:	10 days
2010-10-22 21:38:56 +00:00
Rick Macklem
8a1b5ade5f Modify the experimental NFS server in a manner analagous to
r214049 for the regular NFS server, so that it will not do
a VOP_LOOKUP() of ".." when at the root of a file system
when performing a ReaddirPlus RPC.

MFC after:	10 days
2010-10-21 18:49:12 +00:00
Rick Macklem
c7aafc24c4 Fix the experimental NFSv4 server so that it performs local VOP_ADVLOCK()
unlock operations correctly. It was passing in F_SETLK instead of
F_UNLCK as the operation for the unlock case. This only affected
operation when local locking (vfs.newnfs.enable_locallocks=1) was enabled.

MFC after:	1 week
2010-09-19 01:05:19 +00:00
Rick Macklem
a8437c97f1 Add MODULE_DEPEND() macros to the experimental NFS client and
server so that the modules will load when kernels are built with
none of the NFS* configuration options specified. I believe this
resolves the problems reported by PR kern/144458 and the email on
freebsd-stable@ posted by Dmitry Pryanishnikov on June 13.

Tested by:	kib
PR:		kern/144458
Reviewed by:	kib
MFC after:	1 week
2010-06-15 00:25:04 +00:00
Rick Macklem
54bde1faa5 Harden the experimental NFS server a little, by adding extra checks
in the readdir functions for non-positive byte count arguments.
For the negative case, set it to the maximum allowable, since it
was actually a large positive value (unsigned) on the wire.
Also, fix up the readdir function comment a bit.

Suggested by:	dillon AT apollo.backplane.com
MFC after:	2 weeks
2010-04-04 23:19:11 +00:00
Rick Macklem
15b28cb82d For the experimental NFS server, add a call to free the lookup
path buffer for one case where it was missing when doing mkdir.
This could have conceivably resulted in a leak of a buffer, but
a leak was never observed during testing, so I suspect it would
have occurred rarely, if ever, in practice.

MFC after:	2 weeks
2010-04-02 02:19:28 +00:00
Rick Macklem
7482701cd4 Patch the experimental NFS server in a manner analagous to r205661
for the regular NFS server, to ensure that ESTALE is
returned to the client for all errors returned by VFS_FHTOVP().

MFC after:	2 weeks
2010-03-26 01:35:19 +00:00
Robert Watson
2684bef615 Update nfsrv_getsocksndseq() for changes in TCP internals since FreeBSD 6.x:
- so_pcb is now guaranteed to be non-NULL and valid if a valid socket
  reference is held.

- Need to check INP_TIMEWAIT and INP_DROPPED before assuming inp_ppcb is a
  tcpcb, as it might be a tcptw or NULL otherwise.

- tp can never be NULL by the end of the function, so only check
  TCPS_ESTABLISHED before extracting tcpcb fields.

The NFS server arguably incorporates too many assumptions about TCP
internals, but fixing that is left for nother day.

MFC after:		1 week
Reviewed by:		bz
Reviewed and tested by:	rmacklem
Sponsored by:		Juniper Networks
2010-03-11 11:33:04 +00:00
Rick Macklem
8da45f2c6e Modify the experimental server so that it uses VOP_ACCESSX().
This is necessary in order to enable NFSv4 ACL support. The
argument to nfsvno_accchk() was changed to an accmode_t and
the function nfsrv_aclaccess() was no longer needed and,
therefore, deleted.

Reviewed by:	trasz
MFC after:	2 weeks
2009-12-25 20:44:19 +00:00
Rick Macklem
38e3ea69d4 Modify the experimental nfs server so that it falls back to
using VOP_LOOKUP() when VFS_VGET() returns EOPNOTSUPP in the
ReaddirPlus RPC. This patch is based upon one by pjd@ for the
regular nfs server which has not yet been committed. It is needed
when a ZFS volume is exported and ReaddirPlus (which almost
always happens for NFSv4) is performed by a client. The patch
also simplifies vnode lock handling somewhat.

MFC after:	2 weeks
2009-11-23 16:08:15 +00:00
Rick Macklem
086f6e0cc7 Patch the experimental NFS server is a manner analagous to
r197525, so that the creation verifier is handled correctly
in va_atime for 64bit architectures. There were two problems.
One was that the code incorrectly assumed that
sizeof (struct timespec) == 8 and the other was that the tv_sec
field needs to be assigned from a signed 32bit integer, so that
sign extension occurs on 64bit architectures. This is required
for correct operation when exporting ZFS volumes.

Reviewed by:	pjd
MFC after:	2 weeks
2009-11-20 21:21:13 +00:00