Commit Graph

436 Commits

Author SHA1 Message Date
Doug Rabson
a097f2cc06 We need to pass a structure with enough space for an NFSv2 filehandle to
nfs_srvmtofh_xx otherwise bad things happen when an NFSv2 client tries to
make a request.
2008-12-10 14:49:54 +00:00
Alexander Kabaev
7d464bbf38 Change nfsserver slightly so that it does not trip over the timestamp
validation code on ZFS.

Problem: when opening file with O_CREAT|O_EXCL NFS has to jump through
extra hoops to ensure O_EXCL semantics. Namely, client supplies of 8
bytes (NFSX_V3CREATEVERF) bytes of verification data to uniquely
identify this create request. Server then creates a new file with access
mode 0, copies received 8 bytes into va_atime member of struct vattr and
attempt to set the atime on file using VOP_SETATTR. If that succeeds, it
fetches file attributes with VOP_GETATTR and verifies that atime
timestamps match.  If timestamps do not match, NFS server concludes it
has probbaly lost the race to another process creating the file with the
same name and bails with EEXIST.

This scheme works OK when exported FS is FFS, but if underlying
filesystem is ZFS _and_ server is running 64bit kernel, it breaks down
due to sanity checking in zfs_setattr function, which refuses to accept
any timestamps which have tv_sec that cannot be represented as 32bit
int. Since struct timespec fields are 64 bit integers on 64bit platforms
and server just copies NFSX_V3CREATEVERF bytes info va_atime, all eight
bytes supplied by client end up in va_atime.tv_sec, forcing it out of
valid 32bit range.

The solution this change implements is simple: it treats
NFSX_V3CREATEVERF as two 32bit integers and unpacks them separately into
va_atime.tv_sec and va_atime.tv_nsec respectively, thus guaranteeing
that tv_sec remains in 32 bit range and ZFS remains happy.

Reviewed by: kib
2008-12-03 17:54:09 +00:00
Konstantin Belousov
6179164448 In the nfsrv_fhtovp(), after the vfs_getvfs() function found the pointer
to the fs, but before a vnode on the fs is locked, unmount may free fs
structures, causing access to destroyed data and freed memory.

Introduce a vfs_busymp() function that looks up and busies found
fs while mountlist_mtx is held. Use it in nfsrv_fhtovp() and in the
implementation of the handle syscalls.

Two other uses of the vfs_getvfs() in the vfs_subr.c, namely in
sysctl_vfs_ctl and vfs_getnewfsid seems to be ok. In particular,
sysctl_vfs_ctl is protected by Giant by being a non-sleeping sysctl
handler, that prevents Giant-locked unmount code to interfere with it.

Noted by:	tegge
Reviewed by:	dfr
Tested by:	pho
MFC after:	1 month
2008-11-29 13:34:59 +00:00
Doug Rabson
335317d291 Switch the default rpc implementation for NFS back to the new code. I believe
I have fixed the reported problems - if you still have trouble with it, please
contact me with as much detail as possible so that I can track down any other
issues as quickly as possible.
2008-11-14 11:27:53 +00:00
Doug Rabson
36b83ac1b0 Use the remote address for access control, not the local address. This fixes
the nfsd problems that some people have with the new code.

Add support for the vfs.nfsrv.nfs_privport sysctl which denies access unless
the client is using a port number less than 1024. Not really sure if this is
particularly useful since it doesn't add any real security.
2008-11-13 14:36:52 +00:00
Doug Rabson
0eec5c87bf Temporarily switch NFS back to the old RPC code while I try to diagnose and
fix the problems a few people have noticed with the new code. People who want
to continue testing the new code or who need RPCSEC_GSS support should use
the new option NFS_NEWRPC to select it.
2008-11-13 11:35:18 +00:00
Doug Rabson
4ee072f815 Turn (NFSERR_AUTHERR|code) status values into svcerr_auth(rqst, code) replies
instead of returning a success with a bogus NFS error code.
2008-11-12 09:38:18 +00:00
Doug Rabson
5f8afa0579 Allow v3 GETATTR requests even when weakly authenticated. Change the error
return for for weakly authenticated requests from REJECTEDCRED to WEAKAUTH
for consistency with Solaris.
2008-11-12 09:36:35 +00:00
Doug Rabson
f8dcc5a8f7 Range-check NFSv2 procedure numbers before converting to NFSv3.
Submitted by:	csjp
2008-11-07 10:43:01 +00:00
Doug Rabson
c3c1513719 Don't depend on krpc.ko in the NFS_LEGACYRPC case. 2008-11-06 11:43:49 +00:00
Dag-Erling Smørgrav
19aa71e559 Unbreak NFS.
Pointy hat to:	dfr
2008-11-06 10:53:35 +00:00
Doug Rabson
4822a31d1c If mountd doesn't specify a secflavor list for the mount, assume that -sec=sys
is what was wanted.
2008-11-05 16:25:26 +00:00
Doug Rabson
a720ad3450 Include <sys/eventhandler.h>. 2008-11-04 16:43:02 +00:00
Doug Rabson
a9148abd9d Implement support for RPCSEC_GSS authentication to both the NFS client
and server. This replaces the RPC implementation of the NFS client and
server with the newer RPC implementation originally developed
(actually ported from the userland sunrpc code) to support the NFS
Lock Manager.  I have tested this code extensively and I believe it is
stable and that performance is at least equal to the legacy RPC
implementation.

The NFS code currently contains support for both the new RPC
implementation and the older legacy implementation inherited from the
original NFS codebase. The default is to use the new implementation -
add the NFS_LEGACYRPC option to fall back to the old code. When I
merge this support back to RELENG_7, I will probably change this so
that users have to 'opt in' to get the new code.

To use RPCSEC_GSS on either client or server, you must build a kernel
which includes the KGSSAPI option and the crypto device. On the
userland side, you must build at least a new libc, mountd, mount_nfs
and gssd. You must install new versions of /etc/rc.d/gssd and
/etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf.

As long as gssd is running, you should be able to mount an NFS
filesystem from a server that requires RPCSEC_GSS authentication. The
mount itself can happen without any kerberos credentials but all
access to the filesystem will be denied unless the accessing user has
a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There
is currently no support for situations where the ticket file is in a
different place, such as when the user logged in via SSH and has
delegated credentials from that login. This restriction is also
present in Solaris and Linux. In theory, we could improve this in
future, possibly using Brooks Davis' implementation of variant
symlinks.

Supporting RPCSEC_GSS on a server is nearly as simple. You must create
service creds for the server in the form 'nfs/<fqdn>@<REALM>' and
install them in /etc/krb5.keytab. The standard heimdal utility ktutil
makes this fairly easy. After the service creds have been created, you
can add a '-sec=krb5' option to /etc/exports and restart both mountd
and nfsd.

The only other difference an administrator should notice is that nfsd
doesn't fork to create service threads any more. In normal operation,
there will be two nfsd processes, one in userland waiting for TCP
connections and one in the kernel handling requests. The latter
process will create as many kthreads as required - these should be
visible via 'top -H'. The code has some support for varying the number
of service threads according to load but initially at least, nfsd uses
a fixed number of threads according to the value supplied to its '-n'
option.

Sponsored by:	Isilon Systems
MFC after:	1 month
2008-11-03 10:38:00 +00:00
Tom Rhodes
8b4acb0cc0 Document a few sysctls in the NFS client and server code.
Minor style(9) where applicable.

Approved by:	alfred (slightly older version)
2008-11-02 17:00:23 +00:00
Edward Tomasz Napierala
15bc6b2bd8 Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary
to add more V* constants, and the variables changed by this patch were often
being assigned to mode_t variables, which is 16 bit.

Approved by:	rwatson (mentor)
2008-10-28 13:44:11 +00:00
Robert Watson
212ab0cfb3 Rename three MAC entry points from _proc_ to _cred_ to reflect the fact
that they operate directly on credentials: mac_proc_create_swapper(),
mac_proc_create_init(), and mac_proc_associate_nfsd().  Update policies.

Obtained from:	TrustedBSD Project
2008-10-28 11:33:06 +00:00
Dag-Erling Smørgrav
1ede983cc9 Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after:	3 months
2008-10-23 15:53:51 +00:00
Robert Watson
63a1c6ba59 Turn XXX's for unlocked writes of NFS server statistics to simple notes,
as we consider it a feature to exchange performance for consistency.

MFC after:	3 days
2008-10-12 20:06:59 +00:00
Attilio Rao
cecd8edba5 Remove the suser(9) interface from the kernel. It has been replaced from
years by the priv_check(9) interface and just very few places are left.
Note that compatibility stub with older FreeBSD version
(all above the 8 limit though) are left in order to reduce diffs against
old versions. It is responsibility of the maintainers for any module, if
they think it is the case, to axe out such cases.

This patch breaks KPI so __FreeBSD_version will be bumped into a later
commit.

This patch needs to be credited 50-50 with rwatson@ as he found time to
explain me how the priv_check() works in detail and to review patches.

Tested by:      Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
Reviewed by:    rwatson
2008-09-17 15:49:44 +00:00
Attilio Rao
3daba5a642 Decontext-alize the nfsserver module.
Now, only some few places still require thread passing (mostly the ones which
access to VOP_* functions) and will be fixed once the primitive also will be.

Tested by:	Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2008-09-16 21:57:39 +00:00
Attilio Rao
0359a12ead Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2008-08-28 15:23:18 +00:00
Robert Watson
2a61d63038 Remove spls from NFS server setup call; expand receive socket buffer
locking to cover full setup of socket upcalls; remove XXX about
locking.

MFC after:	3 weeks
2008-06-30 20:43:06 +00:00
Konstantin Belousov
7763d855c4 Change the fix in the rev. 1.179 to use nfsrv_lockedpair_nd().
Tested by:	pho
MFC after:	3 days
2008-05-28 16:23:17 +00:00
Konstantin Belousov
0ebda62134 Initialize vfslocked prior to calling nfsm_srvmtofh where it was forgotten.
Reported by:	Andrew Edwards <aedwards sandvine com>
Tested by:	pho
MFC after:	3 days
2008-05-28 16:21:32 +00:00
Ruslan Ermilov
ea26d58729 Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT.
Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true
since the advent of MBUMA.

Reviewed by:	arch

There are ongoing disputes as to whether we want to switch to directly using
UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.
2008-03-25 09:39:02 +00:00
Jeff Roberson
698b1a6643 - Complete part of the unfinished bufobj work by consistently using
BO_LOCK/UNLOCK/MTX when manipulating the bufobj.
 - Create a new lock in the bufobj to lock bufobj fields independently.
   This leaves the vnode interlock as an 'identity' lock while the bufobj
   is an io lock.  The bufobj lock is ordered before the vnode interlock
   and also before the mnt ilock.
 - Exploit this new lock order to simplify softdep_check_suspend().
 - A few sync related functions are marked with a new XXX to note that
   we may not properly interlock against a non-zero bv_cnt when
   attempting to sync all vnodes on a mountlist.  I do not believe this
   race is important.  If I'm wrong this will make these locations easier
   to find.

Reviewed by:	kib (earlier diff)
Tested by:	kris, pho (earlier diff)
2008-03-22 09:15:16 +00:00
Doug Rabson
0d8563d31e Fix a regression from the last revision - don't edit the ns_rec list while
not holding the lock.
2008-03-19 12:33:25 +00:00
Doug Rabson
999396482a Don't call nfs_realign while holding locks.
Reviewed by: kib
2008-03-18 18:42:59 +00:00
Konstantin Belousov
bcd654920e Fix the Giant leak in the nfsrv_remove().
Reported by:	pluknet <pluknet gmail com>
MFC after:	1 week
2008-03-04 11:05:03 +00:00
Remko Lodder
af3e1b9f22 Use nfsrv_destroycache() only once, else it crashes the server.
PR:		kern/118152
Submitted by:	Bjoern Groenvall <bg at sics dot se>
Approved by:	imp (mentor, a while ago already), jhb
MFC After:	3 days
2008-01-18 17:03:36 +00:00
Attilio Rao
22db15c06f VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
Attilio Rao
cb05b60a89 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
Robert Watson
a49d769e88 Garbage collect now-unused nfsrv_setcred() -- it's not only unused, but
also a purveyor of unfortunate (and now unsupported) direct frobbing of
struct ucred.

MFC after:	3 days
2007-11-04 19:20:33 +00:00
Robert Watson
eb2cd5e1df Rename mac_associate_nfsd_label() to mac_proc_associate_nfsd(), and move
from mac_vfs.c to mac_process.c to join other functions that setup up
process labels for specific purposes.  Unlike the two proc create calls,
this call is intended to run after creation when a process registers as
the NFS daemon, so remains an _associate_ call..

Obtained from:	TrustedBSD Project
2007-10-25 12:34:14 +00:00
John Baldwin
813947b737 Add a -z flag to nfsstat which zeros the NFS statistics after displaying
them.

MFC after:	1 week
Requested by:	ps
Submitted by:	ps (6 years ago)
2007-10-18 16:38:07 +00:00
Mohan Srinivasan
58d14dae6d Set the NFS server sockbuf high watermarks to the system defaults
(up form 32KB). The low highwatermark setting caused UDP fullsock
request drops, throttling thruput greatly.
Reported by: Kris Kennaway
Approved by: re@ (Ken Smith)
2007-10-12 03:56:27 +00:00
Robert Watson
0bf686c125 Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which
previously conditionally acquired Giant based on debug.mpsafenet.  As that
has now been removed, they are no longer required.  Removing them
significantly simplifies error-handling in the socket layer, eliminated
quite a bit of unwinding of locking in error cases.

While here clean up the now unneeded opt_net.h, which previously was used
for the NET_WITH_GIANT kernel option.  Clean up some related gotos for
consistency.

Reviewed by:	bz, csjp
Tested by:	kris
Approved by:	re (kensmith)
2007-08-06 14:26:03 +00:00
Robert Watson
33d2bb9ca3 First in a series of changes to remove the now-unused Giant compatibility
framework for non-MPSAFE network protocols:

- Remove debug_mpsafenet variable, sysctl, and tunable.
- Remove NET_NEEDS_GIANT() and associate SYSINITSs used by it to force
  debug.mpsafenet=0 if non-MPSAFE protocols are compiled into the kernel.
- Remove logic to automatically flag interrupt handlers as non-MPSAFE if
  debug.mpsafenet is set for an INTR_TYPE_NET handler.
- Remove logic to automatically flag netisr handlers as non-MPSAFE if
  debug.mpsafenet is set.
- Remove references in a few subsystems, including NFS and Cronyx drivers,
  which keyed off debug_mpsafenet to determine various aspects of their own
  locking behavior.
- Convert NET_LOCK_GIANT(), NET_UNLOCK_GIANT(), and NET_ASSERT_GIANT into
  no-op's, as their entire behavior was determined by the value in
  debug_mpsafenet.
- Alias NET_CALLOUT_MPSAFE to CALLOUT_MPSAFE.

Many remaining references to NET_.*_GIANT() and NET_CALLOUT_MPSAFE are still
present in subsystems, and will be removed in followup commits.

Reviewed by:	bz, jhb
Approved by:	re (kensmith)
2007-07-27 11:59:57 +00:00
Robert Watson
c2259ba44f Include priv.h to pick up suser(9) definitions, missed in an earlier
commit.

Warnings spotted by:	kris
2007-06-13 22:42:43 +00:00
Matt Jacob
ad37a275f6 Init timespec to zero fo quiesce warnings. 2007-06-10 04:42:20 +00:00
Robert Watson
c14d15ae3e Remove MAC Framework access control check entry points made redundant with
the introduction of priv(9) and MAC Framework entry points for privilege
checking/granting.  These entry points exactly aligned with privileges and
provided no additional security context:

- mac_check_sysarch_ioperm()
- mac_check_kld_unload()
- mac_check_settime()
- mac_check_system_nfsd()

Add mpo_priv_check() implementations to Biba and LOMAC policies, which,
for each privilege, determine if they can be granted to processes
considered unprivileged by those two policies.  These mostly, but not
entirely, align with the set of privileges granted in jails.

Obtained from:	TrustedBSD Project
2007-04-22 15:31:22 +00:00
Robert Watson
dc4725135d Attempt to rationalize NFS privileges:
- Replace PRIV_NFSD with PRIV_NFS_DAEMON, add PRIV_NFS_LOCKD.

- Use PRIV_NFS_DAEMON in the NFS server.

- In the NFS client, move the privilege check from nfslockdans(), which
  occurs every time a write is performed on /dev/nfslock, and instead do it
  in nfslock_open() just once.  This allows us to avoid checking the saved
  uid for root, and just use the effective on open.  Use PRIV_NFS_LOCKD.
2007-04-21 18:11:19 +00:00
Robert Watson
a0bda9d077 In nfsrv_rcv(), don't reacquire the nfs server lock until after
nfs_realign() has been called, as it may sleep waiting on memory
allocation.

Reported by:	simon
2007-04-15 15:50:50 +00:00
John Baldwin
ddda35b8f6 - Split out the part of SYSCALL_MODULE_HELPER() that builds a 'struct
sysent' for a new system call into a new MAKE_SYSENT() macro.
- Use MAKE_SYSENT() to build a full sysent for the nfssvc system call in
  the NFS server and use syscall_register() and syscall_deregister() to
  manage the nfssvc system call entry instead of manually frobbing the
  sysent[] array.
2007-04-02 13:53:26 +00:00
John Baldwin
cad603e388 Initialize vfslocked to 0 before nfsm_srvmtofh() so that the variable is
not used uninitialized in 'nfsmout' if nfsm_srvmtofh() gets an internal
error.

CID:		1766
Found by:	Coverity Prevent (tm)
2007-03-26 15:14:58 +00:00
Jeff Roberson
37374fc852 - Turn all explicit giant acquires into conditional VFS_LOCK_GIANTs.
Only ops which used namei still remained.
 - Implement a scheme for reducing the overhead of tracking which vops
   require giant by constantly reducing the number of recursive giant
   acquires to one, leaving us with only one vfslocked variable.
 - Remove all NFSD lock acquisition and release from the individual nfs
   ops.  Careful examination has shown that they are not required.  This
   greatly simplifies the code.

Sponsored by:	Isilon Systems, Inc.
Discussed with:	rwatson
Tested by:	kkenn
Approved by:	re
2007-03-17 18:18:08 +00:00
Wojciech A. Koszek
59f65a4ba6 Change these descriptions of memory types used in malloc(9), as their
current, rather long strings make output from vmstat -m look unpleasant.

Approved by:	cognet (mentor)
2007-03-05 00:21:40 +00:00
Robert Watson
0c14ff0eb5 Remove 'MPSAFE' annotations from the comments above most system calls: all
system calls now enter without Giant held, and then in some cases, acquire
Giant explicitly.

Remove a number of other MPSAFE annotations in the credential code and
tweak one or two other adjacent comments.
2007-03-04 22:36:48 +00:00
Pawel Jakub Dawidek
10bcafe9ab Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method.
This way we may support multiple structures in v_data vnode field within
one file system without using black magic.

Vnode-to-file-handle should be VOP in the first place, but was made VFS
operation to keep interface as compatible as possible with SUN's VFS.
BTW. Now Solaris also implements vnode-to-file-handle as VOP operation.

VFS_VPTOFH() was left for API backward compatibility, but is marked for
removal before 8.0-RELEASE.

Approved by:	mckusick
Discussed with:	many (on IRC)
Tested with:	ufs, msdosfs, cd9660, nullfs and zfs
2007-02-15 22:08:35 +00:00