From 90d2dfab1914ea51ea02c0a4b1b69429adfa0c6f Mon Sep 17 00:00:00 2001 From: Rick Macklem Date: Tue, 12 Jun 2018 19:36:32 +0000 Subject: [PATCH] Merge the pNFS server code from projects/pnfs-planb-server into head. This code merge adds a pNFS service to the NFSv4.1 server. Although it is a large commit it should not affect behaviour for a non-pNFS NFS server. Some documentation on how this works can be found at: http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt and will hopefully be turned into a proper document soon. This is a merge of the kernel code. Userland and man page changes will come soon, once the dust settles on this merge. It has passed a "make universe", so I hope it will not cause build problems. It also adds NFSv4.1 server support for the "current stateid". Here is a brief overview of the pNFS service: A pNFS service separates the Read/Write oeprations from all the other NFSv4.1 Metadata operations. It is hoped that this separation allows a pNFS service to be configured that exceeds the limits of a single NFS server for either storage capacity and/or I/O bandwidth. It is possible to configure mirroring within the data servers (DSs) so that the data storage file for an MDS file will be mirrored on two or more of the DSs. When this is used, failure of a DS will not stop the pNFS service and a failed DS can be recovered once repaired while the pNFS service continues to operate. Although two way mirroring would be the norm, it is possible to set a mirroring level of up to four or the number of DSs, whichever is less. The Metadata server will always be a single point of failure, just as a single NFS server is. A Plan B pNFS service consists of a single MetaData Server (MDS) and K Data Servers (DS), all of which are recent FreeBSD systems. Clients will mount the MDS as they would a single NFS server. When files are created, the MDS creates a file tree identical to what a single NFS server creates, except that all the regular (VREG) files will be empty. As such, if you look at the exported tree on the MDS directly on the MDS server (not via an NFS mount), the files will all be of size 0. Each of these files will also have two extended attributes in the system attribute name space: pnfsd.dsfile - This extended attrbute stores the information that the MDS needs to find the data storage file(s) on DS(s) for this file. pnfsd.dsattr - This extended attribute stores the Size, AccessTime, ModifyTime and Change attributes for the file, so that the MDS doesn't need to acquire the attributes from the DS for every Getattr operation. For each regular (VREG) file, the MDS creates a data storage file on one (or more if mirroring is enabled) of the DSs in one of the "dsNN" subdirectories. The name of this file is the file handle of the file on the MDS in hexadecimal so that the name is unique. The DSs use subdirectories named "ds0" to "dsN" so that no one directory gets too large. The value of "N" is set via the sysctl vfs.nfsd.dsdirsize on the MDS, with the default being 20. For production servers that will store a lot of files, this value should probably be much larger. It can be increased when the "nfsd" daemon is not running on the MDS, once the "dsK" directories are created. For pNFS aware NFSv4.1 clients, the FreeBSD server will return two pieces of information to the client that allows it to do I/O directly to the DS. DeviceInfo - This is relatively static information that defines what a DS is. The critical bits of information returned by the FreeBSD server is the IP address of the DS and, for the Flexible File layout, that NFSv4.1 is to be used and that it is "tightly coupled". There is a "deviceid" which identifies the DeviceInfo. Layout - This is per file and can be recalled by the server when it is no longer valid. For the FreeBSD server, there is support for two types of layout, call File and Flexible File layout. Both allow the client to do I/O on the DS via NFSv4.1 I/O operations. The Flexible File layout is a more recent variant that allows specification of mirrors, where the client is expected to do writes to all mirrors to maintain them in a consistent state. The Flexible File layout also allows the client to report I/O errors for a DS back to the MDS. The Flexible File layout supports two variants referred to as "tightly coupled" vs "loosely coupled". The FreeBSD server always uses the "tightly coupled" variant where the client uses the same credentials to do I/O on the DS as it would on the MDS. For the "loosely coupled" variant, the layout specifies a synthetic user/group that the client uses to do I/O on the DS. The FreeBSD server does not do striping and always returns layouts for the entire file. The critical information in a layout is Read vs Read/Writea and DeviceID(s) that identify which DS(s) the data is stored on. At this time, the MDS generates File Layout layouts to NFSv4.1 clients that know how to do pNFS for the non-mirrored DS case unless the sysctl vfs.nfsd.default_flexfile is set non-zero, in which case Flexible File layouts are generated. The mirrored DS configuration always generates Flexible File layouts. For NFS clients that do not support NFSv4.1 pNFS, all I/O operations are done against the MDS which acts as a proxy for the appropriate DS(s). When the MDS receives an I/O RPC, it will do the RPC on the DS as a proxy. If the DS is on the same machine, the MDS/DS will do the RPC on the DS as a proxy and so on, until the machine runs out of some resource, such as session slots or mbufs. As such, DSs must be separate systems from the MDS. Tested by: james.rose@framestore.com Relnotes: yes --- sys/fs/nfs/nfs.h | 34 +- sys/fs/nfs/nfs_commonacl.c | 30 - sys/fs/nfs/nfs_commonport.c | 6 + sys/fs/nfs/nfs_commonsubs.c | 147 +- sys/fs/nfs/nfs_var.h | 66 +- sys/fs/nfs/nfsport.h | 17 + sys/fs/nfs/nfsproto.h | 34 +- sys/fs/nfs/nfsrvstate.h | 104 +- sys/fs/nfsclient/nfs_clport.c | 13 +- sys/fs/nfsclient/nfs_clrpcops.c | 2 +- sys/fs/nfsclient/nfs_clstate.c | 2 +- sys/fs/nfsclient/nfs_clvfsops.c | 8 + sys/fs/nfsserver/nfs_nfsdkrpc.c | 56 +- sys/fs/nfsserver/nfs_nfsdport.c | 2309 ++++++++++++++++++++++++++++- sys/fs/nfsserver/nfs_nfsdserv.c | 747 ++++++++-- sys/fs/nfsserver/nfs_nfsdsocket.c | 24 +- sys/fs/nfsserver/nfs_nfsdstate.c | 2076 +++++++++++++++++++++++++- sys/fs/nfsserver/nfs_nfsdsubs.c | 14 +- sys/nfs/nfs_nfssvc.c | 2 +- sys/nfs/nfssvc.h | 1 + 20 files changed, 5473 insertions(+), 219 deletions(-) diff --git a/sys/fs/nfs/nfs.h b/sys/fs/nfs/nfs.h index fc2409881b84..a814f31b59f6 100644 --- a/sys/fs/nfs/nfs.h +++ b/sys/fs/nfs/nfs.h @@ -98,6 +98,7 @@ #define NFSSESSIONHASHSIZE 20 /* Size of server session hash table */ #endif #define NFSSTATEHASHSIZE 10 /* Size of server stateid hash table */ +#define NFSLAYOUTHIGHWATER 1000000 /* Upper limit for # of layouts */ #ifndef NFSCLDELEGHIGHWATER #define NFSCLDELEGHIGHWATER 10000 /* limit for client delegations */ #endif @@ -171,11 +172,20 @@ struct nfsd_addsock_args { /* * nfsd argument for new krpc. + * (New version supports pNFS, indicated by NFSSVC_NEWSTRUCT flag.) */ struct nfsd_nfsd_args { const char *principal; /* GSS-API service principal name */ int minthreads; /* minimum service thread count */ int maxthreads; /* maximum service thread count */ + int version; /* Allow multiple variants */ + char *addr; /* pNFS DS addresses */ + int addrlen; /* Length of addrs */ + char *dnshost; /* DNS names for DS addresses */ + int dnshostlen; /* Length of DNS names */ + char *dspath; /* DS Mount path on MDS */ + int dspathlen; /* Length of DS Mount path on MDS */ + int mirrorcnt; /* Number of mirrors to create on DSs */ }; /* @@ -186,6 +196,23 @@ struct nfsd_nfsd_args { #define NFSDEV_MAXMIRRORS 4 #define NFSDEV_MAXVERS 4 +struct nfsd_pnfsd_args { + int op; /* Which pNFSd op to perform. */ + char *mdspath; /* Path of MDS file. */ + char *dspath; /* Path of recovered DS mounted on dir. */ + char *curdspath; /* Path of current DS mounted on dir. */ +}; + +#define PNFSDOP_DELDSSERVER 1 +#define PNFSDOP_COPYMR 2 + +/* Old version. */ +struct nfsd_nfsd_oargs { + const char *principal; /* GSS-API service principal name */ + int minthreads; /* minimum service thread count */ + int maxthreads; /* maximum service thread count */ +}; + /* * Arguments for use by the callback daemon. */ @@ -593,8 +620,8 @@ struct nfsrv_descript { NFSSOCKADDR_T nd_nam2; /* return socket addr */ caddr_t nd_dpos; /* Current dissect pos */ caddr_t nd_bpos; /* Current build pos */ + u_int64_t nd_flag; /* nd_flag */ u_int16_t nd_procnum; /* RPC # */ - u_int32_t nd_flag; /* nd_flag */ u_int32_t nd_repstat; /* Reply status */ int *nd_errp; /* Pointer to ret status */ u_int32_t nd_retxid; /* Reply xid */ @@ -613,6 +640,8 @@ struct nfsrv_descript { uint32_t nd_slotid; /* Slotid for this RPC */ SVCXPRT *nd_xprt; /* Server RPC handle */ uint32_t *nd_sequence; /* Sequence Op. ptr */ + nfsv4stateid_t nd_curstateid; /* Current StateID */ + nfsv4stateid_t nd_savedcurstateid; /* Saved Current StateID */ }; #define nd_princlen nd_gssnamelen @@ -649,6 +678,9 @@ struct nfsrv_descript { #define ND_CACHETHIS 0x08000000 #define ND_LASTOP 0x10000000 #define ND_LOOPBADSESS 0x20000000 +#define ND_DSSERVER 0x40000000 +#define ND_CURSTATEID 0x80000000 +#define ND_SAVEDCURSTATEID 0x100000000 /* * ND_GSS should be the "or" of all GSS type authentications. diff --git a/sys/fs/nfs/nfs_commonacl.c b/sys/fs/nfs/nfs_commonacl.c index 1ed3d1db9abe..3e8cfe2071a3 100644 --- a/sys/fs/nfs/nfs_commonacl.c +++ b/sys/fs/nfs/nfs_commonacl.c @@ -449,36 +449,6 @@ nfsrv_buildacl(struct nfsrv_descript *nd, NFSACL_T *aclp, enum vtype type, return (retlen); } -/* - * Set an NFSv4 acl. - */ -APPLESTATIC int -nfsrv_setacl(vnode_t vp, NFSACL_T *aclp, struct ucred *cred, - NFSPROC_T *p) -{ - int error; - - if (nfsrv_useacl == 0 || nfs_supportsnfsv4acls(vp) == 0) { - error = NFSERR_ATTRNOTSUPP; - goto out; - } - /* - * With NFSv4 ACLs, chmod(2) may need to add additional entries. - * Make sure it has enough room for that - splitting every entry - * into two and appending "canonical six" entries at the end. - * Cribbed out of kern/vfs_acl.c - Rick M. - */ - if (aclp->acl_cnt > (ACL_MAX_ENTRIES - 6) / 2) { - error = NFSERR_ATTRNOTSUPP; - goto out; - } - error = VOP_SETACL(vp, ACL_TYPE_NFS4, aclp, cred, p); - -out: - NFSEXITCODE(error); - return (error); -} - /* * Compare two NFSv4 acls. * Return 0 if they are the same, 1 if not the same. diff --git a/sys/fs/nfs/nfs_commonport.c b/sys/fs/nfs/nfs_commonport.c index c2ea3331c28c..6686269765b8 100644 --- a/sys/fs/nfs/nfs_commonport.c +++ b/sys/fs/nfs/nfs_commonport.c @@ -69,6 +69,9 @@ int nfscl_debuglevel = 0; char nfsv4_callbackaddr[INET6_ADDRSTRLEN]; struct callout newnfsd_callout; int nfsrv_lughashsize = 100; +struct mtx nfsrv_dslock_mtx; +struct nfsdevicehead nfsrv_devidhead; +volatile int nfsrv_devidcnt = 0; void (*nfsd_call_servertimer)(void) = NULL; void (*ncl_call_invalcaches)(struct vnode *) = NULL; @@ -768,6 +771,8 @@ nfscommon_modevent(module_t mod, int type, void *data) mtx_init(&nfs_req_mutex, "nfs_req_mutex", NULL, MTX_DEF); mtx_init(&nfsrv_nfsuserdsock.nr_mtx, "nfsuserd", NULL, MTX_DEF); + mtx_init(&nfsrv_dslock_mtx, "nfs4ds", NULL, MTX_DEF); + TAILQ_INIT(&nfsrv_devidhead); callout_init(&newnfsd_callout, 1); newnfs_init(); nfsd_call_nfscommon = nfssvc_nfscommon; @@ -794,6 +799,7 @@ nfscommon_modevent(module_t mod, int type, void *data) mtx_destroy(&nfs_slock_mutex); mtx_destroy(&nfs_req_mutex); mtx_destroy(&nfsrv_nfsuserdsock.nr_mtx); + mtx_destroy(&nfsrv_dslock_mtx); loaded = 0; break; default: diff --git a/sys/fs/nfs/nfs_commonsubs.c b/sys/fs/nfs/nfs_commonsubs.c index 74a9586ae919..29b11f9fb56e 100644 --- a/sys/fs/nfs/nfs_commonsubs.c +++ b/sys/fs/nfs/nfs_commonsubs.c @@ -70,15 +70,24 @@ gid_t nfsrv_defaultgid = GID_NOGROUP; int nfsrv_lease = NFSRV_LEASE; int ncl_mbuf_mlen = MLEN; int nfsd_enable_stringtouid = 0; +int nfsrv_doflexfile = 0; static int nfs_enable_uidtostring = 0; NFSNAMEIDMUTEX; NFSSOCKMUTEX; extern int nfsrv_lughashsize; +extern struct mtx nfsrv_dslock_mtx; +extern volatile int nfsrv_devidcnt; +extern int nfscl_debuglevel; +extern struct nfsdevicehead nfsrv_devidhead; SYSCTL_DECL(_vfs_nfs); SYSCTL_INT(_vfs_nfs, OID_AUTO, enable_uidtostring, CTLFLAG_RW, &nfs_enable_uidtostring, 0, "Make nfs always send numeric owner_names"); +int nfsrv_maxpnfsmirror = 1; +SYSCTL_INT(_vfs_nfs, OID_AUTO, pnfsmirror, CTLFLAG_RD, + &nfsrv_maxpnfsmirror, 0, "Mirror level for pNFS service"); + /* * This array of structures indicates, for V4: * retfh - which of 3 types of calling args are used @@ -487,7 +496,7 @@ nfsm_fhtom(struct nfsrv_descript *nd, u_int8_t *fhp, int size, int set_true) { u_int32_t *tl; u_int8_t *cp; - int fullsiz, bytesize = 0; + int fullsiz, rem, bytesize = 0; if (size == 0) size = NFSX_MYFH; @@ -504,6 +513,7 @@ nfsm_fhtom(struct nfsrv_descript *nd, u_int8_t *fhp, int size, int set_true) case ND_NFSV3: case ND_NFSV4: fullsiz = NFSM_RNDUP(size); + rem = fullsiz - size; if (set_true) { bytesize = 2 * NFSX_UNSIGNED + fullsiz; NFSM_BUILD(tl, u_int32_t *, NFSX_UNSIGNED); @@ -1768,6 +1778,40 @@ nfsv4_loadattr(struct nfsrv_descript *nd, vnode_t vp, } attrsum += cnt; break; + case NFSATTRBIT_FSLAYOUTTYPE: + case NFSATTRBIT_LAYOUTTYPE: + NFSM_DISSECT(tl, u_int32_t *, NFSX_UNSIGNED); + attrsum += NFSX_UNSIGNED; + i = fxdr_unsigned(int, *tl); + if (i > 0) { + NFSM_DISSECT(tl, u_int32_t *, i * + NFSX_UNSIGNED); + attrsum += i * NFSX_UNSIGNED; + j = fxdr_unsigned(int, *tl); + if (i == 1 && compare && !(*retcmpp) && + (((nfsrv_doflexfile != 0 || + nfsrv_maxpnfsmirror > 1) && + j != NFSLAYOUT_FLEXFILE) || + (nfsrv_doflexfile == 0 && + j != NFSLAYOUT_NFSV4_1_FILES))) + *retcmpp = NFSERR_NOTSAME; + } + if (nfsrv_devidcnt == 0) { + if (compare && !(*retcmpp) && i > 0) + *retcmpp = NFSERR_NOTSAME; + } else { + if (compare && !(*retcmpp) && i != 1) + *retcmpp = NFSERR_NOTSAME; + } + break; + case NFSATTRBIT_LAYOUTALIGNMENT: + case NFSATTRBIT_LAYOUTBLKSIZE: + NFSM_DISSECT(tl, u_int32_t *, NFSX_UNSIGNED); + attrsum += NFSX_UNSIGNED; + i = fxdr_unsigned(int, *tl); + if (compare && !(*retcmpp) && i != NFS_SRVMAXIO) + *retcmpp = NFSERR_NOTSAME; + break; default: printf("EEK! nfsv4_loadattr unknown attr=%d\n", bitpos); @@ -2024,7 +2068,8 @@ APPLESTATIC int nfsv4_fillattr(struct nfsrv_descript *nd, struct mount *mp, vnode_t vp, NFSACL_T *saclp, struct vattr *vap, fhandle_t *fhp, int rderror, nfsattrbit_t *attrbitp, struct ucred *cred, NFSPROC_T *p, int isdgram, - int reterr, int supports_nfsv4acls, int at_root, uint64_t mounted_on_fileno) + int reterr, int supports_nfsv4acls, int at_root, uint64_t mounted_on_fileno, + struct statfs *pnfssf) { int bitpos, retnum = 0; u_int32_t *tl; @@ -2426,25 +2471,45 @@ nfsv4_fillattr(struct nfsrv_descript *nd, struct mount *mp, vnode_t vp, break; case NFSATTRBIT_SPACEAVAIL: NFSM_BUILD(tl, u_int32_t *, NFSX_HYPER); - if (priv_check_cred(cred, PRIV_VFS_BLOCKRESERVE, 0)) - uquad = (u_int64_t)fs->f_bfree; + if (priv_check_cred(cred, PRIV_VFS_BLOCKRESERVE, 0)) { + if (pnfssf != NULL) + uquad = (u_int64_t)pnfssf->f_bfree; + else + uquad = (u_int64_t)fs->f_bfree; + } else { + if (pnfssf != NULL) + uquad = (u_int64_t)pnfssf->f_bavail; + else + uquad = (u_int64_t)fs->f_bavail; + } + if (pnfssf != NULL) + uquad *= pnfssf->f_bsize; else - uquad = (u_int64_t)fs->f_bavail; - uquad *= fs->f_bsize; + uquad *= fs->f_bsize; txdr_hyper(uquad, tl); retnum += NFSX_HYPER; break; case NFSATTRBIT_SPACEFREE: NFSM_BUILD(tl, u_int32_t *, NFSX_HYPER); - uquad = (u_int64_t)fs->f_bfree; - uquad *= fs->f_bsize; + if (pnfssf != NULL) { + uquad = (u_int64_t)pnfssf->f_bfree; + uquad *= pnfssf->f_bsize; + } else { + uquad = (u_int64_t)fs->f_bfree; + uquad *= fs->f_bsize; + } txdr_hyper(uquad, tl); retnum += NFSX_HYPER; break; case NFSATTRBIT_SPACETOTAL: NFSM_BUILD(tl, u_int32_t *, NFSX_HYPER); - uquad = (u_int64_t)fs->f_blocks; - uquad *= fs->f_bsize; + if (pnfssf != NULL) { + uquad = (u_int64_t)pnfssf->f_blocks; + uquad *= pnfssf->f_bsize; + } else { + uquad = (u_int64_t)fs->f_blocks; + uquad *= fs->f_bsize; + } txdr_hyper(uquad, tl); retnum += NFSX_HYPER; break; @@ -2514,6 +2579,33 @@ nfsv4_fillattr(struct nfsrv_descript *nd, struct mount *mp, vnode_t vp, NFSCLRBIT_ATTRBIT(&attrbits, NFSATTRBIT_TIMEACCESSSET); retnum += nfsrv_putattrbit(nd, &attrbits); break; + case NFSATTRBIT_FSLAYOUTTYPE: + case NFSATTRBIT_LAYOUTTYPE: + if (nfsrv_devidcnt == 0) + siz = 1; + else + siz = 2; + if (siz == 2) { + NFSM_BUILD(tl, u_int32_t *, 2 * NFSX_UNSIGNED); + *tl++ = txdr_unsigned(1); /* One entry. */ + if (nfsrv_doflexfile != 0 || + nfsrv_maxpnfsmirror > 1) + *tl = txdr_unsigned(NFSLAYOUT_FLEXFILE); + else + *tl = txdr_unsigned( + NFSLAYOUT_NFSV4_1_FILES); + } else { + NFSM_BUILD(tl, u_int32_t *, NFSX_UNSIGNED); + *tl = 0; + } + retnum += siz * NFSX_UNSIGNED; + break; + case NFSATTRBIT_LAYOUTALIGNMENT: + case NFSATTRBIT_LAYOUTBLKSIZE: + NFSM_BUILD(tl, u_int32_t *, NFSX_UNSIGNED); + *tl = txdr_unsigned(NFS_SRVMAXIO); + retnum += NFSX_UNSIGNED; + break; default: printf("EEK! Bad V4 attribute bitpos=%d\n", bitpos); } @@ -4240,3 +4332,38 @@ nfsv4_freeslot(struct nfsclsession *sep, int slot) mtx_unlock(&sep->nfsess_mtx); } +/* + * Search for a matching pnfsd mirror device structure, base on the nmp arg. + * Return one if found, NULL otherwise. + */ +struct nfsdevice * +nfsv4_findmirror(struct nfsmount *nmp) +{ + struct nfsdevice *ds, *fndds; + int fndmirror; + + mtx_assert(NFSDDSMUTEXPTR, MA_OWNED); + /* + * Search the DS server list for a match with nmp. + * Remove the DS entry if found and there is a mirror. + */ + fndds = NULL; + fndmirror = 0; + if (nfsrv_devidcnt == 0) + return (fndds); + TAILQ_FOREACH(ds, &nfsrv_devidhead, nfsdev_list) { + if (ds->nfsdev_nmp == nmp) { + NFSCL_DEBUG(4, "fnd main ds\n"); + fndds = ds; + } else if (ds->nfsdev_nmp != NULL) + fndmirror = 1; + if (fndds != NULL && fndmirror != 0) + break; + } + if (fndmirror == 0) { + NFSCL_DEBUG(4, "no mirror for DS\n"); + return (NULL); + } + return (fndds); +} + diff --git a/sys/fs/nfs/nfs_var.h b/sys/fs/nfs/nfs_var.h index 2c597aaf3000..90f6e0d95f5b 100644 --- a/sys/fs/nfs/nfs_var.h +++ b/sys/fs/nfs/nfs_var.h @@ -63,6 +63,7 @@ union nethostaddr; struct nfsstate; struct nfslock; struct nfsclient; +struct nfslayout; struct nfsdsession; struct nfslockconflict; struct nfsd_idargs; @@ -82,6 +83,9 @@ struct nfsv4lock; struct nfsvattr; struct nfs_vattr; struct NFSSVCARGS; +struct nfsdevice; +struct pnfsdsfile; +struct pnfsdsattr; #ifdef __FreeBSD__ NFS_ACCESS_ARGS; NFS_OPEN_ARGS; @@ -112,9 +116,9 @@ int nfsrv_openctrl(struct nfsrv_descript *, vnode_t, int nfsrv_opencheck(nfsquad_t, nfsv4stateid_t *, struct nfsstate *, vnode_t, struct nfsrv_descript *, NFSPROC_T *, int); int nfsrv_openupdate(vnode_t, struct nfsstate *, nfsquad_t, - nfsv4stateid_t *, struct nfsrv_descript *, NFSPROC_T *); + nfsv4stateid_t *, struct nfsrv_descript *, NFSPROC_T *, int *); int nfsrv_delegupdate(struct nfsrv_descript *, nfsquad_t, nfsv4stateid_t *, - vnode_t, int, struct ucred *, NFSPROC_T *); + vnode_t, int, struct ucred *, NFSPROC_T *, int *); int nfsrv_releaselckown(struct nfsstate *, nfsquad_t, NFSPROC_T *); void nfsrv_zapclient(struct nfsclient *, NFSPROC_T *); int nfssvc_idname(struct nfsd_idargs *); @@ -131,7 +135,7 @@ int nfsrv_checksetattr(vnode_t, struct nfsrv_descript *, nfsv4stateid_t *, struct nfsvattr *, nfsattrbit_t *, struct nfsexstuff *, NFSPROC_T *); int nfsrv_checkgetattr(struct nfsrv_descript *, vnode_t, - struct nfsvattr *, nfsattrbit_t *, struct ucred *, NFSPROC_T *); + struct nfsvattr *, nfsattrbit_t *, NFSPROC_T *); int nfsrv_nfsuserdport(struct sockaddr *, u_short, NFSPROC_T *); void nfsrv_nfsuserddelport(void); void nfsrv_throwawayallstate(NFSPROC_T *); @@ -140,6 +144,30 @@ int nfsrv_checksequence(struct nfsrv_descript *, uint32_t, uint32_t *, int nfsrv_checkreclaimcomplete(struct nfsrv_descript *); void nfsrv_cache_session(uint8_t *, uint32_t, int, struct mbuf **); void nfsrv_freeallbackchannel_xprts(void); +int nfsrv_layoutcommit(struct nfsrv_descript *, vnode_t, int, int, uint64_t, + uint64_t, uint64_t, int, struct timespec *, int, nfsv4stateid_t *, + int, char *, int *, uint64_t *, struct ucred *, NFSPROC_T *); +int nfsrv_layoutget(struct nfsrv_descript *, vnode_t, struct nfsexstuff *, + int, int *, uint64_t *, uint64_t *, uint64_t, nfsv4stateid_t *, int, int *, + int *, char *, struct ucred *, NFSPROC_T *); +void nfsrv_flexmirrordel(char *, NFSPROC_T *); +void nfsrv_recalloldlayout(NFSPROC_T *); +int nfsrv_layoutreturn(struct nfsrv_descript *, vnode_t, int, int, uint64_t, + uint64_t, int, int, nfsv4stateid_t *, int, uint32_t *, int *, + struct ucred *, NFSPROC_T *); +int nfsrv_getdevinfo(char *, int, uint32_t *, uint32_t *, int *, char **); +void nfsrv_freeonedevid(struct nfsdevice *); +void nfsrv_freealllayoutsanddevids(void); +void nfsrv_freefilelayouts(fhandle_t *); +int nfsrv_deldsserver(char *, NFSPROC_T *); +struct nfsdevice *nfsrv_deldsnmp(struct nfsmount *, NFSPROC_T *); +int nfsrv_createdevids(struct nfsd_nfsd_args *, NFSPROC_T *); +int nfsrv_checkdsattr(struct nfsrv_descript *, vnode_t, NFSPROC_T *); +int nfsrv_copymr(vnode_t, vnode_t, vnode_t, struct nfsdevice *, + struct pnfsdsfile *, struct pnfsdsfile *, int, struct ucred *, NFSPROC_T *); +int nfsrv_mdscopymr(char *, char *, char *, char *, int *, char *, NFSPROC_T *, + struct vnode **, struct vnode **, struct pnfsdsfile **, struct nfsdevice **, + struct nfsdevice **); /* nfs_nfsdserv.c */ int nfsrvd_access(struct nfsrv_descript *, int, @@ -240,6 +268,14 @@ int nfsrvd_destroysession(struct nfsrv_descript *, int, vnode_t, NFSPROC_T *, struct nfsexstuff *); int nfsrvd_freestateid(struct nfsrv_descript *, int, vnode_t, NFSPROC_T *, struct nfsexstuff *); +int nfsrvd_layoutget(struct nfsrv_descript *, int, + vnode_t, NFSPROC_T *, struct nfsexstuff *); +int nfsrvd_getdevinfo(struct nfsrv_descript *, int, + vnode_t, NFSPROC_T *, struct nfsexstuff *); +int nfsrvd_layoutcommit(struct nfsrv_descript *, int, + vnode_t, NFSPROC_T *, struct nfsexstuff *); +int nfsrvd_layoutreturn(struct nfsrv_descript *, int, + vnode_t, NFSPROC_T *, struct nfsexstuff *); int nfsrvd_teststateid(struct nfsrv_descript *, int, vnode_t, NFSPROC_T *, struct nfsexstuff *); int nfsrvd_notsupp(struct nfsrv_descript *, int, @@ -306,6 +342,7 @@ int nfsv4_sequencelookup(struct nfsmount *, struct nfsclsession *, int *, int *, uint32_t *, uint8_t *); void nfsv4_freeslot(struct nfsclsession *, int); struct ucred *nfsrv_getgrpscred(struct ucred *); +struct nfsdevice *nfsv4_findmirror(struct nfsmount *); /* nfs_clcomsubs.c */ void nfsm_uiombuf(struct nfsrv_descript *, struct uio *, int); @@ -339,7 +376,7 @@ void nfsrv_wcc(struct nfsrv_descript *, int, struct nfsvattr *, int, struct nfsvattr *); int nfsv4_fillattr(struct nfsrv_descript *, struct mount *, vnode_t, NFSACL_T *, struct vattr *, fhandle_t *, int, nfsattrbit_t *, - struct ucred *, NFSPROC_T *, int, int, int, int, uint64_t); + struct ucred *, NFSPROC_T *, int, int, int, int, uint64_t, struct statfs *); void nfsrv_fillattr(struct nfsrv_descript *, struct nfsvattr *); void nfsrv_adj(mbuf_t, int, int); void nfsrv_postopattr(struct nfsrv_descript *, int, struct nfsvattr *); @@ -387,8 +424,6 @@ int nfsrv_dissectace(struct nfsrv_descript *, struct acl_entry *, int *, int *, NFSPROC_T *); int nfsrv_buildacl(struct nfsrv_descript *, NFSACL_T *, enum vtype, NFSPROC_T *); -int nfsrv_setacl(vnode_t, NFSACL_T *, struct ucred *, - NFSPROC_T *); int nfsrv_compareacl(NFSACL_T *, NFSACL_T *); /* nfs_clrpcops.c */ @@ -603,8 +638,8 @@ int ncl_flush(vnode_t, int, NFSPROC_T *, int, int); void ncl_invalcaches(vnode_t); /* nfs_nfsdport.c */ -int nfsvno_getattr(vnode_t, struct nfsvattr *, struct ucred *, - NFSPROC_T *, int); +int nfsvno_getattr(vnode_t, struct nfsvattr *, struct nfsrv_descript *, + NFSPROC_T *, int, nfsattrbit_t *); int nfsvno_setattr(vnode_t, struct nfsvattr *, struct ucred *, NFSPROC_T *, struct nfsexstuff *); int nfsvno_getfh(vnode_t, fhandle_t *, NFSPROC_T *); @@ -618,7 +653,7 @@ int nfsvno_readlink(vnode_t, struct ucred *, NFSPROC_T *, mbuf_t *, mbuf_t *, int *); int nfsvno_read(vnode_t, off_t, int, struct ucred *, NFSPROC_T *, mbuf_t *, mbuf_t *); -int nfsvno_write(vnode_t, off_t, int, int, int, mbuf_t, +int nfsvno_write(vnode_t, off_t, int, int, int *, mbuf_t, char *, struct ucred *, NFSPROC_T *); int nfsvno_createsub(struct nfsrv_descript *, struct nameidata *, vnode_t *, struct nfsvattr *, int *, int32_t *, NFSDEV_T, NFSPROC_T *, @@ -647,7 +682,7 @@ void nfsvno_open(struct nfsrv_descript *, struct nameidata *, nfsquad_t, nfsv4stateid_t *, struct nfsstate *, int *, struct nfsvattr *, int32_t *, int, NFSACL_T *, nfsattrbit_t *, struct ucred *, NFSPROC_T *, struct nfsexstuff *, vnode_t *); -int nfsvno_updfilerev(vnode_t, struct nfsvattr *, struct ucred *, +int nfsvno_updfilerev(vnode_t, struct nfsvattr *, struct nfsrv_descript *, NFSPROC_T *); int nfsvno_fillattr(struct nfsrv_descript *, struct mount *, vnode_t, struct nfsvattr *, fhandle_t *, int, nfsattrbit_t *, @@ -667,6 +702,17 @@ int nfsvno_testexp(struct nfsrv_descript *, struct nfsexstuff *); uint32_t nfsrv_hashfh(fhandle_t *); uint32_t nfsrv_hashsessionid(uint8_t *); void nfsrv_backupstable(void); +int nfsrv_dsgetdevandfh(struct vnode *, NFSPROC_T *, int *, fhandle_t *, + char *); +int nfsrv_dsgetsockmnt(struct vnode *, int, char *, int *, int *, + NFSPROC_T *, struct vnode **, fhandle_t *, char *, char *, + struct vnode **, struct nfsmount **, struct nfsmount *, int *, int *); +int nfsrv_dscreate(struct vnode *, struct vattr *, struct vattr *, + fhandle_t *, struct pnfsdsfile *, struct pnfsdsattr *, char *, + struct ucred *, NFSPROC_T *, struct vnode **); +int nfsrv_updatemdsattr(struct vnode *, struct nfsvattr *, NFSPROC_T *); +void nfsrv_killrpcs(struct nfsmount *); +int nfsrv_setacl(struct vnode *, NFSACL_T *, struct ucred *, NFSPROC_T *); /* nfs_commonkrpc.c */ int newnfs_nmcancelreqs(struct nfsmount *); diff --git a/sys/fs/nfs/nfsport.h b/sys/fs/nfs/nfsport.h index c89ad016f19d..9a9a2ba2cc9b 100644 --- a/sys/fs/nfs/nfsport.h +++ b/sys/fs/nfs/nfsport.h @@ -701,10 +701,18 @@ void nfsrvd_rcv(struct socket *, void *, int); #define NFSSESSIONMUTEXPTR(s) (&((s)->mtx)) #define NFSLOCKSESSION(s) mtx_lock(&((s)->mtx)) #define NFSUNLOCKSESSION(s) mtx_unlock(&((s)->mtx)) +#define NFSLAYOUTMUTEXPTR(l) (&((l)->mtx)) #define NFSLOCKLAYOUT(l) mtx_lock(&((l)->mtx)) #define NFSUNLOCKLAYOUT(l) mtx_unlock(&((l)->mtx)) +#define NFSDDSMUTEXPTR (&nfsrv_dslock_mtx) #define NFSDDSLOCK() mtx_lock(&nfsrv_dslock_mtx) #define NFSDDSUNLOCK() mtx_unlock(&nfsrv_dslock_mtx) +#define NFSDDONTLISTMUTEXPTR (&nfsrv_dontlistlock_mtx) +#define NFSDDONTLISTLOCK() mtx_lock(&nfsrv_dontlistlock_mtx) +#define NFSDDONTLISTUNLOCK() mtx_unlock(&nfsrv_dontlistlock_mtx) +#define NFSDRECALLMUTEXPTR (&nfsrv_recalllock_mtx) +#define NFSDRECALLLOCK() mtx_lock(&nfsrv_recalllock_mtx) +#define NFSDRECALLUNLOCK() mtx_unlock(&nfsrv_recalllock_mtx) /* * Use these macros to initialize/free a mutex. @@ -1037,6 +1045,15 @@ struct nfsreq { */ extern const char nfs_vnode_tag[]; +/* + * Check for the errors that indicate a DS should be disabled. + * ENXIO indicates that the krpc cannot do an RPC on the DS. + * EIO is returned by the RPC as an indication of I/O problems on the + * server. + * Are there other fatal errors? + */ +#define nfsds_failerr(e) ((e) == ENXIO || (e) == EIO) + #endif /* _KERNEL */ #endif /* _NFS_NFSPORT_H */ diff --git a/sys/fs/nfs/nfsproto.h b/sys/fs/nfs/nfsproto.h index 4c4ade6f4c5e..e9f8ec83614f 100644 --- a/sys/fs/nfs/nfsproto.h +++ b/sys/fs/nfs/nfsproto.h @@ -260,6 +260,12 @@ #define NFSX_V4SETTIME (NFSX_UNSIGNED + NFSX_V4TIME) #define NFSX_V4SESSIONID 16 #define NFSX_V4DEVICEID 16 +#define NFSX_V4PNFSFH (sizeof(fhandle_t) + 1) +#define NFSX_V4FILELAYOUT (4 * NFSX_UNSIGNED + NFSX_V4DEVICEID + \ + NFSX_HYPER + NFSM_RNDUP(NFSX_V4PNFSFH)) +#define NFSX_V4FLEXLAYOUT(m) (NFSX_HYPER + 3 * NFSX_UNSIGNED + \ + ((m) * (NFSX_V4DEVICEID + NFSX_STATEID + NFSM_RNDUP(NFSX_V4PNFSFH) + \ + 8 * NFSX_UNSIGNED))) /* sizes common to multiple NFS versions */ #define NFSX_FHMAX (NFSX_V4FHMAX) @@ -272,6 +278,11 @@ /* variants for multiple versions */ #define NFSX_STATFS(v3) ((v3) ? NFSX_V3STATFS : NFSX_V2STATFS) +/* + * Beware. NFSPROC_NULL and friends are defined in + * as well and the numbers are different. + */ +#ifndef NFSPROC_NULL /* nfs rpc procedure numbers (before version mapping) */ #define NFSPROC_NULL 0 #define NFSPROC_GETATTR 1 @@ -295,6 +306,7 @@ #define NFSPROC_FSINFO 19 #define NFSPROC_PATHCONF 20 #define NFSPROC_COMMIT 21 +#endif /* NFSPROC_NULL */ /* * The lower numbers -> 21 are used by NFSv2 and v3. These define higher @@ -652,6 +664,7 @@ /* Flags for File Layout. */ #define NFSFLAYUTIL_DENSE 0x1 #define NFSFLAYUTIL_COMMIT_THRU_MDS 0x2 +#define NFSFLAYUTIL_STRIPE_MASK 0xffffffc0 /* Flags for Flex File Layout. */ #define NFSFLEXFLAG_NO_LAYOUTCOMMIT 0x00000001 @@ -668,6 +681,7 @@ #define NFSCDFS4_BACK 0x2 #define NFSCDFS4_BOTH 0x3 +#if defined(_KERNEL) || defined(KERNEL) /* Conversion macros */ #define vtonfsv2_mode(t,m) \ txdr_unsigned(((t) == VFIFO) ? MAKEIMODE(VCHR, (m)) : \ @@ -819,6 +833,7 @@ struct nfsv3_sattr { u_int32_t sa_mtimetype; nfstime3 sa_mtime; }; +#endif /* _KERNEL */ /* * The attribute bits used for V4. @@ -1046,7 +1061,8 @@ struct nfsv3_sattr { NFSATTRBM_MOUNTEDONFILEID | \ NFSATTRBM_QUOTAHARD | \ NFSATTRBM_QUOTASOFT | \ - NFSATTRBM_QUOTAUSED) + NFSATTRBM_QUOTAUSED | \ + NFSATTRBM_FSLAYOUTTYPE) #ifdef QUOTA @@ -1062,7 +1078,11 @@ struct nfsv3_sattr { #define NFSATTRBIT_SUPP1 NFSATTRBIT_S1 #endif -#define NFSATTRBIT_SUPP2 NFSATTRBM_SUPPATTREXCLCREAT +#define NFSATTRBIT_SUPP2 \ + (NFSATTRBM_LAYOUTTYPE | \ + NFSATTRBM_LAYOUTBLKSIZE | \ + NFSATTRBM_LAYOUTALIGNMENT | \ + NFSATTRBM_SUPPATTREXCLCREAT) /* * NFSATTRBIT_SUPPSETONLY is the OR of NFSATTRBIT_TIMEACCESSSET and @@ -1379,4 +1399,14 @@ struct nfsv4stateid { }; typedef struct nfsv4stateid nfsv4stateid_t; +/* Notify bits and notify bitmap size. */ +#define NFSV4NOTIFY_CHANGE 1 +#define NFSV4NOTIFY_DELETE 2 +#define NFSV4_NOTIFYBITMAP 1 /* # of 32bit values needed for bits */ + +/* Layoutreturn kinds. */ +#define NFSV4LAYOUTRET_FILE 1 +#define NFSV4LAYOUTRET_FSID 2 +#define NFSV4LAYOUTRET_ALL 3 + #endif /* _NFS_NFSPROTO_H_ */ diff --git a/sys/fs/nfs/nfsrvstate.h b/sys/fs/nfs/nfsrvstate.h index 53cfb185c206..769f7986ea28 100644 --- a/sys/fs/nfs/nfsrvstate.h +++ b/sys/fs/nfs/nfsrvstate.h @@ -31,6 +31,7 @@ #ifndef _NFS_NFSRVSTATE_H_ #define _NFS_NFSRVSTATE_H_ +#if defined(_KERNEL) || defined(KERNEL) /* * Definitions for NFS V4 server state handling. */ @@ -46,6 +47,10 @@ LIST_HEAD(nfslockhead, nfslock); LIST_HEAD(nfslockhashhead, nfslockfile); LIST_HEAD(nfssessionhead, nfsdsession); LIST_HEAD(nfssessionhashhead, nfsdsession); +TAILQ_HEAD(nfslayouthead, nfslayout); +SLIST_HEAD(nfsdsdirhead, nfsdsdir); +TAILQ_HEAD(nfsdevicehead, nfsdevice); +LIST_HEAD(nfsdontlisthead, nfsdontlist); /* * List head for nfsusrgrp. @@ -74,6 +79,13 @@ struct nfssessionhash { #define NFSSESSIONHASH(f) \ (&nfssessionhash[nfsrv_hashsessionid(f) % nfsrv_sessionhashsize]) +struct nfslayouthash { + struct mtx mtx; + struct nfslayouthead list; +}; +#define NFSLAYOUTHASH(f) \ + (&nfslayouthash[nfsrv_hashfh(f) % nfsrv_layouthashsize]) + /* * Client server structure for V4. It is doubly linked into two lists. * The first is a hash table based on the clientid and the second is a @@ -111,6 +123,31 @@ struct nfsclient { #define CLOPS_RENEW 0x0002 #define CLOPS_RENEWOP 0x0004 +/* + * Structure for NFSv4.1 Layouts. + * Malloc'd to correct size for the lay_xdr. + */ +struct nfslayout { + TAILQ_ENTRY(nfslayout) lay_list; + nfsv4stateid_t lay_stateid; + nfsquad_t lay_clientid; + fhandle_t lay_fh; + fsid_t lay_fsid; + uint32_t lay_layoutlen; + uint16_t lay_mirrorcnt; + uint16_t lay_trycnt; + uint16_t lay_type; + uint16_t lay_flags; + uint32_t lay_xdr[0]; +}; + +/* Flags for lay_flags. */ +#define NFSLAY_READ 0x0001 +#define NFSLAY_RW 0x0002 +#define NFSLAY_RECALL 0x0004 +#define NFSLAY_RETURNED 0x0008 +#define NFSLAY_CALLB 0x0010 + /* * Structure for an NFSv4.1 session. * Locking rules for this structure. @@ -290,9 +327,72 @@ struct nfsf_rec { u_int32_t numboots; /* Number of boottimes */ }; -#if defined(_KERNEL) || defined(KERNEL) void nfsrv_cleanclient(struct nfsclient *, NFSPROC_T *); void nfsrv_freedeleglist(struct nfsstatehead *); -#endif + +/* + * This structure is used to create the list of device info entries for + * a GetDeviceInfo operation and stores the DS server info. + * The nfsdev_addrandhost field has the fully qualified host domain name + * followed by the network address in XDR. + * It is allocated with nfsrv_dsdirsize nfsdev_dsdir[] entries. + */ +struct nfsdevice { + TAILQ_ENTRY(nfsdevice) nfsdev_list; + vnode_t nfsdev_dvp; + struct nfsmount *nfsdev_nmp; + char nfsdev_deviceid[NFSX_V4DEVICEID]; + uint16_t nfsdev_hostnamelen; + uint16_t nfsdev_fileaddrlen; + uint16_t nfsdev_flexaddrlen; + char *nfsdev_fileaddr; + char *nfsdev_flexaddr; + char *nfsdev_host; + uint32_t nfsdev_nextdir; + vnode_t nfsdev_dsdir[0]; +}; + +/* + * This structure holds the va_size, va_filerev, va_atime and va_mtime for the + * DS file and is stored in the metadata file's extended attribute pnfsd.dsattr. + */ +struct pnfsdsattr { + uint64_t dsa_filerev; + uint64_t dsa_size; + struct timespec dsa_atime; + struct timespec dsa_mtime; +}; + +/* + * This structure is a list element for a list the pNFS server uses to + * mark that the recovery of a mirror file is in progress. + */ +struct nfsdontlist { + LIST_ENTRY(nfsdontlist) nfsmr_list; + uint32_t nfsmr_flags; + fhandle_t nfsmr_fh; +}; + +/* nfsmr_flags bits. */ +#define NFSMR_DONTLAYOUT 0x00000001 + +#endif /* defined(_KERNEL) || defined(KERNEL) */ + +/* + * This structure holds the information about the DS file and is stored + * in the metadata file's extended attribute called pnfsd.dsfile. + */ +#define PNFS_FILENAME_LEN (2 * sizeof(fhandle_t)) +struct pnfsdsfile { + fhandle_t dsf_fh; + uint32_t dsf_dir; + union { + struct sockaddr_in sin; + struct sockaddr_in6 sin6; + } dsf_nam; + char dsf_filename[PNFS_FILENAME_LEN + 1]; +}; +#define dsf_sin dsf_nam.sin +#define dsf_sin6 dsf_nam.sin6 #endif /* _NFS_NFSRVSTATE_H_ */ diff --git a/sys/fs/nfsclient/nfs_clport.c b/sys/fs/nfsclient/nfs_clport.c index 1855fbcba5dc..7ced32393e4b 100644 --- a/sys/fs/nfsclient/nfs_clport.c +++ b/sys/fs/nfsclient/nfs_clport.c @@ -86,6 +86,7 @@ extern int nfs_numnfscbd; extern int nfscl_inited; struct mtx ncl_iod_mutex; NFSDLOCKMUTEX; +extern struct mtx nfsrv_dslock_mtx; extern void (*ncl_call_invalcaches)(struct vnode *); @@ -930,7 +931,7 @@ nfscl_fillsattr(struct nfsrv_descript *nd, struct vattr *vap, if (vap->va_mtime.tv_sec != VNOVAL) NFSSETBIT_ATTRBIT(&attrbits, NFSATTRBIT_TIMEMODIFYSET); (void) nfsv4_fillattr(nd, vp->v_mount, vp, NULL, vap, NULL, 0, - &attrbits, NULL, NULL, 0, 0, 0, 0, (uint64_t)0); + &attrbits, NULL, NULL, 0, 0, 0, 0, (uint64_t)0, NULL); break; } } @@ -1383,6 +1384,13 @@ nfssvc_nfscl(struct thread *td, struct nfssvc_args *uap) 0 && strcmp(mp->mnt_stat.f_fstypename, "nfs") == 0 && mp->mnt_data != NULL) { nmp = VFSTONFS(mp); + NFSDDSLOCK(); + if (nfsv4_findmirror(nmp) != NULL) { + NFSDDSUNLOCK(); + error = ENXIO; + nmp = NULL; + break; + } mtx_lock(&nmp->nm_mtx); if ((nmp->nm_privflag & NFSMNTP_FORCEDISM) == 0) { @@ -1394,6 +1402,7 @@ nfssvc_nfscl(struct thread *td, struct nfssvc_args *uap) mtx_unlock(&nmp->nm_mtx); nmp = NULL; } + NFSDDSUNLOCK(); break; } } @@ -1418,7 +1427,7 @@ nfssvc_nfscl(struct thread *td, struct nfssvc_args *uap) nmp->nm_privflag &= ~NFSMNTP_CANCELRPCS; wakeup(nmp); mtx_unlock(&nmp->nm_mtx); - } else + } else if (error == 0) error = EINVAL; } free(buf, M_TEMP); diff --git a/sys/fs/nfsclient/nfs_clrpcops.c b/sys/fs/nfsclient/nfs_clrpcops.c index d1420e564130..6b57e836801e 100644 --- a/sys/fs/nfsclient/nfs_clrpcops.c +++ b/sys/fs/nfsclient/nfs_clrpcops.c @@ -4620,7 +4620,7 @@ nfsrpc_setaclrpc(vnode_t vp, struct ucred *cred, NFSPROC_T *p, NFSZERO_ATTRBIT(&attrbits); NFSSETBIT_ATTRBIT(&attrbits, NFSATTRBIT_ACL); (void) nfsv4_fillattr(nd, vnode_mount(vp), vp, aclp, NULL, NULL, 0, - &attrbits, NULL, NULL, 0, 0, 0, 0, (uint64_t)0); + &attrbits, NULL, NULL, 0, 0, 0, 0, (uint64_t)0, NULL); error = nfscl_request(nd, vp, p, cred, stuff); if (error) return (error); diff --git a/sys/fs/nfsclient/nfs_clstate.c b/sys/fs/nfsclient/nfs_clstate.c index 702f81a0465b..7aa0708d01b1 100644 --- a/sys/fs/nfsclient/nfs_clstate.c +++ b/sys/fs/nfsclient/nfs_clstate.c @@ -3373,7 +3373,7 @@ nfscl_docb(struct nfsrv_descript *nd, NFSPROC_T *p) if (!error) (void) nfsv4_fillattr(nd, NULL, NULL, NULL, &va, NULL, 0, &rattrbits, NULL, p, 0, 0, 0, 0, - (uint64_t)0); + (uint64_t)0, NULL); break; case NFSV4OP_CBRECALL: NFSCL_DEBUG(4, "cbrecall\n"); diff --git a/sys/fs/nfsclient/nfs_clvfsops.c b/sys/fs/nfsclient/nfs_clvfsops.c index 7845149dd307..96f93c060aba 100644 --- a/sys/fs/nfsclient/nfs_clvfsops.c +++ b/sys/fs/nfsclient/nfs_clvfsops.c @@ -86,6 +86,7 @@ extern enum nfsiod_state ncl_iodwant[NFS_MAXASYNCDAEMON]; extern struct nfsmount *ncl_iodmount[NFS_MAXASYNCDAEMON]; extern struct mtx ncl_iod_mutex; NFSCLSTATEMUTEX; +extern struct mtx nfsrv_dslock_mtx; MALLOC_DEFINE(M_NEWNFSREQ, "newnfsclient_req", "NFS request header"); MALLOC_DEFINE(M_NEWNFSMNT, "newnfsmnt", "NFS mount struct"); @@ -1672,6 +1673,7 @@ nfs_unmount(struct mount *mp, int mntflags) if (mntflags & MNT_FORCE) flags |= FORCECLOSE; nmp = VFSTONFS(mp); + error = 0; /* * Goes something like this.. * - Call vflush() to clear out vnodes for this filesystem @@ -1680,6 +1682,12 @@ nfs_unmount(struct mount *mp, int mntflags) */ /* In the forced case, cancel any outstanding requests. */ if (mntflags & MNT_FORCE) { + NFSDDSLOCK(); + if (nfsv4_findmirror(nmp) != NULL) + error = ENXIO; + NFSDDSUNLOCK(); + if (error) + goto out; error = newnfs_nmcancelreqs(nmp); if (error) goto out; diff --git a/sys/fs/nfsserver/nfs_nfsdkrpc.c b/sys/fs/nfsserver/nfs_nfsdkrpc.c index 5b77592af10e..4f5d5a498ff9 100644 --- a/sys/fs/nfsserver/nfs_nfsdkrpc.c +++ b/sys/fs/nfsserver/nfs_nfsdkrpc.c @@ -105,6 +105,7 @@ static int nfs_proc(struct nfsrv_descript *, u_int32_t, SVCXPRT *xprt, extern u_long sb_max_adj; extern int newnfs_numnfsd; extern struct proc *nfsd_master_proc; +extern time_t nfsdev_time; /* * NFS server system calls @@ -495,6 +496,7 @@ nfsrvd_nfsd(struct thread *td, struct nfsd_nfsd_args *args) */ NFSD_LOCK(); if (newnfs_numnfsd == 0) { + nfsdev_time = time_second; p = td->td_proc; PROC_LOCK(p); p->p_flag2 |= P2_AST_SU; @@ -502,31 +504,36 @@ nfsrvd_nfsd(struct thread *td, struct nfsd_nfsd_args *args) newnfs_numnfsd++; NFSD_UNLOCK(); - - /* An empty string implies AUTH_SYS only. */ - if (principal[0] != '\0') { - ret2 = rpc_gss_set_svc_name_call(principal, - "kerberosv5", GSS_C_INDEFINITE, NFS_PROG, NFS_VER2); - ret3 = rpc_gss_set_svc_name_call(principal, - "kerberosv5", GSS_C_INDEFINITE, NFS_PROG, NFS_VER3); - ret4 = rpc_gss_set_svc_name_call(principal, - "kerberosv5", GSS_C_INDEFINITE, NFS_PROG, NFS_VER4); - - if (!ret2 || !ret3 || !ret4) - printf("nfsd: can't register svc name\n"); + error = nfsrv_createdevids(args, td); + if (error == 0) { + /* An empty string implies AUTH_SYS only. */ + if (principal[0] != '\0') { + ret2 = rpc_gss_set_svc_name_call(principal, + "kerberosv5", GSS_C_INDEFINITE, NFS_PROG, + NFS_VER2); + ret3 = rpc_gss_set_svc_name_call(principal, + "kerberosv5", GSS_C_INDEFINITE, NFS_PROG, + NFS_VER3); + ret4 = rpc_gss_set_svc_name_call(principal, + "kerberosv5", GSS_C_INDEFINITE, NFS_PROG, + NFS_VER4); + + if (!ret2 || !ret3 || !ret4) + printf( + "nfsd: can't register svc name\n"); + } + + nfsrvd_pool->sp_minthreads = args->minthreads; + nfsrvd_pool->sp_maxthreads = args->maxthreads; + + svc_run(nfsrvd_pool); + + if (principal[0] != '\0') { + rpc_gss_clear_svc_name_call(NFS_PROG, NFS_VER2); + rpc_gss_clear_svc_name_call(NFS_PROG, NFS_VER3); + rpc_gss_clear_svc_name_call(NFS_PROG, NFS_VER4); + } } - - nfsrvd_pool->sp_minthreads = args->minthreads; - nfsrvd_pool->sp_maxthreads = args->maxthreads; - - svc_run(nfsrvd_pool); - - if (principal[0] != '\0') { - rpc_gss_clear_svc_name_call(NFS_PROG, NFS_VER2); - rpc_gss_clear_svc_name_call(NFS_PROG, NFS_VER3); - rpc_gss_clear_svc_name_call(NFS_PROG, NFS_VER4); - } - NFSD_LOCK(); newnfs_numnfsd--; nfsrvd_init(1); @@ -555,6 +562,7 @@ nfsrvd_init(int terminating) if (terminating) { nfsd_master_proc = NULL; NFSD_UNLOCK(); + nfsrv_freealllayoutsanddevids(); nfsrv_freeallbackchannel_xprts(); svcpool_close(nfsrvd_pool); NFSD_LOCK(); diff --git a/sys/fs/nfsserver/nfs_nfsdport.c b/sys/fs/nfsserver/nfs_nfsdport.c index a38276e677cb..cf4d4b9218f4 100644 --- a/sys/fs/nfsserver/nfs_nfsdport.c +++ b/sys/fs/nfsserver/nfs_nfsdport.c @@ -37,6 +37,7 @@ __FBSDID("$FreeBSD$"); #include +#include /* * Functions that perform the vfs operations required by the routines in @@ -65,12 +66,23 @@ extern struct nfslockhashhead *nfslockhash; extern struct nfssessionhash *nfssessionhash; extern int nfsrv_sessionhashsize; extern struct nfsstatsv1 nfsstatsv1; +extern struct nfslayouthash *nfslayouthash; +extern int nfsrv_layouthashsize; +extern struct mtx nfsrv_dslock_mtx; +extern int nfs_pnfsiothreads; +extern struct nfsdontlisthead nfsrv_dontlisthead; +extern volatile int nfsrv_dontlistlen; +extern volatile int nfsrv_devidcnt; +extern int nfsrv_maxpnfsmirror; struct vfsoptlist nfsv4root_opt, nfsv4root_newopt; NFSDLOCKMUTEX; +NFSSTATESPINLOCK; struct nfsrchash_bucket nfsrchash_table[NFSRVCACHE_HASHSIZE]; struct nfsrchash_bucket nfsrcahash_table[NFSRVCACHE_HASHSIZE]; struct mtx nfsrc_udpmtx; struct mtx nfs_v4root_mutex; +struct mtx nfsrv_dontlistlock_mtx; +struct mtx nfsrv_recalllock_mtx; struct nfsrvfh nfs_rootfh, nfs_pubfh; int nfs_pubfhset = 0, nfs_rootfhset = 0; struct proc *nfsd_master_proc = NULL; @@ -79,6 +91,7 @@ static pid_t nfsd_master_pid = (pid_t)-1; static char nfsd_master_comm[MAXCOMLEN + 1]; static struct timeval nfsd_master_start; static uint32_t nfsv4_sysid = 0; +static fhandle_t zerofh; static int nfssvc_srvcall(struct thread *, struct nfssvc_args *, struct ucred *); @@ -89,6 +102,40 @@ static int nfs_commit_miss; extern int nfsrv_issuedelegs; extern int nfsrv_dolocallocks; extern int nfsd_enable_stringtouid; +extern struct nfsdevicehead nfsrv_devidhead; + +static void nfsrv_pnfscreate(struct vnode *, struct vattr *, struct ucred *, + NFSPROC_T *); +static void nfsrv_pnfsremovesetup(struct vnode *, NFSPROC_T *, struct vnode **, + int *, char *, fhandle_t *); +static void nfsrv_pnfsremove(struct vnode **, int, char *, fhandle_t *, + NFSPROC_T *); +static int nfsrv_proxyds(struct nfsrv_descript *, struct vnode *, off_t, int, + struct ucred *, struct thread *, int, struct mbuf **, char *, + struct mbuf **, struct nfsvattr *, struct acl *); +static int nfsrv_setextattr(struct vnode *, struct nfsvattr *, NFSPROC_T *); +static int nfsrv_readdsrpc(fhandle_t *, off_t, int, struct ucred *, + NFSPROC_T *, struct nfsmount *, struct mbuf **, struct mbuf **); +static int nfsrv_writedsrpc(fhandle_t *, off_t, int, struct ucred *, + NFSPROC_T *, struct vnode *, struct nfsmount **, int, struct mbuf **, + char *, int *); +static int nfsrv_setacldsrpc(fhandle_t *, struct ucred *, NFSPROC_T *, + struct vnode *, struct nfsmount **, int, struct acl *, int *); +static int nfsrv_setattrdsrpc(fhandle_t *, struct ucred *, NFSPROC_T *, + struct vnode *, struct nfsmount **, int, struct nfsvattr *, int *); +static int nfsrv_getattrdsrpc(fhandle_t *, struct ucred *, NFSPROC_T *, + struct vnode *, struct nfsmount *, struct nfsvattr *); +static int nfsrv_putfhname(fhandle_t *, char *); +static int nfsrv_pnfslookupds(struct vnode *, struct vnode *, + struct pnfsdsfile *, struct vnode **, NFSPROC_T *); +static void nfsrv_pnfssetfh(struct vnode *, struct pnfsdsfile *, + struct vnode *, NFSPROC_T *); +static int nfsrv_dsremove(struct vnode *, char *, struct ucred *, NFSPROC_T *); +static int nfsrv_dssetacl(struct vnode *, struct acl *, struct ucred *, + NFSPROC_T *); +static int nfsrv_pnfsstatfs(struct statfs *); + +int nfs_pnfsio(task_fn_t *, void *); SYSCTL_NODE(_vfs, OID_AUTO, nfsd, CTLFLAG_RW, 0, "NFS server"); SYSCTL_INT(_vfs_nfsd, OID_AUTO, mirrormnt, CTLFLAG_RW, @@ -105,6 +152,35 @@ SYSCTL_INT(_vfs_nfsd, OID_AUTO, debuglevel, CTLFLAG_RW, &nfsd_debuglevel, 0, "Debug level for NFS server"); SYSCTL_INT(_vfs_nfsd, OID_AUTO, enable_stringtouid, CTLFLAG_RW, &nfsd_enable_stringtouid, 0, "Enable nfsd to accept numeric owner_names"); +static int nfsrv_pnfsgetdsattr = 1; +SYSCTL_INT(_vfs_nfsd, OID_AUTO, pnfsgetdsattr, CTLFLAG_RW, + &nfsrv_pnfsgetdsattr, 0, "When set getattr gets DS attributes via RPC"); + +/* + * nfsrv_dsdirsize can only be increased and only when the nfsd threads are + * not running. + * The dsN subdirectories for the increased values must have been created + * on all DS servers before this increase is done. + */ +u_int nfsrv_dsdirsize = 20; +static int +sysctl_dsdirsize(SYSCTL_HANDLER_ARGS) +{ + int error, newdsdirsize; + + newdsdirsize = nfsrv_dsdirsize; + error = sysctl_handle_int(oidp, &newdsdirsize, 0, req); + if (error != 0 || req->newptr == NULL) + return (error); + if (newdsdirsize <= nfsrv_dsdirsize || newdsdirsize > 10000 || + newnfs_numnfsd != 0) + return (EINVAL); + nfsrv_dsdirsize = newdsdirsize; + return (0); +} +SYSCTL_PROC(_vfs_nfsd, OID_AUTO, dsdirsize, CTLTYPE_UINT | CTLFLAG_RW, 0, + sizeof(nfsrv_dsdirsize), sysctl_dsdirsize, "IU", + "Number of dsN subdirs on the DS servers"); #define MAX_REORDERED_RPC 16 #define NUM_HEURISTIC 1031 @@ -181,10 +257,12 @@ nfsrv_sequential_heuristic(struct uio *uio, struct vnode *vp) * Get attributes into nfsvattr structure. */ int -nfsvno_getattr(struct vnode *vp, struct nfsvattr *nvap, struct ucred *cred, - struct thread *p, int vpislocked) +nfsvno_getattr(struct vnode *vp, struct nfsvattr *nvap, + struct nfsrv_descript *nd, struct thread *p, int vpislocked, + nfsattrbit_t *attrbitp) { - int error, lockedit = 0; + int error, gotattr, lockedit = 0; + struct nfsvattr na; if (vpislocked == 0) { /* @@ -197,10 +275,47 @@ nfsvno_getattr(struct vnode *vp, struct nfsvattr *nvap, struct ucred *cred, NFSVOPLOCK(vp, LK_SHARED | LK_RETRY); } } - error = VOP_GETATTR(vp, &nvap->na_vattr, cred); + + /* + * Acquire the Change, Size and TimeModify attributes, as required. + * This needs to be done for regular files if: + * - non-NFSv4 RPCs or + * - when attrbitp == NULL or + * - an NFSv4 RPC with any of the above attributes in attrbitp. + * A return of 0 for nfsrv_proxyds() indicates that it has acquired + * these attributes. nfsrv_proxyds() will return an error if the + * server is not a pNFS one. + */ + gotattr = 0; + if (vp->v_type == VREG && nfsrv_devidcnt > 0 && (attrbitp == NULL || + (nd->nd_flag & ND_NFSV4) == 0 || + NFSISSET_ATTRBIT(attrbitp, NFSATTRBIT_CHANGE) || + NFSISSET_ATTRBIT(attrbitp, NFSATTRBIT_SIZE) || + NFSISSET_ATTRBIT(attrbitp, NFSATTRBIT_TIMEACCESS) || + NFSISSET_ATTRBIT(attrbitp, NFSATTRBIT_TIMEMODIFY))) { + error = nfsrv_proxyds(nd, vp, 0, 0, nd->nd_cred, p, + NFSPROC_GETATTR, NULL, NULL, NULL, &na, NULL); + if (error == 0) + gotattr = 1; + } + + error = VOP_GETATTR(vp, &nvap->na_vattr, nd->nd_cred); if (lockedit != 0) NFSVOPUNLOCK(vp, 0); + /* + * If we got the Change, Size and Modify Time from the DS, + * replace them. + */ + if (gotattr != 0) { + nvap->na_atime = na.na_atime; + nvap->na_mtime = na.na_mtime; + nvap->na_filerev = na.na_filerev; + nvap->na_size = na.na_size; + } + NFSD_DEBUG(4, "nfsvno_getattr: gotattr=%d err=%d chg=%ju\n", gotattr, + error, (uintmax_t)na.na_filerev); + NFSEXITCODE(error); return (error); } @@ -330,6 +445,18 @@ nfsvno_setattr(struct vnode *vp, struct nfsvattr *nvap, struct ucred *cred, int error; error = VOP_SETATTR(vp, &nvap->na_vattr, cred); + if (error == 0 && (nvap->na_vattr.va_uid != (uid_t)VNOVAL || + nvap->na_vattr.va_gid != (gid_t)VNOVAL || + nvap->na_vattr.va_size != VNOVAL || + nvap->na_vattr.va_mode != (mode_t)VNOVAL || + nvap->na_vattr.va_atime.tv_sec != VNOVAL || + nvap->na_vattr.va_mtime.tv_sec != VNOVAL)) { + /* For a pNFS server, set the attributes on the DS file. */ + error = nfsrv_proxyds(NULL, vp, 0, 0, cred, p, NFSPROC_SETATTR, + NULL, NULL, NULL, nvap, NULL); + if (error == ENOENT) + error = 0; + } NFSEXITCODE(error); return (error); } @@ -640,6 +767,15 @@ nfsvno_read(struct vnode *vp, off_t off, int cnt, struct ucred *cred, struct uio io, *uiop = &io; struct nfsheur *nh; + /* + * Attempt to read from a DS file. A return of ENOENT implies + * there is no DS file to read. + */ + error = nfsrv_proxyds(NULL, vp, off, cnt, cred, p, NFSPROC_READDS, mpp, + NULL, mpendp, NULL, NULL); + if (error != ENOENT) + return (error); + len = left = NFSM_RNDUP(cnt); m3 = NULL; /* @@ -717,7 +853,7 @@ nfsvno_read(struct vnode *vp, off_t off, int cnt, struct ucred *cred, * Write vnode op from an mbuf list. */ int -nfsvno_write(struct vnode *vp, off_t off, int retlen, int cnt, int stable, +nfsvno_write(struct vnode *vp, off_t off, int retlen, int cnt, int *stable, struct mbuf *mp, char *cp, struct ucred *cred, struct thread *p) { struct iovec *ivp; @@ -727,6 +863,17 @@ nfsvno_write(struct vnode *vp, off_t off, int retlen, int cnt, int stable, struct uio io, *uiop = &io; struct nfsheur *nh; + /* + * Attempt to write to a DS file. A return of ENOENT implies + * there is no DS file to write. + */ + error = nfsrv_proxyds(NULL, vp, off, retlen, cred, p, NFSPROC_WRITEDS, + &mp, cp, NULL, NULL, NULL); + if (error != ENOENT) { + *stable = NFSWRITE_FILESYNC; + return (error); + } + ivp = malloc(cnt * sizeof (struct iovec), M_TEMP, M_WAITOK); uiop->uio_iov = iv = ivp; @@ -750,7 +897,7 @@ nfsvno_write(struct vnode *vp, off_t off, int retlen, int cnt, int stable, } } - if (stable == NFSWRITE_UNSTABLE) + if (*stable == NFSWRITE_UNSTABLE) ioflags = IO_NODELOCKED; else ioflags = (IO_SYNC | IO_NODELOCKED); @@ -789,6 +936,16 @@ nfsvno_createsub(struct nfsrv_descript *nd, struct nameidata *ndp, vrele(ndp->ni_startdir); error = VOP_CREATE(ndp->ni_dvp, &ndp->ni_vp, &ndp->ni_cnd, &nvap->na_vattr); + /* For a pNFS server, create the data file on a DS. */ + if (error == 0 && nvap->na_type == VREG) { + /* + * Create a data file on a DS for a pNFS server. + * This function just returns if not + * running a pNFS DS or the creation fails. + */ + nfsrv_pnfscreate(ndp->ni_vp, &nvap->na_vattr, + nd->nd_cred, p); + } vput(ndp->ni_dvp); nfsvno_relpathbuf(ndp); if (!error) { @@ -1055,16 +1212,23 @@ int nfsvno_removesub(struct nameidata *ndp, int is_v4, struct ucred *cred, struct thread *p, struct nfsexstuff *exp) { - struct vnode *vp; - int error = 0; + struct vnode *vp, *dsdvp[NFSDEV_MAXMIRRORS]; + int error = 0, mirrorcnt; + char fname[PNFS_FILENAME_LEN + 1]; + fhandle_t fh; vp = ndp->ni_vp; + dsdvp[0] = NULL; if (vp->v_type == VDIR) error = NFSERR_ISDIR; else if (is_v4) error = nfsrv_checkremove(vp, 1, p); + if (error == 0) + nfsrv_pnfsremovesetup(vp, p, dsdvp, &mirrorcnt, fname, &fh); if (!error) error = VOP_REMOVE(ndp->ni_dvp, vp, &ndp->ni_cnd); + if (error == 0 && dsdvp[0] != NULL) + nfsrv_pnfsremove(dsdvp, mirrorcnt, fname, &fh, p); if (ndp->ni_dvp == vp) vrele(ndp->ni_dvp); else @@ -1124,9 +1288,12 @@ int nfsvno_rename(struct nameidata *fromndp, struct nameidata *tondp, u_int32_t ndstat, u_int32_t ndflag, struct ucred *cred, struct thread *p) { - struct vnode *fvp, *tvp, *tdvp; - int error = 0; + struct vnode *fvp, *tvp, *tdvp, *dsdvp[NFSDEV_MAXMIRRORS]; + int error = 0, mirrorcnt; + char fname[PNFS_FILENAME_LEN + 1]; + fhandle_t fh; + dsdvp[0] = NULL; fvp = fromndp->ni_vp; if (ndstat) { vrele(fromndp->ni_dvp); @@ -1201,6 +1368,11 @@ nfsvno_rename(struct nameidata *fromndp, struct nameidata *tondp, */ nfsd_recalldelegation(fvp, p); } + if (error == 0 && tvp != NULL) { + nfsrv_pnfsremovesetup(tvp, p, dsdvp, &mirrorcnt, fname, &fh); + NFSD_DEBUG(4, "nfsvno_rename: pnfsremovesetup" + " dsdvp=%p\n", dsdvp[0]); + } out: if (!error) { error = VOP_RENAME(fromndp->ni_dvp, fromndp->ni_vp, @@ -1218,6 +1390,17 @@ nfsvno_rename(struct nameidata *fromndp, struct nameidata *tondp, if (error == -1) error = 0; } + + /* + * If dsdvp[0] != NULL, it was set up by nfsrv_pnfsremovesetup() and + * if the rename succeeded, the DS file for the tvp needs to be + * removed. + */ + if (error == 0 && dsdvp[0] != NULL) { + nfsrv_pnfsremove(dsdvp, mirrorcnt, fname, &fh, p); + NFSD_DEBUG(4, "nfsvno_rename: pnfsremove\n"); + } + vrele(tondp->ni_startdir); nfsvno_relpathbuf(tondp); out1: @@ -1379,10 +1562,27 @@ nfsvno_fsync(struct vnode *vp, u_int64_t off, int cnt, struct ucred *cred, int nfsvno_statfs(struct vnode *vp, struct statfs *sf) { + struct statfs *tsf; int error; + tsf = NULL; + if (nfsrv_devidcnt > 0) { + /* For a pNFS service, get the DS numbers. */ + tsf = malloc(sizeof(*tsf), M_TEMP, M_WAITOK | M_ZERO); + error = nfsrv_pnfsstatfs(tsf); + if (error != 0) { + free(tsf, M_TEMP); + tsf = NULL; + } + } error = VFS_STATFS(vp->v_mount, sf); if (error == 0) { + if (tsf != NULL) { + sf->f_blocks = tsf->f_blocks; + sf->f_bavail = tsf->f_bavail; + sf->f_bfree = tsf->f_bfree; + sf->f_bsize = tsf->f_bsize; + } /* * Since NFS handles these values as unsigned on the * wire, there is no way to represent negative values, @@ -1395,6 +1595,7 @@ nfsvno_statfs(struct vnode *vp, struct statfs *sf) if (sf->f_ffree < 0) sf->f_ffree = 0; } + free(tsf, M_TEMP); NFSEXITCODE(error); return (error); } @@ -1422,6 +1623,16 @@ nfsvno_open(struct nfsrv_descript *nd, struct nameidata *ndp, vrele(ndp->ni_startdir); nd->nd_repstat = VOP_CREATE(ndp->ni_dvp, &ndp->ni_vp, &ndp->ni_cnd, &nvap->na_vattr); + /* For a pNFS server, create the data file on a DS. */ + if (nd->nd_repstat == 0) { + /* + * Create a data file on a DS for a pNFS server. + * This function just returns if not + * running a pNFS DS or the creation fails. + */ + nfsrv_pnfscreate(ndp->ni_vp, &nvap->na_vattr, + cred, p); + } vput(ndp->ni_dvp); nfsvno_relpathbuf(ndp); if (!nd->nd_repstat) { @@ -1505,7 +1716,7 @@ nfsvno_open(struct nfsrv_descript *nd, struct nameidata *ndp, */ int nfsvno_updfilerev(struct vnode *vp, struct nfsvattr *nvap, - struct ucred *cred, struct thread *p) + struct nfsrv_descript *nd, struct thread *p) { struct vattr va; @@ -1516,8 +1727,8 @@ nfsvno_updfilerev(struct vnode *vp, struct nfsvattr *nvap, if ((vp->v_iflag & VI_DOOMED) != 0) return (ESTALE); } - (void) VOP_SETATTR(vp, &va, cred); - (void) nfsvno_getattr(vp, nvap, cred, p, 1); + (void) VOP_SETATTR(vp, &va, nd->nd_cred); + (void) nfsvno_getattr(vp, nvap, nd, p, 1, NULL); return (0); } @@ -1530,11 +1741,25 @@ nfsvno_fillattr(struct nfsrv_descript *nd, struct mount *mp, struct vnode *vp, struct ucred *cred, struct thread *p, int isdgram, int reterr, int supports_nfsv4acls, int at_root, uint64_t mounted_on_fileno) { + struct statfs *sf; int error; + sf = NULL; + if (nfsrv_devidcnt > 0 && + (NFSISSET_ATTRBIT(attrbitp, NFSATTRBIT_SPACEAVAIL) || + NFSISSET_ATTRBIT(attrbitp, NFSATTRBIT_SPACEFREE) || + NFSISSET_ATTRBIT(attrbitp, NFSATTRBIT_SPACETOTAL))) { + sf = malloc(sizeof(*sf), M_TEMP, M_WAITOK | M_ZERO); + error = nfsrv_pnfsstatfs(sf); + if (error != 0) { + free(sf, M_TEMP); + sf = NULL; + } + } error = nfsv4_fillattr(nd, mp, vp, NULL, &nvap->na_vattr, fhp, rderror, attrbitp, cred, p, isdgram, reterr, supports_nfsv4acls, at_root, - mounted_on_fileno); + mounted_on_fileno, sf); + free(sf, M_TEMP); NFSEXITCODE2(0, nd); return (error); } @@ -1601,8 +1826,8 @@ nfsrvd_readdir(struct nfsrv_descript *nd, int isdgram, siz = ((cnt + DIRBLKSIZ - 1) & ~(DIRBLKSIZ - 1)); fullsiz = siz; if (nd->nd_flag & ND_NFSV3) { - nd->nd_repstat = getret = nfsvno_getattr(vp, &at, nd->nd_cred, - p, 1); + nd->nd_repstat = getret = nfsvno_getattr(vp, &at, nd, p, 1, + NULL); #if 0 /* * va_filerev is not sufficient as a cookie verifier, @@ -1660,7 +1885,7 @@ nfsrvd_readdir(struct nfsrv_descript *nd, int isdgram, if (!cookies && !nd->nd_repstat) nd->nd_repstat = NFSERR_PERM; if (nd->nd_flag & ND_NFSV3) { - getret = nfsvno_getattr(vp, &at, nd->nd_cred, p, 1); + getret = nfsvno_getattr(vp, &at, nd, p, 1, NULL); if (!nd->nd_repstat) nd->nd_repstat = getret; } @@ -1875,7 +2100,7 @@ nfsrvd_readdirplus(struct nfsrv_descript *nd, int isdgram, NFSZERO_ATTRBIT(&attrbits); } fullsiz = siz; - nd->nd_repstat = getret = nfsvno_getattr(vp, &at, nd->nd_cred, p, 1); + nd->nd_repstat = getret = nfsvno_getattr(vp, &at, nd, p, 1, NULL); if (!nd->nd_repstat) { if (off && verf != at.na_filerev) { /* @@ -1935,7 +2160,7 @@ nfsrvd_readdirplus(struct nfsrv_descript *nd, int isdgram, if (io.uio_resid) siz -= io.uio_resid; - getret = nfsvno_getattr(vp, &at, nd->nd_cred, p, 1); + getret = nfsvno_getattr(vp, &at, nd, p, 1, NULL); if (!cookies && !nd->nd_repstat) nd->nd_repstat = NFSERR_PERM; @@ -2175,8 +2400,8 @@ nfsrvd_readdirplus(struct nfsrv_descript *nd, int isdgram, NFSNONZERO_ATTRBIT(&attrbits))) { r = nfsvno_getfh(nvp, &nfh, p); if (!r) - r = nfsvno_getattr(nvp, nvap, - nd->nd_cred, p, 1); + r = nfsvno_getattr(nvp, nvap, nd, p, + 1, &attrbits); if (r == 0 && is_zfs == 1 && nfsrv_enable_crossmntpt != 0 && (nd->nd_flag & ND_NFSV4) != 0 && @@ -3084,8 +3309,15 @@ nfssvc_nfsd(struct thread *td, struct nfssvc_args *uap) struct file *fp; struct nfsd_addsock_args sockarg; struct nfsd_nfsd_args nfsdarg; + struct nfsd_nfsd_oargs onfsdarg; + struct nfsd_pnfsd_args pnfsdarg; + struct vnode *vp, *nvp, *curdvp; + struct pnfsdsfile *pf; + struct nfsdevice *ds, *fds; cap_rights_t rights; - int error; + int buflen, error, ret; + char *buf, *cp, *cp2, *cp3; + char fname[PNFS_FILENAME_LEN + 1]; if (uap->flag & NFSSVC_NFSDADDSOCK) { error = copyin(uap->argp, (caddr_t)&sockarg, sizeof (sockarg)); @@ -3112,11 +3344,141 @@ nfssvc_nfsd(struct thread *td, struct nfssvc_args *uap) error = EINVAL; goto out; } - error = copyin(uap->argp, (caddr_t)&nfsdarg, - sizeof (nfsdarg)); + if ((uap->flag & NFSSVC_NEWSTRUCT) == 0) { + error = copyin(uap->argp, &onfsdarg, sizeof(onfsdarg)); + if (error == 0) { + nfsdarg.principal = onfsdarg.principal; + nfsdarg.minthreads = onfsdarg.minthreads; + nfsdarg.maxthreads = onfsdarg.maxthreads; + nfsdarg.version = 1; + nfsdarg.addr = NULL; + nfsdarg.addrlen = 0; + nfsdarg.dnshost = NULL; + nfsdarg.dnshostlen = 0; + nfsdarg.mirrorcnt = 1; + } + } else + error = copyin(uap->argp, &nfsdarg, sizeof(nfsdarg)); if (error) goto out; + if (nfsdarg.addrlen > 0 && nfsdarg.addrlen < 10000 && + nfsdarg.dnshostlen > 0 && nfsdarg.dnshostlen < 10000 && + nfsdarg.dspathlen > 0 && nfsdarg.dspathlen < 10000 && + nfsdarg.mirrorcnt >= 1 && + nfsdarg.mirrorcnt <= NFSDEV_MAXMIRRORS && + nfsdarg.addr != NULL && nfsdarg.dnshost != NULL && + nfsdarg.dspath != NULL) { + NFSD_DEBUG(1, "addrlen=%d dspathlen=%d dnslen=%d" + " mirrorcnt=%d\n", nfsdarg.addrlen, + nfsdarg.dspathlen, nfsdarg.dnshostlen, + nfsdarg.mirrorcnt); + cp = malloc(nfsdarg.addrlen + 1, M_TEMP, M_WAITOK); + error = copyin(nfsdarg.addr, cp, nfsdarg.addrlen); + if (error != 0) { + free(cp, M_TEMP); + goto out; + } + cp[nfsdarg.addrlen] = '\0'; /* Ensure nul term. */ + nfsdarg.addr = cp; + cp = malloc(nfsdarg.dnshostlen + 1, M_TEMP, M_WAITOK); + error = copyin(nfsdarg.dnshost, cp, nfsdarg.dnshostlen); + if (error != 0) { + free(nfsdarg.addr, M_TEMP); + free(cp, M_TEMP); + goto out; + } + cp[nfsdarg.dnshostlen] = '\0'; /* Ensure nul term. */ + nfsdarg.dnshost = cp; + cp = malloc(nfsdarg.dspathlen + 1, M_TEMP, M_WAITOK); + error = copyin(nfsdarg.dspath, cp, nfsdarg.dspathlen); + if (error != 0) { + free(nfsdarg.addr, M_TEMP); + free(nfsdarg.dnshost, M_TEMP); + free(cp, M_TEMP); + goto out; + } + cp[nfsdarg.dspathlen] = '\0'; /* Ensure nul term. */ + nfsdarg.dspath = cp; + } else { + nfsdarg.addr = NULL; + nfsdarg.addrlen = 0; + nfsdarg.dnshost = NULL; + nfsdarg.dnshostlen = 0; + nfsdarg.dspath = NULL; + nfsdarg.dspathlen = 0; + nfsdarg.mirrorcnt = 1; + } error = nfsrvd_nfsd(td, &nfsdarg); + free(nfsdarg.addr, M_TEMP); + free(nfsdarg.dnshost, M_TEMP); + free(nfsdarg.dspath, M_TEMP); + } else if (uap->flag & NFSSVC_PNFSDS) { + error = copyin(uap->argp, &pnfsdarg, sizeof(pnfsdarg)); + if (error == 0 && pnfsdarg.op == PNFSDOP_DELDSSERVER) { + cp = malloc(PATH_MAX + 1, M_TEMP, M_WAITOK); + error = copyinstr(pnfsdarg.dspath, cp, PATH_MAX + 1, + NULL); + if (error == 0) + error = nfsrv_deldsserver(cp, td); + free(cp, M_TEMP); + } else if (error == 0 && pnfsdarg.op == PNFSDOP_COPYMR) { + cp = malloc(PATH_MAX + 1, M_TEMP, M_WAITOK); + buflen = sizeof(*pf) * NFSDEV_MAXMIRRORS; + buf = malloc(buflen, M_TEMP, M_WAITOK); + error = copyinstr(pnfsdarg.mdspath, cp, PATH_MAX + 1, + NULL); + NFSD_DEBUG(4, "pnfsdcopymr cp mdspath=%d\n", error); + if (error == 0 && pnfsdarg.dspath != NULL) { + cp2 = malloc(PATH_MAX + 1, M_TEMP, M_WAITOK); + error = copyinstr(pnfsdarg.dspath, cp2, + PATH_MAX + 1, NULL); + NFSD_DEBUG(4, "pnfsdcopymr cp dspath=%d\n", + error); + } else + cp2 = NULL; + if (error == 0 && pnfsdarg.curdspath != NULL) { + cp3 = malloc(PATH_MAX + 1, M_TEMP, M_WAITOK); + error = copyinstr(pnfsdarg.curdspath, cp3, + PATH_MAX + 1, NULL); + NFSD_DEBUG(4, "pnfsdcopymr cp curdspath=%d\n", + error); + } else + cp3 = NULL; + curdvp = NULL; + fds = NULL; + if (error == 0) + error = nfsrv_mdscopymr(cp, cp2, cp3, buf, + &buflen, fname, td, &vp, &nvp, &pf, &ds, + &fds); + NFSD_DEBUG(4, "nfsrv_mdscopymr=%d\n", error); + if (error == 0) { + if (pf->dsf_dir >= nfsrv_dsdirsize) { + printf("copymr: dsdir out of range\n"); + pf->dsf_dir = 0; + } + NFSD_DEBUG(4, "copymr: buflen=%d\n", buflen); + error = nfsrv_copymr(vp, nvp, + ds->nfsdev_dsdir[pf->dsf_dir], ds, pf, + (struct pnfsdsfile *)buf, + buflen / sizeof(*pf), td->td_ucred, td); + vput(vp); + vput(nvp); + if (fds != NULL && error == 0) { + curdvp = fds->nfsdev_dsdir[pf->dsf_dir]; + ret = vn_lock(curdvp, LK_EXCLUSIVE); + if (ret == 0) { + nfsrv_dsremove(curdvp, fname, + td->td_ucred, td); + NFSVOPUNLOCK(curdvp, 0); + } + } + NFSD_DEBUG(4, "nfsrv_copymr=%d\n", error); + } + free(cp, M_TEMP); + free(cp2, M_TEMP); + free(cp3, M_TEMP); + free(buf, M_TEMP); + } } else { error = nfssvc_srvcall(td, uap, td->td_ucred); } @@ -3335,6 +3697,1896 @@ nfsrv_backupstable(void) } } +/* + * Create a DS data file for nfsrv_pnfscreate(). Called for each mirror. + * The arguments are in a structure, so that they can be passed through + * taskqueue for a kernel process to execute this function. + */ +struct nfsrvdscreate { + int done; + int inprog; + struct task tsk; + struct ucred *tcred; + struct vnode *dvp; + NFSPROC_T *p; + struct pnfsdsfile *pf; + int err; + fhandle_t fh; + struct vattr va; + struct vattr createva; +}; + +int +nfsrv_dscreate(struct vnode *dvp, struct vattr *vap, struct vattr *nvap, + fhandle_t *fhp, struct pnfsdsfile *pf, struct pnfsdsattr *dsa, + char *fnamep, struct ucred *tcred, NFSPROC_T *p, struct vnode **nvpp) +{ + struct vnode *nvp; + struct nameidata named; + struct vattr va; + char *bufp; + u_long *hashp; + struct nfsnode *np; + struct nfsmount *nmp; + int error; + + NFSNAMEICNDSET(&named.ni_cnd, tcred, CREATE, + LOCKPARENT | LOCKLEAF | SAVESTART | NOCACHE); + nfsvno_setpathbuf(&named, &bufp, &hashp); + named.ni_cnd.cn_lkflags = LK_EXCLUSIVE; + named.ni_cnd.cn_thread = p; + named.ni_cnd.cn_nameptr = bufp; + if (fnamep != NULL) { + strlcpy(bufp, fnamep, PNFS_FILENAME_LEN + 1); + named.ni_cnd.cn_namelen = strlen(bufp); + } else + named.ni_cnd.cn_namelen = nfsrv_putfhname(fhp, bufp); + NFSD_DEBUG(4, "nfsrv_dscreate: dvp=%p fname=%s\n", dvp, bufp); + + /* Create the date file in the DS mount. */ + error = NFSVOPLOCK(dvp, LK_EXCLUSIVE); + if (error == 0) { + error = VOP_CREATE(dvp, &nvp, &named.ni_cnd, vap); + NFSVOPUNLOCK(dvp, 0); + if (error == 0) { + /* Set the ownership of the file. */ + error = VOP_SETATTR(nvp, nvap, tcred); + NFSD_DEBUG(4, "nfsrv_dscreate:" + " setattr-uid=%d\n", error); + if (error != 0) + vput(nvp); + } + if (error != 0) + printf("pNFS: pnfscreate failed=%d\n", error); + } else + printf("pNFS: pnfscreate vnlock=%d\n", error); + if (error == 0) { + np = VTONFS(nvp); + nmp = VFSTONFS(nvp->v_mount); + if (strcmp(nvp->v_mount->mnt_vfc->vfc_name, "nfs") + != 0 || nmp->nm_nam->sa_len > sizeof( + struct sockaddr_in6) || + np->n_fhp->nfh_len != NFSX_MYFH) { + printf("Bad DS file: fstype=%s salen=%d" + " fhlen=%d\n", + nvp->v_mount->mnt_vfc->vfc_name, + nmp->nm_nam->sa_len, np->n_fhp->nfh_len); + error = ENOENT; + } + + /* Set extattrs for the DS on the MDS file. */ + if (error == 0) { + if (dsa != NULL) { + error = VOP_GETATTR(nvp, &va, tcred); + if (error == 0) { + dsa->dsa_filerev = va.va_filerev; + dsa->dsa_size = va.va_size; + dsa->dsa_atime = va.va_atime; + dsa->dsa_mtime = va.va_mtime; + } + } + if (error == 0) { + NFSBCOPY(np->n_fhp->nfh_fh, &pf->dsf_fh, + NFSX_MYFH); + NFSBCOPY(nmp->nm_nam, &pf->dsf_sin, + nmp->nm_nam->sa_len); + NFSBCOPY(named.ni_cnd.cn_nameptr, + pf->dsf_filename, + sizeof(pf->dsf_filename)); + } + } else + printf("pNFS: pnfscreate can't get DS" + " attr=%d\n", error); + if (nvpp != NULL && error == 0) + *nvpp = nvp; + else + vput(nvp); + } + nfsvno_relpathbuf(&named); + return (error); +} + +/* + * Start up the thread that will execute nfsrv_dscreate(). + */ +static void +start_dscreate(void *arg, int pending) +{ + struct nfsrvdscreate *dsc; + + dsc = (struct nfsrvdscreate *)arg; + dsc->err = nfsrv_dscreate(dsc->dvp, &dsc->createva, &dsc->va, &dsc->fh, + dsc->pf, NULL, NULL, dsc->tcred, dsc->p, NULL); + dsc->done = 1; + NFSD_DEBUG(4, "start_dscreate: err=%d\n", dsc->err); +} + +/* + * Create a pNFS data file on the Data Server(s). + */ +static void +nfsrv_pnfscreate(struct vnode *vp, struct vattr *vap, struct ucred *cred, + NFSPROC_T *p) +{ + struct nfsrvdscreate *dsc, *tdsc; + struct nfsdevice *ds, *mds; + struct mount *mp; + struct pnfsdsfile *pf, *tpf; + struct pnfsdsattr dsattr; + struct vattr va; + struct vnode *dvp[NFSDEV_MAXMIRRORS]; + struct nfsmount *nmp; + fhandle_t fh; + uid_t vauid; + gid_t vagid; + u_short vamode; + struct ucred *tcred; + int dsdir[NFSDEV_MAXMIRRORS], error, i, mirrorcnt, ret; + int failpos, timo; + + /* Get a DS server directory in a round-robin order. */ + mirrorcnt = 1; + NFSDDSLOCK(); + TAILQ_FOREACH(ds, &nfsrv_devidhead, nfsdev_list) { + if (ds->nfsdev_nmp != NULL) + break; + } + if (ds == NULL) { + NFSDDSUNLOCK(); + NFSD_DEBUG(4, "nfsrv_pnfscreate: no srv\n"); + return; + } + i = dsdir[0] = ds->nfsdev_nextdir; + ds->nfsdev_nextdir = (ds->nfsdev_nextdir + 1) % nfsrv_dsdirsize; + dvp[0] = ds->nfsdev_dsdir[i]; + if (nfsrv_maxpnfsmirror > 1) { + mds = TAILQ_NEXT(ds, nfsdev_list); + TAILQ_FOREACH_FROM(mds, &nfsrv_devidhead, nfsdev_list) { + if (mds->nfsdev_nmp != NULL) { + dsdir[mirrorcnt] = i; + dvp[mirrorcnt] = mds->nfsdev_dsdir[i]; + mirrorcnt++; + if (mirrorcnt >= nfsrv_maxpnfsmirror) + break; + } + } + } + /* Put at end of list to implement round-robin usage. */ + TAILQ_REMOVE(&nfsrv_devidhead, ds, nfsdev_list); + TAILQ_INSERT_TAIL(&nfsrv_devidhead, ds, nfsdev_list); + NFSDDSUNLOCK(); + dsc = NULL; + if (mirrorcnt > 1) + tdsc = dsc = malloc(sizeof(*dsc) * (mirrorcnt - 1), M_TEMP, + M_WAITOK | M_ZERO); + tpf = pf = malloc(sizeof(*pf) * mirrorcnt, M_TEMP, M_WAITOK | M_ZERO); + + error = nfsvno_getfh(vp, &fh, p); + if (error == 0) + error = VOP_GETATTR(vp, &va, cred); + if (error == 0) { + /* Set the attributes for "vp" to Setattr the DS vp. */ + vauid = va.va_uid; + vagid = va.va_gid; + vamode = va.va_mode; + VATTR_NULL(&va); + va.va_uid = vauid; + va.va_gid = vagid; + va.va_mode = vamode; + va.va_size = 0; + } else + printf("pNFS: pnfscreate getfh+attr=%d\n", error); + + NFSD_DEBUG(4, "nfsrv_pnfscreate: cruid=%d crgid=%d\n", cred->cr_uid, + cred->cr_gid); + /* Make data file name based on FH. */ + tcred = newnfs_getcred(); + + /* + * Create the file on each DS mirror, using kernel process(es) for the + * additional mirrors. + */ + failpos = -1; + for (i = 0; i < mirrorcnt - 1 && error == 0; i++, tpf++, tdsc++) { + tpf->dsf_dir = dsdir[i]; + tdsc->tcred = tcred; + tdsc->p = p; + tdsc->pf = tpf; + tdsc->createva = *vap; + tdsc->fh = fh; + tdsc->va = va; + tdsc->dvp = dvp[i]; + tdsc->done = 0; + tdsc->inprog = 0; + tdsc->err = 0; + ret = EIO; + if (nfs_pnfsiothreads != 0) { + ret = nfs_pnfsio(start_dscreate, tdsc); + NFSD_DEBUG(4, "nfsrv_pnfscreate: nfs_pnfsio=%d\n", ret); + } + if (ret != 0) { + ret = nfsrv_dscreate(dvp[i], vap, &va, &fh, tpf, NULL, + NULL, tcred, p, NULL); + if (ret != 0) { + KASSERT(error == 0, ("nfsrv_dscreate err=%d", + error)); + if (failpos == -1 && nfsds_failerr(ret)) + failpos = i; + else + error = ret; + } + } + } + if (error == 0) { + tpf->dsf_dir = dsdir[mirrorcnt - 1]; + error = nfsrv_dscreate(dvp[mirrorcnt - 1], vap, &va, &fh, tpf, + &dsattr, NULL, tcred, p, NULL); + if (failpos == -1 && mirrorcnt > 1 && nfsds_failerr(error)) { + failpos = mirrorcnt - 1; + error = 0; + } + } + timo = hz / 50; /* Wait for 20msec. */ + if (timo < 1) + timo = 1; + /* Wait for kernel task(s) to complete. */ + for (tdsc = dsc, i = 0; i < mirrorcnt - 1; i++, tdsc++) { + while (tdsc->inprog != 0 && tdsc->done == 0) + tsleep(&tdsc->tsk, PVFS, "srvdcr", timo); + if (tdsc->err != 0) { + if (failpos == -1 && nfsds_failerr(tdsc->err)) + failpos = i; + else if (error == 0) + error = tdsc->err; + } + } + + /* + * If failpos has been set, that mirror has failed, so it needs + * to be disabled. + */ + if (failpos >= 0) { + nmp = VFSTONFS(dvp[failpos]->v_mount); + NFSLOCKMNT(nmp); + if ((nmp->nm_privflag & (NFSMNTP_FORCEDISM | + NFSMNTP_CANCELRPCS)) == 0) { + nmp->nm_privflag |= NFSMNTP_CANCELRPCS; + NFSUNLOCKMNT(nmp); + ds = nfsrv_deldsnmp(nmp, p); + NFSD_DEBUG(4, "dscreatfail fail=%d ds=%p\n", failpos, + ds); + if (ds != NULL) + nfsrv_killrpcs(nmp); + NFSLOCKMNT(nmp); + nmp->nm_privflag &= ~NFSMNTP_CANCELRPCS; + wakeup(nmp); + } + NFSUNLOCKMNT(nmp); + } + + NFSFREECRED(tcred); + if (error == 0) { + ASSERT_VOP_ELOCKED(vp, "nfsrv_pnfscreate vp"); + error = vn_start_write(vp, &mp, V_WAIT); + if (error == 0) { + error = vn_extattr_set(vp, IO_NODELOCKED, + EXTATTR_NAMESPACE_SYSTEM, "pnfsd.dsfile", + sizeof(*pf) * mirrorcnt, (char *)pf, p); + if (error == 0) + error = vn_extattr_set(vp, IO_NODELOCKED, + EXTATTR_NAMESPACE_SYSTEM, "pnfsd.dsattr", + sizeof(dsattr), (char *)&dsattr, p); + vn_finished_write(mp); + if (error != 0) + printf("pNFS: pnfscreate setextattr=%d\n", + error); + } else + printf("pNFS: pnfscreate startwrite=%d\n", error); + } else + printf("pNFS: pnfscreate=%d\n", error); + free(pf, M_TEMP); + free(dsc, M_TEMP); +} + +/* + * Get the information needed to remove the pNFS Data Server file from the + * Metadata file. Upon success, ddvp is set non-NULL to the locked + * DS directory vnode. The caller must unlock *ddvp when done with it. + */ +static void +nfsrv_pnfsremovesetup(struct vnode *vp, NFSPROC_T *p, struct vnode **dvpp, + int *mirrorcntp, char *fname, fhandle_t *fhp) +{ + struct vattr va; + struct ucred *tcred; + char *buf; + int buflen, error; + + dvpp[0] = NULL; + /* If not an exported regular file or not a pNFS server, just return. */ + if (vp->v_type != VREG || (vp->v_mount->mnt_flag & MNT_EXPORTED) == 0 || + nfsrv_devidcnt == 0) + return; + + /* Check to see if this is the last hard link. */ + tcred = newnfs_getcred(); + error = VOP_GETATTR(vp, &va, tcred); + NFSFREECRED(tcred); + if (error != 0) { + printf("pNFS: nfsrv_pnfsremovesetup getattr=%d\n", error); + return; + } + if (va.va_nlink > 1) + return; + + error = nfsvno_getfh(vp, fhp, p); + if (error != 0) { + printf("pNFS: nfsrv_pnfsremovesetup getfh=%d\n", error); + return; + } + + buflen = 1024; + buf = malloc(buflen, M_TEMP, M_WAITOK); + /* Get the directory vnode for the DS mount and the file handle. */ + error = nfsrv_dsgetsockmnt(vp, 0, buf, &buflen, mirrorcntp, p, dvpp, + NULL, NULL, fname, NULL, NULL, NULL, NULL, NULL); + free(buf, M_TEMP); + if (error != 0) + printf("pNFS: nfsrv_pnfsremovesetup getsockmnt=%d\n", error); +} + +/* + * Remove a DS data file for nfsrv_pnfsremove(). Called for each mirror. + * The arguments are in a structure, so that they can be passed through + * taskqueue for a kernel process to execute this function. + */ +struct nfsrvdsremove { + int done; + int inprog; + struct task tsk; + struct ucred *tcred; + struct vnode *dvp; + NFSPROC_T *p; + int err; + char fname[PNFS_FILENAME_LEN + 1]; +}; + +static int +nfsrv_dsremove(struct vnode *dvp, char *fname, struct ucred *tcred, + NFSPROC_T *p) +{ + struct nameidata named; + struct vnode *nvp; + char *bufp; + u_long *hashp; + int error; + + error = NFSVOPLOCK(dvp, LK_EXCLUSIVE); + if (error != 0) + return (error); + named.ni_cnd.cn_nameiop = DELETE; + named.ni_cnd.cn_lkflags = LK_EXCLUSIVE | LK_RETRY; + named.ni_cnd.cn_cred = tcred; + named.ni_cnd.cn_thread = p; + named.ni_cnd.cn_flags = ISLASTCN | LOCKPARENT | LOCKLEAF | SAVENAME; + nfsvno_setpathbuf(&named, &bufp, &hashp); + named.ni_cnd.cn_nameptr = bufp; + named.ni_cnd.cn_namelen = strlen(fname); + strlcpy(bufp, fname, NAME_MAX); + NFSD_DEBUG(4, "nfsrv_pnfsremove: filename=%s\n", bufp); + error = VOP_LOOKUP(dvp, &nvp, &named.ni_cnd); + NFSD_DEBUG(4, "nfsrv_pnfsremove: aft LOOKUP=%d\n", error); + if (error == 0) { + error = VOP_REMOVE(dvp, nvp, &named.ni_cnd); + vput(nvp); + } + NFSVOPUNLOCK(dvp, 0); + nfsvno_relpathbuf(&named); + if (error != 0) + printf("pNFS: nfsrv_pnfsremove failed=%d\n", error); + return (error); +} + +/* + * Start up the thread that will execute nfsrv_dsremove(). + */ +static void +start_dsremove(void *arg, int pending) +{ + struct nfsrvdsremove *dsrm; + + dsrm = (struct nfsrvdsremove *)arg; + dsrm->err = nfsrv_dsremove(dsrm->dvp, dsrm->fname, dsrm->tcred, + dsrm->p); + dsrm->done = 1; + NFSD_DEBUG(4, "start_dsremove: err=%d\n", dsrm->err); +} + +/* + * Remove a pNFS data file from a Data Server. + * nfsrv_pnfsremovesetup() must have been called before the MDS file was + * removed to set up the dvp and fill in the FH. + */ +static void +nfsrv_pnfsremove(struct vnode **dvp, int mirrorcnt, char *fname, fhandle_t *fhp, + NFSPROC_T *p) +{ + struct ucred *tcred; + struct nfsrvdsremove *dsrm, *tdsrm; + struct nfsdevice *ds; + struct nfsmount *nmp; + int failpos, i, ret, timo; + + tcred = newnfs_getcred(); + dsrm = NULL; + if (mirrorcnt > 1) + dsrm = malloc(sizeof(*dsrm) * mirrorcnt - 1, M_TEMP, M_WAITOK); + /* + * Remove the file on each DS mirror, using kernel process(es) for the + * additional mirrors. + */ + failpos = -1; + for (tdsrm = dsrm, i = 0; i < mirrorcnt - 1; i++, tdsrm++) { + tdsrm->tcred = tcred; + tdsrm->p = p; + tdsrm->dvp = dvp[i]; + strlcpy(tdsrm->fname, fname, PNFS_FILENAME_LEN + 1); + tdsrm->inprog = 0; + tdsrm->done = 0; + tdsrm->err = 0; + ret = EIO; + if (nfs_pnfsiothreads != 0) { + ret = nfs_pnfsio(start_dsremove, tdsrm); + NFSD_DEBUG(4, "nfsrv_pnfsremove: nfs_pnfsio=%d\n", ret); + } + if (ret != 0) { + ret = nfsrv_dsremove(dvp[i], fname, tcred, p); + if (failpos == -1 && nfsds_failerr(ret)) + failpos = i; + } + } + ret = nfsrv_dsremove(dvp[mirrorcnt - 1], fname, tcred, p); + if (failpos == -1 && mirrorcnt > 1 && nfsds_failerr(ret)) + failpos = mirrorcnt - 1; + timo = hz / 50; /* Wait for 20msec. */ + if (timo < 1) + timo = 1; + /* Wait for kernel task(s) to complete. */ + for (tdsrm = dsrm, i = 0; i < mirrorcnt - 1; i++, tdsrm++) { + while (tdsrm->inprog != 0 && tdsrm->done == 0) + tsleep(&tdsrm->tsk, PVFS, "srvdsrm", timo); + if (failpos == -1 && nfsds_failerr(tdsrm->err)) + failpos = i; + } + + /* + * If failpos has been set, that mirror has failed, so it needs + * to be disabled. + */ + if (failpos >= 0) { + nmp = VFSTONFS(dvp[failpos]->v_mount); + NFSLOCKMNT(nmp); + if ((nmp->nm_privflag & (NFSMNTP_FORCEDISM | + NFSMNTP_CANCELRPCS)) == 0) { + nmp->nm_privflag |= NFSMNTP_CANCELRPCS; + NFSUNLOCKMNT(nmp); + ds = nfsrv_deldsnmp(nmp, p); + NFSD_DEBUG(4, "dsremovefail fail=%d ds=%p\n", failpos, + ds); + if (ds != NULL) + nfsrv_killrpcs(nmp); + NFSLOCKMNT(nmp); + nmp->nm_privflag &= ~NFSMNTP_CANCELRPCS; + wakeup(nmp); + } + NFSUNLOCKMNT(nmp); + } + + /* Get rid all layouts for the file. */ + nfsrv_freefilelayouts(fhp); + + NFSFREECRED(tcred); + free(dsrm, M_TEMP); +} + +/* + * Generate a file name based on the file handle and put it in *bufp. + * Return the number of bytes generated. + */ +static int +nfsrv_putfhname(fhandle_t *fhp, char *bufp) +{ + int i; + uint8_t *cp; + const uint8_t *hexdigits = "0123456789abcdef"; + + cp = (uint8_t *)fhp; + for (i = 0; i < sizeof(*fhp); i++) { + bufp[2 * i] = hexdigits[(*cp >> 4) & 0xf]; + bufp[2 * i + 1] = hexdigits[*cp++ & 0xf]; + } + bufp[2 * i] = '\0'; + return (2 * i); +} + +/* + * Update the Metadata file's attributes from the DS file when a Read/Write + * layout is returned. + * Basically just call nfsrv_proxyds() with procedure == NFSPROC_LAYOUTRETURN + * so that it does a nfsrv_getattrdsrpc() and nfsrv_setextattr() on the DS file. + */ +int +nfsrv_updatemdsattr(struct vnode *vp, struct nfsvattr *nap, NFSPROC_T *p) +{ + struct ucred *tcred; + int error; + + /* Do this as root so that it won't fail with EACCES. */ + tcred = newnfs_getcred(); + error = nfsrv_proxyds(NULL, vp, 0, 0, tcred, p, NFSPROC_LAYOUTRETURN, + NULL, NULL, NULL, nap, NULL); + NFSFREECRED(tcred); + return (error); +} + +/* + * Set the NFSv4 ACL on the DS file to the same ACL as the MDS file. + */ +static int +nfsrv_dssetacl(struct vnode *vp, struct acl *aclp, struct ucred *cred, + NFSPROC_T *p) +{ + int error; + + error = nfsrv_proxyds(NULL, vp, 0, 0, cred, p, NFSPROC_SETACL, + NULL, NULL, NULL, NULL, aclp); + return (error); +} + +static int +nfsrv_proxyds(struct nfsrv_descript *nd, struct vnode *vp, off_t off, int cnt, + struct ucred *cred, struct thread *p, int ioproc, struct mbuf **mpp, + char *cp, struct mbuf **mpp2, struct nfsvattr *nap, struct acl *aclp) +{ + struct nfsmount *nmp[NFSDEV_MAXMIRRORS], *failnmp; + fhandle_t fh[NFSDEV_MAXMIRRORS]; + struct vnode *dvp[NFSDEV_MAXMIRRORS]; + struct nfsdevice *ds; + struct pnfsdsattr dsattr; + char *buf; + int buflen, error, failpos, i, mirrorcnt, origmircnt, trycnt; + + NFSD_DEBUG(4, "in nfsrv_proxyds\n"); + /* + * If not a regular file, not exported or not a pNFS server, + * just return ENOENT. + */ + if (vp->v_type != VREG || (vp->v_mount->mnt_flag & MNT_EXPORTED) == 0 || + nfsrv_devidcnt == 0) + return (ENOENT); + + buflen = 1024; + buf = malloc(buflen, M_TEMP, M_WAITOK); + error = 0; + + /* + * For Getattr, get the Change attribute (va_filerev) and size (va_size) + * from the MetaData file's extended attribute. + */ + if (ioproc == NFSPROC_GETATTR) { + error = vn_extattr_get(vp, IO_NODELOCKED, + EXTATTR_NAMESPACE_SYSTEM, "pnfsd.dsattr", &buflen, buf, + p); + if (error == 0 && buflen != sizeof(dsattr)) + error = ENXIO; + if (error == 0) { + NFSBCOPY(buf, &dsattr, buflen); + nap->na_filerev = dsattr.dsa_filerev; + nap->na_size = dsattr.dsa_size; + nap->na_atime = dsattr.dsa_atime; + nap->na_mtime = dsattr.dsa_mtime; + + /* + * If nfsrv_pnfsgetdsattr is 0 or nfsrv_checkdsattr() + * returns 0, just return now. nfsrv_checkdsattr() + * returns 0 if there is no Read/Write layout + * plus either an Open/Write_access or Write + * delegation issued to a client for the file. + */ + if (nfsrv_pnfsgetdsattr == 0 || + nfsrv_checkdsattr(nd, vp, p) == 0) { + free(buf, M_TEMP); + return (error); + } + } + + /* + * Clear ENOATTR so the code below will attempt to do a + * nfsrv_getattrdsrpc() to get the attributes and (re)create + * the extended attribute. + */ + if (error == ENOATTR) + error = 0; + } + + origmircnt = -1; + trycnt = 0; +tryagain: + if (error == 0) { + buflen = 1024; + error = nfsrv_dsgetsockmnt(vp, LK_SHARED, buf, &buflen, + &mirrorcnt, p, dvp, fh, NULL, NULL, NULL, NULL, NULL, + NULL, NULL); + if (error == 0) { + for (i = 0; i < mirrorcnt; i++) + nmp[i] = VFSTONFS(dvp[i]->v_mount); + } else + printf("pNFS: proxy getextattr sockaddr=%d\n", error); + } else + printf("pNFS: nfsrv_dsgetsockmnt=%d\n", error); + if (error == 0) { + failpos = -1; + if (origmircnt == -1) + origmircnt = mirrorcnt; + /* + * If failpos is set to a mirror#, then that mirror has + * failed and will be disabled. For Read and Getattr, the + * function only tries one mirror, so if that mirror has + * failed, it will need to be retried. As such, increment + * tryitagain for these cases. + * For Write, Setattr and Setacl, the function tries all + * mirrors and will not return an error for the case where + * one mirror has failed. For these cases, the functioning + * mirror(s) will have been modified, so a retry isn't + * necessary. These functions will set failpos for the + * failed mirror#. + */ + if (ioproc == NFSPROC_READDS) { + error = nfsrv_readdsrpc(fh, off, cnt, cred, p, nmp[0], + mpp, mpp2); + if (nfsds_failerr(error) && mirrorcnt > 1) { + /* + * Setting failpos will cause the mirror + * to be disabled and then a retry of this + * read is required. + */ + failpos = 0; + error = 0; + trycnt++; + } + } else if (ioproc == NFSPROC_WRITEDS) + error = nfsrv_writedsrpc(fh, off, cnt, cred, p, vp, + &nmp[0], mirrorcnt, mpp, cp, &failpos); + else if (ioproc == NFSPROC_SETATTR) + error = nfsrv_setattrdsrpc(fh, cred, p, vp, &nmp[0], + mirrorcnt, nap, &failpos); + else if (ioproc == NFSPROC_SETACL) + error = nfsrv_setacldsrpc(fh, cred, p, vp, &nmp[0], + mirrorcnt, aclp, &failpos); + else { + error = nfsrv_getattrdsrpc(&fh[mirrorcnt - 1], cred, p, + vp, nmp[mirrorcnt - 1], nap); + if (nfsds_failerr(error) && mirrorcnt > 1) { + /* + * Setting failpos will cause the mirror + * to be disabled and then a retry of this + * getattr is required. + */ + failpos = mirrorcnt - 1; + error = 0; + trycnt++; + } + } + ds = NULL; + if (failpos >= 0) { + failnmp = nmp[failpos]; + NFSLOCKMNT(failnmp); + if ((failnmp->nm_privflag & (NFSMNTP_FORCEDISM | + NFSMNTP_CANCELRPCS)) == 0) { + failnmp->nm_privflag |= NFSMNTP_CANCELRPCS; + NFSUNLOCKMNT(failnmp); + ds = nfsrv_deldsnmp(failnmp, p); + NFSD_DEBUG(4, "dsldsnmp fail=%d ds=%p\n", + failpos, ds); + if (ds != NULL) + nfsrv_killrpcs(failnmp); + NFSLOCKMNT(failnmp); + failnmp->nm_privflag &= ~NFSMNTP_CANCELRPCS; + wakeup(failnmp); + } + NFSUNLOCKMNT(failnmp); + } + for (i = 0; i < mirrorcnt; i++) + NFSVOPUNLOCK(dvp[i], 0); + NFSD_DEBUG(4, "nfsrv_proxyds: aft RPC=%d trya=%d\n", error, + trycnt); + /* Try the Read/Getattr again if a mirror was deleted. */ + if (ds != NULL && trycnt > 0 && trycnt < origmircnt) + goto tryagain; + } else { + /* Return ENOENT for any Extended Attribute error. */ + error = ENOENT; + } + free(buf, M_TEMP); + NFSD_DEBUG(4, "nfsrv_proxyds: error=%d\n", error); + return (error); +} + +/* + * Get the DS mount point, fh and directory from the "pnfsd.dsfile" extended + * attribute. + * newnmpp - If it points to a non-NULL nmp, that is the destination and needs + * to be checked. If it points to a NULL nmp, then it returns + * a suitable destination. + * curnmp - If non-NULL, it is the source mount for the copy. + */ +int +nfsrv_dsgetsockmnt(struct vnode *vp, int lktype, char *buf, int *buflenp, + int *mirrorcntp, NFSPROC_T *p, struct vnode **dvpp, fhandle_t *fhp, + char *devid, char *fnamep, struct vnode **nvpp, struct nfsmount **newnmpp, + struct nfsmount *curnmp, int *ippos, int *dsdirp) +{ + struct vnode *dvp, *nvp, **tdvpp; + struct nfsmount *nmp, *newnmp; + struct sockaddr *sad; + struct sockaddr_in *sin; + struct nfsdevice *ds, *fndds; + struct pnfsdsfile *pf; + uint32_t dsdir; + int error, fhiszero, fnd, gotone, i, mirrorcnt; + + ASSERT_VOP_LOCKED(vp, "nfsrv_dsgetsockmnt vp"); + *mirrorcntp = 1; + tdvpp = dvpp; + if (nvpp != NULL) + *nvpp = NULL; + if (dvpp != NULL) + *dvpp = NULL; + if (ippos != NULL) + *ippos = -1; + if (newnmpp != NULL) + newnmp = *newnmpp; + else + newnmp = NULL; + error = vn_extattr_get(vp, IO_NODELOCKED, EXTATTR_NAMESPACE_SYSTEM, + "pnfsd.dsfile", buflenp, buf, p); + mirrorcnt = *buflenp / sizeof(*pf); + if (error == 0 && (mirrorcnt < 1 || mirrorcnt > NFSDEV_MAXMIRRORS || + *buflenp != sizeof(*pf) * mirrorcnt)) + error = ENOATTR; + + pf = (struct pnfsdsfile *)buf; + /* If curnmp != NULL, check for a match in the mirror list. */ + if (curnmp != NULL && error == 0) { + fnd = 0; + for (i = 0; i < mirrorcnt; i++, pf++) { + sad = (struct sockaddr *)&pf->dsf_sin; + if (nfsaddr2_match(sad, curnmp->nm_nam)) { + if (ippos != NULL) + *ippos = i; + fnd = 1; + break; + } + } + if (fnd == 0) + error = ENXIO; + } + + gotone = 0; + pf = (struct pnfsdsfile *)buf; + NFSD_DEBUG(4, "nfsrv_dsgetsockmnt: mirrorcnt=%d err=%d\n", mirrorcnt, + error); + for (i = 0; i < mirrorcnt && error == 0; i++, pf++) { + fhiszero = 0; + sad = (struct sockaddr *)&pf->dsf_sin; + sin = &pf->dsf_sin; + dsdir = pf->dsf_dir; + if (dsdir >= nfsrv_dsdirsize) { + printf("nfsrv_dsgetsockmnt: dsdir=%d\n", dsdir); + error = ENOATTR; + } else if (nvpp != NULL && newnmp != NULL && + nfsaddr2_match(sad, newnmp->nm_nam)) + error = EEXIST; + if (error == 0) { + if (ippos != NULL && curnmp == NULL && + sad->sa_family == AF_INET && + sin->sin_addr.s_addr == 0) + *ippos = i; + if (NFSBCMP(&zerofh, &pf->dsf_fh, sizeof(zerofh)) == 0) + fhiszero = 1; + /* Use the socket address to find the mount point. */ + fndds = NULL; + NFSDDSLOCK(); + TAILQ_FOREACH(ds, &nfsrv_devidhead, nfsdev_list) { + if (ds->nfsdev_nmp != NULL) { + dvp = ds->nfsdev_dvp; + nmp = VFSTONFS(dvp->v_mount); + if (nmp != ds->nfsdev_nmp) + printf("different2 nmp %p %p\n", + nmp, ds->nfsdev_nmp); + if (nfsaddr2_match(sad, nmp->nm_nam)) + fndds = ds; + else if (newnmpp != NULL && + newnmp == NULL && + (*newnmpp == NULL || fndds == NULL)) + /* + * Return a destination for the + * copy in newnmpp. Choose the + * last valid one before the + * source mirror, so it isn't + * always the first one. + */ + *newnmpp = nmp; + } + } + NFSDDSUNLOCK(); + if (fndds != NULL) { + dvp = fndds->nfsdev_dsdir[dsdir]; + if (lktype != 0 || fhiszero != 0 || + (nvpp != NULL && *nvpp == NULL)) { + if (fhiszero != 0) + error = vn_lock(dvp, + LK_EXCLUSIVE); + else if (lktype != 0) + error = vn_lock(dvp, lktype); + else + error = vn_lock(dvp, LK_SHARED); + /* + * If the file handle is all 0's, try to + * do a Lookup against the DS to acquire + * it. + * If dvpp == NULL or the Lookup fails, + * unlock dvp after the call. + */ + if (error == 0 && (fhiszero != 0 || + (nvpp != NULL && *nvpp == NULL))) { + error = nfsrv_pnfslookupds(vp, + dvp, pf, &nvp, p); + if (error == 0) { + if (fhiszero != 0) + nfsrv_pnfssetfh( + vp, pf, + nvp, p); + if (nvpp != NULL && + *nvpp == NULL) { + *nvpp = nvp; + *dsdirp = dsdir; + } else + vput(nvp); + } + if (error != 0 || lktype == 0) + NFSVOPUNLOCK(dvp, 0); + } + } + if (error == 0) { + gotone++; + NFSD_DEBUG(4, "gotone=%d\n", gotone); + if (devid != NULL) { + NFSBCOPY(fndds->nfsdev_deviceid, + devid, NFSX_V4DEVICEID); + devid += NFSX_V4DEVICEID; + } + if (dvpp != NULL) + *tdvpp++ = dvp; + if (fhp != NULL) + NFSBCOPY(&pf->dsf_fh, fhp++, + NFSX_MYFH); + if (fnamep != NULL && gotone == 1) + strlcpy(fnamep, + pf->dsf_filename, + sizeof(pf->dsf_filename)); + } else + NFSD_DEBUG(4, "nfsrv_dsgetsockmnt " + "err=%d\n", error); + } + } + } + if (error == 0 && gotone == 0) + error = ENOENT; + + NFSD_DEBUG(4, "eo nfsrv_dsgetsockmnt: gotone=%d err=%d\n", gotone, + error); + if (error == 0) + *mirrorcntp = gotone; + else { + if (gotone > 0 && dvpp != NULL) { + /* + * If the error didn't occur on the first one and + * dvpp != NULL, the one(s) prior to the failure will + * have locked dvp's that need to be unlocked. + */ + for (i = 0; i < gotone; i++) { + NFSVOPUNLOCK(*dvpp, 0); + *dvpp++ = NULL; + } + } + /* + * If it found the vnode to be copied from before a failure, + * it needs to be vput()'d. + */ + if (nvpp != NULL && *nvpp != NULL) { + vput(*nvpp); + *nvpp = NULL; + } + } + return (error); +} + +/* + * Set the extended attribute for the Change attribute. + */ +static int +nfsrv_setextattr(struct vnode *vp, struct nfsvattr *nap, NFSPROC_T *p) +{ + struct pnfsdsattr dsattr; + struct mount *mp; + int error; + + ASSERT_VOP_ELOCKED(vp, "nfsrv_setextattr vp"); + error = vn_start_write(vp, &mp, V_WAIT); + if (error == 0) { + dsattr.dsa_filerev = nap->na_filerev; + dsattr.dsa_size = nap->na_size; + dsattr.dsa_atime = nap->na_atime; + dsattr.dsa_mtime = nap->na_mtime; + error = vn_extattr_set(vp, IO_NODELOCKED, + EXTATTR_NAMESPACE_SYSTEM, "pnfsd.dsattr", + sizeof(dsattr), (char *)&dsattr, p); + vn_finished_write(mp); + } + if (error != 0) + printf("pNFS: setextattr=%d\n", error); + return (error); +} + +static int +nfsrv_readdsrpc(fhandle_t *fhp, off_t off, int len, struct ucred *cred, + NFSPROC_T *p, struct nfsmount *nmp, struct mbuf **mpp, struct mbuf **mpendp) +{ + uint32_t *tl; + struct nfsrv_descript *nd; + nfsv4stateid_t st; + struct mbuf *m, *m2; + int error = 0, retlen, tlen, trimlen; + + NFSD_DEBUG(4, "in nfsrv_readdsrpc\n"); + nd = malloc(sizeof(*nd), M_TEMP, M_WAITOK | M_ZERO); + *mpp = NULL; + /* + * Use a stateid where other is an alternating 01010 pattern and + * seqid is 0xffffffff. This value is not defined as special by + * the RFC and is used by the FreeBSD NFS server to indicate an + * MDS->DS proxy operation. + */ + st.other[0] = 0x55555555; + st.other[1] = 0x55555555; + st.other[2] = 0x55555555; + st.seqid = 0xffffffff; + nfscl_reqstart(nd, NFSPROC_READDS, nmp, (u_int8_t *)fhp, sizeof(*fhp), + NULL, NULL, 0, 0); + nfsm_stateidtom(nd, &st, NFSSTATEID_PUTSTATEID); + NFSM_BUILD(tl, uint32_t *, NFSX_UNSIGNED * 3); + txdr_hyper(off, tl); + *(tl + 2) = txdr_unsigned(len); + error = newnfs_request(nd, nmp, NULL, &nmp->nm_sockreq, NULL, p, cred, + NFS_PROG, NFS_VER4, NULL, 1, NULL, NULL); + if (error != 0) { + free(nd, M_TEMP); + return (error); + } + if (nd->nd_repstat == 0) { + NFSM_DISSECT(tl, u_int32_t *, NFSX_UNSIGNED); + NFSM_STRSIZ(retlen, len); + if (retlen > 0) { + /* Trim off the pre-data XDR from the mbuf chain. */ + m = nd->nd_mrep; + while (m != NULL && m != nd->nd_md) { + if (m->m_next == nd->nd_md) { + m->m_next = NULL; + m_freem(nd->nd_mrep); + nd->nd_mrep = m = nd->nd_md; + } else + m = m->m_next; + } + if (m == NULL) { + printf("nfsrv_readdsrpc: busted mbuf list\n"); + error = ENOENT; + goto nfsmout; + } + + /* + * Now, adjust first mbuf so that any XDR before the + * read data is skipped over. + */ + trimlen = nd->nd_dpos - mtod(m, char *); + if (trimlen > 0) { + m->m_len -= trimlen; + NFSM_DATAP(m, trimlen); + } + + /* + * Truncate the mbuf chain at retlen bytes of data, + * plus XDR padding that brings the length up to a + * multiple of 4. + */ + tlen = NFSM_RNDUP(retlen); + do { + if (m->m_len >= tlen) { + m->m_len = tlen; + tlen = 0; + m2 = m->m_next; + m->m_next = NULL; + m_freem(m2); + break; + } + tlen -= m->m_len; + m = m->m_next; + } while (m != NULL); + if (tlen > 0) { + printf("nfsrv_readdsrpc: busted mbuf list\n"); + error = ENOENT; + goto nfsmout; + } + *mpp = nd->nd_mrep; + *mpendp = m; + nd->nd_mrep = NULL; + } + } else + error = nd->nd_repstat; +nfsmout: + /* If nd->nd_mrep is already NULL, this is a no-op. */ + m_freem(nd->nd_mrep); + free(nd, M_TEMP); + NFSD_DEBUG(4, "nfsrv_readdsrpc error=%d\n", error); + return (error); +} + +/* + * Do a write RPC on a DS data file, using this structure for the arguments, + * so that this function can be executed by a separate kernel process. + */ +struct nfsrvwritedsdorpc { + int done; + int inprog; + struct task tsk; + fhandle_t fh; + off_t off; + int len; + struct nfsmount *nmp; + struct ucred *cred; + NFSPROC_T *p; + struct mbuf *m; + int err; +}; + +static int +nfsrv_writedsdorpc(struct nfsmount *nmp, fhandle_t *fhp, off_t off, int len, + struct nfsvattr *nap, struct mbuf *m, struct ucred *cred, NFSPROC_T *p) +{ + uint32_t *tl; + struct nfsrv_descript *nd; + nfsattrbit_t attrbits; + nfsv4stateid_t st; + int commit, error, retlen; + + nd = malloc(sizeof(*nd), M_TEMP, M_WAITOK | M_ZERO); + nfscl_reqstart(nd, NFSPROC_WRITE, nmp, (u_int8_t *)fhp, + sizeof(fhandle_t), NULL, NULL, 0, 0); + + /* + * Use a stateid where other is an alternating 01010 pattern and + * seqid is 0xffffffff. This value is not defined as special by + * the RFC and is used by the FreeBSD NFS server to indicate an + * MDS->DS proxy operation. + */ + st.other[0] = 0x55555555; + st.other[1] = 0x55555555; + st.other[2] = 0x55555555; + st.seqid = 0xffffffff; + nfsm_stateidtom(nd, &st, NFSSTATEID_PUTSTATEID); + NFSM_BUILD(tl, u_int32_t *, NFSX_HYPER + 2 * NFSX_UNSIGNED); + txdr_hyper(off, tl); + tl += 2; + /* + * Do all writes FileSync, since the server doesn't hold onto dirty + * buffers. Since clients should be accessing the DS servers directly + * using the pNFS layouts, this just needs to work correctly as a + * fallback. + */ + *tl++ = txdr_unsigned(NFSWRITE_FILESYNC); + *tl = txdr_unsigned(len); + NFSD_DEBUG(4, "nfsrv_writedsdorpc: len=%d\n", len); + + /* Put data in mbuf chain. */ + nd->nd_mb->m_next = m; + + /* Set nd_mb and nd_bpos to end of data. */ + while (m->m_next != NULL) + m = m->m_next; + nd->nd_mb = m; + nd->nd_bpos = mtod(m, char *) + m->m_len; + NFSD_DEBUG(4, "nfsrv_writedsdorpc: lastmb len=%d\n", m->m_len); + + /* Do a Getattr for Size, Change and Modify Time. */ + NFSZERO_ATTRBIT(&attrbits); + NFSSETBIT_ATTRBIT(&attrbits, NFSATTRBIT_SIZE); + NFSSETBIT_ATTRBIT(&attrbits, NFSATTRBIT_CHANGE); + NFSSETBIT_ATTRBIT(&attrbits, NFSATTRBIT_TIMEACCESS); + NFSSETBIT_ATTRBIT(&attrbits, NFSATTRBIT_TIMEMODIFY); + NFSM_BUILD(tl, u_int32_t *, NFSX_UNSIGNED); + *tl = txdr_unsigned(NFSV4OP_GETATTR); + (void) nfsrv_putattrbit(nd, &attrbits); + error = newnfs_request(nd, nmp, NULL, &nmp->nm_sockreq, NULL, p, + cred, NFS_PROG, NFS_VER4, NULL, 1, NULL, NULL); + if (error != 0) { + free(nd, M_TEMP); + return (error); + } + NFSD_DEBUG(4, "nfsrv_writedsdorpc: aft writerpc=%d\n", nd->nd_repstat); + /* Get rid of weak cache consistency data for now. */ + if ((nd->nd_flag & (ND_NOMOREDATA | ND_NFSV4 | ND_V4WCCATTR)) == + (ND_NFSV4 | ND_V4WCCATTR)) { + error = nfsv4_loadattr(nd, NULL, nap, NULL, NULL, 0, NULL, NULL, + NULL, NULL, NULL, 0, NULL, NULL, NULL, NULL, NULL); + NFSD_DEBUG(4, "nfsrv_writedsdorpc: wcc attr=%d\n", error); + if (error != 0) + goto nfsmout; + /* + * Get rid of Op# and status for next op. + */ + NFSM_DISSECT(tl, uint32_t *, 2 * NFSX_UNSIGNED); + if (*++tl != 0) + nd->nd_flag |= ND_NOMOREDATA; + } + if (nd->nd_repstat == 0) { + NFSM_DISSECT(tl, uint32_t *, 2 * NFSX_UNSIGNED + NFSX_VERF); + retlen = fxdr_unsigned(int, *tl++); + commit = fxdr_unsigned(int, *tl); + if (commit != NFSWRITE_FILESYNC) + error = NFSERR_IO; + NFSD_DEBUG(4, "nfsrv_writedsdorpc:retlen=%d commit=%d err=%d\n", + retlen, commit, error); + } else + error = nd->nd_repstat; + /* We have no use for the Write Verifier since we use FileSync. */ + + /* + * Get the Change, Size, Access Time and Modify Time attributes and set + * on the Metadata file, so its attributes will be what the file's + * would be if it had been written. + */ + if (error == 0) { + NFSM_DISSECT(tl, uint32_t *, 2 * NFSX_UNSIGNED); + error = nfsv4_loadattr(nd, NULL, nap, NULL, NULL, 0, NULL, NULL, + NULL, NULL, NULL, 0, NULL, NULL, NULL, NULL, NULL); + } + NFSD_DEBUG(4, "nfsrv_writedsdorpc: aft loadattr=%d\n", error); +nfsmout: + m_freem(nd->nd_mrep); + free(nd, M_TEMP); + NFSD_DEBUG(4, "nfsrv_writedsdorpc error=%d\n", error); + return (error); +} + +/* + * Start up the thread that will execute nfsrv_writedsdorpc(). + */ +static void +start_writedsdorpc(void *arg, int pending) +{ + struct nfsrvwritedsdorpc *drpc; + + drpc = (struct nfsrvwritedsdorpc *)arg; + drpc->err = nfsrv_writedsdorpc(drpc->nmp, &drpc->fh, drpc->off, + drpc->len, NULL, drpc->m, drpc->cred, drpc->p); + drpc->done = 1; + NFSD_DEBUG(4, "start_writedsdorpc: err=%d\n", drpc->err); +} + +static int +nfsrv_writedsrpc(fhandle_t *fhp, off_t off, int len, struct ucred *cred, + NFSPROC_T *p, struct vnode *vp, struct nfsmount **nmpp, int mirrorcnt, + struct mbuf **mpp, char *cp, int *failposp) +{ + struct nfsrvwritedsdorpc *drpc, *tdrpc; + struct nfsvattr na; + struct mbuf *m; + int error, i, offs, ret, timo; + + NFSD_DEBUG(4, "in nfsrv_writedsrpc\n"); + KASSERT(*mpp != NULL, ("nfsrv_writedsrpc: NULL mbuf chain")); + drpc = NULL; + if (mirrorcnt > 1) + tdrpc = drpc = malloc(sizeof(*drpc) * (mirrorcnt - 1), M_TEMP, + M_WAITOK); + + /* Calculate offset in mbuf chain that data starts. */ + offs = cp - mtod(*mpp, char *); + NFSD_DEBUG(4, "nfsrv_writedsrpc: mcopy offs=%d len=%d\n", offs, len); + + /* + * Do the write RPC for every DS, using a separate kernel process + * for every DS except the last one. + */ + error = 0; + for (i = 0; i < mirrorcnt - 1; i++, tdrpc++) { + tdrpc->done = 0; + tdrpc->fh = *fhp; + tdrpc->off = off; + tdrpc->len = len; + tdrpc->nmp = *nmpp; + tdrpc->cred = cred; + tdrpc->p = p; + tdrpc->inprog = 0; + tdrpc->err = 0; + tdrpc->m = m_copym(*mpp, offs, NFSM_RNDUP(len), M_WAITOK); + ret = EIO; + if (nfs_pnfsiothreads != 0) { + ret = nfs_pnfsio(start_writedsdorpc, tdrpc); + NFSD_DEBUG(4, "nfsrv_writedsrpc: nfs_pnfsio=%d\n", + ret); + } + if (ret != 0) { + ret = nfsrv_writedsdorpc(*nmpp, fhp, off, len, NULL, + tdrpc->m, cred, p); + if (nfsds_failerr(ret) && *failposp == -1) + *failposp = i; + else if (error == 0 && ret != 0) + error = ret; + } + nmpp++; + fhp++; + } + m = m_copym(*mpp, offs, NFSM_RNDUP(len), M_WAITOK); + ret = nfsrv_writedsdorpc(*nmpp, fhp, off, len, &na, m, cred, p); + if (nfsds_failerr(ret) && *failposp == -1 && mirrorcnt > 1) + *failposp = mirrorcnt - 1; + else if (error == 0 && ret != 0) + error = ret; + if (error == 0) + error = nfsrv_setextattr(vp, &na, p); + NFSD_DEBUG(4, "nfsrv_writedsrpc: aft setextat=%d\n", error); + tdrpc = drpc; + timo = hz / 50; /* Wait for 20msec. */ + if (timo < 1) + timo = 1; + for (i = 0; i < mirrorcnt - 1; i++, tdrpc++) { + /* Wait for RPCs on separate threads to complete. */ + while (tdrpc->inprog != 0 && tdrpc->done == 0) + tsleep(&tdrpc->tsk, PVFS, "srvwrds", timo); + if (nfsds_failerr(tdrpc->err) && *failposp == -1) + *failposp = i; + else if (error == 0 && tdrpc->err != 0) + error = tdrpc->err; + } + free(drpc, M_TEMP); + return (error); +} + +static int +nfsrv_setattrdsdorpc(fhandle_t *fhp, struct ucred *cred, NFSPROC_T *p, + struct vnode *vp, struct nfsmount *nmp, struct nfsvattr *nap, + struct nfsvattr *dsnap) +{ + uint32_t *tl; + struct nfsrv_descript *nd; + nfsv4stateid_t st; + nfsattrbit_t attrbits; + int error; + + NFSD_DEBUG(4, "in nfsrv_setattrdsdorpc\n"); + nd = malloc(sizeof(*nd), M_TEMP, M_WAITOK | M_ZERO); + /* + * Use a stateid where other is an alternating 01010 pattern and + * seqid is 0xffffffff. This value is not defined as special by + * the RFC and is used by the FreeBSD NFS server to indicate an + * MDS->DS proxy operation. + */ + st.other[0] = 0x55555555; + st.other[1] = 0x55555555; + st.other[2] = 0x55555555; + st.seqid = 0xffffffff; + nfscl_reqstart(nd, NFSPROC_SETATTR, nmp, (u_int8_t *)fhp, sizeof(*fhp), + NULL, NULL, 0, 0); + nfsm_stateidtom(nd, &st, NFSSTATEID_PUTSTATEID); + nfscl_fillsattr(nd, &nap->na_vattr, vp, NFSSATTR_FULL, 0); + + /* Do a Getattr for Size, Change, Access Time and Modify Time. */ + NFSZERO_ATTRBIT(&attrbits); + NFSSETBIT_ATTRBIT(&attrbits, NFSATTRBIT_SIZE); + NFSSETBIT_ATTRBIT(&attrbits, NFSATTRBIT_CHANGE); + NFSSETBIT_ATTRBIT(&attrbits, NFSATTRBIT_TIMEACCESS); + NFSSETBIT_ATTRBIT(&attrbits, NFSATTRBIT_TIMEMODIFY); + NFSM_BUILD(tl, u_int32_t *, NFSX_UNSIGNED); + *tl = txdr_unsigned(NFSV4OP_GETATTR); + (void) nfsrv_putattrbit(nd, &attrbits); + error = newnfs_request(nd, nmp, NULL, &nmp->nm_sockreq, NULL, p, cred, + NFS_PROG, NFS_VER4, NULL, 1, NULL, NULL); + if (error != 0) { + free(nd, M_TEMP); + return (error); + } + NFSD_DEBUG(4, "nfsrv_setattrdsdorpc: aft setattrrpc=%d\n", + nd->nd_repstat); + /* Get rid of weak cache consistency data for now. */ + if ((nd->nd_flag & (ND_NOMOREDATA | ND_NFSV4 | ND_V4WCCATTR)) == + (ND_NFSV4 | ND_V4WCCATTR)) { + error = nfsv4_loadattr(nd, NULL, dsnap, NULL, NULL, 0, NULL, + NULL, NULL, NULL, NULL, 0, NULL, NULL, NULL, NULL, NULL); + NFSD_DEBUG(4, "nfsrv_setattrdsdorpc: wcc attr=%d\n", error); + if (error != 0) + goto nfsmout; + /* + * Get rid of Op# and status for next op. + */ + NFSM_DISSECT(tl, uint32_t *, 2 * NFSX_UNSIGNED); + if (*++tl != 0) + nd->nd_flag |= ND_NOMOREDATA; + } + error = nfsrv_getattrbits(nd, &attrbits, NULL, NULL); + if (error != 0) + goto nfsmout; + if (nd->nd_repstat != 0) + error = nd->nd_repstat; + /* + * Get the Change, Size, Access Time and Modify Time attributes and set + * on the Metadata file, so its attributes will be what the file's + * would be if it had been written. + */ + if (error == 0) { + NFSM_DISSECT(tl, uint32_t *, 2 * NFSX_UNSIGNED); + error = nfsv4_loadattr(nd, NULL, dsnap, NULL, NULL, 0, NULL, + NULL, NULL, NULL, NULL, 0, NULL, NULL, NULL, NULL, NULL); + } + NFSD_DEBUG(4, "nfsrv_setattrdsdorpc: aft setattr loadattr=%d\n", error); +nfsmout: + m_freem(nd->nd_mrep); + free(nd, M_TEMP); + NFSD_DEBUG(4, "nfsrv_setattrdsdorpc error=%d\n", error); + return (error); +} + +struct nfsrvsetattrdsdorpc { + int done; + int inprog; + struct task tsk; + fhandle_t fh; + struct nfsmount *nmp; + struct vnode *vp; + struct ucred *cred; + NFSPROC_T *p; + struct nfsvattr na; + struct nfsvattr dsna; + int err; +}; + +/* + * Start up the thread that will execute nfsrv_setattrdsdorpc(). + */ +static void +start_setattrdsdorpc(void *arg, int pending) +{ + struct nfsrvsetattrdsdorpc *drpc; + + drpc = (struct nfsrvsetattrdsdorpc *)arg; + drpc->err = nfsrv_setattrdsdorpc(&drpc->fh, drpc->cred, drpc->p, + drpc->vp, drpc->nmp, &drpc->na, &drpc->dsna); + drpc->done = 1; +} + +static int +nfsrv_setattrdsrpc(fhandle_t *fhp, struct ucred *cred, NFSPROC_T *p, + struct vnode *vp, struct nfsmount **nmpp, int mirrorcnt, + struct nfsvattr *nap, int *failposp) +{ + struct nfsrvsetattrdsdorpc *drpc, *tdrpc; + struct nfsvattr na; + int error, i, ret, timo; + + NFSD_DEBUG(4, "in nfsrv_setattrdsrpc\n"); + drpc = NULL; + if (mirrorcnt > 1) + tdrpc = drpc = malloc(sizeof(*drpc) * (mirrorcnt - 1), M_TEMP, + M_WAITOK); + + /* + * Do the setattr RPC for every DS, using a separate kernel process + * for every DS except the last one. + */ + error = 0; + for (i = 0; i < mirrorcnt - 1; i++, tdrpc++) { + tdrpc->done = 0; + tdrpc->inprog = 0; + tdrpc->fh = *fhp; + tdrpc->nmp = *nmpp; + tdrpc->vp = vp; + tdrpc->cred = cred; + tdrpc->p = p; + tdrpc->na = *nap; + tdrpc->err = 0; + ret = EIO; + if (nfs_pnfsiothreads != 0) { + ret = nfs_pnfsio(start_setattrdsdorpc, tdrpc); + NFSD_DEBUG(4, "nfsrv_setattrdsrpc: nfs_pnfsio=%d\n", + ret); + } + if (ret != 0) { + ret = nfsrv_setattrdsdorpc(fhp, cred, p, vp, *nmpp, nap, + &na); + if (nfsds_failerr(ret) && *failposp == -1) + *failposp = i; + else if (error == 0 && ret != 0) + error = ret; + } + nmpp++; + fhp++; + } + ret = nfsrv_setattrdsdorpc(fhp, cred, p, vp, *nmpp, nap, &na); + if (nfsds_failerr(ret) && *failposp == -1 && mirrorcnt > 1) + *failposp = mirrorcnt - 1; + else if (error == 0 && ret != 0) + error = ret; + if (error == 0) + error = nfsrv_setextattr(vp, &na, p); + NFSD_DEBUG(4, "nfsrv_setattrdsrpc: aft setextat=%d\n", error); + tdrpc = drpc; + timo = hz / 50; /* Wait for 20msec. */ + if (timo < 1) + timo = 1; + for (i = 0; i < mirrorcnt - 1; i++, tdrpc++) { + /* Wait for RPCs on separate threads to complete. */ + while (tdrpc->inprog != 0 && tdrpc->done == 0) + tsleep(&tdrpc->tsk, PVFS, "srvsads", timo); + if (nfsds_failerr(tdrpc->err) && *failposp == -1) + *failposp = i; + else if (error == 0 && tdrpc->err != 0) + error = tdrpc->err; + } + free(drpc, M_TEMP); + return (error); +} + +/* + * Do a Setattr of an NFSv4 ACL on the DS file. + */ +static int +nfsrv_setacldsdorpc(fhandle_t *fhp, struct ucred *cred, NFSPROC_T *p, + struct vnode *vp, struct nfsmount *nmp, struct acl *aclp) +{ + struct nfsrv_descript *nd; + nfsv4stateid_t st; + nfsattrbit_t attrbits; + int error; + + NFSD_DEBUG(4, "in nfsrv_setacldsdorpc\n"); + nd = malloc(sizeof(*nd), M_TEMP, M_WAITOK | M_ZERO); + /* + * Use a stateid where other is an alternating 01010 pattern and + * seqid is 0xffffffff. This value is not defined as special by + * the RFC and is used by the FreeBSD NFS server to indicate an + * MDS->DS proxy operation. + */ + st.other[0] = 0x55555555; + st.other[1] = 0x55555555; + st.other[2] = 0x55555555; + st.seqid = 0xffffffff; + nfscl_reqstart(nd, NFSPROC_SETACL, nmp, (u_int8_t *)fhp, sizeof(*fhp), + NULL, NULL, 0, 0); + nfsm_stateidtom(nd, &st, NFSSTATEID_PUTSTATEID); + NFSZERO_ATTRBIT(&attrbits); + NFSSETBIT_ATTRBIT(&attrbits, NFSATTRBIT_ACL); + /* + * The "vp" argument to nfsv4_fillattr() is only used for vnode_type(), + * so passing in the metadata "vp" will be ok, since it is of + * the same type (VREG). + */ + nfsv4_fillattr(nd, NULL, vp, aclp, NULL, NULL, 0, &attrbits, NULL, + NULL, 0, 0, 0, 0, 0, NULL); + error = newnfs_request(nd, nmp, NULL, &nmp->nm_sockreq, NULL, p, cred, + NFS_PROG, NFS_VER4, NULL, 1, NULL, NULL); + if (error != 0) { + free(nd, M_TEMP); + return (error); + } + NFSD_DEBUG(4, "nfsrv_setacldsdorpc: aft setaclrpc=%d\n", + nd->nd_repstat); + error = nd->nd_repstat; + m_freem(nd->nd_mrep); + free(nd, M_TEMP); + return (error); +} + +struct nfsrvsetacldsdorpc { + int done; + int inprog; + struct task tsk; + fhandle_t fh; + struct nfsmount *nmp; + struct vnode *vp; + struct ucred *cred; + NFSPROC_T *p; + struct acl *aclp; + int err; +}; + +/* + * Start up the thread that will execute nfsrv_setacldsdorpc(). + */ +static void +start_setacldsdorpc(void *arg, int pending) +{ + struct nfsrvsetacldsdorpc *drpc; + + drpc = (struct nfsrvsetacldsdorpc *)arg; + drpc->err = nfsrv_setacldsdorpc(&drpc->fh, drpc->cred, drpc->p, + drpc->vp, drpc->nmp, drpc->aclp); + drpc->done = 1; +} + +static int +nfsrv_setacldsrpc(fhandle_t *fhp, struct ucred *cred, NFSPROC_T *p, + struct vnode *vp, struct nfsmount **nmpp, int mirrorcnt, struct acl *aclp, + int *failposp) +{ + struct nfsrvsetacldsdorpc *drpc, *tdrpc; + int error, i, ret, timo; + + NFSD_DEBUG(4, "in nfsrv_setacldsrpc\n"); + drpc = NULL; + if (mirrorcnt > 1) + tdrpc = drpc = malloc(sizeof(*drpc) * (mirrorcnt - 1), M_TEMP, + M_WAITOK); + + /* + * Do the setattr RPC for every DS, using a separate kernel process + * for every DS except the last one. + */ + error = 0; + for (i = 0; i < mirrorcnt - 1; i++, tdrpc++) { + tdrpc->done = 0; + tdrpc->inprog = 0; + tdrpc->fh = *fhp; + tdrpc->nmp = *nmpp; + tdrpc->vp = vp; + tdrpc->cred = cred; + tdrpc->p = p; + tdrpc->aclp = aclp; + tdrpc->err = 0; + ret = EIO; + if (nfs_pnfsiothreads != 0) { + ret = nfs_pnfsio(start_setacldsdorpc, tdrpc); + NFSD_DEBUG(4, "nfsrv_setacldsrpc: nfs_pnfsio=%d\n", + ret); + } + if (ret != 0) { + ret = nfsrv_setacldsdorpc(fhp, cred, p, vp, *nmpp, + aclp); + if (nfsds_failerr(ret) && *failposp == -1) + *failposp = i; + else if (error == 0 && ret != 0) + error = ret; + } + nmpp++; + fhp++; + } + ret = nfsrv_setacldsdorpc(fhp, cred, p, vp, *nmpp, aclp); + if (nfsds_failerr(ret) && *failposp == -1 && mirrorcnt > 1) + *failposp = mirrorcnt - 1; + else if (error == 0 && ret != 0) + error = ret; + NFSD_DEBUG(4, "nfsrv_setacldsrpc: aft setextat=%d\n", error); + tdrpc = drpc; + timo = hz / 50; /* Wait for 20msec. */ + if (timo < 1) + timo = 1; + for (i = 0; i < mirrorcnt - 1; i++, tdrpc++) { + /* Wait for RPCs on separate threads to complete. */ + while (tdrpc->inprog != 0 && tdrpc->done == 0) + tsleep(&tdrpc->tsk, PVFS, "srvacds", timo); + if (nfsds_failerr(tdrpc->err) && *failposp == -1) + *failposp = i; + else if (error == 0 && tdrpc->err != 0) + error = tdrpc->err; + } + free(drpc, M_TEMP); + return (error); +} + +/* + * Getattr call to the DS for the Modify, Size and Change attributes. + */ +static int +nfsrv_getattrdsrpc(fhandle_t *fhp, struct ucred *cred, NFSPROC_T *p, + struct vnode *vp, struct nfsmount *nmp, struct nfsvattr *nap) +{ + struct nfsrv_descript *nd; + int error; + nfsattrbit_t attrbits; + + NFSD_DEBUG(4, "in nfsrv_getattrdsrpc\n"); + nd = malloc(sizeof(*nd), M_TEMP, M_WAITOK | M_ZERO); + nfscl_reqstart(nd, NFSPROC_GETATTR, nmp, (u_int8_t *)fhp, + sizeof(fhandle_t), NULL, NULL, 0, 0); + NFSZERO_ATTRBIT(&attrbits); + NFSSETBIT_ATTRBIT(&attrbits, NFSATTRBIT_SIZE); + NFSSETBIT_ATTRBIT(&attrbits, NFSATTRBIT_CHANGE); + NFSSETBIT_ATTRBIT(&attrbits, NFSATTRBIT_TIMEACCESS); + NFSSETBIT_ATTRBIT(&attrbits, NFSATTRBIT_TIMEMODIFY); + (void) nfsrv_putattrbit(nd, &attrbits); + error = newnfs_request(nd, nmp, NULL, &nmp->nm_sockreq, NULL, p, cred, + NFS_PROG, NFS_VER4, NULL, 1, NULL, NULL); + if (error != 0) { + free(nd, M_TEMP); + return (error); + } + NFSD_DEBUG(4, "nfsrv_getattrdsrpc: aft getattrrpc=%d\n", + nd->nd_repstat); + if (nd->nd_repstat == 0) { + error = nfsv4_loadattr(nd, NULL, nap, NULL, NULL, 0, + NULL, NULL, NULL, NULL, NULL, 0, NULL, NULL, NULL, + NULL, NULL); + /* + * We can only save the updated values in the extended + * attribute if the vp is exclusively locked. + * This should happen when any of the following operations + * occur on the vnode: + * Close, Delegreturn, LayoutCommit, LayoutReturn + * As such, the updated extended attribute should get saved + * before nfsrv_checkdsattr() returns 0 and allows the cached + * attributes to be returned without calling this function. + */ + if (error == 0 && VOP_ISLOCKED(vp) == LK_EXCLUSIVE) { + error = nfsrv_setextattr(vp, nap, p); + NFSD_DEBUG(4, "nfsrv_getattrdsrpc: aft setextat=%d\n", + error); + } + } else + error = nd->nd_repstat; + m_freem(nd->nd_mrep); + free(nd, M_TEMP); + NFSD_DEBUG(4, "nfsrv_getattrdsrpc error=%d\n", error); + return (error); +} + +/* + * Get the device id and file handle for a DS file. + */ +int +nfsrv_dsgetdevandfh(struct vnode *vp, NFSPROC_T *p, int *mirrorcntp, + fhandle_t *fhp, char *devid) +{ + int buflen, error; + char *buf; + + buflen = 1024; + buf = malloc(buflen, M_TEMP, M_WAITOK); + error = nfsrv_dsgetsockmnt(vp, 0, buf, &buflen, mirrorcntp, p, NULL, + fhp, devid, NULL, NULL, NULL, NULL, NULL, NULL); + free(buf, M_TEMP); + return (error); +} + +/* + * Do a Lookup against the DS for the filename. + */ +static int +nfsrv_pnfslookupds(struct vnode *vp, struct vnode *dvp, struct pnfsdsfile *pf, + struct vnode **nvpp, NFSPROC_T *p) +{ + struct nameidata named; + struct ucred *tcred; + char *bufp; + u_long *hashp; + struct vnode *nvp; + int error; + + tcred = newnfs_getcred(); + named.ni_cnd.cn_nameiop = LOOKUP; + named.ni_cnd.cn_lkflags = LK_SHARED | LK_RETRY; + named.ni_cnd.cn_cred = tcred; + named.ni_cnd.cn_thread = p; + named.ni_cnd.cn_flags = ISLASTCN | LOCKPARENT | LOCKLEAF | SAVENAME; + nfsvno_setpathbuf(&named, &bufp, &hashp); + named.ni_cnd.cn_nameptr = bufp; + named.ni_cnd.cn_namelen = strlen(pf->dsf_filename); + strlcpy(bufp, pf->dsf_filename, NAME_MAX); + NFSD_DEBUG(4, "nfsrv_pnfslookupds: filename=%s\n", bufp); + error = VOP_LOOKUP(dvp, &nvp, &named.ni_cnd); + NFSD_DEBUG(4, "nfsrv_pnfslookupds: aft LOOKUP=%d\n", error); + NFSFREECRED(tcred); + nfsvno_relpathbuf(&named); + if (error == 0) + *nvpp = nvp; + NFSD_DEBUG(4, "eo nfsrv_pnfslookupds=%d\n", error); + return (error); +} + +/* + * Set the file handle to the correct one. + */ +static void +nfsrv_pnfssetfh(struct vnode *vp, struct pnfsdsfile *pf, struct vnode *nvp, + NFSPROC_T *p) +{ + struct mount *mp; + struct nfsnode *np; + int ret; + + np = VTONFS(nvp); + NFSBCOPY(np->n_fhp->nfh_fh, &pf->dsf_fh, NFSX_MYFH); + /* + * We can only do a setextattr for an exclusively + * locked vp. Instead of trying to upgrade a shared + * lock, just leave dsf_fh zeroed out and it will + * keep doing this lookup until it is done with an + * exclusively locked vp. + */ + if (NFSVOPISLOCKED(vp) == LK_EXCLUSIVE) { + ret = vn_start_write(vp, &mp, V_WAIT); + NFSD_DEBUG(4, "nfsrv_pnfssetfh: vn_start_write=%d\n", + ret); + if (ret == 0) { + ret = vn_extattr_set(vp, IO_NODELOCKED, + EXTATTR_NAMESPACE_SYSTEM, "pnfsd.dsfile", + sizeof(*pf), (char *)pf, p); + vn_finished_write(mp); + NFSD_DEBUG(4, "nfsrv_pnfslookupds: aft " + "vn_extattr_set=%d\n", ret); + } + } + NFSD_DEBUG(4, "eo nfsrv_pnfssetfh=%d\n", ret); +} + +/* + * Cause RPCs waiting on "nmp" to fail. This is called for a DS mount point + * when the DS has failed. + */ +void +nfsrv_killrpcs(struct nfsmount *nmp) +{ + + /* + * Call newnfs_nmcancelreqs() to cause + * any RPCs in progress on the mount point to + * fail. + * This will cause any process waiting for an + * RPC to complete while holding a vnode lock + * on the mounted-on vnode (such as "df" or + * a non-forced "umount") to fail. + * This will unlock the mounted-on vnode so + * a forced dismount can succeed. + * The NFSMNTP_CANCELRPCS flag should be set when this function is + * called. + */ + newnfs_nmcancelreqs(nmp); +} + +/* + * Sum up the statfs info for each of the DSs, so that the client will + * receive the total for all DSs. + */ +static int +nfsrv_pnfsstatfs(struct statfs *sf) +{ + struct statfs *tsf; + struct nfsdevice *ds; + struct vnode **dvpp, **tdvpp, *dvp; + uint64_t tot; + int cnt, error = 0, i; + + if (nfsrv_devidcnt <= 0) + return (ENXIO); + dvpp = mallocarray(nfsrv_devidcnt, sizeof(*dvpp), M_TEMP, M_WAITOK); + tsf = malloc(sizeof(*tsf), M_TEMP, M_WAITOK); + + /* Get an array of the dvps for the DSs. */ + tdvpp = dvpp; + i = 0; + NFSDDSLOCK(); + TAILQ_FOREACH(ds, &nfsrv_devidhead, nfsdev_list) { + if (ds->nfsdev_nmp != NULL) { + if (++i > nfsrv_devidcnt) + break; + *tdvpp++ = ds->nfsdev_dvp; + } + } + NFSDDSUNLOCK(); + cnt = i; + + /* Do a VFS_STATFS() for each of the DSs and sum them up. */ + tdvpp = dvpp; + for (i = 0; i < cnt && error == 0; i++) { + dvp = *tdvpp++; + error = VFS_STATFS(dvp->v_mount, tsf); + if (error == 0) { + if (sf->f_bsize == 0) { + if (tsf->f_bsize > 0) + sf->f_bsize = tsf->f_bsize; + else + sf->f_bsize = 8192; + } + if (tsf->f_blocks > 0) { + if (sf->f_bsize != tsf->f_bsize) { + tot = tsf->f_blocks * tsf->f_bsize; + sf->f_blocks += (tot / sf->f_bsize); + } else + sf->f_blocks += tsf->f_blocks; + } + if (tsf->f_bfree > 0) { + if (sf->f_bsize != tsf->f_bsize) { + tot = tsf->f_bfree * tsf->f_bsize; + sf->f_bfree += (tot / sf->f_bsize); + } else + sf->f_bfree += tsf->f_bfree; + } + if (tsf->f_bavail > 0) { + if (sf->f_bsize != tsf->f_bsize) { + tot = tsf->f_bavail * tsf->f_bsize; + sf->f_bavail += (tot / sf->f_bsize); + } else + sf->f_bavail += tsf->f_bavail; + } + } + } + free(tsf, M_TEMP); + free(dvpp, M_TEMP); + return (error); +} + +/* + * Set an NFSv4 acl. + */ +int +nfsrv_setacl(struct vnode *vp, NFSACL_T *aclp, struct ucred *cred, NFSPROC_T *p) +{ + int error; + + if (nfsrv_useacl == 0 || nfs_supportsnfsv4acls(vp) == 0) { + error = NFSERR_ATTRNOTSUPP; + goto out; + } + /* + * With NFSv4 ACLs, chmod(2) may need to add additional entries. + * Make sure it has enough room for that - splitting every entry + * into two and appending "canonical six" entries at the end. + * Cribbed out of kern/vfs_acl.c - Rick M. + */ + if (aclp->acl_cnt > (ACL_MAX_ENTRIES - 6) / 2) { + error = NFSERR_ATTRNOTSUPP; + goto out; + } + error = VOP_SETACL(vp, ACL_TYPE_NFS4, aclp, cred, p); + if (error == 0) { + error = nfsrv_dssetacl(vp, aclp, cred, p); + if (error == ENOENT) + error = 0; + } + +out: + NFSEXITCODE(error); + return (error); +} + extern int (*nfsd_call_nfsd)(struct thread *, struct nfssvc_args *); /* @@ -3360,6 +5612,8 @@ nfsd_modevent(module_t mod, int type, void *data) mtx_init(&nfsrc_udpmtx, "nfsuc", NULL, MTX_DEF); mtx_init(&nfs_v4root_mutex, "nfs4rt", NULL, MTX_DEF); mtx_init(&nfsv4root_mnt.mnt_mtx, "nfs4mnt", NULL, MTX_DEF); + mtx_init(&nfsrv_dontlistlock_mtx, "nfs4dnl", NULL, MTX_DEF); + mtx_init(&nfsrv_recalllock_mtx, "nfs4rec", NULL, MTX_DEF); lockinit(&nfsv4root_mnt.mnt_explock, PVFS, "explock", 0, 0); nfsrvd_initcache(); nfsd_init(); @@ -3407,8 +5661,15 @@ nfsd_modevent(module_t mod, int type, void *data) mtx_destroy(&nfsrc_udpmtx); mtx_destroy(&nfs_v4root_mutex); mtx_destroy(&nfsv4root_mnt.mnt_mtx); + mtx_destroy(&nfsrv_dontlistlock_mtx); + mtx_destroy(&nfsrv_recalllock_mtx); for (i = 0; i < nfsrv_sessionhashsize; i++) mtx_destroy(&nfssessionhash[i].mtx); + if (nfslayouthash != NULL) { + for (i = 0; i < nfsrv_layouthashsize; i++) + mtx_destroy(&nfslayouthash[i].mtx); + free(nfslayouthash, M_NFSDSESSION); + } lockdestroy(&nfsv4root_mnt.mnt_explock); free(nfsclienthash, M_NFSDCLIENT); free(nfslockhash, M_NFSDLOCKFILE); diff --git a/sys/fs/nfsserver/nfs_nfsdserv.c b/sys/fs/nfsserver/nfs_nfsdserv.c index f1f6f52a550b..6a478ea012de 100644 --- a/sys/fs/nfsserver/nfs_nfsdserv.c +++ b/sys/fs/nfsserver/nfs_nfsdserv.c @@ -56,12 +56,22 @@ extern struct timeval nfsboottime; extern int nfs_rootfhset; extern int nfsrv_enable_crossmntpt; extern int nfsrv_statehashsize; +extern int nfsrv_layouthashsize; +extern time_t nfsdev_time; +extern volatile int nfsrv_devidcnt; +extern int nfsd_debuglevel; +extern u_long sb_max_adj; +extern int nfsrv_pnfsatime; +extern int nfsrv_maxpnfsmirror; #endif /* !APPLEKEXT */ static int nfs_async = 0; SYSCTL_DECL(_vfs_nfsd); SYSCTL_INT(_vfs_nfsd, OID_AUTO, async, CTLFLAG_RW, &nfs_async, 0, "Tell client that writes were synced even though they were not"); +extern int nfsrv_doflexfile; +SYSCTL_INT(_vfs_nfsd, OID_AUTO, default_flexfile, CTLFLAG_RW, + &nfsrv_doflexfile, 0, "Make Flex File Layout the default for pNFS"); /* * This list defines the GSS mechanisms supported. @@ -153,7 +163,7 @@ nfsrvd_access(struct nfsrv_descript *nd, __unused int isdgram, } nfsmode &= supported; if (nd->nd_flag & ND_NFSV3) { - getret = nfsvno_getattr(vp, &nva, nd->nd_cred, p, 1); + getret = nfsvno_getattr(vp, &nva, nd, p, 1, NULL); nfsrv_postopattr(nd, getret, &nva); } vput(vp); @@ -237,14 +247,14 @@ nfsrvd_getattr(struct nfsrv_descript *nd, int isdgram, } } if (!nd->nd_repstat) - nd->nd_repstat = nfsvno_getattr(vp, &nva, nd->nd_cred, p, 1); + nd->nd_repstat = nfsvno_getattr(vp, &nva, nd, p, 1, &attrbits); if (!nd->nd_repstat) { if (nd->nd_flag & ND_NFSV4) { if (NFSISSET_ATTRBIT(&attrbits, NFSATTRBIT_FILEHANDLE)) nd->nd_repstat = nfsvno_getfh(vp, &fh, p); if (!nd->nd_repstat) nd->nd_repstat = nfsrv_checkgetattr(nd, vp, - &nva, &attrbits, nd->nd_cred, p); + &nva, &attrbits, p); if (nd->nd_repstat == 0) { supports_nfsv4acls = nfs_supportsnfsv4acls(vp); mp = vp->v_mount; @@ -309,6 +319,7 @@ nfsrvd_setattr(struct nfsrv_descript *nd, __unused int isdgram, struct nfsvattr nva, nva2; u_int32_t *tl; int preat_ret = 1, postat_ret = 1, gcheck = 0, error = 0; + int gotproxystateid; struct timespec guard = { 0, 0 }; nfsattrbit_t attrbits, retbits; nfsv4stateid_t stateid; @@ -322,19 +333,32 @@ nfsrvd_setattr(struct nfsrv_descript *nd, __unused int isdgram, aclp = acl_alloc(M_WAITOK); aclp->acl_cnt = 0; #endif + gotproxystateid = 0; NFSVNO_ATTRINIT(&nva); - NFSZERO_ATTRBIT(&retbits); if (nd->nd_flag & ND_NFSV4) { NFSM_DISSECT(tl, u_int32_t *, NFSX_STATEID); stateid.seqid = fxdr_unsigned(u_int32_t, *tl++); - NFSBCOPY((caddr_t)tl,(caddr_t)stateid.other,NFSX_STATEIDOTHER); + stateid.other[0] = *tl++; + stateid.other[1] = *tl++; + stateid.other[2] = *tl; + if (stateid.other[0] == 0x55555555 && + stateid.other[1] == 0x55555555 && + stateid.other[2] == 0x55555555 && + stateid.seqid == 0xffffffff) + gotproxystateid = 1; } error = nfsrv_sattr(nd, vp, &nva, &attrbits, aclp, p); if (error) goto nfsmout; - preat_ret = nfsvno_getattr(vp, &nva2, nd->nd_cred, p, 1); + + /* For NFSv4, only va_uid is used from nva2. */ + NFSZERO_ATTRBIT(&retbits); + NFSSETBIT_ATTRBIT(&retbits, NFSATTRBIT_OWNER); + preat_ret = nfsvno_getattr(vp, &nva2, nd, p, 1, &retbits); if (!nd->nd_repstat) nd->nd_repstat = preat_ret; + + NFSZERO_ATTRBIT(&retbits); if (nd->nd_flag & ND_NFSV3) { NFSM_DISSECT(tl, u_int32_t *, NFSX_UNSIGNED); gcheck = fxdr_unsigned(int, *tl); @@ -378,7 +402,12 @@ nfsrvd_setattr(struct nfsrv_descript *nd, __unused int isdgram, NFSACCCHK_VPISLOCKED, NULL); } } - if (!nd->nd_repstat && (nd->nd_flag & ND_NFSV4)) + /* + * Proxy operations from the MDS are allowed via the all 0s special + * stateid. + */ + if (nd->nd_repstat == 0 && (nd->nd_flag & ND_NFSV4) != 0 && + gotproxystateid == 0) nd->nd_repstat = nfsrv_checksetattr(vp, nd, &stateid, &nva, &attrbits, exp, p); @@ -452,7 +481,7 @@ nfsrvd_setattr(struct nfsrv_descript *nd, __unused int isdgram, exp); } if (nd->nd_flag & (ND_NFSV2 | ND_NFSV3)) { - postat_ret = nfsvno_getattr(vp, &nva, nd->nd_cred, p, 1); + postat_ret = nfsvno_getattr(vp, &nva, nd, p, 1, NULL); if (!nd->nd_repstat) nd->nd_repstat = postat_ret; } @@ -536,8 +565,8 @@ nfsrvd_lookup(struct nfsrv_descript *nd, __unused int isdgram, if (nd->nd_repstat) { if (dirp) { if (nd->nd_flag & ND_NFSV3) - dattr_ret = nfsvno_getattr(dirp, &dattr, - nd->nd_cred, p, 0); + dattr_ret = nfsvno_getattr(dirp, &dattr, nd, p, + 0, NULL); vrele(dirp); } if (nd->nd_flag & ND_NFSV3) @@ -558,15 +587,15 @@ nfsrvd_lookup(struct nfsrv_descript *nd, __unused int isdgram, if (nd->nd_repstat == 0) nd->nd_repstat = nfsvno_getfh(vp, fhp, p); if (!(nd->nd_flag & ND_NFSV4) && !nd->nd_repstat) - nd->nd_repstat = nfsvno_getattr(vp, &nva, nd->nd_cred, p, 1); + nd->nd_repstat = nfsvno_getattr(vp, &nva, nd, p, 1, NULL); if (vpp != NULL && nd->nd_repstat == 0) *vpp = vp; else vput(vp); if (dirp) { if (nd->nd_flag & ND_NFSV3) - dattr_ret = nfsvno_getattr(dirp, &dattr, nd->nd_cred, - p, 0); + dattr_ret = nfsvno_getattr(dirp, &dattr, nd, p, 0, + NULL); vrele(dirp); } if (nd->nd_repstat) { @@ -614,7 +643,7 @@ nfsrvd_readlink(struct nfsrv_descript *nd, __unused int isdgram, nd->nd_repstat = nfsvno_readlink(vp, nd->nd_cred, p, &mp, &mpend, &len); if (nd->nd_flag & ND_NFSV3) - getret = nfsvno_getattr(vp, &nva, nd->nd_cred, p, 1); + getret = nfsvno_getattr(vp, &nva, nd, p, 1, NULL); vput(vp); if (nd->nd_flag & ND_NFSV3) nfsrv_postopattr(nd, getret, &nva); @@ -639,7 +668,7 @@ nfsrvd_read(struct nfsrv_descript *nd, __unused int isdgram, vnode_t vp, NFSPROC_T *p, struct nfsexstuff *exp) { u_int32_t *tl; - int error = 0, cnt, getret = 1, reqlen, eof = 0; + int error = 0, cnt, getret = 1, gotproxystateid, reqlen, eof = 0; mbuf_t m2, m3; struct nfsvattr nva; off_t off = 0x0; @@ -671,6 +700,7 @@ nfsrvd_read(struct nfsrv_descript *nd, __unused int isdgram, error = EBADRPC; goto nfsmout; } + gotproxystateid = 0; if (nd->nd_flag & ND_NFSV4) { stp->ls_flags = (NFSLCK_CHECK | NFSLCK_READACCESS); lop->lo_flags = NFSLCK_READ; @@ -692,6 +722,24 @@ nfsrvd_read(struct nfsrv_descript *nd, __unused int isdgram, nd->nd_clientid.qval = clientid.qval; } stp->ls_stateid.other[2] = *tl++; + /* + * Don't allow the client to use a special stateid for a DS op. + */ + if ((nd->nd_flag & ND_DSSERVER) != 0 && + ((stp->ls_stateid.other[0] == 0x0 && + stp->ls_stateid.other[1] == 0x0 && + stp->ls_stateid.other[2] == 0x0) || + (stp->ls_stateid.other[0] == 0xffffffff && + stp->ls_stateid.other[1] == 0xffffffff && + stp->ls_stateid.other[2] == 0xffffffff) || + stp->ls_stateid.seqid != 0)) + nd->nd_repstat = NFSERR_BADSTATEID; + /* However, allow the proxy stateid. */ + if (stp->ls_stateid.seqid == 0xffffffff && + stp->ls_stateid.other[0] == 0x55555555 && + stp->ls_stateid.other[1] == 0x55555555 && + stp->ls_stateid.other[2] == 0x55555555) + gotproxystateid = 1; off = fxdr_hyper(tl); lop->lo_first = off; tl += 2; @@ -709,7 +757,7 @@ nfsrvd_read(struct nfsrv_descript *nd, __unused int isdgram, nd->nd_repstat = (vnode_vtype(vp) == VDIR) ? EISDIR : EINVAL; } - getret = nfsvno_getattr(vp, &nva, nd->nd_cred, p, 1); + getret = nfsvno_getattr(vp, &nva, nd, p, 1, NULL); if (!nd->nd_repstat) nd->nd_repstat = getret; if (!nd->nd_repstat && @@ -723,7 +771,12 @@ nfsrvd_read(struct nfsrv_descript *nd, __unused int isdgram, nd->nd_cred, exp, p, NFSACCCHK_ALLOWOWNER, NFSACCCHK_VPISLOCKED, NULL); } - if ((nd->nd_flag & ND_NFSV4) && !nd->nd_repstat) + /* + * DS reads are marked by ND_DSSERVER or use the proxy special + * stateid. + */ + if (nd->nd_repstat == 0 && (nd->nd_flag & (ND_NFSV4 | ND_DSSERVER)) == + ND_NFSV4 && gotproxystateid == 0) nd->nd_repstat = nfsrv_lockctrl(vp, &stp, &lop, NULL, clientid, &stateid, exp, nd, p); if (nd->nd_repstat) { @@ -747,7 +800,7 @@ nfsrvd_read(struct nfsrv_descript *nd, __unused int isdgram, nd->nd_repstat = nfsvno_read(vp, off, cnt, nd->nd_cred, p, &m3, &m2); if (!(nd->nd_flag & ND_NFSV4)) { - getret = nfsvno_getattr(vp, &nva, nd->nd_cred, p, 1); + getret = nfsvno_getattr(vp, &nva, nd, p, 1, NULL); if (!nd->nd_repstat) nd->nd_repstat = getret; } @@ -804,17 +857,19 @@ nfsrvd_write(struct nfsrv_descript *nd, __unused int isdgram, mbuf_t mp; struct nfsvattr nva, forat; int aftat_ret = 1, retlen, len, error = 0, forat_ret = 1; - int stable = NFSWRITE_FILESYNC; + int gotproxystateid, stable = NFSWRITE_FILESYNC; off_t off; struct nfsstate st, *stp = &st; struct nfslock lo, *lop = &lo; nfsv4stateid_t stateid; nfsquad_t clientid; + nfsattrbit_t attrbits; if (nd->nd_repstat) { nfsrv_wcc(nd, forat_ret, &forat, aftat_ret, &nva); goto out; } + gotproxystateid = 0; if (nd->nd_flag & ND_NFSV2) { NFSM_DISSECT(tl, u_int32_t *, 4 * NFSX_UNSIGNED); off = (off_t)fxdr_unsigned(u_int32_t, *++tl); @@ -848,6 +903,24 @@ nfsrvd_write(struct nfsrv_descript *nd, __unused int isdgram, nd->nd_clientid.qval = clientid.qval; } stp->ls_stateid.other[2] = *tl++; + /* + * Don't allow the client to use a special stateid for a DS op. + */ + if ((nd->nd_flag & ND_DSSERVER) != 0 && + ((stp->ls_stateid.other[0] == 0x0 && + stp->ls_stateid.other[1] == 0x0 && + stp->ls_stateid.other[2] == 0x0) || + (stp->ls_stateid.other[0] == 0xffffffff && + stp->ls_stateid.other[1] == 0xffffffff && + stp->ls_stateid.other[2] == 0xffffffff) || + stp->ls_stateid.seqid != 0)) + nd->nd_repstat = NFSERR_BADSTATEID; + /* However, allow the proxy stateid. */ + if (stp->ls_stateid.seqid == 0xffffffff && + stp->ls_stateid.other[0] == 0x55555555 && + stp->ls_stateid.other[1] == 0x55555555 && + stp->ls_stateid.other[2] == 0x55555555) + gotproxystateid = 1; off = fxdr_hyper(tl); lop->lo_first = off; tl += 2; @@ -893,7 +966,9 @@ nfsrvd_write(struct nfsrv_descript *nd, __unused int isdgram, nd->nd_repstat = (vnode_vtype(vp) == VDIR) ? EISDIR : EINVAL; } - forat_ret = nfsvno_getattr(vp, &forat, nd->nd_cred, p, 1); + NFSZERO_ATTRBIT(&attrbits); + NFSSETBIT_ATTRBIT(&attrbits, NFSATTRBIT_OWNER); + forat_ret = nfsvno_getattr(vp, &forat, nd, p, 1, &attrbits); if (!nd->nd_repstat) nd->nd_repstat = forat_ret; if (!nd->nd_repstat && @@ -902,10 +977,14 @@ nfsrvd_write(struct nfsrv_descript *nd, __unused int isdgram, nd->nd_repstat = nfsvno_accchk(vp, VWRITE, nd->nd_cred, exp, p, NFSACCCHK_ALLOWOWNER, NFSACCCHK_VPISLOCKED, NULL); - if ((nd->nd_flag & ND_NFSV4) && !nd->nd_repstat) { + /* + * DS reads are marked by ND_DSSERVER or use the proxy special + * stateid. + */ + if (nd->nd_repstat == 0 && (nd->nd_flag & (ND_NFSV4 | ND_DSSERVER)) == + ND_NFSV4 && gotproxystateid == 0) nd->nd_repstat = nfsrv_lockctrl(vp, &stp, &lop, NULL, clientid, &stateid, exp, nd, p); - } if (nd->nd_repstat) { vput(vp); if (nd->nd_flag & ND_NFSV3) @@ -919,7 +998,7 @@ nfsrvd_write(struct nfsrv_descript *nd, __unused int isdgram, * which is to return ok so long as there are no permission problems. */ if (retlen > 0) { - nd->nd_repstat = nfsvno_write(vp, off, retlen, cnt, stable, + nd->nd_repstat = nfsvno_write(vp, off, retlen, cnt, &stable, nd->nd_md, nd->nd_dpos, nd->nd_cred, p); error = nfsm_advance(nd, NFSM_RNDUP(retlen), -1); if (error) @@ -928,7 +1007,7 @@ nfsrvd_write(struct nfsrv_descript *nd, __unused int isdgram, if (nd->nd_flag & ND_NFSV4) aftat_ret = 0; else - aftat_ret = nfsvno_getattr(vp, &nva, nd->nd_cred, p, 1); + aftat_ret = nfsvno_getattr(vp, &nva, nd, p, 1, NULL); vput(vp); if (!nd->nd_repstat) nd->nd_repstat = aftat_ret; @@ -1050,8 +1129,8 @@ nfsrvd_create(struct nfsrv_descript *nd, __unused int isdgram, if (nd->nd_repstat) { nfsvno_relpathbuf(&named); if (nd->nd_flag & ND_NFSV3) { - dirfor_ret = nfsvno_getattr(dp, &dirfor, nd->nd_cred, - p, 1); + dirfor_ret = nfsvno_getattr(dp, &dirfor, nd, p, 1, + NULL); nfsrv_wcc(nd, dirfor_ret, &dirfor, diraft_ret, &diraft); } @@ -1065,8 +1144,8 @@ nfsrvd_create(struct nfsrv_descript *nd, __unused int isdgram, vrele(dirp); dirp = NULL; } else { - dirfor_ret = nfsvno_getattr(dirp, &dirfor, nd->nd_cred, - p, 0); + dirfor_ret = nfsvno_getattr(dirp, &dirfor, nd, p, 0, + NULL); } } if (nd->nd_repstat) { @@ -1104,8 +1183,8 @@ nfsrvd_create(struct nfsrv_descript *nd, __unused int isdgram, if (!nd->nd_repstat) { nd->nd_repstat = nfsvno_getfh(vp, &fh, p); if (!nd->nd_repstat) - nd->nd_repstat = nfsvno_getattr(vp, &nva, nd->nd_cred, - p, 1); + nd->nd_repstat = nfsvno_getattr(vp, &nva, nd, p, 1, + NULL); vput(vp); if (!nd->nd_repstat) { tverf[0] = nva.na_atime.tv_sec; @@ -1121,7 +1200,7 @@ nfsrvd_create(struct nfsrv_descript *nd, __unused int isdgram, if (exclusive_flag && !nd->nd_repstat && (cverf[0] != tverf[0] || cverf[1] != tverf[1])) nd->nd_repstat = EEXIST; - diraft_ret = nfsvno_getattr(dirp, &diraft, nd->nd_cred, p, 0); + diraft_ret = nfsvno_getattr(dirp, &diraft, nd, p, 0, NULL); vrele(dirp); if (!nd->nd_repstat) { (void) nfsm_fhtom(nd, (u_int8_t *)&fh, 0, 1); @@ -1231,7 +1310,7 @@ nfsrvd_mknod(struct nfsrv_descript *nd, __unused int isdgram, } } - dirfor_ret = nfsvno_getattr(dp, &dirfor, nd->nd_cred, p, 0); + dirfor_ret = nfsvno_getattr(dp, &dirfor, nd, p, 0, NULL); if (!nd->nd_repstat && (nd->nd_flag & ND_NFSV4)) { if (!dirfor_ret && NFSVNO_ISSETGID(&nva) && dirfor.na_gid == nva.na_gid) @@ -1269,8 +1348,8 @@ nfsrvd_mknod(struct nfsrv_descript *nd, __unused int isdgram, if (nd->nd_repstat) { if (dirp) { if (nd->nd_flag & ND_NFSV3) - dirfor_ret = nfsvno_getattr(dirp, &dirfor, - nd->nd_cred, p, 0); + dirfor_ret = nfsvno_getattr(dirp, &dirfor, nd, + p, 0, NULL); vrele(dirp); } #ifdef NFS4_ACL_EXTATTR_NAME @@ -1282,7 +1361,7 @@ nfsrvd_mknod(struct nfsrv_descript *nd, __unused int isdgram, goto out; } if (dirp) - dirfor_ret = nfsvno_getattr(dirp, &dirfor, nd->nd_cred, p, 0); + dirfor_ret = nfsvno_getattr(dirp, &dirfor, nd, p, 0, NULL); if ((nd->nd_flag & ND_NFSV4) && (vtyp == VDIR || vtyp == VLNK)) { if (vtyp == VDIR) { @@ -1311,8 +1390,8 @@ nfsrvd_mknod(struct nfsrv_descript *nd, __unused int isdgram, nfsrv_fixattr(nd, vp, &nva, aclp, p, &attrbits, exp); nd->nd_repstat = nfsvno_getfh(vp, fhp, p); if ((nd->nd_flag & ND_NFSV3) && !nd->nd_repstat) - nd->nd_repstat = nfsvno_getattr(vp, &nva, nd->nd_cred, - p, 1); + nd->nd_repstat = nfsvno_getattr(vp, &nva, nd, p, 1, + NULL); if (vpp != NULL && nd->nd_repstat == 0) { NFSVOPUNLOCK(vp, 0); *vpp = vp; @@ -1320,7 +1399,7 @@ nfsrvd_mknod(struct nfsrv_descript *nd, __unused int isdgram, vput(vp); } - diraft_ret = nfsvno_getattr(dirp, &diraft, nd->nd_cred, p, 0); + diraft_ret = nfsvno_getattr(dirp, &diraft, nd, p, 0, NULL); vrele(dirp); if (!nd->nd_repstat) { if (nd->nd_flag & ND_NFSV3) { @@ -1394,8 +1473,8 @@ nfsrvd_remove(struct nfsrv_descript *nd, __unused int isdgram, } if (dirp) { if (!(nd->nd_flag & ND_NFSV2)) { - dirfor_ret = nfsvno_getattr(dirp, &dirfor, - nd->nd_cred, p, 0); + dirfor_ret = nfsvno_getattr(dirp, &dirfor, nd, p, 0, + NULL); } else { vrele(dirp); dirp = NULL; @@ -1419,8 +1498,8 @@ nfsrvd_remove(struct nfsrv_descript *nd, __unused int isdgram, } if (!(nd->nd_flag & ND_NFSV2)) { if (dirp) { - diraft_ret = nfsvno_getattr(dirp, &diraft, nd->nd_cred, - p, 0); + diraft_ret = nfsvno_getattr(dirp, &diraft, nd, p, 0, + NULL); vrele(dirp); } if (nd->nd_flag & ND_NFSV3) { @@ -1466,7 +1545,7 @@ nfsrvd_rename(struct nfsrv_descript *nd, int isdgram, goto out; } if (!(nd->nd_flag & ND_NFSV2)) - fdirfor_ret = nfsvno_getattr(dp, &fdirfor, nd->nd_cred, p, 1); + fdirfor_ret = nfsvno_getattr(dp, &fdirfor, nd, p, 1, NULL); tond.ni_cnd.cn_nameiop = 0; tond.ni_startdir = NULL; NFSNAMEICNDSET(&fromnd.ni_cnd, nd->nd_cred, DELETE, WANTPARENT | SAVESTART); @@ -1489,11 +1568,12 @@ nfsrvd_rename(struct nfsrv_descript *nd, int isdgram, tnes = *toexp; if (dp != tdp) { NFSVOPUNLOCK(dp, 0); - tdirfor_ret = nfsvno_getattr(tdp, &tdirfor, nd->nd_cred, - p, 0); /* Might lock tdp. */ + /* Might lock tdp. */ + tdirfor_ret = nfsvno_getattr(tdp, &tdirfor, nd, p, 0, + NULL); } else { - tdirfor_ret = nfsvno_getattr(tdp, &tdirfor, nd->nd_cred, - p, 1); + tdirfor_ret = nfsvno_getattr(tdp, &tdirfor, nd, p, 1, + NULL); NFSVOPUNLOCK(dp, 0); } } else { @@ -1514,8 +1594,8 @@ nfsrvd_rename(struct nfsrv_descript *nd, int isdgram, VREF(dp); tdp = dp; tnes = *exp; - tdirfor_ret = nfsvno_getattr(tdp, &tdirfor, nd->nd_cred, - p, 1); + tdirfor_ret = nfsvno_getattr(tdp, &tdirfor, nd, p, 1, + NULL); NFSVOPUNLOCK(dp, 0); } else { NFSVOPUNLOCK(dp, 0); @@ -1523,8 +1603,8 @@ nfsrvd_rename(struct nfsrv_descript *nd, int isdgram, nfsd_fhtovp(nd, &tfh, LK_EXCLUSIVE, &tdp, &tnes, NULL, 0, p); /* Locks tdp. */ if (tdp) { - tdirfor_ret = nfsvno_getattr(tdp, &tdirfor, - nd->nd_cred, p, 1); + tdirfor_ret = nfsvno_getattr(tdp, &tdirfor, nd, + p, 1, NULL); NFSVOPUNLOCK(tdp, 0); } } @@ -1581,11 +1661,9 @@ nfsrvd_rename(struct nfsrv_descript *nd, int isdgram, nd->nd_repstat = nfsvno_rename(&fromnd, &tond, nd->nd_repstat, nd->nd_flag, nd->nd_cred, p); if (fdirp) - fdiraft_ret = nfsvno_getattr(fdirp, &fdiraft, nd->nd_cred, p, - 0); + fdiraft_ret = nfsvno_getattr(fdirp, &fdiraft, nd, p, 0, NULL); if (tdirp) - tdiraft_ret = nfsvno_getattr(tdirp, &tdiraft, nd->nd_cred, p, - 0); + tdiraft_ret = nfsvno_getattr(tdirp, &tdiraft, nd, p, 0, NULL); if (fdirp) vrele(fdirp); if (tdirp) @@ -1686,16 +1764,16 @@ nfsrvd_link(struct nfsrv_descript *nd, int isdgram, vrele(dirp); dirp = NULL; } else { - dirfor_ret = nfsvno_getattr(dirp, &dirfor, - nd->nd_cred, p, 0); + dirfor_ret = nfsvno_getattr(dirp, &dirfor, nd, p, 0, + NULL); } } if (!nd->nd_repstat) nd->nd_repstat = nfsvno_link(&named, vp, nd->nd_cred, p, exp); if (nd->nd_flag & ND_NFSV3) - getret = nfsvno_getattr(vp, &at, nd->nd_cred, p, 0); + getret = nfsvno_getattr(vp, &at, nd, p, 0, NULL); if (dirp) { - diraft_ret = nfsvno_getattr(dirp, &diraft, nd->nd_cred, p, 0); + diraft_ret = nfsvno_getattr(dirp, &diraft, nd, p, 0, NULL); vrele(dirp); } vrele(vp); @@ -1765,13 +1843,13 @@ nfsrvd_symlink(struct nfsrv_descript *nd, __unused int isdgram, */ if (!nd->nd_repstat) { if (dirp != NULL) - dirfor_ret = nfsvno_getattr(dirp, &dirfor, nd->nd_cred, - p, 0); + dirfor_ret = nfsvno_getattr(dirp, &dirfor, nd, p, 0, + NULL); nfsrvd_symlinksub(nd, &named, &nva, fhp, vpp, dirp, &dirfor, &diraft, &diraft_ret, NULL, NULL, p, exp, pathcp, pathlen); } else if (dirp != NULL) { - dirfor_ret = nfsvno_getattr(dirp, &dirfor, nd->nd_cred, p, 0); + dirfor_ret = nfsvno_getattr(dirp, &dirfor, nd, p, 0, NULL); vrele(dirp); } if (pathcp) @@ -1811,7 +1889,7 @@ nfsrvd_symlinksub(struct nfsrv_descript *nd, struct nameidata *ndp, nd->nd_repstat = nfsvno_getfh(ndp->ni_vp, fhp, p); if (!nd->nd_repstat) nd->nd_repstat = nfsvno_getattr(ndp->ni_vp, - nvap, nd->nd_cred, p, 1); + nvap, nd, p, 1, NULL); } if (vpp != NULL && nd->nd_repstat == 0) { NFSVOPUNLOCK(ndp->ni_vp, 0); @@ -1820,7 +1898,7 @@ nfsrvd_symlinksub(struct nfsrv_descript *nd, struct nameidata *ndp, vput(ndp->ni_vp); } if (dirp) { - *diraft_retp = nfsvno_getattr(dirp, diraftp, nd->nd_cred, p, 0); + *diraft_retp = nfsvno_getattr(dirp, diraftp, nd, p, 0, NULL); vrele(dirp); } if ((nd->nd_flag & ND_NFSV4) && !nd->nd_repstat) { @@ -1884,8 +1962,8 @@ nfsrvd_mkdir(struct nfsrv_descript *nd, __unused int isdgram, } if (nd->nd_repstat) { if (dirp != NULL) { - dirfor_ret = nfsvno_getattr(dirp, &dirfor, nd->nd_cred, - p, 0); + dirfor_ret = nfsvno_getattr(dirp, &dirfor, nd, p, 0, + NULL); vrele(dirp); } if (nd->nd_flag & ND_NFSV3) @@ -1894,7 +1972,7 @@ nfsrvd_mkdir(struct nfsrv_descript *nd, __unused int isdgram, goto out; } if (dirp != NULL) - dirfor_ret = nfsvno_getattr(dirp, &dirfor, nd->nd_cred, p, 0); + dirfor_ret = nfsvno_getattr(dirp, &dirfor, nd, p, 0, NULL); /* * Call nfsrvd_mkdirsub() for the code common to V4 as well. @@ -1944,8 +2022,8 @@ nfsrvd_mkdirsub(struct nfsrv_descript *nd, struct nameidata *ndp, nfsrv_fixattr(nd, vp, nvap, aclp, p, attrbitp, exp); nd->nd_repstat = nfsvno_getfh(vp, fhp, p); if (!(nd->nd_flag & ND_NFSV4) && !nd->nd_repstat) - nd->nd_repstat = nfsvno_getattr(vp, nvap, nd->nd_cred, - p, 1); + nd->nd_repstat = nfsvno_getattr(vp, nvap, nd, p, 1, + NULL); if (vpp && !nd->nd_repstat) { NFSVOPUNLOCK(vp, 0); *vpp = vp; @@ -1954,7 +2032,7 @@ nfsrvd_mkdirsub(struct nfsrv_descript *nd, struct nameidata *ndp, } } if (dirp) { - *diraft_retp = nfsvno_getattr(dirp, diraftp, nd->nd_cred, p, 0); + *diraft_retp = nfsvno_getattr(dirp, diraftp, nd, p, 0, NULL); vrele(dirp); } if ((nd->nd_flag & ND_NFSV4) && !nd->nd_repstat) { @@ -2004,10 +2082,10 @@ nfsrvd_commit(struct nfsrv_descript *nd, __unused int isdgram, tl += 2; cnt = fxdr_unsigned(int, *tl); if (nd->nd_flag & ND_NFSV3) - for_ret = nfsvno_getattr(vp, &bfor, nd->nd_cred, p, 1); + for_ret = nfsvno_getattr(vp, &bfor, nd, p, 1, NULL); nd->nd_repstat = nfsvno_fsync(vp, off, cnt, nd->nd_cred, p); if (nd->nd_flag & ND_NFSV3) { - aft_ret = nfsvno_getattr(vp, &aft, nd->nd_cred, p, 1); + aft_ret = nfsvno_getattr(vp, &aft, nd, p, 1, NULL); nfsrv_wcc(nd, for_ret, &bfor, aft_ret, &aft); } vput(vp); @@ -2046,7 +2124,7 @@ nfsrvd_statfs(struct nfsrv_descript *nd, __unused int isdgram, } sf = malloc(sizeof(struct statfs), M_STATFS, M_WAITOK); nd->nd_repstat = nfsvno_statfs(vp, sf); - getret = nfsvno_getattr(vp, &at, nd->nd_cred, p, 1); + getret = nfsvno_getattr(vp, &at, nd, p, 1, NULL); vput(vp); if (nd->nd_flag & ND_NFSV3) nfsrv_postopattr(nd, getret, &at); @@ -2101,7 +2179,7 @@ nfsrvd_fsinfo(struct nfsrv_descript *nd, int isdgram, nfsrv_postopattr(nd, getret, &at); goto out; } - getret = nfsvno_getattr(vp, &at, nd->nd_cred, p, 1); + getret = nfsvno_getattr(vp, &at, nd, p, 1, NULL); nfsvno_getfs(&fs, isdgram); vput(vp); nfsrv_postopattr(nd, getret, &at); @@ -2151,7 +2229,7 @@ nfsrvd_pathconf(struct nfsrv_descript *nd, __unused int isdgram, if (!nd->nd_repstat) nd->nd_repstat = nfsvno_pathconf(vp, _PC_NO_TRUNC, ¬runc, nd->nd_cred, p); - getret = nfsvno_getattr(vp, &at, nd->nd_cred, p, 1); + getret = nfsvno_getattr(vp, &at, nd, p, 1, NULL); vput(vp); nfsrv_postopattr(nd, getret, &at); if (!nd->nd_repstat) { @@ -2234,6 +2312,25 @@ nfsrvd_lock(struct nfsrv_descript *nd, __unused int isdgram, NFSBCOPY((caddr_t)tl, (caddr_t)stp->ls_stateid.other, NFSX_STATEIDOTHER); tl += (NFSX_STATEIDOTHER / NFSX_UNSIGNED); + + /* + * For the special stateid of other all 0s and seqid == 1, set + * the stateid to the current stateid, if it is set. + */ + if ((nd->nd_flag & ND_NFSV41) != 0 && + stp->ls_stateid.seqid == 1 && + stp->ls_stateid.other[0] == 0 && + stp->ls_stateid.other[1] == 0 && + stp->ls_stateid.other[2] == 0) { + if ((nd->nd_flag & ND_CURSTATEID) != 0) { + stp->ls_stateid = nd->nd_curstateid; + stp->ls_stateid.seqid = 0; + } else { + nd->nd_repstat = NFSERR_BADSTATEID; + goto nfsmout; + } + } + stp->ls_opentolockseq = fxdr_unsigned(int, *tl++); clientid.lval[0] = *tl++; clientid.lval[1] = *tl++; @@ -2261,6 +2358,25 @@ nfsrvd_lock(struct nfsrv_descript *nd, __unused int isdgram, NFSBCOPY((caddr_t)tl, (caddr_t)stp->ls_stateid.other, NFSX_STATEIDOTHER); tl += (NFSX_STATEIDOTHER / NFSX_UNSIGNED); + + /* + * For the special stateid of other all 0s and seqid == 1, set + * the stateid to the current stateid, if it is set. + */ + if ((nd->nd_flag & ND_NFSV41) != 0 && + stp->ls_stateid.seqid == 1 && + stp->ls_stateid.other[0] == 0 && + stp->ls_stateid.other[1] == 0 && + stp->ls_stateid.other[2] == 0) { + if ((nd->nd_flag & ND_CURSTATEID) != 0) { + stp->ls_stateid = nd->nd_curstateid; + stp->ls_stateid.seqid = 0; + } else { + nd->nd_repstat = NFSERR_BADSTATEID; + goto nfsmout; + } + } + stp->ls_seq = fxdr_unsigned(int, *tl); clientid.lval[0] = stp->ls_stateid.other[0]; clientid.lval[1] = stp->ls_stateid.other[1]; @@ -2327,6 +2443,11 @@ nfsrvd_lock(struct nfsrv_descript *nd, __unused int isdgram, if (stp) free(stp, M_NFSDSTATE); if (!nd->nd_repstat) { + /* For NFSv4.1, set the Current StateID. */ + if ((nd->nd_flag & ND_NFSV41) != 0) { + nd->nd_curstateid = stateid; + nd->nd_flag |= ND_CURSTATEID; + } NFSM_BUILD(tl, u_int32_t *, NFSX_STATEID); *tl++ = txdr_unsigned(stateid.seqid); NFSBCOPY((caddr_t)stateid.other,(caddr_t)tl,NFSX_STATEIDOTHER); @@ -2520,6 +2641,23 @@ nfsrvd_locku(struct nfsrv_descript *nd, __unused int isdgram, NFSBCOPY((caddr_t)tl, (caddr_t)stp->ls_stateid.other, NFSX_STATEIDOTHER); tl += (NFSX_STATEIDOTHER / NFSX_UNSIGNED); + + /* + * For the special stateid of other all 0s and seqid == 1, set the + * stateid to the current stateid, if it is set. + */ + if ((nd->nd_flag & ND_NFSV41) != 0 && stp->ls_stateid.seqid == 1 && + stp->ls_stateid.other[0] == 0 && stp->ls_stateid.other[1] == 0 && + stp->ls_stateid.other[2] == 0) { + if ((nd->nd_flag & ND_CURSTATEID) != 0) { + stp->ls_stateid = nd->nd_curstateid; + stp->ls_stateid.seqid = 0; + } else { + nd->nd_repstat = NFSERR_BADSTATEID; + goto nfsmout; + } + } + lop->lo_first = fxdr_hyper(tl); tl += 2; len = fxdr_hyper(tl); @@ -2699,7 +2837,7 @@ nfsrvd_open(struct nfsrv_descript *nd, __unused int isdgram, NFSM_DISSECT(tl, u_int32_t *, NFSX_UNSIGNED); create = fxdr_unsigned(int, *tl); if (!nd->nd_repstat) - nd->nd_repstat = nfsvno_getattr(dp, &dirfor, nd->nd_cred, p, 0); + nd->nd_repstat = nfsvno_getattr(dp, &dirfor, nd, p, 0, NULL); if (create == NFSV4OPEN_CREATE) { nva.na_type = VREG; nva.na_mode = 0; @@ -2898,7 +3036,7 @@ nfsrvd_open(struct nfsrv_descript *nd, __unused int isdgram, } if (!nd->nd_repstat) { - nd->nd_repstat = nfsvno_getattr(vp, &nva, nd->nd_cred, p, 1); + nd->nd_repstat = nfsvno_getattr(vp, &nva, nd, p, 1, NULL); if (!nd->nd_repstat) { tverf[0] = nva.na_atime.tv_sec; tverf[1] = nva.na_atime.tv_nsec; @@ -2924,9 +3062,13 @@ nfsrvd_open(struct nfsrv_descript *nd, __unused int isdgram, if (stp) free(stp, M_NFSDSTATE); if (!nd->nd_repstat && dirp) - nd->nd_repstat = nfsvno_getattr(dirp, &diraft, nd->nd_cred, p, - 0); + nd->nd_repstat = nfsvno_getattr(dirp, &diraft, nd, p, 0, NULL); if (!nd->nd_repstat) { + /* For NFSv4.1, set the Current StateID. */ + if ((nd->nd_flag & ND_NFSV41) != 0) { + nd->nd_curstateid = stateid; + nd->nd_flag |= ND_CURSTATEID; + } NFSM_BUILD(tl, u_int32_t *, NFSX_STATEID + 6 * NFSX_UNSIGNED); *tl++ = txdr_unsigned(stateid.seqid); NFSBCOPY((caddr_t)stateid.other,(caddr_t)tl,NFSX_STATEIDOTHER); @@ -3026,9 +3168,10 @@ nfsrvd_close(struct nfsrv_descript *nd, __unused int isdgram, { u_int32_t *tl; struct nfsstate st, *stp = &st; - int error = 0; + int error = 0, writeacc; nfsv4stateid_t stateid; nfsquad_t clientid; + struct nfsvattr na; NFSM_DISSECT(tl, u_int32_t *, NFSX_UNSIGNED + NFSX_STATEID); stp->ls_seq = fxdr_unsigned(u_int32_t, *tl++); @@ -3038,6 +3181,22 @@ nfsrvd_close(struct nfsrv_descript *nd, __unused int isdgram, stp->ls_stateid.seqid = fxdr_unsigned(u_int32_t, *tl++); NFSBCOPY((caddr_t)tl, (caddr_t)stp->ls_stateid.other, NFSX_STATEIDOTHER); + + /* + * For the special stateid of other all 0s and seqid == 1, set the + * stateid to the current stateid, if it is set. + */ + if ((nd->nd_flag & ND_NFSV41) != 0 && stp->ls_stateid.seqid == 1 && + stp->ls_stateid.other[0] == 0 && stp->ls_stateid.other[1] == 0 && + stp->ls_stateid.other[2] == 0) { + if ((nd->nd_flag & ND_CURSTATEID) != 0) + stp->ls_stateid = nd->nd_curstateid; + else { + nd->nd_repstat = NFSERR_BADSTATEID; + goto nfsmout; + } + } + stp->ls_flags = NFSLCK_CLOSE; clientid.lval[0] = stp->ls_stateid.other[0]; clientid.lval[1] = stp->ls_stateid.other[1]; @@ -3052,9 +3211,22 @@ nfsrvd_close(struct nfsrv_descript *nd, __unused int isdgram, nd->nd_flag |= ND_IMPLIEDCLID; nd->nd_clientid.qval = clientid.qval; } - nd->nd_repstat = nfsrv_openupdate(vp, stp, clientid, &stateid, nd, p); + nd->nd_repstat = nfsrv_openupdate(vp, stp, clientid, &stateid, nd, p, + &writeacc); + /* For pNFS, update the attributes. */ + if (writeacc != 0 || nfsrv_pnfsatime != 0) + nfsrv_updatemdsattr(vp, &na, p); vput(vp); if (!nd->nd_repstat) { + /* + * If the stateid that has been closed is the current stateid, + * unset it. + */ + if ((nd->nd_flag & ND_CURSTATEID) != 0 && + stateid.other[0] == nd->nd_curstateid.other[0] && + stateid.other[1] == nd->nd_curstateid.other[1] && + stateid.other[2] == nd->nd_curstateid.other[2]) + nd->nd_flag &= ~ND_CURSTATEID; NFSM_BUILD(tl, u_int32_t *, NFSX_STATEID); *tl++ = txdr_unsigned(stateid.seqid); NFSBCOPY((caddr_t)stateid.other,(caddr_t)tl,NFSX_STATEIDOTHER); @@ -3097,7 +3269,7 @@ nfsrvd_delegpurge(struct nfsrv_descript *nd, __unused int isdgram, nd->nd_clientid.qval = clientid.qval; } nd->nd_repstat = nfsrv_delegupdate(nd, clientid, NULL, NULL, - NFSV4OP_DELEGPURGE, nd->nd_cred, p); + NFSV4OP_DELEGPURGE, nd->nd_cred, p, NULL); nfsmout: NFSEXITCODE2(error, nd); return (error); @@ -3111,9 +3283,10 @@ nfsrvd_delegreturn(struct nfsrv_descript *nd, __unused int isdgram, vnode_t vp, NFSPROC_T *p, __unused struct nfsexstuff *exp) { u_int32_t *tl; - int error = 0; + int error = 0, writeacc; nfsv4stateid_t stateid; nfsquad_t clientid; + struct nfsvattr na; NFSM_DISSECT(tl, u_int32_t *, NFSX_STATEID); stateid.seqid = fxdr_unsigned(u_int32_t, *tl++); @@ -3132,7 +3305,10 @@ nfsrvd_delegreturn(struct nfsrv_descript *nd, __unused int isdgram, nd->nd_clientid.qval = clientid.qval; } nd->nd_repstat = nfsrv_delegupdate(nd, clientid, &stateid, vp, - NFSV4OP_DELEGRETURN, nd->nd_cred, p); + NFSV4OP_DELEGRETURN, nd->nd_cred, p, &writeacc); + /* For pNFS, update the attributes. */ + if (writeacc != 0 || nfsrv_pnfsatime != 0) + nfsrv_updatemdsattr(vp, &na, p); nfsmout: vput(vp); NFSEXITCODE2(error, nd); @@ -3196,7 +3372,8 @@ nfsrvd_openconfirm(struct nfsrv_descript *nd, __unused int isdgram, nd->nd_flag |= ND_IMPLIEDCLID; nd->nd_clientid.qval = clientid.qval; } - nd->nd_repstat = nfsrv_openupdate(vp, stp, clientid, &stateid, nd, p); + nd->nd_repstat = nfsrv_openupdate(vp, stp, clientid, &stateid, nd, p, + NULL); if (!nd->nd_repstat) { NFSM_BUILD(tl, u_int32_t *, NFSX_STATEID); *tl++ = txdr_unsigned(stateid.seqid); @@ -3235,6 +3412,22 @@ nfsrvd_opendowngrade(struct nfsrv_descript *nd, __unused int isdgram, NFSBCOPY((caddr_t)tl, (caddr_t)stp->ls_stateid.other, NFSX_STATEIDOTHER); tl += (NFSX_STATEIDOTHER / NFSX_UNSIGNED); + + /* + * For the special stateid of other all 0s and seqid == 1, set the + * stateid to the current stateid, if it is set. + */ + if ((nd->nd_flag & ND_NFSV41) != 0 && stp->ls_stateid.seqid == 1 && + stp->ls_stateid.other[0] == 0 && stp->ls_stateid.other[1] == 0 && + stp->ls_stateid.other[2] == 0) { + if ((nd->nd_flag & ND_CURSTATEID) != 0) + stp->ls_stateid = nd->nd_curstateid; + else { + nd->nd_repstat = NFSERR_BADSTATEID; + goto nfsmout; + } + } + stp->ls_seq = fxdr_unsigned(u_int32_t, *tl++); i = fxdr_unsigned(int, *tl++); if ((nd->nd_flag & ND_NFSV41) != 0) @@ -3285,8 +3478,13 @@ nfsrvd_opendowngrade(struct nfsrv_descript *nd, __unused int isdgram, } if (!nd->nd_repstat) nd->nd_repstat = nfsrv_openupdate(vp, stp, clientid, &stateid, - nd, p); + nd, p, NULL); if (!nd->nd_repstat) { + /* For NFSv4.1, set the Current StateID. */ + if ((nd->nd_flag & ND_NFSV41) != 0) { + nd->nd_curstateid = stateid; + nd->nd_flag |= ND_CURSTATEID; + } NFSM_BUILD(tl, u_int32_t *, NFSX_STATEID); *tl++ = txdr_unsigned(stateid.seqid); NFSBCOPY((caddr_t)stateid.other,(caddr_t)tl,NFSX_STATEIDOTHER); @@ -3614,7 +3812,7 @@ nfsrvd_verify(struct nfsrv_descript *nd, int isdgram, fhandle_t fh; sf = malloc(sizeof(struct statfs), M_STATFS, M_WAITOK); - nd->nd_repstat = nfsvno_getattr(vp, &nva, nd->nd_cred, p, 1); + nd->nd_repstat = nfsvno_getattr(vp, &nva, nd, p, 1, NULL); if (!nd->nd_repstat) nd->nd_repstat = nfsvno_statfs(vp, sf); if (!nd->nd_repstat) @@ -3801,7 +3999,10 @@ nfsrvd_exchangeid(struct nfsrv_descript *nd, __unused int isdgram, confirm.lval[1] = 1; else confirm.lval[1] = 0; - v41flags = NFSV4EXCH_USENONPNFS; + if (nfsrv_devidcnt == 0) + v41flags = NFSV4EXCH_USENONPNFS | NFSV4EXCH_USEPNFSDS; + else + v41flags = NFSV4EXCH_USEPNFSMDS; sp4type = fxdr_unsigned(uint32_t, *tl); if (sp4type != NFSV4EXCH_SP4NONE) { nd->nd_repstat = NFSERR_NOTSUPP; @@ -3892,7 +4093,15 @@ nfsrvd_createsession(struct nfsrv_descript *nd, __unused int isdgram, NFSM_DISSECT(tl, uint32_t *, 7 * NFSX_UNSIGNED); tl++; /* Header pad always 0. */ sep->sess_maxreq = fxdr_unsigned(uint32_t, *tl++); + if (sep->sess_maxreq > sb_max_adj - NFS_MAXXDR) { + sep->sess_maxreq = sb_max_adj - NFS_MAXXDR; + printf("Consider increasing kern.ipc.maxsockbuf\n"); + } sep->sess_maxresp = fxdr_unsigned(uint32_t, *tl++); + if (sep->sess_maxresp > sb_max_adj - NFS_MAXXDR) { + sep->sess_maxresp = sb_max_adj - NFS_MAXXDR; + printf("Consider increasing kern.ipc.maxsockbuf\n"); + } sep->sess_maxrespcached = fxdr_unsigned(uint32_t, *tl++); sep->sess_maxops = fxdr_unsigned(uint32_t, *tl++); sep->sess_maxslots = fxdr_unsigned(uint32_t, *tl++); @@ -4133,7 +4342,369 @@ nfsrvd_freestateid(struct nfsrv_descript *nd, __unused int isdgram, NFSM_DISSECT(tl, uint32_t *, NFSX_STATEID); stateid.seqid = fxdr_unsigned(uint32_t, *tl++); NFSBCOPY(tl, stateid.other, NFSX_STATEIDOTHER); + + /* + * For the special stateid of other all 0s and seqid == 1, set the + * stateid to the current stateid, if it is set. + */ + if (stateid.seqid == 1 && stateid.other[0] == 0 && + stateid.other[1] == 0 && stateid.other[2] == 0) { + if ((nd->nd_flag & ND_CURSTATEID) != 0) { + stateid = nd->nd_curstateid; + stateid.seqid = 0; + } else { + nd->nd_repstat = NFSERR_BADSTATEID; + goto nfsmout; + } + } + nd->nd_repstat = nfsrv_freestateid(nd, &stateid, p); + + /* If the current stateid has been free'd, unset it. */ + if (nd->nd_repstat == 0 && (nd->nd_flag & ND_CURSTATEID) != 0 && + stateid.other[0] == nd->nd_curstateid.other[0] && + stateid.other[1] == nd->nd_curstateid.other[1] && + stateid.other[2] == nd->nd_curstateid.other[2]) + nd->nd_flag &= ~ND_CURSTATEID; +nfsmout: + NFSEXITCODE2(error, nd); + return (error); +} + +/* + * nfsv4 layoutget service + */ +APPLESTATIC int +nfsrvd_layoutget(struct nfsrv_descript *nd, __unused int isdgram, + vnode_t vp, NFSPROC_T *p, struct nfsexstuff *exp) +{ + uint32_t *tl; + nfsv4stateid_t stateid; + int error = 0, layoutlen, layouttype, iomode, maxcnt, retonclose; + uint64_t offset, len, minlen; + char *layp; + + if (nfs_rootfhset == 0 || nfsd_checkrootexp(nd) != 0) { + nd->nd_repstat = NFSERR_WRONGSEC; + goto nfsmout; + } + NFSM_DISSECT(tl, uint32_t *, 4 * NFSX_UNSIGNED + 3 * NFSX_HYPER + + NFSX_STATEID); + tl++; /* Signal layout available. Ignore for now. */ + layouttype = fxdr_unsigned(int, *tl++); + iomode = fxdr_unsigned(int, *tl++); + offset = fxdr_hyper(tl); tl += 2; + len = fxdr_hyper(tl); tl += 2; + minlen = fxdr_hyper(tl); tl += 2; + stateid.seqid = fxdr_unsigned(uint32_t, *tl++); + NFSBCOPY(tl, stateid.other, NFSX_STATEIDOTHER); + tl += (NFSX_STATEIDOTHER / NFSX_UNSIGNED); + maxcnt = fxdr_unsigned(int, *tl); + NFSD_DEBUG(4, "layoutget ltyp=%d iom=%d off=%ju len=%ju mlen=%ju\n", + layouttype, iomode, (uintmax_t)offset, (uintmax_t)len, + (uintmax_t)minlen); + if (len < minlen || + (minlen != UINT64_MAX && offset + minlen < offset) || + (len != UINT64_MAX && offset + len < offset)) { + nd->nd_repstat = NFSERR_INVAL; + goto nfsmout; + } + + /* + * For the special stateid of other all 0s and seqid == 1, set the + * stateid to the current stateid, if it is set. + */ + if (stateid.seqid == 1 && stateid.other[0] == 0 && + stateid.other[1] == 0 && stateid.other[2] == 0) { + if ((nd->nd_flag & ND_CURSTATEID) != 0) { + stateid = nd->nd_curstateid; + stateid.seqid = 0; + } else { + nd->nd_repstat = NFSERR_BADSTATEID; + goto nfsmout; + } + } + + layp = NULL; + if (layouttype == NFSLAYOUT_NFSV4_1_FILES && nfsrv_maxpnfsmirror == 1) + layp = malloc(NFSX_V4FILELAYOUT, M_TEMP, M_WAITOK); + else if (layouttype == NFSLAYOUT_FLEXFILE) + layp = malloc(NFSX_V4FLEXLAYOUT(nfsrv_maxpnfsmirror), M_TEMP, + M_WAITOK); + else + nd->nd_repstat = NFSERR_UNKNLAYOUTTYPE; + if (layp != NULL) + nd->nd_repstat = nfsrv_layoutget(nd, vp, exp, layouttype, + &iomode, &offset, &len, minlen, &stateid, maxcnt, + &retonclose, &layoutlen, layp, nd->nd_cred, p); + NFSD_DEBUG(4, "nfsrv_layoutget stat=%u layoutlen=%d\n", nd->nd_repstat, + layoutlen); + if (nd->nd_repstat == 0) { + /* For NFSv4.1, set the Current StateID. */ + if ((nd->nd_flag & ND_NFSV41) != 0) { + nd->nd_curstateid = stateid; + nd->nd_flag |= ND_CURSTATEID; + } + NFSM_BUILD(tl, uint32_t *, 4 * NFSX_UNSIGNED + NFSX_STATEID + + 2 * NFSX_HYPER); + *tl++ = txdr_unsigned(retonclose); + *tl++ = txdr_unsigned(stateid.seqid); + NFSBCOPY(stateid.other, tl, NFSX_STATEIDOTHER); + tl += (NFSX_STATEIDOTHER / NFSX_UNSIGNED); + *tl++ = txdr_unsigned(1); /* Only returns one layout. */ + txdr_hyper(offset, tl); tl += 2; + txdr_hyper(len, tl); tl += 2; + *tl++ = txdr_unsigned(iomode); + *tl = txdr_unsigned(layouttype); + nfsm_strtom(nd, layp, layoutlen); + } else if (nd->nd_repstat == NFSERR_LAYOUTTRYLATER) { + NFSM_BUILD(tl, uint32_t *, NFSX_UNSIGNED); + *tl = newnfs_false; + } + free(layp, M_TEMP); +nfsmout: + vput(vp); + NFSEXITCODE2(error, nd); + return (error); +} + +/* + * nfsv4 layoutcommit service + */ +APPLESTATIC int +nfsrvd_layoutcommit(struct nfsrv_descript *nd, __unused int isdgram, + vnode_t vp, NFSPROC_T *p, struct nfsexstuff *exp) +{ + uint32_t *tl; + nfsv4stateid_t stateid; + int error = 0, hasnewoff, hasnewmtime, layouttype, maxcnt, reclaim; + int hasnewsize; + uint64_t offset, len, newoff, newsize; + struct timespec newmtime; + char *layp; + + layp = NULL; + if (nfs_rootfhset == 0 || nfsd_checkrootexp(nd) != 0) { + nd->nd_repstat = NFSERR_WRONGSEC; + goto nfsmout; + } + NFSM_DISSECT(tl, uint32_t *, 2 * NFSX_UNSIGNED + 2 * NFSX_HYPER + + NFSX_STATEID); + offset = fxdr_hyper(tl); tl += 2; + len = fxdr_hyper(tl); tl += 2; + reclaim = fxdr_unsigned(int, *tl++); + stateid.seqid = fxdr_unsigned(uint32_t, *tl++); + NFSBCOPY(tl, stateid.other, NFSX_STATEIDOTHER); + tl += (NFSX_STATEIDOTHER / NFSX_UNSIGNED); + /* + * For the special stateid of other all 0s and seqid == 1, set the + * stateid to the current stateid, if it is set. + */ + if (stateid.seqid == 1 && stateid.other[0] == 0 && + stateid.other[1] == 0 && stateid.other[2] == 0) { + if ((nd->nd_flag & ND_CURSTATEID) != 0) { + stateid = nd->nd_curstateid; + stateid.seqid = 0; + } else { + nd->nd_repstat = NFSERR_BADSTATEID; + goto nfsmout; + } + } + + hasnewoff = fxdr_unsigned(int, *tl); + if (hasnewoff != 0) { + NFSM_DISSECT(tl, uint32_t *, NFSX_HYPER + NFSX_UNSIGNED); + newoff = fxdr_hyper(tl); tl += 2; + } else + NFSM_DISSECT(tl, uint32_t *, NFSX_UNSIGNED); + hasnewmtime = fxdr_unsigned(int, *tl); + if (hasnewmtime != 0) { + NFSM_DISSECT(tl, uint32_t *, NFSX_V4TIME + 2 * NFSX_UNSIGNED); + fxdr_nfsv4time(tl, &newmtime); + tl += (NFSX_V4TIME / NFSX_UNSIGNED); + } else + NFSM_DISSECT(tl, uint32_t *, 2 * NFSX_UNSIGNED); + layouttype = fxdr_unsigned(int, *tl++); + maxcnt = fxdr_unsigned(int, *tl); + if (maxcnt > 0) { + layp = malloc(maxcnt + 1, M_TEMP, M_WAITOK); + error = nfsrv_mtostr(nd, layp, maxcnt); + if (error != 0) + goto nfsmout; + } + nd->nd_repstat = nfsrv_layoutcommit(nd, vp, layouttype, hasnewoff, + newoff, offset, len, hasnewmtime, &newmtime, reclaim, &stateid, + maxcnt, layp, &hasnewsize, &newsize, nd->nd_cred, p); + NFSD_DEBUG(4, "nfsrv_layoutcommit stat=%u\n", nd->nd_repstat); + if (nd->nd_repstat == 0) { + if (hasnewsize != 0) { + NFSM_BUILD(tl, uint32_t *, NFSX_UNSIGNED + NFSX_HYPER); + *tl++ = newnfs_true; + txdr_hyper(newsize, tl); + } else { + NFSM_BUILD(tl, uint32_t *, NFSX_UNSIGNED); + *tl = newnfs_false; + } + } +nfsmout: + free(layp, M_TEMP); + vput(vp); + NFSEXITCODE2(error, nd); + return (error); +} + +/* + * nfsv4 layoutreturn service + */ +APPLESTATIC int +nfsrvd_layoutreturn(struct nfsrv_descript *nd, __unused int isdgram, + vnode_t vp, NFSPROC_T *p, struct nfsexstuff *exp) +{ + uint32_t *tl, *layp; + nfsv4stateid_t stateid; + int error = 0, fnd, kind, layouttype, iomode, maxcnt, reclaim; + uint64_t offset, len; + + layp = NULL; + if (nfs_rootfhset == 0 || nfsd_checkrootexp(nd) != 0) { + nd->nd_repstat = NFSERR_WRONGSEC; + goto nfsmout; + } + NFSM_DISSECT(tl, uint32_t *, 4 * NFSX_UNSIGNED); + reclaim = *tl++; + layouttype = fxdr_unsigned(int, *tl++); + iomode = fxdr_unsigned(int, *tl++); + kind = fxdr_unsigned(int, *tl); + NFSD_DEBUG(4, "layoutreturn recl=%d ltyp=%d iom=%d kind=%d\n", reclaim, + layouttype, iomode, kind); + if (kind == NFSV4LAYOUTRET_FILE) { + NFSM_DISSECT(tl, uint32_t *, 2 * NFSX_HYPER + NFSX_STATEID + + NFSX_UNSIGNED); + offset = fxdr_hyper(tl); tl += 2; + len = fxdr_hyper(tl); tl += 2; + stateid.seqid = fxdr_unsigned(uint32_t, *tl++); + NFSBCOPY(tl, stateid.other, NFSX_STATEIDOTHER); + tl += (NFSX_STATEIDOTHER / NFSX_UNSIGNED); + + /* + * For the special stateid of other all 0s and seqid == 1, set + * the stateid to the current stateid, if it is set. + */ + if (stateid.seqid == 1 && stateid.other[0] == 0 && + stateid.other[1] == 0 && stateid.other[2] == 0) { + if ((nd->nd_flag & ND_CURSTATEID) != 0) { + stateid = nd->nd_curstateid; + stateid.seqid = 0; + } else { + nd->nd_repstat = NFSERR_BADSTATEID; + goto nfsmout; + } + } + + maxcnt = fxdr_unsigned(int, *tl); + if (maxcnt > 0) { + layp = malloc(maxcnt + 1, M_TEMP, M_WAITOK); + error = nfsrv_mtostr(nd, (char *)layp, maxcnt); + if (error != 0) + goto nfsmout; + } + } else { + if (reclaim == newnfs_true) { + nd->nd_repstat = NFSERR_INVAL; + goto nfsmout; + } + offset = len = 0; + maxcnt = 0; + } + nd->nd_repstat = nfsrv_layoutreturn(nd, vp, layouttype, iomode, + offset, len, reclaim, kind, &stateid, maxcnt, layp, &fnd, + nd->nd_cred, p); + NFSD_DEBUG(4, "nfsrv_layoutreturn stat=%u fnd=%d\n", nd->nd_repstat, + fnd); + if (nd->nd_repstat == 0) { + NFSM_BUILD(tl, uint32_t *, NFSX_UNSIGNED); + if (fnd != 0) { + *tl = newnfs_true; + NFSM_BUILD(tl, uint32_t *, NFSX_STATEID); + *tl++ = txdr_unsigned(stateid.seqid); + NFSBCOPY(stateid.other, tl, NFSX_STATEIDOTHER); + } else + *tl = newnfs_false; + } +nfsmout: + free(layp, M_TEMP); + vput(vp); + NFSEXITCODE2(error, nd); + return (error); +} + +/* + * nfsv4 getdeviceinfo service + */ +APPLESTATIC int +nfsrvd_getdevinfo(struct nfsrv_descript *nd, __unused int isdgram, + __unused vnode_t vp, NFSPROC_T *p, __unused struct nfsexstuff *exp) +{ + uint32_t *tl, maxcnt, notify[NFSV4_NOTIFYBITMAP]; + int cnt, devaddrlen, error = 0, i, layouttype; + char devid[NFSX_V4DEVICEID], *devaddr; + time_t dev_time; + + if (nfs_rootfhset == 0 || nfsd_checkrootexp(nd) != 0) { + nd->nd_repstat = NFSERR_WRONGSEC; + goto nfsmout; + } + NFSM_DISSECT(tl, uint32_t *, 3 * NFSX_UNSIGNED + NFSX_V4DEVICEID); + NFSBCOPY(tl, devid, NFSX_V4DEVICEID); + tl += (NFSX_V4DEVICEID / NFSX_UNSIGNED); + layouttype = fxdr_unsigned(int, *tl++); + maxcnt = fxdr_unsigned(uint32_t, *tl++); + cnt = fxdr_unsigned(int, *tl); + NFSD_DEBUG(4, "getdevinfo ltyp=%d maxcnt=%u bitcnt=%d\n", layouttype, + maxcnt, cnt); + if (cnt > NFSV4_NOTIFYBITMAP || cnt < 0) { + nd->nd_repstat = NFSERR_INVAL; + goto nfsmout; + } + if (cnt > 0) { + NFSM_DISSECT(tl, uint32_t *, cnt * NFSX_UNSIGNED); + for (i = 0; i < cnt; i++) + notify[i] = fxdr_unsigned(uint32_t, *tl++); + } + for (i = cnt; i < NFSV4_NOTIFYBITMAP; i++) + notify[i] = 0; + + /* + * Check that the device id is not stale. Device ids are recreated + * each time the nfsd threads are restarted. + */ + NFSBCOPY(devid, &dev_time, sizeof(dev_time)); + if (dev_time != nfsdev_time) { + nd->nd_repstat = NFSERR_NOENT; + goto nfsmout; + } + + /* Look for the device id. */ + nd->nd_repstat = nfsrv_getdevinfo(devid, layouttype, &maxcnt, + notify, &devaddrlen, &devaddr); + NFSD_DEBUG(4, "nfsrv_getdevinfo stat=%u\n", nd->nd_repstat); + if (nd->nd_repstat == 0) { + NFSM_BUILD(tl, uint32_t *, NFSX_UNSIGNED); + *tl = txdr_unsigned(layouttype); + nfsm_strtom(nd, devaddr, devaddrlen); + cnt = 0; + for (i = 0; i < NFSV4_NOTIFYBITMAP; i++) { + if (notify[i] != 0) + cnt = i + 1; + } + NFSM_BUILD(tl, uint32_t *, (cnt + 1) * NFSX_UNSIGNED); + *tl++ = txdr_unsigned(cnt); + for (i = 0; i < cnt; i++) + *tl++ = txdr_unsigned(notify[i]); + } else if (nd->nd_repstat == NFSERR_TOOSMALL) { + NFSM_BUILD(tl, uint32_t *, NFSX_UNSIGNED); + *tl = txdr_unsigned(maxcnt); + } nfsmout: NFSEXITCODE2(error, nd); return (error); diff --git a/sys/fs/nfsserver/nfs_nfsdsocket.c b/sys/fs/nfsserver/nfs_nfsdsocket.c index 73e697928260..6326d39ea106 100644 --- a/sys/fs/nfsserver/nfs_nfsdsocket.c +++ b/sys/fs/nfsserver/nfs_nfsdsocket.c @@ -52,6 +52,8 @@ extern struct nfsclienthashhead *nfsclienthash; extern int nfsrv_clienthashsize; extern int nfsrc_floodlevel, nfsrc_tcpsavedreplies; extern int nfsd_debuglevel; +extern int nfsrv_layouthighwater; +extern volatile int nfsrv_layoutcnt; NFSV4ROOTLOCKMUTEX; NFSSTATESPINLOCK; @@ -184,11 +186,11 @@ int (*nfsrv4_ops0[NFSV41_NOPS])(struct nfsrv_descript *, nfsrvd_destroysession, nfsrvd_freestateid, nfsrvd_notsupp, + nfsrvd_getdevinfo, nfsrvd_notsupp, - nfsrvd_notsupp, - nfsrvd_notsupp, - nfsrvd_notsupp, - nfsrvd_notsupp, + nfsrvd_layoutcommit, + nfsrvd_layoutget, + nfsrvd_layoutreturn, nfsrvd_notsupp, nfsrvd_sequence, nfsrvd_notsupp, @@ -727,6 +729,10 @@ nfsrvd_compound(struct nfsrv_descript *nd, int isdgram, u_char *tag, nfsrv_throwawayopens(p); } + /* Do a CBLAYOUTRECALL callback if over the high water mark. */ + if (nfsrv_layoutcnt > nfsrv_layouthighwater) + nfsrv_recalloldlayout(p); + savevp = vp = NULL; save_fsid.val[0] = save_fsid.val[1] = 0; cur_fsid.val[0] = cur_fsid.val[1] = 0; @@ -910,6 +916,11 @@ nfsrvd_compound(struct nfsrv_descript *nd, int isdgram, u_char *tag, savevpnes = vpnes; save_fsid = cur_fsid; } + if ((nd->nd_flag & ND_CURSTATEID) != 0) { + nd->nd_savedcurstateid = + nd->nd_curstateid; + nd->nd_flag |= ND_SAVEDCURSTATEID; + } } else { nd->nd_repstat = NFSERR_NOFILEHANDLE; } @@ -925,6 +936,11 @@ nfsrvd_compound(struct nfsrv_descript *nd, int isdgram, u_char *tag, vpnes = savevpnes; cur_fsid = save_fsid; } + if ((nd->nd_flag & ND_SAVEDCURSTATEID) != 0) { + nd->nd_curstateid = + nd->nd_savedcurstateid; + nd->nd_flag |= ND_CURSTATEID; + } } else { nd->nd_repstat = NFSERR_RESTOREFH; } diff --git a/sys/fs/nfsserver/nfs_nfsdstate.c b/sys/fs/nfsserver/nfs_nfsdstate.c index b8ded77bf9b6..e801a5ccd856 100644 --- a/sys/fs/nfsserver/nfs_nfsdstate.c +++ b/sys/fs/nfsserver/nfs_nfsdstate.c @@ -31,21 +31,35 @@ __FBSDID("$FreeBSD$"); #ifndef APPLEKEXT +#include #include struct nfsrv_stablefirst nfsrv_stablefirst; int nfsrv_issuedelegs = 0; int nfsrv_dolocallocks = 0; struct nfsv4lock nfsv4rootfs_lock; +time_t nfsdev_time = 0; +int nfsrv_layouthashsize; +volatile int nfsrv_layoutcnt = 0; extern int newnfs_numnfsd; extern struct nfsstatsv1 nfsstatsv1; extern int nfsrv_lease; extern struct timeval nfsboottime; extern u_int32_t newnfs_true, newnfs_false; +extern struct mtx nfsrv_dslock_mtx; +extern struct mtx nfsrv_recalllock_mtx; +extern struct mtx nfsrv_dontlistlock_mtx; extern int nfsd_debuglevel; +extern u_int nfsrv_dsdirsize; +extern struct nfsdevicehead nfsrv_devidhead; +extern int nfsrv_doflexfile; +extern int nfsrv_maxpnfsmirror; NFSV4ROOTLOCKMUTEX; NFSSTATESPINLOCK; +extern struct nfsdontlisthead nfsrv_dontlisthead; +extern volatile int nfsrv_devidcnt; +extern struct nfslayouthead nfsrv_recalllisthead; SYSCTL_DECL(_vfs_nfsd); int nfsrv_statehashsize = NFSSTATEHASHSIZE; @@ -68,6 +82,11 @@ SYSCTL_INT(_vfs_nfsd, OID_AUTO, sessionhashsize, CTLFLAG_RDTUN, &nfsrv_sessionhashsize, 0, "Size of session hash table set via loader.conf"); +int nfsrv_layouthighwater = NFSLAYOUTHIGHWATER; +SYSCTL_INT(_vfs_nfsd, OID_AUTO, layouthighwater, CTLFLAG_RDTUN, + &nfsrv_layouthighwater, 0, + "High water mark for number of layouts set via loader.conf"); + static int nfsrv_v4statelimit = NFSRV_V4STATELIMIT; SYSCTL_INT(_vfs_nfsd, OID_AUTO, v4statelimit, CTLFLAG_RWTUN, &nfsrv_v4statelimit, 0, @@ -83,12 +102,24 @@ SYSCTL_INT(_vfs_nfsd, OID_AUTO, allowreadforwriteopen, CTLFLAG_RW, &nfsrv_allowreadforwriteopen, 0, "Allow Reads to be done with Write Access StateIDs"); +int nfsrv_pnfsatime = 0; +SYSCTL_INT(_vfs_nfsd, OID_AUTO, pnfsstrictatime, CTLFLAG_RW, + &nfsrv_pnfsatime, 0, + "For pNFS service, do Getattr ops to keep atime up-to-date"); + +int nfsrv_flexlinuxhack = 0; +SYSCTL_INT(_vfs_nfsd, OID_AUTO, flexlinuxhack, CTLFLAG_RW, + &nfsrv_flexlinuxhack, 0, + "For Linux clients, hack around Flex File Layout bug"); + /* * Hash lists for nfs V4. */ struct nfsclienthashhead *nfsclienthash; struct nfslockhashhead *nfslockhash; struct nfssessionhash *nfssessionhash; +struct nfslayouthash *nfslayouthash; +volatile int nfsrv_dontlistlen = 0; #endif /* !APPLEKEXT */ static u_int32_t nfsrv_openpluslock = 0, nfsrv_delegatecnt = 0; @@ -131,7 +162,7 @@ static int nfsrv_checkgrace(struct nfsrv_descript *nd, struct nfsclient *clp, u_int32_t flags); static int nfsrv_docallback(struct nfsclient *clp, int procnum, nfsv4stateid_t *stateidp, int trunc, fhandle_t *fhp, - struct nfsvattr *nap, nfsattrbit_t *attrbitp, NFSPROC_T *p); + struct nfsvattr *nap, nfsattrbit_t *attrbitp, int laytype, NFSPROC_T *p); static int nfsrv_cbcallargs(struct nfsrv_descript *nd, struct nfsclient *clp, uint32_t callback, int op, const char *optag, struct nfsdsession **sepp); static u_int32_t nfsrv_nextclientindex(void); @@ -170,6 +201,36 @@ static int nfsrv_freesession(struct nfsdsession *sep, uint8_t *sessionid); static int nfsv4_setcbsequence(struct nfsrv_descript *nd, struct nfsclient *clp, int dont_replycache, struct nfsdsession **sepp); static int nfsv4_getcbsession(struct nfsclient *clp, struct nfsdsession **sepp); +static int nfsrv_addlayout(struct nfsrv_descript *nd, struct nfslayout **lypp, + nfsv4stateid_t *stateidp, char *layp, int *layoutlenp, NFSPROC_T *p); +static void nfsrv_freelayout(struct nfslayouthead *lhp, struct nfslayout *lyp); +static void nfsrv_freelayoutlist(nfsquad_t clientid); +static void nfsrv_freelayouts(nfsquad_t *clid, fsid_t *fs, int laytype, + int iomode); +static void nfsrv_freealllayouts(void); +static void nfsrv_freedevid(struct nfsdevice *ds); +static int nfsrv_setdsserver(char *dspathp, NFSPROC_T *p, + struct nfsdevice **dsp); +static int nfsrv_delds(char *devid, NFSPROC_T *p); +static void nfsrv_deleteds(struct nfsdevice *fndds); +static void nfsrv_allocdevid(struct nfsdevice *ds, char *addr, char *dnshost); +static void nfsrv_freealldevids(void); +static void nfsrv_flexlayouterr(struct nfsrv_descript *nd, uint32_t *layp, + int maxcnt, NFSPROC_T *p); +static int nfsrv_recalllayout(nfsquad_t clid, nfsv4stateid_t *stateidp, + fhandle_t *fhp, struct nfslayout *lyp, struct nfslayouthead *lyheadp, + int laytype, NFSPROC_T *p); +static int nfsrv_findlayout(nfsquad_t *clientidp, fhandle_t *fhp, int laytype, + NFSPROC_T *, struct nfslayout **lypp); +static int nfsrv_fndclid(nfsquad_t *clidvec, nfsquad_t clid, int clidcnt); +static struct nfslayout *nfsrv_filelayout(struct nfsrv_descript *nd, int iomode, + fhandle_t *fhp, fhandle_t *dsfhp, char *devid, fsid_t fs); +static struct nfslayout *nfsrv_flexlayout(struct nfsrv_descript *nd, int iomode, + int mirrorcnt, fhandle_t *fhp, fhandle_t *dsfhp, char *devid, fsid_t fs); +static int nfsrv_dontlayout(fhandle_t *fhp); +static int nfsrv_createdsfile(vnode_t vp, fhandle_t *fhp, struct pnfsdsfile *pf, + vnode_t dvp, struct nfsdevice *ds, struct ucred *cred, NFSPROC_T *p, + vnode_t *tvpp); /* * Scan the client list for a match and either return the current one, @@ -741,6 +802,12 @@ nfsrv_destroyclient(nfsquad_t clientid, NFSPROC_T *p) goto out; } + /* + * Free up all layouts on the clientid. Should the client return the + * layouts? + */ + nfsrv_freelayoutlist(clientid); + /* Scan for state on the clientid. */ for (i = 0; i < nfsrv_statehashsize; i++) if (!LIST_EMPTY(&clp->lc_stateid[i])) { @@ -1237,7 +1304,7 @@ nfsrv_zapclient(struct nfsclient *clp, NFSPROC_T *p) clp->lc_hand.nfsh_flag &= ~NFSG_COMPLETE; clp->lc_hand.nfsh_flag |= NFSG_DESTROYED; (void) nfsrv_docallback(clp, NFSV4PROC_CBNULL, - NULL, 0, NULL, NULL, NULL, p); + NULL, 0, NULL, NULL, NULL, 0, p); } #endif newnfs_disconnect(&clp->lc_req); @@ -2590,7 +2657,7 @@ nfsrv_openctrl(struct nfsrv_descript *nd, vnode_t vp, * harmless. */ cbret = nfsrv_docallback(clp, NFSV4PROC_CBNULL, - NULL, 0, NULL, NULL, NULL, p); + NULL, 0, NULL, NULL, NULL, 0, p); NFSLOCKSTATE(); clp->lc_flags &= ~LCL_NEEDSCBNULL; if (!cbret) @@ -3280,7 +3347,8 @@ nfsrv_openctrl(struct nfsrv_descript *nd, vnode_t vp, */ APPLESTATIC int nfsrv_openupdate(vnode_t vp, struct nfsstate *new_stp, nfsquad_t clientid, - nfsv4stateid_t *stateidp, struct nfsrv_descript *nd, NFSPROC_T *p) + nfsv4stateid_t *stateidp, struct nfsrv_descript *nd, NFSPROC_T *p, + int *retwriteaccessp) { struct nfsstate *stp; struct nfsclient *clp; @@ -3382,6 +3450,12 @@ nfsrv_openupdate(vnode_t vp, struct nfsstate *new_stp, nfsquad_t clientid, NFSUNLOCKSTATE(); } else if (new_stp->ls_flags & NFSLCK_CLOSE) { lfp = stp->ls_lfp; + if (retwriteaccessp != NULL) { + if ((stp->ls_flags & NFSLCK_WRITEACCESS) != 0) + *retwriteaccessp = 1; + else + *retwriteaccessp = 0; + } if (nfsrv_dolocallocks != 0 && !LIST_EMPTY(&stp->ls_open)) { /* Get the lf lock */ nfsrv_locklf(lfp); @@ -3438,7 +3512,7 @@ nfsrv_openupdate(vnode_t vp, struct nfsstate *new_stp, nfsquad_t clientid, APPLESTATIC int nfsrv_delegupdate(struct nfsrv_descript *nd, nfsquad_t clientid, nfsv4stateid_t *stateidp, vnode_t vp, int op, struct ucred *cred, - NFSPROC_T *p) + NFSPROC_T *p, int *retwriteaccessp) { struct nfsstate *stp; struct nfsclient *clp; @@ -3503,6 +3577,12 @@ nfsrv_delegupdate(struct nfsrv_descript *nd, nfsquad_t clientid, error = NFSERR_BADSTATEID; goto out; } + if (retwriteaccessp != NULL) { + if ((stp->ls_flags & NFSLCK_DELEGWRITE) != 0) + *retwriteaccessp = 1; + else + *retwriteaccessp = 0; + } nfsrv_freedeleg(stp); } else { nfsrv_freedeleglist(&clp->lc_olddeleg); @@ -4151,18 +4231,20 @@ nfsrv_checkgrace(struct nfsrv_descript *nd, struct nfsclient *clp, * Do a server callback. */ static int -nfsrv_docallback(struct nfsclient *clp, int procnum, - nfsv4stateid_t *stateidp, int trunc, fhandle_t *fhp, - struct nfsvattr *nap, nfsattrbit_t *attrbitp, NFSPROC_T *p) +nfsrv_docallback(struct nfsclient *clp, int procnum, nfsv4stateid_t *stateidp, + int trunc, fhandle_t *fhp, struct nfsvattr *nap, nfsattrbit_t *attrbitp, + int laytype, NFSPROC_T *p) { mbuf_t m; u_int32_t *tl; - struct nfsrv_descript nfsd, *nd = &nfsd; + struct nfsrv_descript *nd; struct ucred *cred; int error = 0; u_int32_t callback; struct nfsdsession *sep = NULL; + uint64_t tval; + nd = malloc(sizeof(*nd), M_TEMP, M_WAITOK | M_ZERO); cred = newnfs_getcred(); NFSLOCKSTATE(); /* mostly for lc_cbref++ */ if (clp->lc_flags & LCL_NEEDSCONFIRM) { @@ -4237,6 +4319,31 @@ nfsrv_docallback(struct nfsclient *clp, int procnum, else *tl = newnfs_false; (void)nfsm_fhtom(nd, (u_int8_t *)fhp, NFSX_MYFH, 0); + } else if (procnum == NFSV4OP_CBLAYOUTRECALL) { + NFSD_DEBUG(4, "docallback layout recall\n"); + nd->nd_procnum = NFSV4PROC_CBCOMPOUND; + error = nfsrv_cbcallargs(nd, clp, callback, + NFSV4OP_CBLAYOUTRECALL, "CB Reclayout", &sep); + NFSD_DEBUG(4, "aft cbcallargs=%d\n", error); + if (error != 0) { + mbuf_freem(nd->nd_mreq); + goto errout; + } + NFSM_BUILD(tl, u_int32_t *, 4 * NFSX_UNSIGNED); + *tl++ = txdr_unsigned(laytype); + *tl++ = txdr_unsigned(NFSLAYOUTIOMODE_ANY); + *tl++ = newnfs_true; + *tl = txdr_unsigned(NFSV4LAYOUTRET_FILE); + nfsm_fhtom(nd, (uint8_t *)fhp, NFSX_MYFH, 0); + NFSM_BUILD(tl, u_int32_t *, 2 * NFSX_HYPER + NFSX_STATEID); + tval = 0; + txdr_hyper(tval, tl); tl += 2; + tval = UINT64_MAX; + txdr_hyper(tval, tl); tl += 2; + *tl++ = txdr_unsigned(stateidp->seqid); + NFSBCOPY(stateidp->other, tl, NFSX_STATEIDOTHER); + tl += (NFSX_STATEIDOTHER / NFSX_UNSIGNED); + NFSD_DEBUG(4, "aft args\n"); } else if (procnum == NFSV4PROC_CBNULL) { nd->nd_procnum = NFSV4PROC_CBNULL; if ((clp->lc_flags & LCL_NFSV41) != 0) { @@ -4268,6 +4375,7 @@ nfsrv_docallback(struct nfsclient *clp, int procnum, NULL, 3); } newnfs_sndunlock(&clp->lc_req.nr_lock); + NFSD_DEBUG(4, "aft sndunlock=%d\n", error); if (!error) { if ((nd->nd_flag & ND_NFSV41) != 0) { KASSERT(sep != NULL, ("sep NULL")); @@ -4288,6 +4396,7 @@ nfsrv_docallback(struct nfsclient *clp, int procnum, printf("nfsrv_docallback: no xprt\n"); error = ECONNREFUSED; } + NFSD_DEBUG(4, "aft newnfs_request=%d\n", error); nfsrv_freesession(sep, NULL); } else error = newnfs_request(nd, NULL, clp, &clp->lc_req, @@ -4322,9 +4431,11 @@ nfsrv_docallback(struct nfsclient *clp, int procnum, if (clp->lc_flags & LCL_CBDOWN) clp->lc_flags &= ~(LCL_CBDOWN | LCL_CALLBACKSON); NFSUNLOCKSTATE(); - if (nd->nd_repstat) + if (nd->nd_repstat) { error = nd->nd_repstat; - else if (error == 0 && procnum == NFSV4OP_CBGETATTR) + NFSD_DEBUG(1, "nfsrv_docallback op=%d err=%d\n", + procnum, error); + } else if (error == 0 && procnum == NFSV4OP_CBGETATTR) error = nfsv4_loadattr(nd, NULL, nap, NULL, NULL, 0, NULL, NULL, NULL, NULL, NULL, 0, NULL, NULL, NULL, p, NULL); @@ -4338,6 +4449,7 @@ nfsrv_docallback(struct nfsclient *clp, int procnum, } NFSUNLOCKSTATE(); + free(nd, M_TEMP); NFSEXITCODE(error); return (error); } @@ -4994,7 +5106,7 @@ nfsrv_delegconflict(struct nfsstate *stp, int *haslockp, NFSPROC_T *p, retrycnt = 0; do { error = nfsrv_docallback(clp, NFSV4OP_CBRECALL, - &tstateid, 0, &tfh, NULL, NULL, p); + &tstateid, 0, &tfh, NULL, NULL, 0, p); retrycnt++; } while ((error == NFSERR_BADSTATEID || error == NFSERR_BADHANDLE) && retrycnt < NFSV4_CBRETRYCNT); @@ -5352,8 +5464,7 @@ nfsrv_checksetattr(vnode_t vp, struct nfsrv_descript *nd, */ APPLESTATIC int nfsrv_checkgetattr(struct nfsrv_descript *nd, vnode_t vp, - struct nfsvattr *nvap, nfsattrbit_t *attrbitp, struct ucred *cred, - NFSPROC_T *p) + struct nfsvattr *nvap, nfsattrbit_t *attrbitp, NFSPROC_T *p) { struct nfsstate *stp; struct nfslockfile *lfp; @@ -5431,13 +5542,13 @@ nfsrv_checkgetattr(struct nfsrv_descript *nd, vnode_t vp, NFSVNO_ATTRINIT(&nva); nva.na_filerev = NFS64BITSSET; error = nfsrv_docallback(clp, NFSV4OP_CBGETATTR, NULL, - 0, &nfh, &nva, &cbbits, p); + 0, &nfh, &nva, &cbbits, 0, p); if (!error) { if ((nva.na_filerev != NFS64BITSSET && nva.na_filerev > delegfilerev) || (NFSVNO_ISSETSIZE(&nva) && nva.na_size != nvap->na_size)) { - error = nfsvno_updfilerev(vp, nvap, cred, p); + error = nfsvno_updfilerev(vp, nvap, nd, p); if (NFSVNO_ISSETSIZE(&nva)) nvap->na_size = nva.na_size; } @@ -5860,9 +5971,14 @@ nfsrv_throwawayallstate(NFSPROC_T *p) * Also, free up any remaining lock file structures. */ for (i = 0; i < nfsrv_lockhashsize; i++) { - LIST_FOREACH_SAFE(lfp, &nfslockhash[i], lf_hash, nlfp) + LIST_FOREACH_SAFE(lfp, &nfslockhash[i], lf_hash, nlfp) { + printf("nfsd unload: fnd a lock file struct\n"); nfsrv_freenfslockfile(lfp); + } } + + /* And get rid of the deviceid structures and layouts. */ + nfsrv_freealllayoutsanddevids(); } /* @@ -6323,3 +6439,1929 @@ nfsrv_freeallbackchannel_xprts(void) } } +/* + * Do a layout commit. Actually just call nfsrv_updatemdsattr(). + * I have no idea if the rest of these arguments will ever be useful? + */ +int +nfsrv_layoutcommit(struct nfsrv_descript *nd, vnode_t vp, int layouttype, + int hasnewoff, uint64_t newoff, uint64_t offset, uint64_t len, + int hasnewmtime, struct timespec *newmtimep, int reclaim, + nfsv4stateid_t *stateidp, int maxcnt, char *layp, int *hasnewsizep, + uint64_t *newsizep, struct ucred *cred, NFSPROC_T *p) +{ + struct nfsvattr na; + int error; + + error = nfsrv_updatemdsattr(vp, &na, p); + if (error == 0) { + *hasnewsizep = 1; + *newsizep = na.na_size; + } + return (error); +} + +/* + * Try and get a layout. + */ +int +nfsrv_layoutget(struct nfsrv_descript *nd, vnode_t vp, struct nfsexstuff *exp, + int layouttype, int *iomode, uint64_t *offset, uint64_t *len, + uint64_t minlen, nfsv4stateid_t *stateidp, int maxcnt, int *retonclose, + int *layoutlenp, char *layp, struct ucred *cred, NFSPROC_T *p) +{ + struct nfslayouthash *lhyp; + struct nfslayout *lyp; + char *devid; + fhandle_t fh, *dsfhp; + int error, mirrorcnt; + + if (nfsrv_devidcnt == 0) + return (NFSERR_UNKNLAYOUTTYPE); + + if (*offset != 0) + printf("nfsrv_layoutget: off=%ju len=%ju\n", (uintmax_t)*offset, + (uintmax_t)*len); + error = nfsvno_getfh(vp, &fh, p); + NFSD_DEBUG(4, "layoutget getfh=%d\n", error); + if (error != 0) + return (error); + + /* + * For now, all layouts are for entire files. + * Only issue Read/Write layouts if requested for a non-readonly fs. + */ + if (NFSVNO_EXRDONLY(exp)) { + if (*iomode == NFSLAYOUTIOMODE_RW) + return (NFSERR_LAYOUTTRYLATER); + *iomode = NFSLAYOUTIOMODE_READ; + } + if (*iomode != NFSLAYOUTIOMODE_RW) + *iomode = NFSLAYOUTIOMODE_READ; + + /* + * Check to see if a write layout can be issued for this file. + * This is used during mirror recovery to avoid RW layouts being + * issued for a file while it is being copied to the recovered + * mirror. + */ + if (*iomode == NFSLAYOUTIOMODE_RW && nfsrv_dontlayout(&fh) != 0) + return (NFSERR_LAYOUTTRYLATER); + + *retonclose = 0; + *offset = 0; + *len = UINT64_MAX; + + /* First, see if a layout already exists and return if found. */ + lhyp = NFSLAYOUTHASH(&fh); + NFSLOCKLAYOUT(lhyp); + error = nfsrv_findlayout(&nd->nd_clientid, &fh, layouttype, p, &lyp); + NFSD_DEBUG(4, "layoutget findlay=%d\n", error); + /* + * Not sure if the seqid must be the same, so I won't check it. + */ + if (error == 0 && (stateidp->other[0] != lyp->lay_stateid.other[0] || + stateidp->other[1] != lyp->lay_stateid.other[1] || + stateidp->other[2] != lyp->lay_stateid.other[2])) { + if ((lyp->lay_flags & NFSLAY_CALLB) == 0) { + NFSUNLOCKLAYOUT(lhyp); + NFSD_DEBUG(1, "ret bad stateid\n"); + return (NFSERR_BADSTATEID); + } + /* + * I believe we get here because there is a race between + * the client processing the CBLAYOUTRECALL and the layout + * being deleted here on the server. + * The client has now done a LayoutGet with a non-layout + * stateid, as it would when there is no layout. + * As such, free this layout and set error == NFSERR_BADSTATEID + * so the code below will create a new layout structure as + * would happen if no layout was found. + * "lyp" will be set before being used below, but set it NULL + * as a safety belt. + */ + nfsrv_freelayout(&lhyp->list, lyp); + lyp = NULL; + error = NFSERR_BADSTATEID; + } + if (error == 0) { + if (lyp->lay_layoutlen > maxcnt) { + NFSUNLOCKLAYOUT(lhyp); + NFSD_DEBUG(1, "ret layout too small\n"); + return (NFSERR_TOOSMALL); + } + if (*iomode == NFSLAYOUTIOMODE_RW) + lyp->lay_flags |= NFSLAY_RW; + else + lyp->lay_flags |= NFSLAY_READ; + NFSBCOPY(lyp->lay_xdr, layp, lyp->lay_layoutlen); + *layoutlenp = lyp->lay_layoutlen; + if (++lyp->lay_stateid.seqid == 0) + lyp->lay_stateid.seqid = 1; + stateidp->seqid = lyp->lay_stateid.seqid; + NFSUNLOCKLAYOUT(lhyp); + NFSD_DEBUG(4, "ret fnd layout\n"); + return (0); + } + NFSUNLOCKLAYOUT(lhyp); + + /* Find the device id and file handle. */ + dsfhp = malloc(sizeof(fhandle_t) * NFSDEV_MAXMIRRORS, M_TEMP, M_WAITOK); + devid = malloc(NFSX_V4DEVICEID * NFSDEV_MAXMIRRORS, M_TEMP, M_WAITOK); + error = nfsrv_dsgetdevandfh(vp, p, &mirrorcnt, dsfhp, devid); + NFSD_DEBUG(4, "layoutget devandfh=%d\n", error); + if (error == 0) { + if (layouttype == NFSLAYOUT_NFSV4_1_FILES) { + if (NFSX_V4FILELAYOUT > maxcnt) + error = NFSERR_TOOSMALL; + else + lyp = nfsrv_filelayout(nd, *iomode, &fh, dsfhp, + devid, vp->v_mount->mnt_stat.f_fsid); + } else { + if (NFSX_V4FLEXLAYOUT(mirrorcnt) > maxcnt) + error = NFSERR_TOOSMALL; + else + lyp = nfsrv_flexlayout(nd, *iomode, mirrorcnt, + &fh, dsfhp, devid, + vp->v_mount->mnt_stat.f_fsid); + } + } + free(dsfhp, M_TEMP); + free(devid, M_TEMP); + if (error != 0) + return (error); + + /* + * Now, add this layout to the list. + */ + error = nfsrv_addlayout(nd, &lyp, stateidp, layp, layoutlenp, p); + NFSD_DEBUG(4, "layoutget addl=%d\n", error); + /* + * The lyp will be set to NULL by nfsrv_addlayout() if it + * linked the new structure into the lists. + */ + free(lyp, M_NFSDSTATE); + return (error); +} + +/* + * Generate a File Layout. + */ +static struct nfslayout * +nfsrv_filelayout(struct nfsrv_descript *nd, int iomode, fhandle_t *fhp, + fhandle_t *dsfhp, char *devid, fsid_t fs) +{ + uint32_t *tl; + struct nfslayout *lyp; + uint64_t pattern_offset; + + lyp = malloc(sizeof(struct nfslayout) + NFSX_V4FILELAYOUT, M_NFSDSTATE, + M_WAITOK | M_ZERO); + lyp->lay_type = NFSLAYOUT_NFSV4_1_FILES; + if (iomode == NFSLAYOUTIOMODE_RW) + lyp->lay_flags = NFSLAY_RW; + else + lyp->lay_flags = NFSLAY_READ; + NFSBCOPY(fhp, &lyp->lay_fh, sizeof(*fhp)); + lyp->lay_clientid.qval = nd->nd_clientid.qval; + lyp->lay_fsid = fs; + + /* Fill in the xdr for the files layout. */ + tl = (uint32_t *)lyp->lay_xdr; + NFSBCOPY(devid, tl, NFSX_V4DEVICEID); /* Device ID. */ + tl += (NFSX_V4DEVICEID / NFSX_UNSIGNED); + + /* + * Make the stripe size as many 64K blocks as will fit in the stripe + * mask. Since there is only one stripe, the stripe size doesn't really + * matter, except that the Linux client will only handle an exact + * multiple of their PAGE_SIZE (usually 4K). I chose 64K as a value + * that should cover most/all arches w.r.t. PAGE_SIZE. + */ + *tl++ = txdr_unsigned(NFSFLAYUTIL_STRIPE_MASK & ~0xffff); + *tl++ = 0; /* 1st stripe index. */ + pattern_offset = 0; + txdr_hyper(pattern_offset, tl); tl += 2; /* Pattern offset. */ + *tl++ = txdr_unsigned(1); /* 1 file handle. */ + *tl++ = txdr_unsigned(NFSX_V4PNFSFH); + NFSBCOPY(dsfhp, tl, sizeof(*dsfhp)); + lyp->lay_layoutlen = NFSX_V4FILELAYOUT; + return (lyp); +} + +#define FLEX_OWNERID "999" +#define FLEX_UID0 "0" +/* + * Generate a Flex File Layout. + * The FLEX_OWNERID can be any string of 3 decimal digits. Although this + * string goes on the wire, it isn't supposed to be used by the client, + * since this server uses tight coupling. + * Although not recommended by the spec., if vfs.nfsd.flexlinuxhack=1 use + * a string of "0". This works around the Linux Flex File Layout driver bug + * which uses the synthetic uid/gid strings for the "tightly coupled" case. + */ +static struct nfslayout * +nfsrv_flexlayout(struct nfsrv_descript *nd, int iomode, int mirrorcnt, + fhandle_t *fhp, fhandle_t *dsfhp, char *devid, fsid_t fs) +{ + uint32_t *tl; + struct nfslayout *lyp; + uint64_t lenval; + int i; + + lyp = malloc(sizeof(struct nfslayout) + NFSX_V4FLEXLAYOUT(mirrorcnt), + M_NFSDSTATE, M_WAITOK | M_ZERO); + lyp->lay_type = NFSLAYOUT_FLEXFILE; + if (iomode == NFSLAYOUTIOMODE_RW) + lyp->lay_flags = NFSLAY_RW; + else + lyp->lay_flags = NFSLAY_READ; + NFSBCOPY(fhp, &lyp->lay_fh, sizeof(*fhp)); + lyp->lay_clientid.qval = nd->nd_clientid.qval; + lyp->lay_fsid = fs; + lyp->lay_mirrorcnt = mirrorcnt; + + /* Fill in the xdr for the files layout. */ + tl = (uint32_t *)lyp->lay_xdr; + lenval = 0; + txdr_hyper(lenval, tl); tl += 2; /* Stripe unit. */ + *tl++ = txdr_unsigned(mirrorcnt); /* # of mirrors. */ + for (i = 0; i < mirrorcnt; i++) { + *tl++ = txdr_unsigned(1); /* One stripe. */ + NFSBCOPY(devid, tl, NFSX_V4DEVICEID); /* Device ID. */ + tl += (NFSX_V4DEVICEID / NFSX_UNSIGNED); + devid += NFSX_V4DEVICEID; + *tl++ = txdr_unsigned(1); /* Efficiency. */ + *tl++ = 0; /* Proxy Stateid. */ + *tl++ = 0x55555555; + *tl++ = 0x55555555; + *tl++ = 0x55555555; + *tl++ = txdr_unsigned(1); /* 1 file handle. */ + *tl++ = txdr_unsigned(NFSX_V4PNFSFH); + NFSBCOPY(dsfhp, tl, sizeof(*dsfhp)); + tl += (NFSM_RNDUP(NFSX_V4PNFSFH) / NFSX_UNSIGNED); + dsfhp++; + if (nfsrv_flexlinuxhack != 0) { + *tl++ = txdr_unsigned(strlen(FLEX_UID0)); + *tl = 0; /* 0 pad string. */ + NFSBCOPY(FLEX_UID0, tl++, strlen(FLEX_UID0)); + *tl++ = txdr_unsigned(strlen(FLEX_UID0)); + *tl = 0; /* 0 pad string. */ + NFSBCOPY(FLEX_UID0, tl++, strlen(FLEX_UID0)); + } else { + *tl++ = txdr_unsigned(strlen(FLEX_OWNERID)); + NFSBCOPY(FLEX_OWNERID, tl++, NFSX_UNSIGNED); + *tl++ = txdr_unsigned(strlen(FLEX_OWNERID)); + NFSBCOPY(FLEX_OWNERID, tl++, NFSX_UNSIGNED); + } + } + *tl++ = txdr_unsigned(0); /* ff_flags. */ + *tl = txdr_unsigned(60); /* Status interval hint. */ + lyp->lay_layoutlen = NFSX_V4FLEXLAYOUT(mirrorcnt); + return (lyp); +} + +/* + * Parse and process Flex File errors returned via LayoutReturn. + */ +static void +nfsrv_flexlayouterr(struct nfsrv_descript *nd, uint32_t *layp, int maxcnt, + NFSPROC_T *p) +{ + uint32_t *tl; + int cnt, errcnt, i, j, opnum, stat; + char devid[NFSX_V4DEVICEID]; + + tl = layp; + cnt = fxdr_unsigned(int, *tl++); + NFSD_DEBUG(4, "flexlayouterr cnt=%d\n", cnt); + for (i = 0; i < cnt; i++) { + /* Skip offset, length and stateid for now. */ + tl += (4 + NFSX_STATEID / NFSX_UNSIGNED); + errcnt = fxdr_unsigned(int, *tl++); + NFSD_DEBUG(4, "flexlayouterr errcnt=%d\n", errcnt); + for (j = 0; j < errcnt; j++) { + NFSBCOPY(tl, devid, NFSX_V4DEVICEID); + tl += (NFSX_V4DEVICEID / NFSX_UNSIGNED); + stat = fxdr_unsigned(int, *tl++); + opnum = fxdr_unsigned(int, *tl++); + NFSD_DEBUG(4, "flexlayouterr op=%d stat=%d\n", opnum, + stat); + /* + * Except for NFSERR_ACCES errors for Reading, + * shut the mirror down. + */ + if (opnum != NFSV4OP_READ || stat != NFSERR_ACCES) + nfsrv_delds(devid, p); + } + } +} + +/* + * This function removes all flex file layouts which has a mirror with + * a device id that matches the argument. + * Called when the DS represented by the device id has failed. + */ +void +nfsrv_flexmirrordel(char *devid, NFSPROC_T *p) +{ + uint32_t *tl; + struct nfslayout *lyp, *nlyp; + struct nfslayouthash *lhyp; + struct nfslayouthead loclyp; + int i, j; + + NFSD_DEBUG(4, "flexmirrordel\n"); + /* Move all layouts found onto a local list. */ + TAILQ_INIT(&loclyp); + for (i = 0; i < nfsrv_layouthashsize; i++) { + lhyp = &nfslayouthash[i]; + NFSLOCKLAYOUT(lhyp); + TAILQ_FOREACH_SAFE(lyp, &lhyp->list, lay_list, nlyp) { + if (lyp->lay_type == NFSLAYOUT_FLEXFILE && + lyp->lay_mirrorcnt > 1) { + NFSD_DEBUG(4, "possible match\n"); + tl = lyp->lay_xdr; + tl += 3; + for (j = 0; j < lyp->lay_mirrorcnt; j++) { + tl++; + if (NFSBCMP(devid, tl, NFSX_V4DEVICEID) + == 0) { + /* Found one. */ + NFSD_DEBUG(4, "fnd one\n"); + TAILQ_REMOVE(&lhyp->list, lyp, + lay_list); + TAILQ_INSERT_HEAD(&loclyp, lyp, + lay_list); + break; + } + tl += (NFSX_V4DEVICEID / NFSX_UNSIGNED + + NFSM_RNDUP(NFSX_V4PNFSFH) / + NFSX_UNSIGNED + 11 * NFSX_UNSIGNED); + } + } + } + NFSUNLOCKLAYOUT(lhyp); + } + + /* Now, try to do a Layout recall for each one found. */ + TAILQ_FOREACH_SAFE(lyp, &loclyp, lay_list, nlyp) { + NFSD_DEBUG(4, "do layout recall\n"); + /* + * The layout stateid.seqid needs to be incremented + * before doing a LAYOUT_RECALL callback. + * Set lay_trycnt to UINT16_MAX so it won't set up a retry. + */ + if (++lyp->lay_stateid.seqid == 0) + lyp->lay_stateid.seqid = 1; + lyp->lay_trycnt = UINT16_MAX; + nfsrv_recalllayout(lyp->lay_clientid, &lyp->lay_stateid, + &lyp->lay_fh, lyp, &loclyp, lyp->lay_type, p); + nfsrv_freelayout(&loclyp, lyp); + } +} + +/* + * Do a recall callback to the client for this layout. + */ +static int +nfsrv_recalllayout(nfsquad_t clid, nfsv4stateid_t *stateidp, fhandle_t *fhp, + struct nfslayout *lyp, struct nfslayouthead *lyheadp, int laytype, + NFSPROC_T *p) +{ + struct nfsclient *clp; + int error; + + NFSD_DEBUG(4, "nfsrv_recalllayout\n"); + error = nfsrv_getclient(clid, 0, &clp, NULL, (nfsquad_t)((u_quad_t)0), + 0, NULL, p); + NFSD_DEBUG(4, "aft nfsrv_getclient=%d\n", error); + if (error != 0) + return (error); + if ((clp->lc_flags & LCL_NFSV41) != 0) { + error = nfsrv_docallback(clp, NFSV4OP_CBLAYOUTRECALL, + stateidp, 0, fhp, NULL, NULL, laytype, p); + /* If lyp != NULL, handle an error return here. */ + if (error != 0 && lyp != NULL) { + NFSDRECALLLOCK(); + if (error == NFSERR_NOMATCHLAYOUT) { + /* + * Mark it returned, since there is no layout. + */ + if ((lyp->lay_flags & NFSLAY_RECALL) != 0) { + lyp->lay_flags |= NFSLAY_RETURNED; + wakeup(lyp); + } + NFSDRECALLUNLOCK(); + } else if ((lyp->lay_flags & NFSLAY_RETURNED) == 0 && + lyp->lay_trycnt < 10) { + /* + * Clear recall, so it can be tried again + * and put it at the end of the list to + * delay the retry a little longer. + */ + lyp->lay_flags &= ~NFSLAY_RECALL; + lyp->lay_trycnt++; + TAILQ_REMOVE(lyheadp, lyp, lay_list); + TAILQ_INSERT_TAIL(lyheadp, lyp, lay_list); + NFSDRECALLUNLOCK(); + nfs_catnap(PVFS, 0, "nfsrclay"); + } else + NFSDRECALLUNLOCK(); + } + } else + printf("nfsrv_recalllayout: clp not NFSv4.1\n"); + return (error); +} + +/* + * Find a layout to recall when we exceed our high water mark. + */ +void +nfsrv_recalloldlayout(NFSPROC_T *p) +{ + struct nfslayouthash *lhyp; + struct nfslayout *lyp; + nfsquad_t clientid; + nfsv4stateid_t stateid; + fhandle_t fh; + int error, laytype, ret; + + lhyp = &nfslayouthash[arc4random() % nfsrv_layouthashsize]; + NFSLOCKLAYOUT(lhyp); + TAILQ_FOREACH_REVERSE(lyp, &lhyp->list, nfslayouthead, lay_list) { + if ((lyp->lay_flags & NFSLAY_CALLB) == 0) { + lyp->lay_flags |= NFSLAY_CALLB; + /* + * The layout stateid.seqid needs to be incremented + * before doing a LAYOUT_RECALL callback. + */ + if (++lyp->lay_stateid.seqid == 0) + lyp->lay_stateid.seqid = 1; + clientid = lyp->lay_clientid; + stateid = lyp->lay_stateid; + fh = lyp->lay_fh; + laytype = lyp->lay_type; + break; + } + } + NFSUNLOCKLAYOUT(lhyp); + if (lyp != NULL) { + error = nfsrv_recalllayout(clientid, &stateid, &fh, NULL, NULL, + laytype, p); + if (error != 0 && error != NFSERR_NOMATCHLAYOUT) + printf("recallold=%d\n", error); + if (error != 0) { + NFSLOCKLAYOUT(lhyp); + /* + * Since the hash list was unlocked, we need to + * find it again. + */ + ret = nfsrv_findlayout(&clientid, &fh, laytype, p, + &lyp); + if (ret == 0 && + (lyp->lay_flags & NFSLAY_CALLB) != 0 && + lyp->lay_stateid.other[0] == stateid.other[0] && + lyp->lay_stateid.other[1] == stateid.other[1] && + lyp->lay_stateid.other[2] == stateid.other[2]) { + /* + * The client no longer knows this layout, so + * it can be free'd now. + */ + if (error == NFSERR_NOMATCHLAYOUT) + nfsrv_freelayout(&lhyp->list, lyp); + else { + /* + * Leave it to be tried later by + * clearing NFSLAY_CALLB and moving + * it to the head of the list, so it + * won't be tried again for a while. + */ + lyp->lay_flags &= ~NFSLAY_CALLB; + TAILQ_REMOVE(&lhyp->list, lyp, + lay_list); + TAILQ_INSERT_HEAD(&lhyp->list, lyp, + lay_list); + } + } + NFSUNLOCKLAYOUT(lhyp); + } + } +} + +/* + * Try and return layout(s). + */ +int +nfsrv_layoutreturn(struct nfsrv_descript *nd, vnode_t vp, + int layouttype, int iomode, uint64_t offset, uint64_t len, int reclaim, + int kind, nfsv4stateid_t *stateidp, int maxcnt, uint32_t *layp, int *fndp, + struct ucred *cred, NFSPROC_T *p) +{ + struct nfsvattr na; + struct nfslayouthash *lhyp; + struct nfslayout *lyp; + fhandle_t fh; + int error = 0; + + *fndp = 0; + if (kind == NFSV4LAYOUTRET_FILE) { + error = nfsvno_getfh(vp, &fh, p); + if (error == 0) { + error = nfsrv_updatemdsattr(vp, &na, p); + if (error != 0) + printf("nfsrv_layoutreturn: updatemdsattr" + " failed=%d\n", error); + } + if (error == 0) { + if (reclaim == newnfs_true) { + error = nfsrv_checkgrace(NULL, NULL, + NFSLCK_RECLAIM); + if (error != NFSERR_NOGRACE) + error = 0; + return (error); + } + lhyp = NFSLAYOUTHASH(&fh); + NFSDRECALLLOCK(); + NFSLOCKLAYOUT(lhyp); + error = nfsrv_findlayout(&nd->nd_clientid, &fh, + layouttype, p, &lyp); + NFSD_DEBUG(4, "layoutret findlay=%d\n", error); + if (error == 0 && + stateidp->other[0] == lyp->lay_stateid.other[0] && + stateidp->other[1] == lyp->lay_stateid.other[1] && + stateidp->other[2] == lyp->lay_stateid.other[2]) { + NFSD_DEBUG(4, "nfsrv_layoutreturn: stateid %d" + " %x %x %x laystateid %d %x %x %x" + " off=%ju len=%ju flgs=0x%x\n", + stateidp->seqid, stateidp->other[0], + stateidp->other[1], stateidp->other[2], + lyp->lay_stateid.seqid, + lyp->lay_stateid.other[0], + lyp->lay_stateid.other[1], + lyp->lay_stateid.other[2], + (uintmax_t)offset, (uintmax_t)len, + lyp->lay_flags); + if (++lyp->lay_stateid.seqid == 0) + lyp->lay_stateid.seqid = 1; + stateidp->seqid = lyp->lay_stateid.seqid; + if (offset == 0 && len == UINT64_MAX) { + if ((iomode & NFSLAYOUTIOMODE_READ) != + 0) + lyp->lay_flags &= ~NFSLAY_READ; + if ((iomode & NFSLAYOUTIOMODE_RW) != 0) + lyp->lay_flags &= ~NFSLAY_RW; + if ((lyp->lay_flags & (NFSLAY_READ | + NFSLAY_RW)) == 0) + nfsrv_freelayout(&lhyp->list, + lyp); + else + *fndp = 1; + } else + *fndp = 1; + } + NFSUNLOCKLAYOUT(lhyp); + /* Search the nfsrv_recalllist for a match. */ + TAILQ_FOREACH(lyp, &nfsrv_recalllisthead, lay_list) { + if (NFSBCMP(&lyp->lay_fh, &fh, + sizeof(fh)) == 0 && + lyp->lay_clientid.qval == + nd->nd_clientid.qval && + stateidp->other[0] == + lyp->lay_stateid.other[0] && + stateidp->other[1] == + lyp->lay_stateid.other[1] && + stateidp->other[2] == + lyp->lay_stateid.other[2]) { + lyp->lay_flags |= NFSLAY_RETURNED; + wakeup(lyp); + error = 0; + } + } + NFSDRECALLUNLOCK(); + } + if (layouttype == NFSLAYOUT_FLEXFILE) + nfsrv_flexlayouterr(nd, layp, maxcnt, p); + } else if (kind == NFSV4LAYOUTRET_FSID) + nfsrv_freelayouts(&nd->nd_clientid, + &vp->v_mount->mnt_stat.f_fsid, layouttype, iomode); + else if (kind == NFSV4LAYOUTRET_ALL) + nfsrv_freelayouts(&nd->nd_clientid, NULL, layouttype, iomode); + else + error = NFSERR_INVAL; + if (error == -1) + error = 0; + return (error); +} + +/* + * Look for an existing layout. + */ +static int +nfsrv_findlayout(nfsquad_t *clientidp, fhandle_t *fhp, int laytype, + NFSPROC_T *p, struct nfslayout **lypp) +{ + struct nfslayouthash *lhyp; + struct nfslayout *lyp; + int ret; + + *lypp = NULL; + ret = 0; + lhyp = NFSLAYOUTHASH(fhp); + TAILQ_FOREACH(lyp, &lhyp->list, lay_list) { + if (NFSBCMP(&lyp->lay_fh, fhp, sizeof(*fhp)) == 0 && + lyp->lay_clientid.qval == clientidp->qval && + lyp->lay_type == laytype) + break; + } + if (lyp != NULL) + *lypp = lyp; + else + ret = -1; + return (ret); +} + +/* + * Add the new layout, as required. + */ +static int +nfsrv_addlayout(struct nfsrv_descript *nd, struct nfslayout **lypp, + nfsv4stateid_t *stateidp, char *layp, int *layoutlenp, NFSPROC_T *p) +{ + struct nfsclient *clp; + struct nfslayouthash *lhyp; + struct nfslayout *lyp, *nlyp; + fhandle_t *fhp; + int error; + + KASSERT((nd->nd_flag & ND_IMPLIEDCLID) != 0, + ("nfsrv_layoutget: no nd_clientid\n")); + lyp = *lypp; + fhp = &lyp->lay_fh; + NFSLOCKSTATE(); + error = nfsrv_getclient((nfsquad_t)((u_quad_t)0), CLOPS_RENEW, &clp, + NULL, (nfsquad_t)((u_quad_t)0), 0, nd, p); + if (error != 0) { + NFSUNLOCKSTATE(); + return (error); + } + lyp->lay_stateid.seqid = stateidp->seqid = 1; + lyp->lay_stateid.other[0] = stateidp->other[0] = + clp->lc_clientid.lval[0]; + lyp->lay_stateid.other[1] = stateidp->other[1] = + clp->lc_clientid.lval[1]; + lyp->lay_stateid.other[2] = stateidp->other[2] = + nfsrv_nextstateindex(clp); + NFSUNLOCKSTATE(); + + lhyp = NFSLAYOUTHASH(fhp); + NFSLOCKLAYOUT(lhyp); + TAILQ_FOREACH(nlyp, &lhyp->list, lay_list) { + if (NFSBCMP(&nlyp->lay_fh, fhp, sizeof(*fhp)) == 0 && + nlyp->lay_clientid.qval == nd->nd_clientid.qval) + break; + } + if (nlyp != NULL) { + /* A layout already exists, so use it. */ + nlyp->lay_flags |= (lyp->lay_flags & (NFSLAY_READ | NFSLAY_RW)); + NFSBCOPY(nlyp->lay_xdr, layp, nlyp->lay_layoutlen); + *layoutlenp = nlyp->lay_layoutlen; + if (++nlyp->lay_stateid.seqid == 0) + nlyp->lay_stateid.seqid = 1; + stateidp->seqid = nlyp->lay_stateid.seqid; + stateidp->other[0] = nlyp->lay_stateid.other[0]; + stateidp->other[1] = nlyp->lay_stateid.other[1]; + stateidp->other[2] = nlyp->lay_stateid.other[2]; + NFSUNLOCKLAYOUT(lhyp); + return (0); + } + + /* Insert the new layout in the lists. */ + *lypp = NULL; + atomic_add_int(&nfsrv_layoutcnt, 1); + NFSBCOPY(lyp->lay_xdr, layp, lyp->lay_layoutlen); + *layoutlenp = lyp->lay_layoutlen; + TAILQ_INSERT_HEAD(&lhyp->list, lyp, lay_list); + NFSUNLOCKLAYOUT(lhyp); + return (0); +} + +/* + * Get the devinfo for a deviceid. + */ +int +nfsrv_getdevinfo(char *devid, int layouttype, uint32_t *maxcnt, + uint32_t *notify, int *devaddrlen, char **devaddr) +{ + struct nfsdevice *ds; + + if ((layouttype != NFSLAYOUT_NFSV4_1_FILES && layouttype != + NFSLAYOUT_FLEXFILE) || + (nfsrv_maxpnfsmirror > 1 && layouttype == NFSLAYOUT_NFSV4_1_FILES)) + return (NFSERR_UNKNLAYOUTTYPE); + + /* + * Now, search for the device id. Note that the structures won't go + * away, but the order changes in the list. As such, the lock only + * needs to be held during the search through the list. + */ + NFSDDSLOCK(); + TAILQ_FOREACH(ds, &nfsrv_devidhead, nfsdev_list) { + if (NFSBCMP(devid, ds->nfsdev_deviceid, NFSX_V4DEVICEID) == 0 && + ds->nfsdev_nmp != NULL) + break; + } + NFSDDSUNLOCK(); + if (ds == NULL) + return (NFSERR_NOENT); + + /* If the correct nfsdev_XXXXaddrlen is > 0, we have the device info. */ + *devaddrlen = 0; + if (layouttype == NFSLAYOUT_NFSV4_1_FILES) { + *devaddrlen = ds->nfsdev_fileaddrlen; + *devaddr = ds->nfsdev_fileaddr; + } else if (layouttype == NFSLAYOUT_FLEXFILE) { + *devaddrlen = ds->nfsdev_flexaddrlen; + *devaddr = ds->nfsdev_flexaddr; + } + if (*devaddrlen == 0) + return (NFSERR_UNKNLAYOUTTYPE); + + /* + * The XDR overhead is 3 unsigned values: layout_type, + * length_of_address and notify bitmap. + * If the notify array is changed to not all zeros, the + * count of unsigned values must be increased. + */ + if (*maxcnt > 0 && *maxcnt < NFSM_RNDUP(*devaddrlen) + + 3 * NFSX_UNSIGNED) { + *maxcnt = NFSM_RNDUP(*devaddrlen) + 3 * NFSX_UNSIGNED; + return (NFSERR_TOOSMALL); + } + return (0); +} + +/* + * Free a list of layout state structures. + */ +static void +nfsrv_freelayoutlist(nfsquad_t clientid) +{ + struct nfslayouthash *lhyp; + struct nfslayout *lyp, *nlyp; + int i; + + for (i = 0; i < nfsrv_layouthashsize; i++) { + lhyp = &nfslayouthash[i]; + NFSLOCKLAYOUT(lhyp); + TAILQ_FOREACH_SAFE(lyp, &lhyp->list, lay_list, nlyp) { + if (lyp->lay_clientid.qval == clientid.qval) + nfsrv_freelayout(&lhyp->list, lyp); + } + NFSUNLOCKLAYOUT(lhyp); + } +} + +/* + * Free up a layout. + */ +static void +nfsrv_freelayout(struct nfslayouthead *lhp, struct nfslayout *lyp) +{ + + NFSD_DEBUG(4, "Freelayout=%p\n", lyp); + atomic_add_int(&nfsrv_layoutcnt, -1); + TAILQ_REMOVE(lhp, lyp, lay_list); + free(lyp, M_NFSDSTATE); +} + +/* + * Free up a device id. + */ +void +nfsrv_freeonedevid(struct nfsdevice *ds) +{ + int i; + + atomic_add_int(&nfsrv_devidcnt, -1); + vrele(ds->nfsdev_dvp); + for (i = 0; i < nfsrv_dsdirsize; i++) + if (ds->nfsdev_dsdir[i] != NULL) + vrele(ds->nfsdev_dsdir[i]); + free(ds->nfsdev_fileaddr, M_NFSDSTATE); + free(ds->nfsdev_flexaddr, M_NFSDSTATE); + free(ds->nfsdev_host, M_NFSDSTATE); + free(ds, M_NFSDSTATE); +} + +/* + * Free up a device id and its mirrors. + */ +static void +nfsrv_freedevid(struct nfsdevice *ds) +{ + + TAILQ_REMOVE(&nfsrv_devidhead, ds, nfsdev_list); + nfsrv_freeonedevid(ds); +} + +/* + * Free all layouts and device ids. + * Done when the nfsd threads are shut down since there may be a new + * modified device id list created when the nfsd is restarted. + */ +void +nfsrv_freealllayoutsanddevids(void) +{ + struct nfsdontlist *mrp, *nmrp; + struct nfslayout *lyp, *nlyp; + + /* Get rid of the deviceid structures. */ + nfsrv_freealldevids(); + TAILQ_INIT(&nfsrv_devidhead); + nfsrv_devidcnt = 0; + + /* Get rid of all layouts. */ + nfsrv_freealllayouts(); + + /* Get rid of any nfsdontlist entries. */ + LIST_FOREACH_SAFE(mrp, &nfsrv_dontlisthead, nfsmr_list, nmrp) + free(mrp, M_NFSDSTATE); + LIST_INIT(&nfsrv_dontlisthead); + nfsrv_dontlistlen = 0; + + /* Free layouts in the recall list. */ + TAILQ_FOREACH_SAFE(lyp, &nfsrv_recalllisthead, lay_list, nlyp) + nfsrv_freelayout(&nfsrv_recalllisthead, lyp); + TAILQ_INIT(&nfsrv_recalllisthead); +} + +/* + * Free layouts that match the arguments. + */ +static void +nfsrv_freelayouts(nfsquad_t *clid, fsid_t *fs, int laytype, int iomode) +{ + struct nfslayouthash *lhyp; + struct nfslayout *lyp, *nlyp; + int i; + + for (i = 0; i < nfsrv_layouthashsize; i++) { + lhyp = &nfslayouthash[i]; + NFSLOCKLAYOUT(lhyp); + TAILQ_FOREACH_SAFE(lyp, &lhyp->list, lay_list, nlyp) { + if (clid->qval != lyp->lay_clientid.qval) + continue; + if (fs != NULL && (fs->val[0] != lyp->lay_fsid.val[0] || + fs->val[1] != lyp->lay_fsid.val[1])) + continue; + if (laytype != lyp->lay_type) + continue; + if ((iomode & NFSLAYOUTIOMODE_READ) != 0) + lyp->lay_flags &= ~NFSLAY_READ; + if ((iomode & NFSLAYOUTIOMODE_RW) != 0) + lyp->lay_flags &= ~NFSLAY_RW; + if ((lyp->lay_flags & (NFSLAY_READ | NFSLAY_RW)) == 0) + nfsrv_freelayout(&lhyp->list, lyp); + } + NFSUNLOCKLAYOUT(lhyp); + } +} + +/* + * Free all layouts for the argument file. + */ +void +nfsrv_freefilelayouts(fhandle_t *fhp) +{ + struct nfslayouthash *lhyp; + struct nfslayout *lyp, *nlyp; + + lhyp = NFSLAYOUTHASH(fhp); + NFSLOCKLAYOUT(lhyp); + TAILQ_FOREACH_SAFE(lyp, &lhyp->list, lay_list, nlyp) { + if (NFSBCMP(&lyp->lay_fh, fhp, sizeof(*fhp)) == 0) + nfsrv_freelayout(&lhyp->list, lyp); + } + NFSUNLOCKLAYOUT(lhyp); +} + +/* + * Free all layouts. + */ +static void +nfsrv_freealllayouts(void) +{ + struct nfslayouthash *lhyp; + struct nfslayout *lyp, *nlyp; + int i; + + for (i = 0; i < nfsrv_layouthashsize; i++) { + lhyp = &nfslayouthash[i]; + NFSLOCKLAYOUT(lhyp); + TAILQ_FOREACH_SAFE(lyp, &lhyp->list, lay_list, nlyp) + nfsrv_freelayout(&lhyp->list, lyp); + NFSUNLOCKLAYOUT(lhyp); + } +} + +/* + * Look up the mount path for the DS server. + */ +static int +nfsrv_setdsserver(char *dspathp, NFSPROC_T *p, struct nfsdevice **dsp) +{ + struct nameidata nd; + struct nfsdevice *ds; + int error, i; + char *dsdirpath; + size_t dsdirsize; + + NFSD_DEBUG(4, "setdssrv path=%s\n", dspathp); + *dsp = NULL; + NDINIT(&nd, LOOKUP, FOLLOW | LOCKSHARED | LOCKLEAF, UIO_SYSSPACE, + dspathp, p); + error = namei(&nd); + NFSD_DEBUG(4, "lookup=%d\n", error); + if (error != 0) + return (error); + if (nd.ni_vp->v_type != VDIR) { + vput(nd.ni_vp); + NFSD_DEBUG(4, "dspath not dir\n"); + return (ENOTDIR); + } + if (strcmp(nd.ni_vp->v_mount->mnt_vfc->vfc_name, "nfs") != 0) { + vput(nd.ni_vp); + NFSD_DEBUG(4, "dspath not an NFS mount\n"); + return (ENXIO); + } + + /* + * Allocate a DS server structure with the NFS mounted directory + * vnode reference counted, so that a non-forced dismount will + * fail with EBUSY. + */ + *dsp = ds = malloc(sizeof(*ds) + nfsrv_dsdirsize * sizeof(vnode_t), + M_NFSDSTATE, M_WAITOK | M_ZERO); + ds->nfsdev_dvp = nd.ni_vp; + ds->nfsdev_nmp = VFSTONFS(nd.ni_vp->v_mount); + NFSVOPUNLOCK(nd.ni_vp, 0); + + dsdirsize = strlen(dspathp) + 16; + dsdirpath = malloc(dsdirsize, M_TEMP, M_WAITOK); + /* Now, create the DS directory structures. */ + for (i = 0; i < nfsrv_dsdirsize; i++) { + snprintf(dsdirpath, dsdirsize, "%s/ds%d", dspathp, i); + NDINIT(&nd, LOOKUP, FOLLOW | LOCKSHARED | LOCKLEAF, + UIO_SYSSPACE, dsdirpath, p); + error = namei(&nd); + NFSD_DEBUG(4, "dsdirpath=%s lookup=%d\n", dsdirpath, error); + if (error != 0) + break; + if (nd.ni_vp->v_type != VDIR) { + vput(nd.ni_vp); + error = ENOTDIR; + NFSD_DEBUG(4, "dsdirpath not a VDIR\n"); + break; + } + if (strcmp(nd.ni_vp->v_mount->mnt_vfc->vfc_name, "nfs") != 0) { + vput(nd.ni_vp); + error = ENXIO; + NFSD_DEBUG(4, "dsdirpath not an NFS mount\n"); + break; + } + ds->nfsdev_dsdir[i] = nd.ni_vp; + NFSVOPUNLOCK(nd.ni_vp, 0); + } + free(dsdirpath, M_TEMP); + + TAILQ_INSERT_TAIL(&nfsrv_devidhead, ds, nfsdev_list); + atomic_add_int(&nfsrv_devidcnt, 1); + return (error); +} + +/* + * Look up the mount path for the DS server and delete it. + */ +int +nfsrv_deldsserver(char *dspathp, NFSPROC_T *p) +{ + struct mount *mp; + struct nfsmount *nmp; + struct nfsdevice *ds; + int error; + + NFSD_DEBUG(4, "deldssrv path=%s\n", dspathp); + /* + * Search for the path in the mount list. Avoid looking the path + * up, since this mount point may be hung, with associated locked + * vnodes, etc. + * Set NFSMNTP_CANCELRPCS so that any forced dismount will be blocked + * until this completes. + * As noted in the man page, this should be done before any forced + * dismount on the mount point, but at least the handshake on + * NFSMNTP_CANCELRPCS should make it safe. + */ + error = 0; + ds = NULL; + nmp = NULL; + mtx_lock(&mountlist_mtx); + TAILQ_FOREACH(mp, &mountlist, mnt_list) { + if (strcmp(mp->mnt_stat.f_mntonname, dspathp) == 0 && + strcmp(mp->mnt_stat.f_fstypename, "nfs") == 0 && + mp->mnt_data != NULL) { + nmp = VFSTONFS(mp); + NFSLOCKMNT(nmp); + if ((nmp->nm_privflag & (NFSMNTP_FORCEDISM | + NFSMNTP_CANCELRPCS)) == 0) { + nmp->nm_privflag |= NFSMNTP_CANCELRPCS; + NFSUNLOCKMNT(nmp); + } else { + NFSUNLOCKMNT(nmp); + nmp = NULL; + } + break; + } + } + mtx_unlock(&mountlist_mtx); + + if (nmp != NULL) { + ds = nfsrv_deldsnmp(nmp, p); + NFSD_DEBUG(4, "deldsnmp=%p\n", ds); + if (ds != NULL) { + nfsrv_killrpcs(nmp); + NFSD_DEBUG(4, "aft killrpcs\n"); + } else + error = ENXIO; + NFSLOCKMNT(nmp); + nmp->nm_privflag &= ~NFSMNTP_CANCELRPCS; + wakeup(nmp); + NFSUNLOCKMNT(nmp); + } else + error = EINVAL; + return (error); +} + +/* + * Search for and remove a DS entry which matches the "nmp" argument. + * The nfsdevice structure pointer is returned so that the caller can + * free it via nfsrv_freeonedevid(). + */ +struct nfsdevice * +nfsrv_deldsnmp(struct nfsmount *nmp, NFSPROC_T *p) +{ + struct nfsdevice *fndds; + + NFSD_DEBUG(4, "deldsdvp\n"); + NFSDDSLOCK(); + fndds = nfsv4_findmirror(nmp); + if (fndds != NULL) + nfsrv_deleteds(fndds); + NFSDDSUNLOCK(); + if (fndds != NULL) { + nfsrv_flexmirrordel(fndds->nfsdev_deviceid, p); + printf("pNFS server: mirror %s failed\n", fndds->nfsdev_host); + } + return (fndds); +} + +/* + * Similar to nfsrv_deldsnmp(), except that the DS is indicated by deviceid. + * This function also calls nfsrv_killrpcs() to unblock RPCs on the mount + * point. + * Also, returns an error instead of the nfsdevice found. + */ +static int +nfsrv_delds(char *devid, NFSPROC_T *p) +{ + struct nfsdevice *ds, *fndds; + struct nfsmount *nmp; + int fndmirror; + + NFSD_DEBUG(4, "delds\n"); + /* + * Search the DS server list for a match with devid. + * Remove the DS entry if found and there is a mirror. + */ + fndds = NULL; + nmp = NULL; + fndmirror = 0; + NFSDDSLOCK(); + TAILQ_FOREACH(ds, &nfsrv_devidhead, nfsdev_list) { + if (NFSBCMP(ds->nfsdev_deviceid, devid, NFSX_V4DEVICEID) == 0 && + ds->nfsdev_nmp != NULL) { + NFSD_DEBUG(4, "fnd main ds\n"); + fndds = ds; + } else if (ds->nfsdev_nmp != NULL) + fndmirror = 1; + if (fndds != NULL && fndmirror != 0) + break; + } + if (fndds != NULL && fndmirror != 0) { + nmp = fndds->nfsdev_nmp; + NFSLOCKMNT(nmp); + if ((nmp->nm_privflag & (NFSMNTP_FORCEDISM | + NFSMNTP_CANCELRPCS)) == 0) { + nmp->nm_privflag |= NFSMNTP_CANCELRPCS; + NFSUNLOCKMNT(nmp); + nfsrv_deleteds(fndds); + } else { + NFSUNLOCKMNT(nmp); + nmp = NULL; + } + } + NFSDDSUNLOCK(); + if (fndds != NULL && nmp != NULL) { + nfsrv_flexmirrordel(fndds->nfsdev_deviceid, p); + printf("pNFS server: mirror %s failed\n", fndds->nfsdev_host); + nfsrv_killrpcs(nmp); + NFSLOCKMNT(nmp); + nmp->nm_privflag &= ~NFSMNTP_CANCELRPCS; + wakeup(nmp); + NFSUNLOCKMNT(nmp); + return (0); + } + return (ENXIO); +} + +/* + * Mark a DS as disabled by setting nfsdev_nmp = NULL. + */ +static void +nfsrv_deleteds(struct nfsdevice *fndds) +{ + + NFSD_DEBUG(4, "deleteds: deleting a mirror\n"); + fndds->nfsdev_nmp = NULL; +} + +/* + * Fill in the addr structures for the File and Flex File layouts. + */ +static void +nfsrv_allocdevid(struct nfsdevice *ds, char *addr, char *dnshost) +{ + uint32_t *tl; + char *netprot; + int addrlen; + static uint64_t new_devid = 0; + + if (strchr(addr, ':') != NULL) + netprot = "tcp6"; + else + netprot = "tcp"; + + /* Fill in the device id. */ + NFSBCOPY(&nfsdev_time, ds->nfsdev_deviceid, sizeof(nfsdev_time)); + new_devid++; + NFSBCOPY(&new_devid, &ds->nfsdev_deviceid[sizeof(nfsdev_time)], + sizeof(new_devid)); + + /* + * Fill in the file addr (actually the nfsv4_file_layout_ds_addr4 + * as defined in RFC5661) in XDR. + */ + addrlen = NFSM_RNDUP(strlen(addr)) + NFSM_RNDUP(strlen(netprot)) + + 6 * NFSX_UNSIGNED; + NFSD_DEBUG(4, "hn=%s addr=%s netprot=%s\n", dnshost, addr, netprot); + ds->nfsdev_fileaddrlen = addrlen; + tl = malloc(addrlen, M_NFSDSTATE, M_WAITOK | M_ZERO); + ds->nfsdev_fileaddr = (char *)tl; + *tl++ = txdr_unsigned(1); /* One stripe with index 0. */ + *tl++ = 0; + *tl++ = txdr_unsigned(1); /* One multipath list */ + *tl++ = txdr_unsigned(1); /* with one entry in it. */ + /* The netaddr for this one entry. */ + *tl++ = txdr_unsigned(strlen(netprot)); + NFSBCOPY(netprot, tl, strlen(netprot)); + tl += (NFSM_RNDUP(strlen(netprot)) / NFSX_UNSIGNED); + *tl++ = txdr_unsigned(strlen(addr)); + NFSBCOPY(addr, tl, strlen(addr)); + + /* + * Fill in the flex file addr (actually the ff_device_addr4 + * as defined for Flexible File Layout) in XDR. + */ + addrlen = NFSM_RNDUP(strlen(addr)) + NFSM_RNDUP(strlen(netprot)) + + 9 * NFSX_UNSIGNED; + ds->nfsdev_flexaddrlen = addrlen; + tl = malloc(addrlen, M_NFSDSTATE, M_WAITOK | M_ZERO); + ds->nfsdev_flexaddr = (char *)tl; + *tl++ = txdr_unsigned(1); /* One multipath entry. */ + /* The netaddr for this one entry. */ + *tl++ = txdr_unsigned(strlen(netprot)); + NFSBCOPY(netprot, tl, strlen(netprot)); + tl += (NFSM_RNDUP(strlen(netprot)) / NFSX_UNSIGNED); + *tl++ = txdr_unsigned(strlen(addr)); + NFSBCOPY(addr, tl, strlen(addr)); + tl += (NFSM_RNDUP(strlen(addr)) / NFSX_UNSIGNED); + *tl++ = txdr_unsigned(1); /* One NFS Version. */ + *tl++ = txdr_unsigned(NFS_VER4); /* NFSv4. */ + *tl++ = txdr_unsigned(NFSV41_MINORVERSION); /* Minor version 1. */ + *tl++ = txdr_unsigned(NFS_SRVMAXIO); /* DS max rsize. */ + *tl++ = txdr_unsigned(NFS_SRVMAXIO); /* DS max wsize. */ + *tl = newnfs_true; /* Tightly coupled. */ + + ds->nfsdev_hostnamelen = strlen(dnshost); + ds->nfsdev_host = malloc(ds->nfsdev_hostnamelen + 1, M_NFSDSTATE, + M_WAITOK); + NFSBCOPY(dnshost, ds->nfsdev_host, ds->nfsdev_hostnamelen + 1); +} + + +/* + * Create the device id list. + * Return 0 if the nfsd threads are to run and ENXIO if the "-p" argument + * is misconfigured. + */ +int +nfsrv_createdevids(struct nfsd_nfsd_args *args, NFSPROC_T *p) +{ + struct nfsdevice *ds; + char *addrp, *dnshostp, *dspathp; + int error, i; + + addrp = args->addr; + dnshostp = args->dnshost; + dspathp = args->dspath; + nfsrv_maxpnfsmirror = args->mirrorcnt; + if (addrp == NULL || dnshostp == NULL || dspathp == NULL) + return (0); + + /* + * Loop around for each nul-terminated string in args->addr, + * args->dnshost and args->dnspath. + */ + while (addrp < (args->addr + args->addrlen) && + dnshostp < (args->dnshost + args->dnshostlen) && + dspathp < (args->dspath + args->dspathlen)) { + error = nfsrv_setdsserver(dspathp, p, &ds); + if (error != 0) { + /* Free all DS servers. */ + nfsrv_freealldevids(); + nfsrv_devidcnt = 0; + return (ENXIO); + } + nfsrv_allocdevid(ds, addrp, dnshostp); + addrp += (strlen(addrp) + 1); + dnshostp += (strlen(dnshostp) + 1); + dspathp += (strlen(dspathp) + 1); + } + if (nfsrv_devidcnt < nfsrv_maxpnfsmirror) { + /* Free all DS servers. */ + nfsrv_freealldevids(); + nfsrv_devidcnt = 0; + nfsrv_maxpnfsmirror = 1; + return (ENXIO); + } + + /* + * Allocate the nfslayout hash table now, since this is a pNFS server. + * Make it 1% of the high water mark and at least 100. + */ + if (nfslayouthash == NULL) { + nfsrv_layouthashsize = nfsrv_layouthighwater / 100; + if (nfsrv_layouthashsize < 100) + nfsrv_layouthashsize = 100; + nfslayouthash = mallocarray(nfsrv_layouthashsize, + sizeof(struct nfslayouthash), M_NFSDSESSION, M_WAITOK | + M_ZERO); + for (i = 0; i < nfsrv_layouthashsize; i++) { + mtx_init(&nfslayouthash[i].mtx, "nfslm", NULL, MTX_DEF); + TAILQ_INIT(&nfslayouthash[i].list); + } + } + return (0); +} + +/* + * Free all device ids. + */ +static void +nfsrv_freealldevids(void) +{ + struct nfsdevice *ds, *nds; + + TAILQ_FOREACH_SAFE(ds, &nfsrv_devidhead, nfsdev_list, nds) + nfsrv_freedevid(ds); +} + +/* + * Check to see if there is a Read/Write Layout plus either: + * - A Write Delegation + * or + * - An Open with Write_access. + * Return 1 if this is the case and 0 otherwise. + * This function is used by nfsrv_proxyds() to decide if doing a Proxy + * Getattr RPC to the Data Server (DS) is necessary. + */ +#define NFSCLIDVECSIZE 6 +APPLESTATIC int +nfsrv_checkdsattr(struct nfsrv_descript *nd, vnode_t vp, NFSPROC_T *p) +{ + fhandle_t fh, *tfhp; + struct nfsstate *stp; + struct nfslayout *lyp; + struct nfslayouthash *lhyp; + struct nfslockhashhead *hp; + struct nfslockfile *lfp; + nfsquad_t clid[NFSCLIDVECSIZE]; + int clidcnt, ret; + + ret = nfsvno_getfh(vp, &fh, p); + if (ret != 0) + return (0); + + /* First check for a Read/Write Layout. */ + clidcnt = 0; + lhyp = NFSLAYOUTHASH(&fh); + NFSLOCKLAYOUT(lhyp); + TAILQ_FOREACH(lyp, &lhyp->list, lay_list) { + if (NFSBCMP(&lyp->lay_fh, &fh, sizeof(fh)) == 0 && + ((lyp->lay_flags & NFSLAY_RW) != 0 || + ((lyp->lay_flags & NFSLAY_READ) != 0 && + nfsrv_pnfsatime != 0))) { + if (clidcnt < NFSCLIDVECSIZE) + clid[clidcnt].qval = lyp->lay_clientid.qval; + clidcnt++; + } + } + NFSUNLOCKLAYOUT(lhyp); + if (clidcnt == 0) { + /* None found, so return 0. */ + return (0); + } + + /* Get the nfslockfile for this fh. */ + NFSLOCKSTATE(); + hp = NFSLOCKHASH(&fh); + LIST_FOREACH(lfp, hp, lf_hash) { + tfhp = &lfp->lf_fh; + if (NFSVNO_CMPFH(&fh, tfhp)) + break; + } + if (lfp == NULL) { + /* None found, so return 0. */ + NFSUNLOCKSTATE(); + return (0); + } + + /* Now, look for a Write delegation for this clientid. */ + LIST_FOREACH(stp, &lfp->lf_deleg, ls_file) { + if ((stp->ls_flags & NFSLCK_DELEGWRITE) != 0 && + nfsrv_fndclid(clid, stp->ls_clp->lc_clientid, clidcnt) != 0) + break; + } + if (stp != NULL) { + /* Found one, so return 1. */ + NFSUNLOCKSTATE(); + return (1); + } + + /* No Write delegation, so look for an Open with Write_access. */ + LIST_FOREACH(stp, &lfp->lf_open, ls_file) { + KASSERT((stp->ls_flags & NFSLCK_OPEN) != 0, + ("nfsrv_checkdsattr: Non-open in Open list\n")); + if ((stp->ls_flags & NFSLCK_WRITEACCESS) != 0 && + nfsrv_fndclid(clid, stp->ls_clp->lc_clientid, clidcnt) != 0) + break; + } + NFSUNLOCKSTATE(); + if (stp != NULL) + return (1); + return (0); +} + +/* + * Look for a matching clientid in the vector. Return 1 if one might match. + */ +static int +nfsrv_fndclid(nfsquad_t *clidvec, nfsquad_t clid, int clidcnt) +{ + int i; + + /* If too many for the vector, return 1 since there might be a match. */ + if (clidcnt > NFSCLIDVECSIZE) + return (1); + + for (i = 0; i < clidcnt; i++) + if (clidvec[i].qval == clid.qval) + return (1); + return (0); +} + +/* + * Check the don't list for "vp" and see if issuing an rw layout is allowed. + * Return 1 if issuing an rw layout isn't allowed, 0 otherwise. + */ +static int +nfsrv_dontlayout(fhandle_t *fhp) +{ + struct nfsdontlist *mrp; + int ret; + + if (nfsrv_dontlistlen == 0) + return (0); + ret = 0; + NFSDDONTLISTLOCK(); + LIST_FOREACH(mrp, &nfsrv_dontlisthead, nfsmr_list) { + if (NFSBCMP(fhp, &mrp->nfsmr_fh, sizeof(*fhp)) == 0 && + (mrp->nfsmr_flags & NFSMR_DONTLAYOUT) != 0) { + ret = 1; + break; + } + } + NFSDDONTLISTUNLOCK(); + return (ret); +} + +#define PNFSDS_COPYSIZ 65536 +/* + * Create a new file on a DS and copy the contents of an extant DS file to it. + * This can be used for recovery of a DS file onto a recovered DS. + * The steps are: + * - When called, the MDS file's vnode is locked, blocking LayoutGet operations. + * - Disable issuing of read/write layouts for the file via the nfsdontlist, + * so that they will be disabled after the MDS file's vnode is unlocked. + * - Set up the nfsrv_recalllist so that recall of read/write layouts can + * be done. + * - Unlock the MDS file's vnode, so that the client(s) can perform proxied + * writes, LayoutCommits and LayoutReturns for the file when completing the + * LayoutReturn requested by the LayoutRecall callback. + * - Issue a LayoutRecall callback for all read/write layouts and wait for + * them to be returned. (If the LayoutRecall callback replies + * NFSERR_NOMATCHLAYOUT, they are gone and no LayoutReturn is needed.) + * - Exclusively lock the MDS file's vnode. This ensures that no proxied + * writes are in progress or can occur during the DS file copy. + * It also blocks Setattr operations. + * - Create the file on the recovered mirror. + * - Copy the file from the operational DS. + * - Copy any ACL from the MDS file to the new DS file. + * - Set the modify time of the new DS file to that of the MDS file. + * - Update the extended attribute for the MDS file. + * - Enable issuing of rw layouts by deleting the nfsdontlist entry. + * - The caller will unlock the MDS file's vnode allowing operations + * to continue normally, since it is now on the mirror again. + */ +int +nfsrv_copymr(vnode_t vp, vnode_t fvp, vnode_t dvp, struct nfsdevice *ds, + struct pnfsdsfile *pf, struct pnfsdsfile *wpf, int mirrorcnt, + struct ucred *cred, NFSPROC_T *p) +{ + struct nfsdontlist *mrp, *nmrp; + struct nfslayouthash *lhyp; + struct nfslayout *lyp, *nlyp; + struct nfslayouthead thl; + struct mount *mp; + struct acl *aclp; + struct vattr va; + struct timespec mtime; + fhandle_t fh; + vnode_t tvp; + off_t rdpos, wrpos; + ssize_t aresid; + char *dat; + int didprintf, ret, retacl, xfer; + + ASSERT_VOP_LOCKED(fvp, "nfsrv_copymr fvp"); + ASSERT_VOP_LOCKED(vp, "nfsrv_copymr vp"); + /* + * Allocate a nfsdontlist entry and set the NFSMR_DONTLAYOUT flag + * so that no more RW layouts will get issued. + */ + ret = nfsvno_getfh(vp, &fh, p); + if (ret != 0) { + NFSD_DEBUG(4, "nfsrv_copymr: getfh=%d\n", ret); + return (ret); + } + nmrp = malloc(sizeof(*nmrp), M_NFSDSTATE, M_WAITOK); + nmrp->nfsmr_flags = NFSMR_DONTLAYOUT; + NFSBCOPY(&fh, &nmrp->nfsmr_fh, sizeof(fh)); + NFSDDONTLISTLOCK(); + LIST_FOREACH(mrp, &nfsrv_dontlisthead, nfsmr_list) { + if (NFSBCMP(&fh, &mrp->nfsmr_fh, sizeof(fh)) == 0) + break; + } + if (mrp == NULL) { + LIST_INSERT_HEAD(&nfsrv_dontlisthead, nmrp, nfsmr_list); + mrp = nmrp; + nmrp = NULL; + nfsrv_dontlistlen++; + NFSD_DEBUG(4, "nfsrv_copymr: in dontlist\n"); + } else { + NFSDDONTLISTUNLOCK(); + free(nmrp, M_NFSDSTATE); + NFSD_DEBUG(4, "nfsrv_copymr: dup dontlist\n"); + return (ENXIO); + } + NFSDDONTLISTUNLOCK(); + + /* + * Search for all RW layouts for this file. Move them to the + * recall list, so they can be recalled and their return noted. + */ + lhyp = NFSLAYOUTHASH(&fh); + NFSDRECALLLOCK(); + NFSLOCKLAYOUT(lhyp); + TAILQ_FOREACH_SAFE(lyp, &lhyp->list, lay_list, nlyp) { + if (NFSBCMP(&lyp->lay_fh, &fh, sizeof(fh)) == 0 && + (lyp->lay_flags & NFSLAY_RW) != 0) { + TAILQ_REMOVE(&lhyp->list, lyp, lay_list); + TAILQ_INSERT_HEAD(&nfsrv_recalllisthead, lyp, lay_list); + lyp->lay_trycnt = 0; + } + } + NFSUNLOCKLAYOUT(lhyp); + NFSDRECALLUNLOCK(); + + ret = 0; + didprintf = 0; + TAILQ_INIT(&thl); + /* Unlock the MDS vp, so that a LayoutReturn can be done on it. */ + NFSVOPUNLOCK(vp, 0); + /* Now, do a recall for all layouts not yet recalled. */ +tryagain: + NFSDRECALLLOCK(); + TAILQ_FOREACH(lyp, &nfsrv_recalllisthead, lay_list) { + if (NFSBCMP(&lyp->lay_fh, &fh, sizeof(fh)) == 0 && + (lyp->lay_flags & NFSLAY_RECALL) == 0) { + lyp->lay_flags |= NFSLAY_RECALL; + /* + * The layout stateid.seqid needs to be incremented + * before doing a LAYOUT_RECALL callback. + */ + if (++lyp->lay_stateid.seqid == 0) + lyp->lay_stateid.seqid = 1; + NFSDRECALLUNLOCK(); + nfsrv_recalllayout(lyp->lay_clientid, &lyp->lay_stateid, + &lyp->lay_fh, lyp, &nfsrv_recalllisthead, + lyp->lay_type, p); + NFSD_DEBUG(4, "nfsrv_copymr: recalled layout\n"); + goto tryagain; + } + } + + /* Now wait for them to be returned. */ +tryagain2: + TAILQ_FOREACH(lyp, &nfsrv_recalllisthead, lay_list) { + if (NFSBCMP(&lyp->lay_fh, &fh, sizeof(fh)) == 0) { + if ((lyp->lay_flags & NFSLAY_RETURNED) != 0) { + TAILQ_REMOVE(&nfsrv_recalllisthead, lyp, + lay_list); + TAILQ_INSERT_HEAD(&thl, lyp, lay_list); + NFSD_DEBUG(4, + "nfsrv_copymr: layout returned\n"); + } else { + ret = mtx_sleep(lyp, NFSDRECALLMUTEXPTR, + PVFS | PCATCH, "nfsmrl", hz); + NFSD_DEBUG(4, "nfsrv_copymr: aft sleep=%d\n", + ret); + if (ret == EINTR || ret == ERESTART) + break; + if ((lyp->lay_flags & NFSLAY_RETURNED) == 0 && + didprintf == 0) { + printf("nfsrv_copymr: layout not " + "returned\n"); + didprintf = 1; + } + } + goto tryagain2; + } + } + NFSDRECALLUNLOCK(); + /* We can now get rid of the layouts that have been returned. */ + TAILQ_FOREACH_SAFE(lyp, &thl, lay_list, nlyp) + nfsrv_freelayout(&thl, lyp); + + /* + * LK_EXCLUSIVE lock the MDS vnode, so that any + * proxied writes through the MDS will be blocked until we have + * completed the copy and update of the extended attributes. + * This will also ensure that any attributes and ACL will not be + * changed until the copy is complete. + */ + NFSVOPLOCK(vp, LK_EXCLUSIVE | LK_RETRY); + if ((vp->v_iflag & VI_DOOMED) != 0) { + NFSD_DEBUG(4, "nfsrv_copymr: lk_exclusive doomed\n"); + ret = ESTALE; + } + + /* Create the data file on the recovered DS. */ + if (ret == 0) + ret = nfsrv_createdsfile(vp, &fh, pf, dvp, ds, cred, p, &tvp); + + /* Copy the DS file, if created successfully. */ + if (ret == 0) { + /* + * Get any NFSv4 ACL on the MDS file, so that it can be set + * on the new DS file. + */ + aclp = acl_alloc(M_WAITOK | M_ZERO); + retacl = VOP_GETACL(vp, ACL_TYPE_NFS4, aclp, cred, p); + if (retacl != 0 && retacl != ENOATTR) + NFSD_DEBUG(1, "nfsrv_copymr: vop_getacl=%d\n", retacl); + dat = malloc(PNFSDS_COPYSIZ, M_TEMP, M_WAITOK); + rdpos = wrpos = 0; + mp = NULL; + ret = vn_start_write(tvp, &mp, V_WAIT | PCATCH); + aresid = 0; + while (ret == 0 && aresid == 0) { + ret = vn_rdwr(UIO_READ, fvp, dat, PNFSDS_COPYSIZ, + rdpos, UIO_SYSSPACE, IO_NODELOCKED, cred, NULL, + &aresid, p); + xfer = PNFSDS_COPYSIZ - aresid; + if (ret == 0 && xfer > 0) { + rdpos += xfer; + ret = vn_rdwr(UIO_WRITE, tvp, dat, xfer, + wrpos, UIO_SYSSPACE, IO_NODELOCKED, + cred, NULL, NULL, p); + if (ret == 0) + wrpos += xfer; + } + } + + /* If there is an ACL and the copy succeeded, set the ACL. */ + if (ret == 0 && retacl == 0) { + ret = VOP_SETACL(tvp, ACL_TYPE_NFS4, aclp, cred, p); + /* + * Don't consider these as errors, since VOP_GETACL() + * can return an ACL when they are not actually + * supported. For example, for UFS, VOP_GETACL() + * will return a trivial ACL based on the uid/gid/mode + * when there is no ACL on the file. + * This case should be recognized as a trivial ACL + * by UFS's VOP_SETACL() and succeed, but... + */ + if (ret == ENOATTR || ret == EOPNOTSUPP || ret == EPERM) + ret = 0; + } + + if (mp != NULL) + vn_finished_write(mp); + if (ret == 0) + ret = VOP_FSYNC(tvp, MNT_WAIT, p); + + /* Set the DS data file's modify time that of the MDS file. */ + if (ret == 0) + ret = VOP_GETATTR(vp, &va, cred); + if (ret == 0) { + mtime = va.va_mtime; + VATTR_NULL(&va); + va.va_mtime = mtime; + ret = VOP_SETATTR(tvp, &va, cred); + } + + vput(tvp); + acl_free(aclp); + free(dat, M_TEMP); + } + + /* Update the extended attributes for the newly created DS file. */ + if (ret == 0) { + mp = NULL; + ret = vn_start_write(vp, &mp, V_WAIT | PCATCH); + if (ret == 0) + ret = vn_extattr_set(vp, IO_NODELOCKED, + EXTATTR_NAMESPACE_SYSTEM, "pnfsd.dsfile", + sizeof(*wpf) * mirrorcnt, (char *)wpf, p); + if (mp != NULL) + vn_finished_write(mp); + } + + /* Get rid of the dontlist entry, so that Layouts can be issued. */ + NFSDDONTLISTLOCK(); + LIST_REMOVE(mrp, nfsmr_list); + NFSDDONTLISTUNLOCK(); + free(mrp, M_NFSDSTATE); + return (ret); +} + +/* + * Create a data storage file on the recovered DS. + */ +static int +nfsrv_createdsfile(vnode_t vp, fhandle_t *fhp, struct pnfsdsfile *pf, + vnode_t dvp, struct nfsdevice *ds, struct ucred *cred, NFSPROC_T *p, + vnode_t *tvpp) +{ + struct vattr va, nva; + int error; + + /* Make data file name based on FH. */ + error = VOP_GETATTR(vp, &va, cred); + if (error == 0) { + /* Set the attributes for "vp" to Setattr the DS vp. */ + VATTR_NULL(&nva); + nva.va_uid = va.va_uid; + nva.va_gid = va.va_gid; + nva.va_mode = va.va_mode; + nva.va_size = 0; + VATTR_NULL(&va); + va.va_type = VREG; + va.va_mode = nva.va_mode; + NFSD_DEBUG(4, "nfsrv_dscreatefile: dvp=%p pf=%p\n", dvp, pf); + error = nfsrv_dscreate(dvp, &va, &nva, fhp, pf, NULL, + pf->dsf_filename, cred, p, tvpp); + } + return (error); +} + +/* + * Look up the MDS file shared locked, and then get the extended attribute + * to find the extant DS file to be copied to the new mirror. + * If successful, *vpp is set to the MDS file's vp and *nvpp is + * set to a DS data file for the MDS file, both exclusively locked. + * The "buf" argument has the pnfsdsfile structure from the MDS file + * in it and buflen is set to its length. + */ +int +nfsrv_mdscopymr(char *mdspathp, char *dspathp, char *curdspathp, char *buf, + int *buflenp, char *fname, NFSPROC_T *p, struct vnode **vpp, + struct vnode **nvpp, struct pnfsdsfile **pfp, struct nfsdevice **dsp, + struct nfsdevice **fdsp) +{ + struct nameidata nd; + struct vnode *vp, *curvp; + struct pnfsdsfile *pf; + struct nfsmount *nmp, *curnmp; + int dsdir, error, mirrorcnt, ippos; + + vp = NULL; + curvp = NULL; + curnmp = NULL; + *dsp = NULL; + *fdsp = NULL; + if (dspathp == NULL && curdspathp != NULL) + return (EPERM); + + /* + * Look up the MDS file shared locked. The lock will be upgraded + * to an exclusive lock after any rw layouts have been returned. + */ + NFSD_DEBUG(4, "mdsopen path=%s\n", mdspathp); + NDINIT(&nd, LOOKUP, FOLLOW | LOCKSHARED | LOCKLEAF, UIO_SYSSPACE, + mdspathp, p); + error = namei(&nd); + NFSD_DEBUG(4, "lookup=%d\n", error); + if (error != 0) + return (error); + if (nd.ni_vp->v_type != VREG) { + vput(nd.ni_vp); + NFSD_DEBUG(4, "mdspath not reg\n"); + return (EISDIR); + } + vp = nd.ni_vp; + + if (curdspathp != NULL) { + /* + * Look up the current DS path and find the nfsdev structure for + * it. + */ + NFSD_DEBUG(4, "curmdsdev path=%s\n", curdspathp); + NDINIT(&nd, LOOKUP, FOLLOW | LOCKSHARED | LOCKLEAF, + UIO_SYSSPACE, curdspathp, p); + error = namei(&nd); + NFSD_DEBUG(4, "ds lookup=%d\n", error); + if (error != 0) { + vput(vp); + return (error); + } + if (nd.ni_vp->v_type != VDIR) { + vput(nd.ni_vp); + vput(vp); + NFSD_DEBUG(4, "curdspath not dir\n"); + return (ENOTDIR); + } + if (strcmp(nd.ni_vp->v_mount->mnt_vfc->vfc_name, "nfs") != 0) { + vput(nd.ni_vp); + vput(vp); + NFSD_DEBUG(4, "curdspath not an NFS mount\n"); + return (ENXIO); + } + curnmp = VFSTONFS(nd.ni_vp->v_mount); + + /* Search the nfsdev list for a match. */ + NFSDDSLOCK(); + *fdsp = nfsv4_findmirror(curnmp); + NFSDDSUNLOCK(); + if (*fdsp == NULL) + curnmp = NULL; + if (curnmp == NULL) { + vput(nd.ni_vp); + vput(vp); + NFSD_DEBUG(4, "mdscopymr: no current ds\n"); + return (ENXIO); + } + curvp = nd.ni_vp; + } + + if (dspathp != NULL) { + /* Look up the nfsdev path and find the nfsdev structure. */ + NFSD_DEBUG(4, "mdsdev path=%s\n", dspathp); + NDINIT(&nd, LOOKUP, FOLLOW | LOCKSHARED | LOCKLEAF, + UIO_SYSSPACE, dspathp, p); + error = namei(&nd); + NFSD_DEBUG(4, "ds lookup=%d\n", error); + if (error != 0) { + vput(vp); + if (curvp != NULL) + vput(curvp); + return (error); + } + if (nd.ni_vp->v_type != VDIR || nd.ni_vp == curvp) { + vput(nd.ni_vp); + vput(vp); + if (curvp != NULL) + vput(curvp); + NFSD_DEBUG(4, "dspath not dir\n"); + if (nd.ni_vp == curvp) + return (EPERM); + return (ENOTDIR); + } + if (strcmp(nd.ni_vp->v_mount->mnt_vfc->vfc_name, "nfs") != 0) { + vput(nd.ni_vp); + vput(vp); + if (curvp != NULL) + vput(curvp); + NFSD_DEBUG(4, "dspath not an NFS mount\n"); + return (ENXIO); + } + nmp = VFSTONFS(nd.ni_vp->v_mount); + + /* Search the nfsdev list for a match. */ + NFSDDSLOCK(); + *dsp = nfsv4_findmirror(nmp); + NFSDDSUNLOCK(); + if (*dsp == NULL) { + vput(nd.ni_vp); + vput(vp); + if (curvp != NULL) + vput(curvp); + NFSD_DEBUG(4, "mdscopymr: no ds\n"); + return (ENXIO); + } + } else { + nd.ni_vp = NULL; + nmp = NULL; + } + + /* + * Get a vp for an available DS data file using the extended + * attribute on the MDS file. + * If there is a valid entry for the new DS in the extended attribute + * on the MDS file (as checked via the nmp argument), + * nfsrv_dsgetsockmnt() returns EEXIST, so no copying will occur. + */ + error = nfsrv_dsgetsockmnt(vp, 0, buf, buflenp, &mirrorcnt, p, + NULL, NULL, NULL, fname, nvpp, &nmp, curnmp, &ippos, &dsdir); + if (curvp != NULL) + vput(curvp); + if (nd.ni_vp == NULL) { + if (error == 0 && nmp != NULL) { + /* Search the nfsdev list for a match. */ + NFSDDSLOCK(); + *dsp = nfsv4_findmirror(nmp); + NFSDDSUNLOCK(); + } + if (error == 0 && (nmp == NULL || *dsp == NULL)) { + if (nvpp != NULL && *nvpp != NULL) { + vput(*nvpp); + *nvpp = NULL; + } + error = ENXIO; + } + } else + vput(nd.ni_vp); + + /* + * When dspathp != NULL and curdspathp == NULL, this is a recovery + * and is only allowed if there is a 0.0.0.0 IP address entry. + * When curdspathp != NULL, the ippos will be set to that entry. + */ + if (error == 0 && dspathp != NULL && ippos == -1) { + if (nvpp != NULL && *nvpp != NULL) { + vput(*nvpp); + *nvpp = NULL; + } + error = ENXIO; + } + if (error == 0) { + *vpp = vp; + + pf = (struct pnfsdsfile *)buf; + if (ippos == -1) { + /* If no zeroip pnfsdsfile, add one. */ + ippos = *buflenp / sizeof(*pf); + *buflenp += sizeof(*pf); + pf += ippos; + pf->dsf_dir = dsdir; + strlcpy(pf->dsf_filename, fname, + sizeof(pf->dsf_filename)); + } else + pf += ippos; + *pfp = pf; + } else + vput(vp); + return (error); +} + diff --git a/sys/fs/nfsserver/nfs_nfsdsubs.c b/sys/fs/nfsserver/nfs_nfsdsubs.c index 6e07c5455904..f7df2793ceea 100644 --- a/sys/fs/nfsserver/nfs_nfsdsubs.c +++ b/sys/fs/nfsserver/nfs_nfsdsubs.c @@ -57,6 +57,8 @@ extern uid_t nfsrv_defaultuid; extern gid_t nfsrv_defaultgid; char nfs_v2pubfh[NFSX_V2FH]; +struct nfsdontlisthead nfsrv_dontlisthead; +struct nfslayouthead nfsrv_recalllisthead; static nfstype newnfsv2_type[9] = { NFNON, NFREG, NFDIR, NFBLK, NFCHR, NFLNK, NFNON, NFCHR, NFNON }; extern nfstype nfsv34_type[9]; @@ -1443,7 +1445,14 @@ nfsrv_mtofh(struct nfsrv_descript *nd, struct nfsrvfh *fhp) nd->nd_flag |= ND_PUBLOOKUP; goto nfsmout; } - if (len < NFSRV_MINFH || len > NFSRV_MAXFH) { + copylen = len; + + /* If len == NFSX_V4PNFSFH the RPC is a pNFS DS one. */ + if (len == NFSX_V4PNFSFH && (nd->nd_flag & ND_NFSV41) != 0) { + copylen = NFSX_MYFH; + len = NFSM_RNDUP(len); + nd->nd_flag |= ND_DSSERVER; + } else if (len < NFSRV_MINFH || len > NFSRV_MAXFH) { if (nd->nd_flag & ND_NFSV4) { if (len > 0 && len <= NFSX_V4FHMAX) { error = nfsm_advance(nd, NFSM_RNDUP(len), -1); @@ -1460,7 +1469,6 @@ nfsrv_mtofh(struct nfsrv_descript *nd, struct nfsrvfh *fhp) goto nfsmout; } } - copylen = len; } else { /* * For NFSv2, the file handle is always 32 bytes on the @@ -2054,6 +2062,8 @@ nfsd_init(void) mtx_init(&nfssessionhash[i].mtx, "nfssm", NULL, MTX_DEF); LIST_INIT(&nfssessionhash[i].list); } + LIST_INIT(&nfsrv_dontlisthead); + TAILQ_INIT(&nfsrv_recalllisthead); /* and the v2 pubfh should be all zeros */ NFSBZERO(nfs_v2pubfh, NFSX_V2FH); diff --git a/sys/nfs/nfs_nfssvc.c b/sys/nfs/nfs_nfssvc.c index 8f3ef1410692..19ac16a933f5 100644 --- a/sys/nfs/nfs_nfssvc.c +++ b/sys/nfs/nfs_nfssvc.c @@ -106,7 +106,7 @@ sys_nfssvc(struct thread *td, struct nfssvc_args *uap) NFSSVC_PUBLICFH | NFSSVC_V4ROOTEXPORT | NFSSVC_NOPUBLICFH | NFSSVC_STABLERESTART | NFSSVC_ADMINREVOKE | NFSSVC_DUMPCLIENTS | NFSSVC_DUMPLOCKS | NFSSVC_BACKUPSTABLE | - NFSSVC_SUSPENDNFSD | NFSSVC_RESUMENFSD)) && + NFSSVC_SUSPENDNFSD | NFSSVC_RESUMENFSD | NFSSVC_PNFSDS)) && nfsd_call_nfsd != NULL) error = (*nfsd_call_nfsd)(td, uap); if (error == EINTR || error == ERESTART) diff --git a/sys/nfs/nfssvc.h b/sys/nfs/nfssvc.h index 99b9b78c17bc..4d0ef56b93a0 100644 --- a/sys/nfs/nfssvc.h +++ b/sys/nfs/nfssvc.h @@ -73,6 +73,7 @@ #define NFSSVC_DUMPMNTOPTS 0x10000000 #define NFSSVC_NEWSTRUCT 0x20000000 #define NFSSVC_FORCEDISM 0x40000000 +#define NFSSVC_PNFSDS 0x80000000 /* Argument structure for NFSSVC_DUMPMNTOPTS. */ struct nfscl_dumpmntopts {