diff --git a/usr.sbin/nfsd/Makefile b/usr.sbin/nfsd/Makefile index 290506773d76..147a056cc0d7 100644 --- a/usr.sbin/nfsd/Makefile +++ b/usr.sbin/nfsd/Makefile @@ -2,6 +2,6 @@ # $FreeBSD$ PROG= nfsd -MAN= nfsd.8 nfsv4.4 stablerestart.5 +MAN= nfsd.8 nfsv4.4 stablerestart.5 pnfs.4 .include diff --git a/usr.sbin/nfsd/pnfs.4 b/usr.sbin/nfsd/pnfs.4 new file mode 100644 index 000000000000..84baab5e0ffa --- /dev/null +++ b/usr.sbin/nfsd/pnfs.4 @@ -0,0 +1,187 @@ +.\" Copyright (c) 2017 Rick Macklem +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.Dd March 26, 2018 +.Dt PNFS 4 +.Os +.Sh NAME +.Nm pNFS +.Nd NFS Version 4.1 Parallel NFS Protocol +.Sh DESCRIPTION +The NFSv4.1 client and server provides support for the +.Tn pNFS +specification; see +.%T "Network File System (NFS) Version 4 Minor Version 1 Protocol RFC 5661" . +A pNFS service separates Read/Write operations from all other NFSv4.1 +operations, which are referred to as Metadata operations. +The Read/Write operations are performed directly on the Data Server (DS) +where the file's data resides, bypassing the NFS server. +All other file operations are performed on the NFS server, which is referred to +as a Metadata Server (MDS). +NFS clients that do not support +.Tn pNFS +perform Read/Write operations on the MDS, which acts as a proxy for the +appropriate DS(s). +.Pp +The NFSv4.1 protocol provides two pieces of information to pNFS aware +clients that allow them to perform Read/Write operations directly on +the DS. +.Pp +The first is DeviceInfo, which is static information defining the DS +server. +The critical piece of information in DeviceInfo for the layout types +supported by FreeBSD is the IP address that is used to perform RPCs on the DS. +It also indicates which version of NFS the DS supports, I/O size and other +layout specific information. +In the DeviceInfo, there is a DeviceID which, for the FreeBSD server +is unique to the DS configuration +and changes whenever the +.Xr nfsd +daemon is restarted or the server is rebooted. +.Pp +The second is the layout, which is per file and references the DeviceInfo +to use via the DeviceID. +It is for a byte range of a file and is either Read or Read/Write. +For the FreeBSD server, a layout covers all bytes of a file. +A layout may be recalled by the MDS using a LayoutRecall callback. +When a client returns a layout via the LayoutReturn operation it can +indicate that error(s) were encountered while doing I/O on the DS. +.Pp +The FreeBSD client and server supports two layout types. +.Pp +The File Layout is described in RFC5661 and uses the NFSv4.1 protocol +to perform I/O on the DS. +It does not support client aware DS mirroring and, as such, +the FreeBSD server only provides File Layout support for non-mirrored +configurations. +.Pp +The Flexible File Layout allows the use of the NFSv3, NFSv4.0 or NFSv4.1 +protocol to perform I/O on the DS and does support client aware mirroring. +As such, the FreeBSD server uses Flexible File Layout layouts for the +mirrored DS configurations. +The FreeBSD server supports the +.Dq tightly coupled +variant and all DSs use the +NFSv4.1 protocol for I/O operations. +Clients that support the Flexible File Layout will do writes and commits +to all DS mirrors in the mirror set. +.Pp +A FreeBSD pNFS service consists of a single MDS server plus one or more +DS servers, all of which are FreeBSD systems. +For a non-mirrored configuration, the FreeBSD server will issue File Layout +layouts by default. +However that default can be set to the Flexible File Layout by setting the +.Xr sysctl 1 +sysctl ``vfs.nfsd.default_flexfile'' to one. +Mirrored server configurations will only issue Flexible File Layouts. +.Tn pNFS +clients mount the MDS as they would a single NFS server. +.Pp +A FreeBSD +.Tn pNFS +client must be running the +.Xr nfscbd 8 +daemon and use the mount options +.Dq nfsv4,minorversion=1,pnfs . +.Pp +When files are created, the MDS creates a file tree identical to what a +single NFS server creates, except that all the regular (VREG) files will +be empty. +As such, if you look at the exported tree on the MDS directly +on the MDS server (not via an NFS mount), the files will all be of size zero. +Each of these files will also have two extended attributes in the system +attribute name space: +.Bd -literal -offset indent +pnfsd.dsfile - This extended attrbute stores the information that the + MDS needs to find the data file on a DS for this file. +pnfsd.dsattr - This extended attribute stores the Size, AccessTime, + ModifyTime and Change attributes for the file. +.Ed +.Pp +For each regular (VREG) file, the MDS creates a data file on one +(or on N of them for the mirrored case, where N is the mirror_level) +of the DSs where the file's data will be stored. +The name of this file is +the file handle of the file on the MDS in hexadecimal at time of file creation. +The data file will have the same file ownership, mode and NFSv4 ACL +(if ACLs are enabled for the file system) as the file on the MDS, so that +permission checking can be done on the DS. +This is referred to as +.Dq tightly coupled +for the Flexible File Layout. +.Pp +For +.Tn pNFS +aware clients, the service generates File Layout +or Flexible File Layout +layouts and associated DeviceInfo. +For non-pNFS aware NFS clients, the pNFS service appears just like a normal +NFS service. +For the non-pNFS aware client, the MDS will perform I/O operations on the appropriate DS(s), acting as +a proxy for the non-pNFS aware client. +This is also true for NFSv3 and NFSv4.0 mounts, since these are always non-pNFS +aware. +.Pp +See +.Bd -literal -offset indent +http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt +.Ed +.sp +for information on how to set up a FreeBSD pNFS service. +.Sh SEE ALSO +.Xr nfsv4 4 , +.Xr exports 5 , +.Xr fstab 5 , +.Xr rc.conf 5 , +.Xr nfscbd 8 , +.Xr nfsd 8 , +.Xr nfsuserd 8 , +.Xr pnfsdscopymr 8 , +.Xr pnfsdsfile 8 , +.Xr pnfsdskill 8 +.Sh BUGS +Linux kernel versions prior to 4.12 only supports NFSv3 DSs in its client +and will do all I/O through the MDS. +For Linux 4.12 kernels, support for NFSv4.1 DSs was added, but I have seen +Linux client crashes when testing this client. +For Linux 4.17-rc2 kernels, I have not seen client crashes during testing, +but it only supports the +.Dq loosely coupled +variant. +To make it work correctly when mounting the FreeBSD server, you must either +patch the Flexible File Layout client driver with a patch like: +.Bd -literal -offset indent +http://people.freebsd.org/~rmacklem/flexfile.patch +.Ed +.sp +or set the sysctl +.Dq vfs.nfsd.flexlinuxhack +to one so that it works around +the Linux client driver's limitations. +.Pp +Since the MDS cannot be mirrored, it is a single point of failure just +as a non +.Tn pNFS +server is.