203c00b359
protocol. This is a content change.
188 lines
7.3 KiB
Groff
188 lines
7.3 KiB
Groff
.\" Copyright (c) 2017 Rick Macklem
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that the following conditions
|
|
.\" are met:
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
.\"
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
|
|
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
|
|
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
.\" SUCH DAMAGE.
|
|
.\"
|
|
.\" $FreeBSD$
|
|
.\"
|
|
.Dd March 26, 2018
|
|
.Dt PNFS 4
|
|
.Os
|
|
.Sh NAME
|
|
.Nm pNFS
|
|
.Nd NFS Version 4.1 Parallel NFS Protocol
|
|
.Sh DESCRIPTION
|
|
The NFSv4.1 client and server provides support for the
|
|
.Tn pNFS
|
|
specification; see
|
|
.%T "Network File System (NFS) Version 4 Minor Version 1 Protocol RFC 5661" .
|
|
A pNFS service separates Read/Write operations from all other NFSv4.1
|
|
operations, which are referred to as Metadata operations.
|
|
The Read/Write operations are performed directly on the Data Server (DS)
|
|
where the file's data resides, bypassing the NFS server.
|
|
All other file operations are performed on the NFS server, which is referred to
|
|
as a Metadata Server (MDS).
|
|
NFS clients that do not support
|
|
.Tn pNFS
|
|
perform Read/Write operations on the MDS, which acts as a proxy for the
|
|
appropriate DS(s).
|
|
.Pp
|
|
The NFSv4.1 protocol provides two pieces of information to pNFS aware
|
|
clients that allow them to perform Read/Write operations directly on
|
|
the DS.
|
|
.Pp
|
|
The first is DeviceInfo, which is static information defining the DS
|
|
server.
|
|
The critical piece of information in DeviceInfo for the layout types
|
|
supported by FreeBSD is the IP address that is used to perform RPCs on the DS.
|
|
It also indicates which version of NFS the DS supports, I/O size and other
|
|
layout specific information.
|
|
In the DeviceInfo, there is a DeviceID which, for the FreeBSD server
|
|
is unique to the DS configuration
|
|
and changes whenever the
|
|
.Xr nfsd
|
|
daemon is restarted or the server is rebooted.
|
|
.Pp
|
|
The second is the layout, which is per file and references the DeviceInfo
|
|
to use via the DeviceID.
|
|
It is for a byte range of a file and is either Read or Read/Write.
|
|
For the FreeBSD server, a layout covers all bytes of a file.
|
|
A layout may be recalled by the MDS using a LayoutRecall callback.
|
|
When a client returns a layout via the LayoutReturn operation it can
|
|
indicate that error(s) were encountered while doing I/O on the DS.
|
|
.Pp
|
|
The FreeBSD client and server supports two layout types.
|
|
.Pp
|
|
The File Layout is described in RFC5661 and uses the NFSv4.1 protocol
|
|
to perform I/O on the DS.
|
|
It does not support client aware DS mirroring and, as such,
|
|
the FreeBSD server only provides File Layout support for non-mirrored
|
|
configurations.
|
|
.Pp
|
|
The Flexible File Layout allows the use of the NFSv3, NFSv4.0 or NFSv4.1
|
|
protocol to perform I/O on the DS and does support client aware mirroring.
|
|
As such, the FreeBSD server uses Flexible File Layout layouts for the
|
|
mirrored DS configurations.
|
|
The FreeBSD server supports the
|
|
.Dq tightly coupled
|
|
variant and all DSs use the
|
|
NFSv4.1 protocol for I/O operations.
|
|
Clients that support the Flexible File Layout will do writes and commits
|
|
to all DS mirrors in the mirror set.
|
|
.Pp
|
|
A FreeBSD pNFS service consists of a single MDS server plus one or more
|
|
DS servers, all of which are FreeBSD systems.
|
|
For a non-mirrored configuration, the FreeBSD server will issue File Layout
|
|
layouts by default.
|
|
However that default can be set to the Flexible File Layout by setting the
|
|
.Xr sysctl 1
|
|
sysctl ``vfs.nfsd.default_flexfile'' to one.
|
|
Mirrored server configurations will only issue Flexible File Layouts.
|
|
.Tn pNFS
|
|
clients mount the MDS as they would a single NFS server.
|
|
.Pp
|
|
A FreeBSD
|
|
.Tn pNFS
|
|
client must be running the
|
|
.Xr nfscbd 8
|
|
daemon and use the mount options
|
|
.Dq nfsv4,minorversion=1,pnfs .
|
|
.Pp
|
|
When files are created, the MDS creates a file tree identical to what a
|
|
single NFS server creates, except that all the regular (VREG) files will
|
|
be empty.
|
|
As such, if you look at the exported tree on the MDS directly
|
|
on the MDS server (not via an NFS mount), the files will all be of size zero.
|
|
Each of these files will also have two extended attributes in the system
|
|
attribute name space:
|
|
.Bd -literal -offset indent
|
|
pnfsd.dsfile - This extended attrbute stores the information that the
|
|
MDS needs to find the data file on a DS for this file.
|
|
pnfsd.dsattr - This extended attribute stores the Size, AccessTime,
|
|
ModifyTime and Change attributes for the file.
|
|
.Ed
|
|
.Pp
|
|
For each regular (VREG) file, the MDS creates a data file on one
|
|
(or on N of them for the mirrored case, where N is the mirror_level)
|
|
of the DSs where the file's data will be stored.
|
|
The name of this file is
|
|
the file handle of the file on the MDS in hexadecimal at time of file creation.
|
|
The data file will have the same file ownership, mode and NFSv4 ACL
|
|
(if ACLs are enabled for the file system) as the file on the MDS, so that
|
|
permission checking can be done on the DS.
|
|
This is referred to as
|
|
.Dq tightly coupled
|
|
for the Flexible File Layout.
|
|
.Pp
|
|
For
|
|
.Tn pNFS
|
|
aware clients, the service generates File Layout
|
|
or Flexible File Layout
|
|
layouts and associated DeviceInfo.
|
|
For non-pNFS aware NFS clients, the pNFS service appears just like a normal
|
|
NFS service.
|
|
For the non-pNFS aware client, the MDS will perform I/O operations on the appropriate DS(s), acting as
|
|
a proxy for the non-pNFS aware client.
|
|
This is also true for NFSv3 and NFSv4.0 mounts, since these are always non-pNFS
|
|
aware.
|
|
.Pp
|
|
See
|
|
.Bd -literal -offset indent
|
|
http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt
|
|
.Ed
|
|
.sp
|
|
for information on how to set up a FreeBSD pNFS service.
|
|
.Sh SEE ALSO
|
|
.Xr nfsv4 4 ,
|
|
.Xr exports 5 ,
|
|
.Xr fstab 5 ,
|
|
.Xr rc.conf 5 ,
|
|
.Xr nfscbd 8 ,
|
|
.Xr nfsd 8 ,
|
|
.Xr nfsuserd 8 ,
|
|
.Xr pnfsdscopymr 8 ,
|
|
.Xr pnfsdsfile 8 ,
|
|
.Xr pnfsdskill 8
|
|
.Sh BUGS
|
|
Linux kernel versions prior to 4.12 only supports NFSv3 DSs in its client
|
|
and will do all I/O through the MDS.
|
|
For Linux 4.12 kernels, support for NFSv4.1 DSs was added, but I have seen
|
|
Linux client crashes when testing this client.
|
|
For Linux 4.17-rc2 kernels, I have not seen client crashes during testing,
|
|
but it only supports the
|
|
.Dq loosely coupled
|
|
variant.
|
|
To make it work correctly when mounting the FreeBSD server, you must either
|
|
patch the Flexible File Layout client driver with a patch like:
|
|
.Bd -literal -offset indent
|
|
http://people.freebsd.org/~rmacklem/flexfile.patch
|
|
.Ed
|
|
.sp
|
|
or set the sysctl
|
|
.Dq vfs.nfsd.flexlinuxhack
|
|
to one so that it works around
|
|
the Linux client driver's limitations.
|
|
.Pp
|
|
Since the MDS cannot be mirrored, it is a single point of failure just
|
|
as a non
|
|
.Tn pNFS
|
|
server is.
|