4c87085d35
- consider using OS macro: Fx Reviewed by: bcr MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D31145
231 lines
8.3 KiB
Groff
231 lines
8.3 KiB
Groff
.\" Copyright (c) 2017 Rick Macklem
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that the following conditions
|
|
.\" are met:
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
.\"
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
|
|
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
|
|
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
.\" SUCH DAMAGE.
|
|
.\"
|
|
.\" $FreeBSD$
|
|
.\"
|
|
.Dd December 20, 2019
|
|
.Dt PNFS 4
|
|
.Os
|
|
.Sh NAME
|
|
.Nm pNFS
|
|
.Nd NFS Version 4.1 and 4.2 Parallel NFS Protocol
|
|
.Sh DESCRIPTION
|
|
The NFSv4.1 and NFSv4.2 client and server provides support for the
|
|
.Tn pNFS
|
|
specification; see
|
|
.%T "Network File System (NFS) Version 4 Minor Version 1 Protocol RFC 5661" ,
|
|
.%T "Network File System (NFS) Version 4 Minor Version 2 Protocol RFC 7862" and
|
|
.%T "Parallel NFS (pNFS) Flexible File Layout RFC 8435" .
|
|
A pNFS service separates Read/Write operations from all other NFSv4.1 and
|
|
NFSv4.2 operations, which are referred to as Metadata operations.
|
|
The Read/Write operations are performed directly on the Data Server (DS)
|
|
where the file's data resides, bypassing the NFS server.
|
|
All other file operations are performed on the NFS server, which is referred to
|
|
as a Metadata Server (MDS).
|
|
NFS clients that do not support
|
|
.Tn pNFS
|
|
perform Read/Write operations on the MDS, which acts as a proxy for the
|
|
appropriate DS(s).
|
|
.Pp
|
|
The NFSv4.1 and NFSv4.2 protocols provide two pieces of information to pNFS
|
|
aware clients that allow them to perform Read/Write operations directly on
|
|
the DS.
|
|
.Pp
|
|
The first is DeviceInfo, which is static information defining the DS
|
|
server.
|
|
The critical piece of information in DeviceInfo for the layout types
|
|
supported by
|
|
.Fx
|
|
is the IP address that is used to perform RPCs on the DS.
|
|
It also indicates which version of NFS the DS supports, I/O size and other
|
|
layout specific information.
|
|
In the DeviceInfo, there is a DeviceID which, for the
|
|
.Fx
|
|
server
|
|
is unique to the DS configuration
|
|
and changes whenever the
|
|
.Xr nfsd
|
|
daemon is restarted or the server is rebooted.
|
|
.Pp
|
|
The second is the layout, which is per file and references the DeviceInfo
|
|
to use via the DeviceID.
|
|
It is for a byte range of a file and is either Read or Read/Write.
|
|
For the
|
|
.Fx
|
|
server, a layout covers all bytes of a file.
|
|
A layout may be recalled by the MDS using a LayoutRecall callback.
|
|
When a client returns a layout via the LayoutReturn operation it can
|
|
indicate that error(s) were encountered while doing I/O on the DS,
|
|
at least for certain layout types such as the Flexible File Layout.
|
|
.Pp
|
|
The
|
|
.Fx
|
|
client and server supports two layout types.
|
|
.Pp
|
|
The File Layout is described in RFC5661 and uses the NFSv4.1 or NFSv4.2 protocol
|
|
to perform I/O on the DS.
|
|
It does not support client aware DS mirroring and, as such,
|
|
the
|
|
.Fx
|
|
server only provides File Layout support for non-mirrored
|
|
configurations.
|
|
.Pp
|
|
The Flexible File Layout allows the use of the NFSv3, NFSv4.0, NFSv4.1 or
|
|
NFSv4.2 protocol to perform I/O on the DS and does support client aware
|
|
mirroring.
|
|
As such, the
|
|
.Fx
|
|
server uses Flexible File Layout layouts for the
|
|
mirrored DS configurations.
|
|
The
|
|
.Fx
|
|
server supports the
|
|
.Dq tightly coupled
|
|
variant and all DSs allow use of the
|
|
NFSv4.2 or NFSv4.1 protocol for I/O operations.
|
|
Clients that support the Flexible File Layout will do writes and commits
|
|
to all DS mirrors in the mirror set.
|
|
.Pp
|
|
A
|
|
.Fx
|
|
pNFS service consists of a single MDS server plus one or more
|
|
DS servers, all of which are
|
|
.Fx
|
|
systems.
|
|
For a non-mirrored configuration, the
|
|
.Fx
|
|
server will issue File Layout
|
|
layouts by default.
|
|
However that default can be set to the Flexible File Layout by setting the
|
|
.Xr sysctl 1
|
|
sysctl
|
|
.Dq vfs.nfsd.default_flexfile
|
|
to one.
|
|
Mirrored server configurations will only issue Flexible File Layouts.
|
|
.Tn pNFS
|
|
clients mount the MDS as they would a single NFS server.
|
|
.Pp
|
|
A
|
|
.Fx
|
|
.Tn pNFS
|
|
client must be running the
|
|
.Xr nfscbd 8
|
|
daemon and use the mount options
|
|
.Dq nfsv4,minorversion=2,pnfs or
|
|
.Dq nfsv4,minorversion=1,pnfs .
|
|
.Pp
|
|
When files are created, the MDS creates a file tree identical to what a
|
|
single NFS server creates, except that all the regular (VREG) files will
|
|
be empty.
|
|
As such, if you look at the exported tree on the MDS directly
|
|
on the MDS server (not via an NFS mount), the files will all be of size zero.
|
|
Each of these files will also have two extended attributes in the system
|
|
attribute name space:
|
|
.Bd -literal -offset indent
|
|
pnfsd.dsfile - This extended attrbute stores the information that the
|
|
MDS needs to find the data file on a DS(s) for this file.
|
|
pnfsd.dsattr - This extended attribute stores the Size, AccessTime,
|
|
ModifyTime, Change and SpaceUsed attributes for the file.
|
|
.Ed
|
|
.Pp
|
|
For each regular (VREG) file, the MDS creates a data file on one
|
|
(or on N of them for the mirrored case, where N is the mirror_level)
|
|
of the DS(s) where the file's data will be stored.
|
|
The name of this file is
|
|
the file handle of the file on the MDS in hexadecimal at time of file creation.
|
|
The data file will have the same file ownership, mode and NFSv4 ACL
|
|
(if ACLs are enabled for the file system) as the file on the MDS, so that
|
|
permission checking can be done on the DS.
|
|
This is referred to as
|
|
.Dq tightly coupled
|
|
for the Flexible File Layout.
|
|
.Pp
|
|
For
|
|
.Tn pNFS
|
|
aware clients, the service generates File Layout
|
|
or Flexible File Layout
|
|
layouts and associated DeviceInfo.
|
|
For non-pNFS aware NFS clients, the pNFS service appears just like a normal
|
|
NFS service.
|
|
For the non-pNFS aware client, the MDS will perform I/O operations on the
|
|
appropriate DS(s), acting as
|
|
a proxy for the non-pNFS aware client.
|
|
This is also true for NFSv3 and NFSv4.0 mounts, since these are always non-pNFS
|
|
aware.
|
|
.Pp
|
|
It is possible to assign a DS to an MDS exported file system so that it will
|
|
store data for files on the MDS exported file system.
|
|
If a DS is not assigned to an MDS exported file system, it will store data
|
|
for files on all exported file systems on the MDS.
|
|
.Pp
|
|
If mirroring is enabled, the pNFS service will continue to function when
|
|
DS(s) have failed, so long is there is at least one DS still operational
|
|
that stores data for files on all of the MDS exported file systems.
|
|
After a disabled mirrored DS is repaired, it is possible to recover the DS
|
|
as a mirror while the pNFS service continues to function.
|
|
.Pp
|
|
See
|
|
.Xr pnfsserver 4
|
|
for information on how to set up a
|
|
.Fx
|
|
pNFS service.
|
|
.Sh SEE ALSO
|
|
.Xr nfsv4 4 ,
|
|
.Xr pnfsserver 4 ,
|
|
.Xr exports 5 ,
|
|
.Xr fstab 5 ,
|
|
.Xr rc.conf 5 ,
|
|
.Xr nfscbd 8 ,
|
|
.Xr nfsd 8 ,
|
|
.Xr nfsuserd 8 ,
|
|
.Xr pnfsdscopymr 8 ,
|
|
.Xr pnfsdsfile 8 ,
|
|
.Xr pnfsdskill 8
|
|
.Sh BUGS
|
|
Linux kernel versions prior to 4.12 only supports NFSv3 DSs in its client
|
|
and will do all I/O through the MDS.
|
|
For Linux 4.12 kernels, support for NFSv4.1 DSs was added, but I have seen
|
|
Linux client crashes when testing this client.
|
|
For Linux 4.17-rc2 kernels, I have not seen client crashes during testing,
|
|
but it only supports the
|
|
.Dq loosely coupled
|
|
variant.
|
|
To make it work correctly when mounting the
|
|
.Fx
|
|
server, you must
|
|
set the sysctl
|
|
.Dq vfs.nfsd.flexlinuxhack
|
|
to one so that it works around
|
|
the Linux client driver's limitations.
|
|
Wihout this sysctl being set, there will be access errors, since the Linux
|
|
client will use the authenticator in the layout (uid=999, gid=999) and not
|
|
the authenticator specified in the RPC header.
|
|
.Pp
|
|
Linux 5.n kernels appear to be patched so that it uses the authenticator
|
|
in the RPC header and, as such, the above sysctl should not need to be set.
|
|
.Pp
|
|
Since the MDS cannot be mirrored, it is a single point of failure just
|
|
as a non
|
|
.Tn pNFS
|
|
server is.
|