freebsd-nq

Author	SHA1	Message	Date
Rick Macklem	fdab4d3b29	Fix LORs between vn_start_write() and vn_lock() in nfsrv_copymr(). When coding the pNFS server, I added vn_start_write() calls in nfsrv_copymr() done while the vnodes were locked, not realizing I had introduced LORs and possible deadlock when an exported file system on the MDS is suspended. This patch fixes the LORs by moving the vn_start_write() calls up to before where the vnodes are locked. For "tvp", the vn_start_write() probaby isn't necessary, because NFS mounts can't be suspended. However, I think doing so is harmless. Thanks go to kib@ for letting me know that I had introduced these LORs. This patch only affects the behaviour of the pNFS server when pnfsdscopymr(8) is used to recover a mirrored DS.	2018-08-18 19:14:06 +00:00
Rick Macklem	3e5ba2e187	Fix LORs between vn_start_write() and vn_lock() in the pNFS server. When coding the pNFS server, I added several vn_start_write() calls done while the vnode was locked, not realizing I had introduced LORs and possible deadlock when an exported file system on the MDS is suspended. This patch fixes this by removing the added vn_start_write() calls and modifying the code so that the extant vn_start_write() call before the NFS RPC/operation is done when needed by the pNFS server. Flags are changed so that LayoutCommit and LayoutReturn now get a vn_start_write() done for them. When the pNFS server is enabled, the code now also changes the flags for Getattr, so that the vn_start_write() is done for Getattr, since it may need to do a vn_set_extattr(). The nfs_writerpc flag array was made global to the NFS server and renamed nfsrv_writerpc, which is consistent naming for globals in the NFS server. Thanks go to kib@ for reporting that doing vn_start_write() while the vnode is locked results in a LOR. This patch only affects the behaviour of the pNFS server.	2018-08-17 21:12:16 +00:00
Rick Macklem	9fbb0faf4f	Don't set a file's size for the MDS file of a pNFS service. When a pNFS service is running, the size of the files created on the MDS are normally 0, since the data is written to the data files on the DS(s). However, without this patch, if a Setattr with a non-zero size was done by a client, the MDS file was set to that size. This was thought to be benign, but it turns out that files with a non-zero size plus extended attributes can cause a "ffs_truncate3" panic in UFS. Although the exact cause of this panic() has not been isolated, this patch avoids the panic() and leaves the MDS files in a consistent state of always having a size == 0. Note that these MDS files never store data. The patch also includes an unnecessary initialization of savsize in case some compiler or static analyser complains it might not be initialized. This patch only affects the NFS server when pNFS is enabled via the "-p" command line option on nfsd.	2018-08-17 12:32:38 +00:00
Jamie Gritton	284001a222	Put jail(2) under COMPAT_FREEBSD11. It has been the "old" way of creating jails since FreeBSD 7. Along with the system call, put the various security.jail.allow_foo and security.jail.foo_allowed sysctls partly under COMPAT_FREEBSD11 (or BURN_BRIDGES). These sysctls had two disparate uses: on the system side, they were global permissions for jails created via jail(2) which lacked fine-grained permission controls; inside a jail, they're read-only descriptions of what the current jail is allowed to do. The first use is obsolete along with jail(2), but keep them for the second-read-only use. Differential Revision: D14791	2018-08-16 18:40:16 +00:00
Conrad Meyer	5cb27f0813	FUSE: Document global sysctl knobs So that I don't have to keep grepping around the codebase to remember what each one does. And maybe it saves someone else some time. Fix a trivial whitespace issue while here. No functional change. Sponsored by: Dell EMC Isilon	2018-08-15 17:41:19 +00:00
Toomas Soome	527d337fdb	cd9660 pointer sign issues and missing __packed attribute The isonum_* functions are defined to take unsigend char* as an argument, but the structure fields are defined as char. Change to u_char where needed. Probably the full structure should be changed, but I'm not sure about the side affects. While there, add __packed attribute. Differential Revision: https://reviews.freebsd.org/D16564	2018-08-15 06:42:31 +00:00
Rick Macklem	41df1b5b47	Assorted fixes to handling of LayoutRecall callbacks, mostly error handling. After a re-read of the appropriate section of RFC5661, I decided that a few things should be changed related to LayoutRecall callback handling. Here are the things fixed by this patch. - For two of the three cases that LayoutRecall is done, I now think setting the clora_changed argument false is correct. - All errors other than NFSERR_DELAY returned by LayoutRecall appear permanent, so don't retry for any of them. (NFSERR_DELAY is retried by newnfs_request(), so it is not affected by this patch.) - Instead of waiting "forever" (actually until the process is SIGTERM'd) for Layouts to be returned during a mirror copy, fail and return ENXIO after about 1minute. Waiting for a <ctrl>C made sense when pnfsdscopymr() was done by itself, but did not make sense when done via find(1). This patch only affects the pNFS server.	2018-08-08 20:21:45 +00:00
Pedro F. Giffuni	c820acbf0a	msdosfs: fixes for Undefined Behavior. These were found by the Undefined Behaviour GsoC project at NetBSD: Do not change signedness bit with left shift. While there avoid signed integer overflow. Address both issues with using unsigned type. msdosfs_fat.c:512:42, left shift of 1 by 31 places cannot be represented in type 'int' msdosfs_fat.c:521:44, left shift of 1 by 31 places cannot be represented in type 'int' msdosfs_fat.c:744:14, left shift of 1 by 31 places cannot be represented in type 'int' msdosfs_fat.c:744:24, signed integer overflow: -2147483648 - 1 cannot be represented in type 'int [20]' msdosfs_fat.c:840:13, left shift of 1 by 31 places cannot be represented in type 'int' msdosfs_fat.c:840:36, signed integer overflow: -2147483648 - 1 cannot be represented in type 'int [20]' Detected with micro-UBSan in the user mode. Hinted from: NetBSD (CVS 1.33) MFC after: 2 weeks Differenctial Revision: https://reviews.freebsd.org/D16615	2018-08-08 15:08:22 +00:00
Fedor Uporov	53288b712d	Split the dir_index and dir_nlink features. Do not allow to create more that EXT4_LINK_MAX links to directory in case if the dir_nlink is not set, like it is done in the fresh e2fsprogs updates. MFC after: 3 months	2018-08-08 12:08:46 +00:00
Fedor Uporov	17c7b27f55	Fix directory blocks checksum updating logic. The checksum updating functions were not called in case of dir index inode splitting and in case of dir entry removing, when the entry was first in the block. Fix and move the dir entry adding logic when i_count == 0 to new function. MFC after: 3 months	2018-08-08 12:07:45 +00:00
Conrad Meyer	3dc1c7d6bc	FUSE: Remove some set-but-not-used variables No functional change.	2018-08-08 04:46:03 +00:00
Rick Macklem	93df87f208	Allow newnfs_request() to retry all callback RPCs with an NFSERR_DELAY reply. The code in newnfs_request() retries RPCs that get a reply of NFSERR_DELAY, but exempts certain NFSv4 operations. However, for callback RPCs, there should not be any exemptions at this time. The code would have erroneously exempted the CBRECALL callback, since it has the same operation number as the CLOSE operation. This patch fixes this by checking for a callback RPC (indicated by clp != NULL) and not checking for exempt operations for callbacks. This would have only affected the NFSv4 server when delegations are enabled (they are not enabled by default) and the client replies to CBRECALL with NFSERR_DELAY. This may never actually happen. Spotted during code inspection. MFC after: 2 weeks	2018-08-07 21:29:14 +00:00
Rick Macklem	25705dd5d0	Copy all bits of a file handle in case there is padding in the structure. At least on x86, fhandle_t is a packed structure, so I believe an assignment will copy all the bits. However, for some current/future architectures, there might be padding in the structure that doesn't get copied via an assignment. Since NFS assumes a file handle is an opaque blob of bits that can be compared via memcmp()/bcmp(), all the bits including any padding must be copied. This patch replaces the assignments with a call to a byte copy function. Spotted during code inspection.	2018-08-05 19:21:50 +00:00
Rick Macklem	ac0d649588	Silence newer gcc warnings. Newer versions of gcc generate "might not be initialized" warnings for several variables in nfsrpc_doiods(). I have checked and all of these variables are assigned values before they are used. In the one case of "tdrpc", it could have passed garbage as an argument to nfscl_dofflayoutio() when mirrorcnt is one. However nfscl_dofflayoutio() only uses the argument when mirrorcnt > 1, so it wasn't actually broken. This patch initializes "tdrpc" to avoid confusion and initializes the rest to make the compiler happy. Requested by: mmacy	2018-08-02 20:10:59 +00:00
Conrad Meyer	dab6195cd3	FUSE: Bump maximum IO size to enable more performant operation Various components restrict size of IO passed up to the userspace filesystem based on the mount's f_iosize value. The previous default of PAGE_SIZE is anemic, even for normal filesystems, but especially considering every FUSE operation involves a kernel <-> userspace IPC upcall. Bump to DFLTPHYS (currently 64kB) to match other FUSE implementations. Anecdotally, Jakub reports IO read performance increased from 600 MB/s -> 2700 MB/s with a basic RAM-backed FUSE filesystem. PR: 230260 Reported by: Peter (MooseFS) <freebsd AT moosefs.com> Tested by: Jakub Kruszona-Zawadzki <acid AT moosefs.com> MFC after: 3 days	2018-08-02 19:25:43 +00:00
Ed Maste	195e6c50d3	msdosfs: trim EOL whitespace	2018-07-31 12:44:28 +00:00
Ed Maste	a6274b81d5	cd9660: replace bcopy/bzero with C standard equivalents To reduce diffs against NetBSD.	2018-07-31 12:36:46 +00:00
Ed Maste	22e56aea3f	msdosfs: use same max filesize #define as NetBSD and move to header For use by makefs msdosfs support. Obtained from: NetBSD denode.h 1.6 Sponsored by: The FreeBSD Foundation	2018-07-30 20:36:51 +00:00
Rick Macklem	743d528198	Silence newer gcc warnings. Newer versions of gcc generate "set, but not used" warnings. Add __unused macros to silence these warnings. Although the variables are not being used, they are values parsed from arguments to callback RPCs that might be needed in the future. Requested by: mmacy	2018-07-30 20:25:32 +00:00
Rick Macklem	8014c97147	Silence newer gcc warnings. Newer versions of gcc generate "set, but not used" warnings in the NFS server. Add __unused macros to silence these warnings. Requested by: mmacy	2018-07-29 21:51:17 +00:00
Rick Macklem	a3e709cd33	Modify the NFSv4.1 server so that it allows ReclaimComplete as done by ESXi 6.7. I believe that a ReclaimComplete with rca_one_fs == TRUE is only to be used after a file system has been transferred to a different file server. However, RFC5661 is somewhat vague w.r.t. this and the ESXi 6.7 client does both a ReclaimComplete with rca_one_fs == TRUE and one with ReclaimComplete with rca_one_fs == FALSE. Therefore, just ignore the rca_one_fs == TRUE operation and return NFS_OK without doing anything instead of replying NFS4ERR_NOTSUPP. This allows the ESXi 6.7 NFSv4.1 client to do a mount. After discussion on the NFSv4 IETF working group mailing list, doing this along with setting a flag to note that a ReclaimComplete with rca_one_fs TRUE was an appropriate way to handle this. The flag that indicates that a ReclaimComplete with rca_one_fs == TRUE was done may be used to disable replies of NFS4ERR_GRACE for non-reclaim state operations in a future commit. This patch along with r332790, r334492 and r336357 allow ESXi 6.7 NFSv4.1 mounts work ok. ESX 6.5 NFSv4.1 mounts do not work well, due to what I believe are violations of RFC-5661 and should not be used. Reported by: andreas.nagy@frequentis.com Tested by: andreas.nagy@frequentis.com, daniel@ftml.net (earlier version) MFC after: 2 weeks Relnotes: yes	2018-07-28 20:21:04 +00:00
Eitan Adler	33f4bccaa6	Use https over http for FreeBSD pages	2018-07-27 10:40:48 +00:00
Ed Maste	6ae00e306f	Revert msdosfs MAKEFS #ifdef changes from r319870 These changes are not needed for current msdosfs makefs WIP. Submitted by: Siva Mahadevan Sponsored by: The FreeBSD Foundation	2018-07-24 21:10:17 +00:00
Rick Macklem	cecf6c6e9c	Set CLSET_TIMEOUT on TCP connections to pNFS DSs. Use CLSET_TIMEOUT to set the timeout for connections to DSs instead of specifying a timeout on each RPC. This is done so that SO_SNDTIMEO is set on the TCP socket as well as specifying a time limit when waiting for an RPC reply. Useful if the send queue for the TCP connection has become constipated, due to a failed DS. The choice of lease_duration / 4 is fairly arbitrary, but seems to work ok, with a lower bound of 10sec. For client connections to a DS, set the retry limit to vfs.nfsd.dsretries, which is 2 by default. This patch should only affect pNFS connections to DSs. This patch requires r336542. MFC after: 2 weeks	2018-07-21 01:33:07 +00:00
Alan Somers	5717aa2d2a	Allow mounting FUSE filesystems in jails Reviewed by: jamie MFC after: 2 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D16371	2018-07-20 21:35:31 +00:00
Rick Macklem	5d54f186bb	Modify the reasons for not issuing a delegation in the NFSv4.1 server. The ESXi NFSv4.1 client will generate warning messages when the reason for not issuing a delegation is two. Two refers to a resource limit and I do not see why it would be considered invalid. However it probably was not the best choice of reason for not issuing a delegation. This patch changes the reasons used to ones that the ESXi client doesn't complain about. This change does not affect the FreeBSD client and does not appear to affect behaviour of the Linux NFSv4.1 client. RFC5661 defines these "reasons" but does not give any guidance w.r.t. which ones are more appropriate to return to a client. Tested by: andreas.nagy@frequentis.com PR: 226650 MFC after: 2 weeks	2018-07-16 21:32:50 +00:00
Rick Macklem	5da3882447	Shut down the TCP connection to a DS in the pNFS client when Renew fails. When a NFSv4.1 client mount using pNFS detects a failure trying to do a Renew (actually just a Sequence operation), the code would simply try again and again and again every 30sec. This would tie up the "nfscl" thread, which should also be doing other things like Renews on other DSs and the MDS. This patch adds code which closes down the TCP connection and marks it defunct when Renew detects an failure to communicate with the DS, so further Renews will not be attempted until a new working TCP connection to the DS is established. It also makes the call to nfscl_cancelreqs() unconditional, since nfscl_cancelreqs() checks the NFSCLDS_SAMECONN flag and does so while holding the lock. This fix only applies to the NFSv4.1 client whne using pNFS and without it the only effect would have been an "nfscl" thread busy doing Renew attempts on an unresponsive DS. MFC after: 2 weeks	2018-07-15 18:54:44 +00:00
Rick Macklem	89c64a3a4f	Fix the pNFS client when mirrors aren't on the same machine. Without this patch, the client side NFSv4.1 pNFS code erroneously did writes and commits to both DS mirrors using the TCP connection of the first one. For my test setup this worked, since I have both DSs running on the same machine, but it would have failed when the DSs are on separate machines. This patch fixes the code to use the correct TCP connection for each DS. This patch should only affect the NFSv4.1 client when using "pnfs" mounts to mirrored DSs. MFC after: 2 weeks	2018-07-14 19:51:44 +00:00
Rick Macklem	0e7bd20bb2	Close down the TCP connection to a pNFS DS when it is disabled. So long as the TCP connection to a pNFS DS isn't shared with other DSs, it can be closed down when the DS is being disabled in the pNFS client. This causes any RPCs in progress to fail. This patch only affects the NFSv4.1 pNFS client when errors occur while doing I/O on a DS. MFC after: 2 weeks	2018-07-13 20:03:05 +00:00
Rick Macklem	83f526de6a	Change the pNFS client so that it does not report an NFSERR_STALE from an I/O attempt on a DS to the server via LayoutReturn. The current FreeBSD client can generate these errors for an operational DS while doing a recovery of a mirror after a mirrored DS has been repaired. I am not sure why these errors occur, but my best current guess is a race between the Layout Recall issued by the kernel code run from pnfsdscopymr(8) and a Read operation on the DS for the file bing copied. The errrors are not fatal, since the client falls back on doing I/O through the MDS, which can do the I/O successfully as a proxy. (The fact that the MDS can do this indicates that the file does still exist on the functioning DS.) This patch only affects behaviour of the pNFS client and only when using Flexible File layouts. MFC after: 2 weeks	2018-07-13 12:39:27 +00:00
Rick Macklem	a6fed5f514	Modify the NFSv4.1 pNFS client to use separate TCP connections for DSs. Without this patch, the NFSv4.1 pNFS client shared a single TCP connection for all DSs that resided on the same machine. This made disabling one of the DSs impossible. Although unlikely, it is possible that the storage subsystem has failed in such a way that the storage for one DS on a machine is no longer functioning correctly, but the storage used by another DS on the same machine is still ok. For this case, it would be nice if a system can fail one of the DSs without failing them all. This patch changes the default behaviour to use separate TCP connections for each DS even if they reside on the same machine. I do not believe that this will be a problem for extant pNFS servers, but a sysctl can be set to restore the old behaviour if this change causes a problem for an extant pNFS server. This patch only affects the NFSv4.1 pNFS client. MFC after: 2 weeks	2018-07-12 20:46:22 +00:00
Rick Macklem	8361de2544	Ignore the cookie verifier for NFSv4.1 when the cookie is 0. RFC5661 states that the cookie verifier should be 0 when the cookie is 0. However, the wording is somewhat unclear and a recent discussion on the nfsv4@ietf.org mailing list indicated that the NFSv4 server should ignore the cookie verifier's value when the dirctory offset cookie is 0. This patch deletes the check for this that would return NFSERR_BAD_COOKIE when the verifier was not 0. This was found during testing of the ESXi client against the NFSv4.1 server. Reported by: daniel@ftml.net (via packet trace) MFC after: 2 weeks	2018-07-11 23:23:29 +00:00
Rick Macklem	de9a1a70ab	Add support for a "forced" pnfsdskill to the pNFS server kernel code. The pnfsdskill(8) command will normally fail if there is no valid mirror for the DS to be disabled. However, a system administrator may need to disable a DS which does not have a valid mirror so that the nfsd threads can be terminated. This patch adds the kernel code needed by pnfsdskill(8) to implement this "forced" case of disabling a DS. This patch only affects the pNFS server.	2018-07-09 19:58:01 +00:00
Rick Macklem	acc6e58def	Fix the kernel part of pnfsdscopymr() to handle holes in the file being copied. If a mirrored DS is being recovered that has a lot of large sparse files, pnfsdscopymr(8) would use a lot of space on the recovered mirror since it would write the "holes" in the file being mirrored. This patch adds code to check for a "hole" and skip doing the write. The check is done on a "per PNFSDS_COPYSIZ size block", which is currently 64K. I think that most file server file systems will be using a blocksize at least this large. If the file server is using a smaller blocksize and smaller holes need to be preserved, PNFSDS_COPYSIZ could be decreased. The block of 0s is malloc()d, since pnfsdcopymr(8) should be an infrequent occurrence.	2018-07-08 18:15:55 +00:00
Rick Macklem	ed66a76bca	Fix handling of the hybrid DS case for a pNFS server. After the addition of the "#mds_path" suffix for a DS specification on the "-p" nfsd option, it is possible to have a mix of DSs assigned to an MDS file system and DSs that store files for all DSs. This is what I referred to as "hybrid" above. At first, I didn't think this hybrid case would be useful, but I now believe that some system administrators may fine it useful. This patch modifies the file storage assignment algorithm so that it makes the "#mds_path" DSs take priority and the all file systems DSs are now only used for MDS file systems with no "#mds_path" DS servers. This only affects the pNFS server for this "hybrid" case.	2018-07-07 19:27:49 +00:00
Rick Macklem	5b500ea949	Change the pNFS server so that it does not disable a mirrored DS for an NFSERR_STALE error reported via a LayoutReturn. The current FreeBSD client can generate these errors for an operational DS while doing a recovery of a mirror after a mirrored DS has been repaired. I am not sure why these errors occur, but my best current guess is a race between the Layout Recall issued by the kernel code run from pnfsdscopymr(8) and a Read operation on the DS for the file bing copied. The errors are not fatal, since the client falls back on doing I/O through the MDS, which can do the I/O successfully as a proxy. (The fact that the MDS can do this indicates that the file does still exist on the functioning DS.) This change only affects the pNFS server and only when a client does a LayoutReturn with the NFSERR_STALE error report.	2018-07-06 19:18:45 +00:00
Rick Macklem	ff3b992f38	Fix the pNFS server so that it handles the "#mds_path" check for mirrors. The recently added feature of the pNFS server will set an fsid for the MDS file system to define the file system a DS should store files for. For a case where a DS handling all file systems has failed, it was possible for the code to check for a mirror with a specified fs, even though nfsdev_mdsisset was 0, possibly causing a false successful check for a mirror. This patch adds a check for nfsdev_mdsisset != 0 to avoid this. It only affects the pNFS server for a rare case. Found via code inspection.	2018-07-04 19:46:26 +00:00
Rick Macklem	2f32675c83	Add an optional feature to the pNFS server. Without this patch, the pNFS server distributes the data storage files across all of the specified DSs. A tester noted that it would be nice if a system administrator could control which DSs are used to store the file data for a given exported MDS file system. This patch adds the kernel support to do this. It also makes a slight semantic change to nfsv4_findmirror(), since some uses of it no longer require that the DS being searched for have a current mirror. A patch that will be committed in a few minutes will modify the nfsd daemon to support this feature. The patch should only affect sites using the pNFS server (specified via the "-p" command line option for nfsd. Suggested by: james.rose@framestore.com	2018-07-02 19:21:33 +00:00
Rick Macklem	1aabf3fd5e	Fix the pNFS server for a case where mirror level equals number of DSs. If a pNFS service was set up where the number of DSs equals the mirror level and then a DS was disabled, the service would create files with duplicate entries for the same DS. This bug occurred because I didn't realize that TAILQ_FOREACH_FROM() would start at the beginning of the list when the inital value of the variable was NULL. This patch also changes the pNFS server DS file creation code so that it creates entrie(s) with 0.0.0.0 IP address when it cannot create mirror level files due to lack of DSs. The patch only affects the pNFS service and only when it was created with a number of DSs equal to the mirror level and mirroring is enabled.	2018-06-29 12:41:36 +00:00
Rick Macklem	9f4c522e6b	Set the slotid and ND_HASSLOTID flag for NFSv4.1 sequenced operations. Most NFSv4.1 compound RPCs start with a Sequence operation. For these cases, save the slotid and note that it is saved by setting ND_HASSLOTID. This is used by r335568 to free up the session slot and disable it. MFC after: 2 weeks	2018-06-23 00:48:45 +00:00
Rick Macklem	b18130d330	Define ND_HASSLOTID needed by r335568. r335568 uses a flag called ND_HASSLOTID to indicate that the slotid is set, so it can free and invalidate it. This flag needs to be set, which will be done in a subsequent commit. MFC after: 2 weeks	2018-06-23 00:37:15 +00:00
Rick Macklem	ba6cce3aea	Fix the handling of NFSv4.1 sessions for "soft" mounts. When a "soft" mount is used for NFSv4.1, an RPC that fails without completing will leave a slot in the NFSv4.1 session in an indeterminate state. As such, all that can be done is free up the slot while making is no longer usable. A "soft" NFSv4.1 mount is not recommended in general, since it will leave Open/Lock state in an indeterminate state. An exception is a pNFS mount of a DS, since there are no Opens/Locks done for them except file creates where loss of the Open state does not matter. The patch also makes connections to DSs soft, so that they will fail when a DS is non-functional or network partitioned, allowing the pNFS MDS to disable the DS for a mirrored configuration. This patch should not affect normal "hard" NFSv4.1 mounts. MFC after: 2 weeks	2018-06-22 21:37:20 +00:00
Rick Macklem	2e35b8fe24	Change the NFSv4.1 pNFS client so that it returns the DS error in layoutreturn. When the NFSv4.1 pNFS client gets an error for a DS I/O operation using a Flexible File layout, it returns the layout with an error. This patch changes the code slightly, so that it returns the layout for all errors except EACCES and lets the MDS decide what to do based on the error. It also makes a couple of changes to nfscl_layoutrecall() to ensure that the first layoutreturn(s) will have the error in the reply. Plus, the patch adds a wakeup() so that the "nfscl" thread won't wait 1sec before doing the LayoutReturn. Tested against the pNFS service. This patch should not affect non-pNFS use of the client. The unused "dsp" argument will be used by a future patch that disables the connection to the DS when possible. MFC after: 2 weeks	2018-06-22 21:25:27 +00:00
Rick Macklem	c16f407e31	Add a counter to limit the number of disabled DSs for a mirrored pNFS MDS. This patch adds a counter that limits the number of disabled mirrored DSs to mirror level - 1. It also makes a small change that keeps a Write that has failed with EACCES when attempted by a client to a DS from disabling the DS. This patch only affects the pNFS server.	2018-06-22 00:55:39 +00:00
Rick Macklem	755e4b7936	Revert r335263, since it can cause crashes in unusual circumstances. This needs to be fixed in a different way.	2018-06-17 23:08:54 +00:00
Rick Macklem	2bad64241c	Make the pNFS NFSv4.1 client return a Flexible File layout upon error. The Flexible File layout LayoutReturn operation has argument fields where an I/O error encountered when attempting I/O on a DS can be reported back to the MDS. This patch adds code to the client to do this for the Flexible File layout mirrored case. This patch should only affect mounts using the "pnfs" option against servers that support the Flexible File layout. MFC after: 2 weeks	2018-06-17 16:30:06 +00:00
Rick Macklem	46d30d3d9c	Fix NFSv4.1 client side handling of "soft,retrans=2" mounts. Normally "soft,retrans=2" cannot be safely used on NFSv4 mounts, since the RPC can fail and leave the open/lock state in an undefined state. Doing I/O on a pNFS DS is an exception to this, since no open/lock state is maintained on the DS server. It is useful to do "soft,retrans=2" connections to a DS when it is mirrored, so that the client can detect failure of the DS. As such, mounts from the MDS to the DSs should use these mount options when mirroring is enabled. However, the NFSv4.1 client still leaves the session in an undefined state when this happens. This patch fixes the problem by setting the session defunct, so it will no longer be used. The patch also sets "retries=2" on the connections done by the client to a DS, which is the internal equivalent of "soft,retrans=2". The client does not know if the server implements mirroring at connection time, but always doing this should be safe, since it will fall back on doing I/O via the MDS as a proxy when there is a failure doing an I/O RPC to the DS. This patch should not affect non-pNFS client mounts. MFC after: 2 weeks	2018-06-16 19:45:06 +00:00
Rick Macklem	c338c94d20	Move four functions in nfscl.ko to nfscommon.ko. Four functions nfscl_reqstart(), nfscl_fillsattr(), nfsm_stateidtom() and nfsmnt_mdssession() are now called from within the nfsd. As such, they needed to be moved from nfscl.ko to nfscommon.ko so that nfsd.ko would load when nfscl.ko wasn't loaded. Reported by: herbert@gojira.at	2018-06-14 10:00:19 +00:00
Bruce Evans	ab35e1c71b	Fix the encoding of major and minor numbers in 64-bit dev_t by restoring the old encodings for the lower 16 and 32 bits and only using the higher 32 bits for unusually large major and minor numbers. This change breaks compatibility with the previous encoding (which was only used in -current). Fix truncation to (essentially) 16-bit dev_t in newnfs v3. Any encoding of device numbers gives an ABI, so it can't be changed without translations for compatibility. Extra bits give the much larger complication that the translations need to compress into fewer bits. Fortunately, more than 32 bits are rarely needed, so compression is rarely needed except for 16-bit linux dev_t where it was always needed but never done. The previous encoding moved the major number into the top 32 bits. Almost no translation code handled this, so the major number was blindly truncated away in most 32-bit encodings. E.g., for ffs, mknod(8) with major = 1 and minor = 2 gave dev_t = 0x10000002; ffs cannot represent this and blindly truncated it to 2. But if this mknod was run on any released version of FreeBSD, it gives dev_t = 0x102. ffs can represent this, but in the previous encoding it was not decoded, giving major = 0, minor = 0x102. The presence of bugs was most obvious for exporting dev_t's from an old system to -current, since bugs in newnfs augment them. I fixed oldnfs to support 32-bit dev_t in 1996 (r16634), but this regressed to 16-bit dev_t in newnfs, first to the old 16-bit encoding and then further in -current. E.g., old ad0 with major = 234, minor = 0x10002 had the correct (major, minor) number on the wire, but newnfs truncated this to (234, 2) and then the previous encoding shifted the major number into oblivion as seen by ffs or old applications. I first tried to fix this by translating on every ABI/API boundary, but there are too many boundaries and too many sloppy translations by blind truncation. So use the old encoding for the low 32 bits so that sloppy translations work no worse than before provided the high 32 bits are not set. Add some error checking for when bits are lost. Keep not doing any error checking for translations for almost everything in compat/linux. compat/freebsd32/freebsd32_misc.c: Optionally check for losing bits after possibly-truncating assignments as before. compat/linux/linux_stats.c: Depend on the representation being compatible with Linux's (or just with itself for local use) and spell some of the translations as assignments in a macro that hides the details. fs/nfsclient/nfs_clcomsubs.c: Essentially the same fix as in 1996, except there is now no possible truncation in makedev() itself. Also fix nearby style bugs. kern/vfs_syscalls.c: As for freebsd32. Also update the sysctl description to include file numbers, and change it to describe device ids as device numbers. sys/types.h: Use inline functions (wrapped by macros) since the expressions are now a bit too complicated for plain macros. Describe the encoding and some of the reasons for it. 16-bit compatibility didn't leave many reasonable choices for the 32-bit encoding, and 32-bit compatibility doesn't leave many reasonable choices for the 64-bit encoding. My choice is to put the 8 new minor bits in the low 8 bits of the top 32 bits. This minimizes discontiguities. Reviewed by: kib (except for rewrite of the comment in linux_stats.c)	2018-06-13 12:22:00 +00:00
Rick Macklem	90d2dfab19	Merge the pNFS server code from projects/pnfs-planb-server into head. This code merge adds a pNFS service to the NFSv4.1 server. Although it is a large commit it should not affect behaviour for a non-pNFS NFS server. Some documentation on how this works can be found at: http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt and will hopefully be turned into a proper document soon. This is a merge of the kernel code. Userland and man page changes will come soon, once the dust settles on this merge. It has passed a "make universe", so I hope it will not cause build problems. It also adds NFSv4.1 server support for the "current stateid". Here is a brief overview of the pNFS service: A pNFS service separates the Read/Write oeprations from all the other NFSv4.1 Metadata operations. It is hoped that this separation allows a pNFS service to be configured that exceeds the limits of a single NFS server for either storage capacity and/or I/O bandwidth. It is possible to configure mirroring within the data servers (DSs) so that the data storage file for an MDS file will be mirrored on two or more of the DSs. When this is used, failure of a DS will not stop the pNFS service and a failed DS can be recovered once repaired while the pNFS service continues to operate. Although two way mirroring would be the norm, it is possible to set a mirroring level of up to four or the number of DSs, whichever is less. The Metadata server will always be a single point of failure, just as a single NFS server is. A Plan B pNFS service consists of a single MetaData Server (MDS) and K Data Servers (DS), all of which are recent FreeBSD systems. Clients will mount the MDS as they would a single NFS server. When files are created, the MDS creates a file tree identical to what a single NFS server creates, except that all the regular (VREG) files will be empty. As such, if you look at the exported tree on the MDS directly on the MDS server (not via an NFS mount), the files will all be of size 0. Each of these files will also have two extended attributes in the system attribute name space: pnfsd.dsfile - This extended attrbute stores the information that the MDS needs to find the data storage file(s) on DS(s) for this file. pnfsd.dsattr - This extended attribute stores the Size, AccessTime, ModifyTime and Change attributes for the file, so that the MDS doesn't need to acquire the attributes from the DS for every Getattr operation. For each regular (VREG) file, the MDS creates a data storage file on one (or more if mirroring is enabled) of the DSs in one of the "dsNN" subdirectories. The name of this file is the file handle of the file on the MDS in hexadecimal so that the name is unique. The DSs use subdirectories named "ds0" to "dsN" so that no one directory gets too large. The value of "N" is set via the sysctl vfs.nfsd.dsdirsize on the MDS, with the default being 20. For production servers that will store a lot of files, this value should probably be much larger. It can be increased when the "nfsd" daemon is not running on the MDS, once the "dsK" directories are created. For pNFS aware NFSv4.1 clients, the FreeBSD server will return two pieces of information to the client that allows it to do I/O directly to the DS. DeviceInfo - This is relatively static information that defines what a DS is. The critical bits of information returned by the FreeBSD server is the IP address of the DS and, for the Flexible File layout, that NFSv4.1 is to be used and that it is "tightly coupled". There is a "deviceid" which identifies the DeviceInfo. Layout - This is per file and can be recalled by the server when it is no longer valid. For the FreeBSD server, there is support for two types of layout, call File and Flexible File layout. Both allow the client to do I/O on the DS via NFSv4.1 I/O operations. The Flexible File layout is a more recent variant that allows specification of mirrors, where the client is expected to do writes to all mirrors to maintain them in a consistent state. The Flexible File layout also allows the client to report I/O errors for a DS back to the MDS. The Flexible File layout supports two variants referred to as "tightly coupled" vs "loosely coupled". The FreeBSD server always uses the "tightly coupled" variant where the client uses the same credentials to do I/O on the DS as it would on the MDS. For the "loosely coupled" variant, the layout specifies a synthetic user/group that the client uses to do I/O on the DS. The FreeBSD server does not do striping and always returns layouts for the entire file. The critical information in a layout is Read vs Read/Writea and DeviceID(s) that identify which DS(s) the data is stored on. At this time, the MDS generates File Layout layouts to NFSv4.1 clients that know how to do pNFS for the non-mirrored DS case unless the sysctl vfs.nfsd.default_flexfile is set non-zero, in which case Flexible File layouts are generated. The mirrored DS configuration always generates Flexible File layouts. For NFS clients that do not support NFSv4.1 pNFS, all I/O operations are done against the MDS which acts as a proxy for the appropriate DS(s). When the MDS receives an I/O RPC, it will do the RPC on the DS as a proxy. If the DS is on the same machine, the MDS/DS will do the RPC on the DS as a proxy and so on, until the machine runs out of some resource, such as session slots or mbufs. As such, DSs must be separate systems from the MDS. Tested by: james.rose@framestore.com Relnotes: yes	2018-06-12 19:36:32 +00:00

1 2 3 4 5 ...

3824 Commits