freebsd-dev

Author	SHA1	Message	Date
Fedor Uporov	4ff6603ab3	Do not panic if inode bitmap is corrupted. admbug: 804 Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com> Reviewed by: pfg MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D19325	2019-03-04 11:12:19 +00:00
Fedor Uporov	80a4a9716b	Validate block bitmaps. Reviewed by: pfg MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D19324	2019-03-04 11:01:23 +00:00
Fedor Uporov	daa2d62da2	Add additional on-disk inode checks. Reviewed by: pfg MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D19323	2019-03-04 10:55:01 +00:00
Fedor Uporov	6e38bf94e5	Make superblock reading logic more strict. Add more on-disk superblock consistency checks to ext2_compute_sb_data() function. It should decrease the probability of mounting filesystems with corrupted superblock data. Reviewed by: pfg MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D19322	2019-03-04 10:42:25 +00:00
Alan Somers	c02ccc7e44	Fix typos from r344664 Sponsored by: The FreeBSD Foundation	2019-03-01 15:49:11 +00:00
Alan Somers	cf16949867	fuse(4): convert debug printfs into dtrace probes fuse(4) was heavily instrumented with debug printf statements that could only be enabled with compile-time flags. They fell into three basic groups: 1) Totally redundant with dtrace FBT probes. These I deleted. 2) Print textual information, usually error messages. These I converted to SDT probes of the form fuse:fuse:FILE:trace. They work just like the old printf statements except they can be enabled at runtime with dtrace. They can be filtered by FILE and/or by priority. 3) More complicated probes that print detailed information. These I converted into ad-hoc SDT probes. Sponsored by: The FreeBSD Foundation	2019-02-28 19:27:54 +00:00
Conrad Meyer	f6ebb68395	fuse: Fix a regression introduced in r337165 On systems with non-default DFLTPHYS and/or MAXBSIZE, FUSE would attempt to use a buf cache block size in excess of permitted size. This did not affect most configurations, since DFLTPHYS and MAXBSIZE both default to 64kB. The issue was discovered and reported using a custom kernel with a DFLTPHYS of 512kB. PR: 230260 (comment #9) Reported by: ken@ MFC after: π/𝑒 weeks	2019-02-21 02:41:57 +00:00
Matt Macy	81167243b4	PFS: Bump NAMELEN and don't require clients to be sleepable - debugfs consumers expect to be able to export names more than 48 characters - debugfs consumers expect to be able to hold locks across calls and are able to handle allocation failures Reviewed by: hps@ MFC after: 1 week Sponsored by: iX Systems Differential Revision: https://reviews.freebsd.org/D19256	2019-02-20 20:55:02 +00:00
Conrad Meyer	02295caf43	Fuse: whitespace and style(9) cleanup Take a pass through fixing some of the most egregious whitespace issues in fs/fuse. Also fix some style(9) warts while here. Not 100% cleaned up, but somewhat less painful to look at and edit. No functional change.	2019-02-20 02:49:26 +00:00
Conrad Meyer	bd4cb2a46d	fuse: add descriptions for remaining sysctls (Except reclaim revoked; I don't know what that goal of that one is.)	2019-02-20 02:48:59 +00:00
Edward Tomasz Napierala	c9172fb4f1	Work around the "nfscl: bad open cnt on server" assertion that can happen when rerooting into NFSv4 rootfs with kernel built with INVARIANTS. I've talked to rmacklem@ (back in 2017), and while the root cause is still unknown, the case guarded by assertion (nfscl_doclose() being called from VOP_INACTIVE) is believed to be safe, and the whole thing seems to run just fine. Obtained from: CheriBSD MFC after: 2 weeks Sponsored by: DARPA, AFRL	2019-02-19 12:45:37 +00:00
Conrad Meyer	3c324b9465	FUSE: Refresh cached file size when it changes (lookup) The cached fvdat->filesize is indepedent of the (mostly unused) cached_attrs, and we failed to update it when a cached (but perhaps inactive) vnode was found during VOP_LOOKUP to have a different size than cached. As noted in the code comment, this can occur in distributed filesystems or with other kinds of irregular file behavior (anything is possible in FUSE). We do something similar in fuse_vnop_getattr already. PR: 230258 (as reported in description; other issues explored in comments are not all resolved) Reported by: MooseFS FreeBSD Team <freebsd AT moosefs.com> Submitted by: Jakub Kruszona-Zawadzki <acid AT moosefs.com> (earlier version)	2019-02-15 22:55:13 +00:00
Conrad Meyer	c4af8b173a	FUSE: The FUSE design expects writethrough caching At least prior to 7.23 (which adds FUSE_WRITEBACK_CACHE), the FUSE protocol specifies only clean data to be cached. Prior to this change, we implement and default to writeback caching. This is ok enough for local only filesystems without hardlinks, but violates the general design contract with FUSE and breaks distributed filesystems or concurrent access to hardlinks of the same inode. In this change, add cache mode as an extension of cache enable/disable. The new modes are UC (was: cache disabled), WT (default), and WB (was: cache enabled). For now, WT caching is implemented as write-around, which meets the goal of only caching clean data. WT can be better than WA for workloads that frequently read data that was recently written, but WA is trivial to implement. Note that this has no effect on O_WRONLY-opened files, which were already coerced to write-around. Refs: * https://sourceforge.net/p/fuse/mailman/message/8902254/ * https://github.com/vgough/encfs/issues/315 PR: 230258 (inspired by)	2019-02-15 22:52:49 +00:00
Conrad Meyer	194e691aaf	FUSE: Only "dirty" cached file size when data is dirty Most users of fuse_vnode_setsize() set the cached fvdat->filesize and update the buf cache bounds as a result of either a read from the underlying FUSE filesystem, or as part of a write-through type operation (like truncate => VOP_SETATTR). In these cases, do not set the FN_SIZECHANGE flag, which indicates that an inode's data is dirty (in particular, that the local buf cache and fvdat->filesize have dirty extended data). PR: 230258 (related)	2019-02-15 22:51:09 +00:00
Conrad Meyer	09176f096b	FUSE: Respect userspace FS "do-not-cache" of path components The FUSE protocol demands that kernel implementations cache user filesystem path components (lookup/cnp data) for a maximum period of time in the range of [0, ULONG_MAX] seconds. In practice, typical requests are for 0, 1, or 10 seconds; or "a long time" to represent indefinite caching. Historically, FreeBSD FUSE has ignored this client directive entirely. This works fine for local-only filesystems, but causes consistency issues with multi-writer network filesystems. For now, respect 0 second cache TTLs and do not cache such metadata. Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds are still cached indefinitely, because it is unclear how a userspace filesystem could do anything sensible with those semantics even if implemented. Pass fuse_entry_out to fuse_vnode_get when available and only cache lookup if the user filesystem did not set a zero second TTL. PR: 230258 (inspired by; does not fix)	2019-02-15 22:50:31 +00:00
Conrad Meyer	78a7722fbc	FUSE: Respect userspace FS "do-not-cache" of file attributes The FUSE protocol demands that kernel implementations cache user filesystem file attributes (vattr data) for a maximum period of time in the range of [0, ULONG_MAX] seconds. In practice, typical requests are for 0, 1, or 10 seconds; or "a long time" to represent indefinite caching. Historically, FreeBSD FUSE has ignored this client directive entirely. This works fine for local-only filesystems, but causes consistency issues with multi-writer network filesystems. For now, respect 0 second cache TTLs and do not cache such metadata. Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds are still cached indefinitely, because it is unclear how a userspace filesystem could do anything sensible with those semantics even if implemented. In the future, as an optimization, we should implement notify_inval_entry, etc, which provide userspace filesystems a way of evicting the kernel cache. One potentially bogus access to invalid cached attribute data was left in fuse_io_strategy. It is restricted behind the undocumented and non-default "vfs.fuse.fix_broken_io" sysctl or "brokenio" mount option; maybe these are deadcode and can be eliminated? Some minor APIs changed to facilitate this: 1. Attribute cache validity is tracked in FUSE inodes ("fuse_vnode_data"). 2. cache_attrs() respects the provided TTL and only caches in the FUSE inode if TTL > 0. It also grows an "out" argument, which, if non-NULL, stores the translated fuse_attr (even if not suitable for caching). 3. FUSE VTOVA(vp) returns NULL if the vnode's cache is invalid, to help avoid programming mistakes. 4. A VOP_LINK check for potential nlink overflow prior to invoking the FUSE link op was weakened (only performed when we have a valid attr cache). The check is racy in a multi-writer network filesystem anyway -- classic TOCTOU. We have to trust any userspace filesystem that rejects local caching to account for it correctly. PR: 230258 (inspired by; does not fix)	2019-02-15 22:49:15 +00:00
Konstantin Belousov	b9662886ef	Un null_vptocnp(), cache vp->v_mount and use it for null_nodeget() call. The vp vnode is unlocked during the execution of the VOP method and can be reclaimed, zeroing vp->v_data. Caching allows to use the correct mount point. Reported and tested by: pho PR: 235549 Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-02-08 08:20:18 +00:00
Konstantin Belousov	25728e8411	Before using VTONULL(), check that the covered vnode belongs to nullfs. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-02-08 08:17:31 +00:00
Konstantin Belousov	930cc2dbef	Some style for nullfs_mount(). Also use bool type for isvnunlocked. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-02-08 08:15:29 +00:00
Pedro F. Giffuni	771ec59bb7	ext2fs: Add some extra consistency checks for the superblock. Maliciously formed, or badly corrupted, filesystems can cause kernel panics. In general, such acts of foot-shooting can only be accomplished by root, but in a world with VM images that is moving towards automated mounts it is important to have some form of prevention. Reported by: Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE. Incidentaly this should also fix a memory corruption issue reported by Dr Silvio Cesare of InfoSect. Huge thanks to all reseachers for making us aware of the issue. admbug: 872, 891 Reviewed by: fsu Obtained from: NetBSD (with minor changes) MFC after: 3 days	2019-01-25 22:22:29 +00:00
Mark Johnston	d9463dd4f3	nfs: Zero the buffers exported by NFSSVC_DUMPCLIENTS and DUMPLOCKS. Note that these interfaces are available only to root. admbugs: 765 Reported by: Vlad Tsyrklevich <vlad@tsyrklevich.net> Reviewed by: rmacklem MFC after: 1 day Security: Kernel memory disclosure Sponsored by: The FreeBSD Foundation	2019-01-21 23:54:33 +00:00
Oleksandr Tymoshenko	52b2c8e242	[smbfs] Allow semicolon in mounts that support long names Semicolon is a legal character in long names but not in 8.3 format. Move it to respective character set. PR: 140068 Submitted by: tom@uffner.com MFC after: 3 weeks	2019-01-20 05:52:16 +00:00
Gleb Smirnoff	756a541279	Allocate pager bufs from UMA instead of 80-ish mutex protected linked list. o In vm_pager_bufferinit() create pbuf_zone and start accounting on how many pbufs are we going to have set. In various subsystems that are going to utilize pbufs create private zones via call to pbuf_zsecond_create(). The latter calls uma_zsecond_create(), and sets a limit on created zone. After startup preallocate pbufs according to requirements of all pbuf zones. Subsystems that used to have a private limit with old allocator now have private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS cluster, FFS, swap, vnode pager. The following subsystems use shared pbuf zone: cam(4), nvme(4), physio(9), aio(4). They should have their private limits, but changing that is out of scope of this commit. o Fetch tunable value of kern.nswbuf from init_param2() and while here move NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that was holding only this option. Default values aren't touched by this commit, but they probably should be reviewed wrt to modern hardware. This change removes a tight bottleneck from sendfile(2) operation, that uses pbufs in vnode pager. Other pagers also would benefit from faster allocation. Together with: gallatin Tested by: pho	2019-01-15 01:02:16 +00:00
Kirk McKusick	c0029546f8	When loading an inode from disk, verify that its mode is valid. If invalid, return EINVAL. Note that inode check-hashes greatly reduce the chance that these errors will go undetected. Reported by: Christopher Krah <krah@protonmail.com> Reported as: FS-5-UFS-2: Denial Of Service in nmount-3 (ffs_read) Reviewed by: kib MFC after: 1 week Sponsored by: Netflix M sys/fs/ext2fs/ext2_vnops.c M sys/kern/vfs_subr.c M sys/ufs/ffs/ffs_snapshot.c M sys/ufs/ufs/ufs_vnops.c	2018-12-27 07:18:53 +00:00
Bruce Evans	416e232cc6	Fix clobbering of the fatchain cache for clustered i/o's when full clustering is not done. The bug caused extreme slowness for large files in some cases. There is no way to tell VOP_BMAP() how many blocks are wanted, so for all file systems it has to waste time in some cases by searching for more contiguous blocks than will be accessed. For msdosfs, it also clobbered the fatchain cache in these cases by advancing the cache to point to the chain entry for block that won't be read. This makes the cache useless for the next sequential i/o (or VOP_BMAP()), so the fat chain is searched from the beginning. The cache only has 1 relevant entry, so it is similarly useless for random i/o. Fix this by only advancing the cache to point to the chain entry for the first block that will be read. Clustering uses results from VOP_BMAP(), so when more than 1 block is read by clustering, the cache is not advanced as optimally as before, but it is at most 1 cluster size behind and searching the chain through the blocks for this cluster doesn't take too long.	2018-12-21 21:17:45 +00:00
Bruce Evans	8ec22c4d65	Quick fix for initialization of mnt_iosize_max. (This limit controls mainly clustering and read-ahead.) Copy the initialization from ffs, and also copy a couple of lines of ffs's nearby style for initialization order and whitespace. A correct fix would de-duplicate the initialization and fix bitrot in it instead of adding another instance of the duplication. Complications to use the size preferred by the device have been reduced to hard-coding slightly pessimal and/or inconsistent defaults, using large code that was almost needed to support the complications. For msdosfs, the result was that mnt_iosize_max was DFTLPHYS (64K) but is now MAXPHYS (128K).	2018-12-21 20:12:43 +00:00
Rick Macklem	23114c6c2a	Fix the NFSv4 server to obey vfs.nfsd.nfs_privport. When the NFSv4 server was coded, I believed that the specification authors did not want NFSv4 servers to require a client to use a reserved port#. However, recently it has been noted that the Linux NFSv4 server does support a check for a reserved port#. Since both the FreeBSD and Linux NFSv4 clients use a reserved port# by default, enabling vfs.nfsd.nfs_privport to require a reserved port# for NFSv4 the same as it does for NFSv2, 3 seems reasonable. The only case where this could cause a POLA violation is a FreeBSD NFSv4 server with vfs.nfsd.nfs_privport set, but with NFSv4 clients doing mounts without using a reserved port# (< 1024). Tested by: chaz.newton58@gmail.com PR: 234106 MFC after: 1 week	2018-12-20 22:21:41 +00:00
Mateusz Guzik	cc426dd319	Remove unused argument to priv_check_cred. Patch mostly generated with cocinnelle: @@ expression E1,E2; @@ - priv_check_cred(E1,E2,0) + priv_check_cred(E1,E2) Sponsored by: The FreeBSD Foundation	2018-12-11 19:32:16 +00:00
Mark Johnston	352aaa5122	Plug memory disclosures via ptrace(2). On some architectures, the structures returned by PT_GET*REGS were not fully populated and could contain uninitialized stack memory. The same issue existed with the register files in procfs. Reported by: Thomas Barabosch, Fraunhofer FKIE Reviewed by: kib MFC after: 3 days Security: kernel stack memory disclosure Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18421	2018-12-03 20:54:17 +00:00
Mark Johnston	fee65dfc37	Ensure the dirent remains initialized when dirent.d_fileno is unset. Reported by: rmacklem MFC with: r340856 Sponsored by: The FreeBSD Foundation	2018-11-23 23:07:49 +00:00
Mark Johnston	6d2e2df764	Ensure that directory entry padding bytes are zeroed. Directory entries must be padded to maintain alignment; in many filesystems the padding was not initialized, resulting in stack memory being copied out to userspace. With the ino64 work there are also some explicit pad fields in struct dirent. Add a subroutine to clear these bytes and use it in the in-tree filesystems. The NFS client is omitted for now as it was fixed separately in r340787. Reported by: Thomas Barabosch, Fraunhofer FKIE Reviewed by: kib MFC after: 3 days Sponsored by: The FreeBSD Foundation	2018-11-23 22:24:59 +00:00
Rick Macklem	f86bce1770	Make sure the NFS readdir client fills in all "struct dirent" data. The NFS client code (nfsrpc_readdir() and nfsrpc_readdirplus()) wasn't filling in parts of the readdir reply, such as d_pad[01] and the bytes at the end of d_name within d_reclen. As such, data left in a buffer cache block could be leaked to userland in the readdir reply. This patch makes sure all of the data is filled in. Reported by: Thomas Barabosch, Fraunhofer FKIE Reviewed by: kib, markj MFC after: 2 weeks	2018-11-23 00:17:47 +00:00
Mateusz Guzik	53011553fa	proc: convert pfind & friends to use pidhash locks and other cleanup pfind_locked is retired as it relied on allproc which unnecessarily restricts locking of the hash. Sponsored by: The FreeBSD Foundation	2018-11-21 20:15:56 +00:00
Mateusz Guzik	30e0cf499f	tmpfs: use unr64 for inode numbers Sponsored by: The FreeBSD Foundation	2018-11-20 15:14:30 +00:00
Rick Macklem	75772b69f2	Improve sanity checking for the dircount hint argument to NFSv3's ReaddirPlus and NFSv4's Readdir operations. The code checked for a zero argument, but did not check for a very large value. This patch clips dircount at the server's maximum data size. MFC after: 1 week	2018-11-20 01:59:57 +00:00
Rick Macklem	778f29833b	nfsm_advance() would panic() when the offs argument was negative. The code assumed that this would indicate a corrupted mbuf chain, but it could simply be caused by bogus RPC message data. This patch replaces the panic() with a printf() plus error return. MFC after: 1 week	2018-11-20 01:56:34 +00:00
Rick Macklem	1d171e7971	r304026 added code that started statistics gathering for an operation before the operation number (the variable called "op") was sanity checked. This patch moves the code down to below the range sanity check for "op".	2018-11-20 01:52:45 +00:00
Mark Johnston	3d2a0fe762	Remove comments made obsolete by the ino64 work. MFC after: 3 days Sponsored by: The FreeBSD Foundation	2018-11-19 17:33:44 +00:00
Konstantin Belousov	1c4ca77890	Add d_off support for multiple filesystems. The d_off field has been added to the dirent structure recently. Currently filesystems don't support this feature. Support has been added and tested for zfs, ufs, ext2fs, fdescfs, msdosfs and unionfs. A stub implementation is available for cd9660, nandfs, udf and pseudofs but hasn't been tested. Motivation for this feature: our usecase is for a userspace nfs server (nfs-ganesha) with zfs. At the moment we cache direntry offsets by calling lseek once per entry, with this patch we can get the offset directly from getdirentries(2) calls which provides a significant speedup. Submitted by: Jack Halford <jack@gandi.net> Reviewed by: mckusick, pfg, rmacklem (previous versions) Sponsored by: Gandi.net MFC after: 1 week Differential revision: https://reviews.freebsd.org/D17917	2018-11-14 14:18:35 +00:00
Rick Macklem	6ad8a6eaa4	Change nfs_advlock() so that the NFSVOPUNLOCK() is mostly done at the end. Prior to this patch, nfs_advlock() did NFSVOPUNLOCK(); return (error); in many places. This patch replaces these code sequenences with a "goto out;" and does the NFSVOPUNLOCK(); return (error); at the end of the function in order to make the vnode locking simpler. This patch does not change the semantics of nfs_advlock(). Suggested by: kib Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D17853	2018-11-06 22:50:50 +00:00
Brooks Davis	318f0d7720	Use declared types for caddr_t arguments. Leave ptrace(2) alone for the moment as it's defined to take a caddr_t. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D17852	2018-11-06 18:46:38 +00:00
Brooks Davis	1493c2ee62	Make vop_symlink take a const target path. This will enable callers to take const paths as part of syscall decleration improvements. Where doing so is easy and non-distruptive carry the const through implementations. In UFS the value is passed to an interface that must take non-const values. In ZFS, const poisoning would touch code shared with upstream and it's not worth adding diffs. Bump __FreeBSD_version for external API consumers. Reviewed by: kib (prior version) Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D17805	2018-11-02 14:42:36 +00:00
Rick Macklem	881a9516a2	Fix NFS client vnode locking to avoid a crash during forced dismount. A crash was reported where the crash occurred in nfs_advlock() when the NFS_ISV4(vp) macro was being executed. This was caused by the vnode being VI_DOOMED due to a forced dismount in progress. This patch fixes the problem by locking the vnode before executing the NFS_ISV4() macro. Tested by: rlibby PR: 232673 Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D17757	2018-11-01 15:27:22 +00:00
Brooks Davis	ed34a7fcf2	Move 32-bit compat support for FIODGNAME to the right place. ioctl(2) commands only have meaning in the context of a file descriptor so translating them in the syscall layer is incorrect. The new handler users an accessor to retrieve/construct a pointer from the last member of the passed structure and relies on type punning to access the other member which requires no translation. Unlike r339174 this change supports both places FIODGNAME is handled. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D17475	2018-10-26 17:59:25 +00:00
Konstantin Belousov	8ff7fad1d7	Only call sigdeferstop() for NFS. Use bypass to catch any NFS VOP dispatch and route it through the wrapper which does sigdeferstop() and then dispatches original VOP. NFS does not need a bypass below it, which is not supported. The vop offset in the vop_vector is added since otherwise it is impossible to get vop_op_t from the internal table, and I did not wanted to create the layered fs only to wrap NFS VOPs. VFS_OP()s wrap is straightforward. Requested and reviewed by: mjg (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D17658	2018-10-23 21:43:41 +00:00
Andriy Gapon	ca8f3d1ca2	nfsrvd_readdirplus: for some errors, do not fail the entire request Instead, a failing entry is skipped. This change consist of two logical changes. A failure to vget or lookup an entry is considered to be a result of a concurrent removal, which is the only reasonable explanation given that the filesystem is busied. So, the entry would be silently skipped. In the case of a failure to get attributes of an entry for an NFSv3 request, the entry would be silently skipped. There can be legitimate reasons for the failure, but NFSv3 does not provide any means to report the error, so we have two options: either fail the whole request or ignore the failed entry. Traditionally, the old NFS server used the latter option, so the code is reverted to it. Making the whole directory unreadable because of a single entry seems to be unpractical. Additionally, some bits of code are slightly re-arranged to account for the new control flow and to honor style(9). Reviewed by: rmacklem Sponsored by: Panzura Differential Revision: https://reviews.freebsd.org/D15424	2018-10-22 15:33:05 +00:00
Rick Macklem	910ccc7727	Fix the pNFS server's reporting of disk space usage for the "#<path>" case. The pNFS server would report the total disk space used and free for all of the DSs, even when certain DSs are assigned to the file system via the "#<path>" suffix used in the "nfsd -p" option argument. This patch fixes this case. It only reports usage for the file system that the argument vnode resides on. This is consistent with the non-pNFS NFSv4 server. In NFSv4 it is possible to have subtrees on other file systems, but these are not included in the usage information for NFSv4. Approved by: re (gjb)	2018-10-09 01:10:50 +00:00
Brooks Davis	9bc603bd20	Revert r339174: Move 32-bit compat support for FIODGNAME to the right place. A case was missed in this commit which breaks sshing into a 32-bit sshd on a 64-bit system. Approved by: re (gjb)	2018-10-04 23:55:03 +00:00
Brooks Davis	23f2e22802	Move 32-bit compat support for FIODGNAME to the right place. ioctl(2) commands only have meaning in the context of a file descriptor so translating them in the syscall layer is incorrect. The new handler users an accessor to retrieve/construct a pointer from the last member of the passed structure and relies on type punning to access the other member which requires no translation. Reviewed by: kib Approved by: re (rgrimes, gjb) Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Review: https://reviews.freebsd.org/D17388	2018-10-03 20:39:48 +00:00
Mark Murray	19fa89e938	Remove the Yarrow PRNG algorithm option in accordance with due notice given in random(4). This includes updating of the relevant man pages, and no-longer-used harvesting parameters. Ensure that the pseudo-unit-test still does something useful, now also with the "other" algorithm instead of Yarrow. PR: 230870 Reviewed by: cem Approved by: so(delphij,gtetlow) Approved by: re(marius) Differential Revision: https://reviews.freebsd.org/D16898	2018-08-26 12:51:46 +00:00
Fedor Uporov	28f4f62303	FUSE extattrs: fix issue when neither uio nor size were not passed to VOP_* (cosmetic only). Reviewed by: cem, pfg MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D13737	2018-08-21 18:50:29 +00:00
Fedor Uporov	493b4a8ccd	FUSE extattrs: fix issue when neither uio nor size were not passed to VOP_*. The requested size was returned incorrectly in case uio == NULL from listextattr because the nameprefix/name conversion was not applied. Also, make a_size/uio returning logic more unified with other filesystems. Reviewed by: cem, pfg MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D13528	2018-08-21 18:39:47 +00:00
Fedor Uporov	4c1e1d2bcc	Change unused inodes counters behavior in the cylinder groups. Make it more close to native ext4 implementation to avoid fsck errors.	2018-08-21 18:39:29 +00:00
Fedor Uporov	e49d64a7a7	Fix directory blocks checksum updating logic. Count dirent tail in the searchslot logic in case of directory block search. Add htree root csum update function call in case of rename.	2018-08-21 18:39:02 +00:00
Rick Macklem	fdab4d3b29	Fix LORs between vn_start_write() and vn_lock() in nfsrv_copymr(). When coding the pNFS server, I added vn_start_write() calls in nfsrv_copymr() done while the vnodes were locked, not realizing I had introduced LORs and possible deadlock when an exported file system on the MDS is suspended. This patch fixes the LORs by moving the vn_start_write() calls up to before where the vnodes are locked. For "tvp", the vn_start_write() probaby isn't necessary, because NFS mounts can't be suspended. However, I think doing so is harmless. Thanks go to kib@ for letting me know that I had introduced these LORs. This patch only affects the behaviour of the pNFS server when pnfsdscopymr(8) is used to recover a mirrored DS.	2018-08-18 19:14:06 +00:00
Rick Macklem	3e5ba2e187	Fix LORs between vn_start_write() and vn_lock() in the pNFS server. When coding the pNFS server, I added several vn_start_write() calls done while the vnode was locked, not realizing I had introduced LORs and possible deadlock when an exported file system on the MDS is suspended. This patch fixes this by removing the added vn_start_write() calls and modifying the code so that the extant vn_start_write() call before the NFS RPC/operation is done when needed by the pNFS server. Flags are changed so that LayoutCommit and LayoutReturn now get a vn_start_write() done for them. When the pNFS server is enabled, the code now also changes the flags for Getattr, so that the vn_start_write() is done for Getattr, since it may need to do a vn_set_extattr(). The nfs_writerpc flag array was made global to the NFS server and renamed nfsrv_writerpc, which is consistent naming for globals in the NFS server. Thanks go to kib@ for reporting that doing vn_start_write() while the vnode is locked results in a LOR. This patch only affects the behaviour of the pNFS server.	2018-08-17 21:12:16 +00:00
Rick Macklem	9fbb0faf4f	Don't set a file's size for the MDS file of a pNFS service. When a pNFS service is running, the size of the files created on the MDS are normally 0, since the data is written to the data files on the DS(s). However, without this patch, if a Setattr with a non-zero size was done by a client, the MDS file was set to that size. This was thought to be benign, but it turns out that files with a non-zero size plus extended attributes can cause a "ffs_truncate3" panic in UFS. Although the exact cause of this panic() has not been isolated, this patch avoids the panic() and leaves the MDS files in a consistent state of always having a size == 0. Note that these MDS files never store data. The patch also includes an unnecessary initialization of savsize in case some compiler or static analyser complains it might not be initialized. This patch only affects the NFS server when pNFS is enabled via the "-p" command line option on nfsd.	2018-08-17 12:32:38 +00:00
Jamie Gritton	284001a222	Put jail(2) under COMPAT_FREEBSD11. It has been the "old" way of creating jails since FreeBSD 7. Along with the system call, put the various security.jail.allow_foo and security.jail.foo_allowed sysctls partly under COMPAT_FREEBSD11 (or BURN_BRIDGES). These sysctls had two disparate uses: on the system side, they were global permissions for jails created via jail(2) which lacked fine-grained permission controls; inside a jail, they're read-only descriptions of what the current jail is allowed to do. The first use is obsolete along with jail(2), but keep them for the second-read-only use. Differential Revision: D14791	2018-08-16 18:40:16 +00:00
Conrad Meyer	5cb27f0813	FUSE: Document global sysctl knobs So that I don't have to keep grepping around the codebase to remember what each one does. And maybe it saves someone else some time. Fix a trivial whitespace issue while here. No functional change. Sponsored by: Dell EMC Isilon	2018-08-15 17:41:19 +00:00
Toomas Soome	527d337fdb	cd9660 pointer sign issues and missing __packed attribute The isonum_* functions are defined to take unsigend char* as an argument, but the structure fields are defined as char. Change to u_char where needed. Probably the full structure should be changed, but I'm not sure about the side affects. While there, add __packed attribute. Differential Revision: https://reviews.freebsd.org/D16564	2018-08-15 06:42:31 +00:00
Rick Macklem	41df1b5b47	Assorted fixes to handling of LayoutRecall callbacks, mostly error handling. After a re-read of the appropriate section of RFC5661, I decided that a few things should be changed related to LayoutRecall callback handling. Here are the things fixed by this patch. - For two of the three cases that LayoutRecall is done, I now think setting the clora_changed argument false is correct. - All errors other than NFSERR_DELAY returned by LayoutRecall appear permanent, so don't retry for any of them. (NFSERR_DELAY is retried by newnfs_request(), so it is not affected by this patch.) - Instead of waiting "forever" (actually until the process is SIGTERM'd) for Layouts to be returned during a mirror copy, fail and return ENXIO after about 1minute. Waiting for a <ctrl>C made sense when pnfsdscopymr() was done by itself, but did not make sense when done via find(1). This patch only affects the pNFS server.	2018-08-08 20:21:45 +00:00
Pedro F. Giffuni	c820acbf0a	msdosfs: fixes for Undefined Behavior. These were found by the Undefined Behaviour GsoC project at NetBSD: Do not change signedness bit with left shift. While there avoid signed integer overflow. Address both issues with using unsigned type. msdosfs_fat.c:512:42, left shift of 1 by 31 places cannot be represented in type 'int' msdosfs_fat.c:521:44, left shift of 1 by 31 places cannot be represented in type 'int' msdosfs_fat.c:744:14, left shift of 1 by 31 places cannot be represented in type 'int' msdosfs_fat.c:744:24, signed integer overflow: -2147483648 - 1 cannot be represented in type 'int [20]' msdosfs_fat.c:840:13, left shift of 1 by 31 places cannot be represented in type 'int' msdosfs_fat.c:840:36, signed integer overflow: -2147483648 - 1 cannot be represented in type 'int [20]' Detected with micro-UBSan in the user mode. Hinted from: NetBSD (CVS 1.33) MFC after: 2 weeks Differenctial Revision: https://reviews.freebsd.org/D16615	2018-08-08 15:08:22 +00:00
Fedor Uporov	53288b712d	Split the dir_index and dir_nlink features. Do not allow to create more that EXT4_LINK_MAX links to directory in case if the dir_nlink is not set, like it is done in the fresh e2fsprogs updates. MFC after: 3 months	2018-08-08 12:08:46 +00:00
Fedor Uporov	17c7b27f55	Fix directory blocks checksum updating logic. The checksum updating functions were not called in case of dir index inode splitting and in case of dir entry removing, when the entry was first in the block. Fix and move the dir entry adding logic when i_count == 0 to new function. MFC after: 3 months	2018-08-08 12:07:45 +00:00
Conrad Meyer	3dc1c7d6bc	FUSE: Remove some set-but-not-used variables No functional change.	2018-08-08 04:46:03 +00:00
Rick Macklem	93df87f208	Allow newnfs_request() to retry all callback RPCs with an NFSERR_DELAY reply. The code in newnfs_request() retries RPCs that get a reply of NFSERR_DELAY, but exempts certain NFSv4 operations. However, for callback RPCs, there should not be any exemptions at this time. The code would have erroneously exempted the CBRECALL callback, since it has the same operation number as the CLOSE operation. This patch fixes this by checking for a callback RPC (indicated by clp != NULL) and not checking for exempt operations for callbacks. This would have only affected the NFSv4 server when delegations are enabled (they are not enabled by default) and the client replies to CBRECALL with NFSERR_DELAY. This may never actually happen. Spotted during code inspection. MFC after: 2 weeks	2018-08-07 21:29:14 +00:00
Rick Macklem	25705dd5d0	Copy all bits of a file handle in case there is padding in the structure. At least on x86, fhandle_t is a packed structure, so I believe an assignment will copy all the bits. However, for some current/future architectures, there might be padding in the structure that doesn't get copied via an assignment. Since NFS assumes a file handle is an opaque blob of bits that can be compared via memcmp()/bcmp(), all the bits including any padding must be copied. This patch replaces the assignments with a call to a byte copy function. Spotted during code inspection.	2018-08-05 19:21:50 +00:00
Rick Macklem	ac0d649588	Silence newer gcc warnings. Newer versions of gcc generate "might not be initialized" warnings for several variables in nfsrpc_doiods(). I have checked and all of these variables are assigned values before they are used. In the one case of "tdrpc", it could have passed garbage as an argument to nfscl_dofflayoutio() when mirrorcnt is one. However nfscl_dofflayoutio() only uses the argument when mirrorcnt > 1, so it wasn't actually broken. This patch initializes "tdrpc" to avoid confusion and initializes the rest to make the compiler happy. Requested by: mmacy	2018-08-02 20:10:59 +00:00
Conrad Meyer	dab6195cd3	FUSE: Bump maximum IO size to enable more performant operation Various components restrict size of IO passed up to the userspace filesystem based on the mount's f_iosize value. The previous default of PAGE_SIZE is anemic, even for normal filesystems, but especially considering every FUSE operation involves a kernel <-> userspace IPC upcall. Bump to DFLTPHYS (currently 64kB) to match other FUSE implementations. Anecdotally, Jakub reports IO read performance increased from 600 MB/s -> 2700 MB/s with a basic RAM-backed FUSE filesystem. PR: 230260 Reported by: Peter (MooseFS) <freebsd AT moosefs.com> Tested by: Jakub Kruszona-Zawadzki <acid AT moosefs.com> MFC after: 3 days	2018-08-02 19:25:43 +00:00
Ed Maste	195e6c50d3	msdosfs: trim EOL whitespace	2018-07-31 12:44:28 +00:00
Ed Maste	a6274b81d5	cd9660: replace bcopy/bzero with C standard equivalents To reduce diffs against NetBSD.	2018-07-31 12:36:46 +00:00
Ed Maste	22e56aea3f	msdosfs: use same max filesize #define as NetBSD and move to header For use by makefs msdosfs support. Obtained from: NetBSD denode.h 1.6 Sponsored by: The FreeBSD Foundation	2018-07-30 20:36:51 +00:00
Rick Macklem	743d528198	Silence newer gcc warnings. Newer versions of gcc generate "set, but not used" warnings. Add __unused macros to silence these warnings. Although the variables are not being used, they are values parsed from arguments to callback RPCs that might be needed in the future. Requested by: mmacy	2018-07-30 20:25:32 +00:00
Rick Macklem	8014c97147	Silence newer gcc warnings. Newer versions of gcc generate "set, but not used" warnings in the NFS server. Add __unused macros to silence these warnings. Requested by: mmacy	2018-07-29 21:51:17 +00:00
Rick Macklem	a3e709cd33	Modify the NFSv4.1 server so that it allows ReclaimComplete as done by ESXi 6.7. I believe that a ReclaimComplete with rca_one_fs == TRUE is only to be used after a file system has been transferred to a different file server. However, RFC5661 is somewhat vague w.r.t. this and the ESXi 6.7 client does both a ReclaimComplete with rca_one_fs == TRUE and one with ReclaimComplete with rca_one_fs == FALSE. Therefore, just ignore the rca_one_fs == TRUE operation and return NFS_OK without doing anything instead of replying NFS4ERR_NOTSUPP. This allows the ESXi 6.7 NFSv4.1 client to do a mount. After discussion on the NFSv4 IETF working group mailing list, doing this along with setting a flag to note that a ReclaimComplete with rca_one_fs TRUE was an appropriate way to handle this. The flag that indicates that a ReclaimComplete with rca_one_fs == TRUE was done may be used to disable replies of NFS4ERR_GRACE for non-reclaim state operations in a future commit. This patch along with r332790, r334492 and r336357 allow ESXi 6.7 NFSv4.1 mounts work ok. ESX 6.5 NFSv4.1 mounts do not work well, due to what I believe are violations of RFC-5661 and should not be used. Reported by: andreas.nagy@frequentis.com Tested by: andreas.nagy@frequentis.com, daniel@ftml.net (earlier version) MFC after: 2 weeks Relnotes: yes	2018-07-28 20:21:04 +00:00
Eitan Adler	33f4bccaa6	Use https over http for FreeBSD pages	2018-07-27 10:40:48 +00:00
Ed Maste	6ae00e306f	Revert msdosfs MAKEFS #ifdef changes from r319870 These changes are not needed for current msdosfs makefs WIP. Submitted by: Siva Mahadevan Sponsored by: The FreeBSD Foundation	2018-07-24 21:10:17 +00:00
Rick Macklem	cecf6c6e9c	Set CLSET_TIMEOUT on TCP connections to pNFS DSs. Use CLSET_TIMEOUT to set the timeout for connections to DSs instead of specifying a timeout on each RPC. This is done so that SO_SNDTIMEO is set on the TCP socket as well as specifying a time limit when waiting for an RPC reply. Useful if the send queue for the TCP connection has become constipated, due to a failed DS. The choice of lease_duration / 4 is fairly arbitrary, but seems to work ok, with a lower bound of 10sec. For client connections to a DS, set the retry limit to vfs.nfsd.dsretries, which is 2 by default. This patch should only affect pNFS connections to DSs. This patch requires r336542. MFC after: 2 weeks	2018-07-21 01:33:07 +00:00
Alan Somers	5717aa2d2a	Allow mounting FUSE filesystems in jails Reviewed by: jamie MFC after: 2 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D16371	2018-07-20 21:35:31 +00:00
Rick Macklem	5d54f186bb	Modify the reasons for not issuing a delegation in the NFSv4.1 server. The ESXi NFSv4.1 client will generate warning messages when the reason for not issuing a delegation is two. Two refers to a resource limit and I do not see why it would be considered invalid. However it probably was not the best choice of reason for not issuing a delegation. This patch changes the reasons used to ones that the ESXi client doesn't complain about. This change does not affect the FreeBSD client and does not appear to affect behaviour of the Linux NFSv4.1 client. RFC5661 defines these "reasons" but does not give any guidance w.r.t. which ones are more appropriate to return to a client. Tested by: andreas.nagy@frequentis.com PR: 226650 MFC after: 2 weeks	2018-07-16 21:32:50 +00:00
Rick Macklem	5da3882447	Shut down the TCP connection to a DS in the pNFS client when Renew fails. When a NFSv4.1 client mount using pNFS detects a failure trying to do a Renew (actually just a Sequence operation), the code would simply try again and again and again every 30sec. This would tie up the "nfscl" thread, which should also be doing other things like Renews on other DSs and the MDS. This patch adds code which closes down the TCP connection and marks it defunct when Renew detects an failure to communicate with the DS, so further Renews will not be attempted until a new working TCP connection to the DS is established. It also makes the call to nfscl_cancelreqs() unconditional, since nfscl_cancelreqs() checks the NFSCLDS_SAMECONN flag and does so while holding the lock. This fix only applies to the NFSv4.1 client whne using pNFS and without it the only effect would have been an "nfscl" thread busy doing Renew attempts on an unresponsive DS. MFC after: 2 weeks	2018-07-15 18:54:44 +00:00
Rick Macklem	89c64a3a4f	Fix the pNFS client when mirrors aren't on the same machine. Without this patch, the client side NFSv4.1 pNFS code erroneously did writes and commits to both DS mirrors using the TCP connection of the first one. For my test setup this worked, since I have both DSs running on the same machine, but it would have failed when the DSs are on separate machines. This patch fixes the code to use the correct TCP connection for each DS. This patch should only affect the NFSv4.1 client when using "pnfs" mounts to mirrored DSs. MFC after: 2 weeks	2018-07-14 19:51:44 +00:00
Rick Macklem	0e7bd20bb2	Close down the TCP connection to a pNFS DS when it is disabled. So long as the TCP connection to a pNFS DS isn't shared with other DSs, it can be closed down when the DS is being disabled in the pNFS client. This causes any RPCs in progress to fail. This patch only affects the NFSv4.1 pNFS client when errors occur while doing I/O on a DS. MFC after: 2 weeks	2018-07-13 20:03:05 +00:00
Rick Macklem	83f526de6a	Change the pNFS client so that it does not report an NFSERR_STALE from an I/O attempt on a DS to the server via LayoutReturn. The current FreeBSD client can generate these errors for an operational DS while doing a recovery of a mirror after a mirrored DS has been repaired. I am not sure why these errors occur, but my best current guess is a race between the Layout Recall issued by the kernel code run from pnfsdscopymr(8) and a Read operation on the DS for the file bing copied. The errrors are not fatal, since the client falls back on doing I/O through the MDS, which can do the I/O successfully as a proxy. (The fact that the MDS can do this indicates that the file does still exist on the functioning DS.) This patch only affects behaviour of the pNFS client and only when using Flexible File layouts. MFC after: 2 weeks	2018-07-13 12:39:27 +00:00
Rick Macklem	a6fed5f514	Modify the NFSv4.1 pNFS client to use separate TCP connections for DSs. Without this patch, the NFSv4.1 pNFS client shared a single TCP connection for all DSs that resided on the same machine. This made disabling one of the DSs impossible. Although unlikely, it is possible that the storage subsystem has failed in such a way that the storage for one DS on a machine is no longer functioning correctly, but the storage used by another DS on the same machine is still ok. For this case, it would be nice if a system can fail one of the DSs without failing them all. This patch changes the default behaviour to use separate TCP connections for each DS even if they reside on the same machine. I do not believe that this will be a problem for extant pNFS servers, but a sysctl can be set to restore the old behaviour if this change causes a problem for an extant pNFS server. This patch only affects the NFSv4.1 pNFS client. MFC after: 2 weeks	2018-07-12 20:46:22 +00:00
Rick Macklem	8361de2544	Ignore the cookie verifier for NFSv4.1 when the cookie is 0. RFC5661 states that the cookie verifier should be 0 when the cookie is 0. However, the wording is somewhat unclear and a recent discussion on the nfsv4@ietf.org mailing list indicated that the NFSv4 server should ignore the cookie verifier's value when the dirctory offset cookie is 0. This patch deletes the check for this that would return NFSERR_BAD_COOKIE when the verifier was not 0. This was found during testing of the ESXi client against the NFSv4.1 server. Reported by: daniel@ftml.net (via packet trace) MFC after: 2 weeks	2018-07-11 23:23:29 +00:00
Rick Macklem	de9a1a70ab	Add support for a "forced" pnfsdskill to the pNFS server kernel code. The pnfsdskill(8) command will normally fail if there is no valid mirror for the DS to be disabled. However, a system administrator may need to disable a DS which does not have a valid mirror so that the nfsd threads can be terminated. This patch adds the kernel code needed by pnfsdskill(8) to implement this "forced" case of disabling a DS. This patch only affects the pNFS server.	2018-07-09 19:58:01 +00:00
Rick Macklem	acc6e58def	Fix the kernel part of pnfsdscopymr() to handle holes in the file being copied. If a mirrored DS is being recovered that has a lot of large sparse files, pnfsdscopymr(8) would use a lot of space on the recovered mirror since it would write the "holes" in the file being mirrored. This patch adds code to check for a "hole" and skip doing the write. The check is done on a "per PNFSDS_COPYSIZ size block", which is currently 64K. I think that most file server file systems will be using a blocksize at least this large. If the file server is using a smaller blocksize and smaller holes need to be preserved, PNFSDS_COPYSIZ could be decreased. The block of 0s is malloc()d, since pnfsdcopymr(8) should be an infrequent occurrence.	2018-07-08 18:15:55 +00:00
Rick Macklem	ed66a76bca	Fix handling of the hybrid DS case for a pNFS server. After the addition of the "#mds_path" suffix for a DS specification on the "-p" nfsd option, it is possible to have a mix of DSs assigned to an MDS file system and DSs that store files for all DSs. This is what I referred to as "hybrid" above. At first, I didn't think this hybrid case would be useful, but I now believe that some system administrators may fine it useful. This patch modifies the file storage assignment algorithm so that it makes the "#mds_path" DSs take priority and the all file systems DSs are now only used for MDS file systems with no "#mds_path" DS servers. This only affects the pNFS server for this "hybrid" case.	2018-07-07 19:27:49 +00:00
Rick Macklem	5b500ea949	Change the pNFS server so that it does not disable a mirrored DS for an NFSERR_STALE error reported via a LayoutReturn. The current FreeBSD client can generate these errors for an operational DS while doing a recovery of a mirror after a mirrored DS has been repaired. I am not sure why these errors occur, but my best current guess is a race between the Layout Recall issued by the kernel code run from pnfsdscopymr(8) and a Read operation on the DS for the file bing copied. The errors are not fatal, since the client falls back on doing I/O through the MDS, which can do the I/O successfully as a proxy. (The fact that the MDS can do this indicates that the file does still exist on the functioning DS.) This change only affects the pNFS server and only when a client does a LayoutReturn with the NFSERR_STALE error report.	2018-07-06 19:18:45 +00:00
Rick Macklem	ff3b992f38	Fix the pNFS server so that it handles the "#mds_path" check for mirrors. The recently added feature of the pNFS server will set an fsid for the MDS file system to define the file system a DS should store files for. For a case where a DS handling all file systems has failed, it was possible for the code to check for a mirror with a specified fs, even though nfsdev_mdsisset was 0, possibly causing a false successful check for a mirror. This patch adds a check for nfsdev_mdsisset != 0 to avoid this. It only affects the pNFS server for a rare case. Found via code inspection.	2018-07-04 19:46:26 +00:00
Rick Macklem	2f32675c83	Add an optional feature to the pNFS server. Without this patch, the pNFS server distributes the data storage files across all of the specified DSs. A tester noted that it would be nice if a system administrator could control which DSs are used to store the file data for a given exported MDS file system. This patch adds the kernel support to do this. It also makes a slight semantic change to nfsv4_findmirror(), since some uses of it no longer require that the DS being searched for have a current mirror. A patch that will be committed in a few minutes will modify the nfsd daemon to support this feature. The patch should only affect sites using the pNFS server (specified via the "-p" command line option for nfsd. Suggested by: james.rose@framestore.com	2018-07-02 19:21:33 +00:00
Rick Macklem	1aabf3fd5e	Fix the pNFS server for a case where mirror level equals number of DSs. If a pNFS service was set up where the number of DSs equals the mirror level and then a DS was disabled, the service would create files with duplicate entries for the same DS. This bug occurred because I didn't realize that TAILQ_FOREACH_FROM() would start at the beginning of the list when the inital value of the variable was NULL. This patch also changes the pNFS server DS file creation code so that it creates entrie(s) with 0.0.0.0 IP address when it cannot create mirror level files due to lack of DSs. The patch only affects the pNFS service and only when it was created with a number of DSs equal to the mirror level and mirroring is enabled.	2018-06-29 12:41:36 +00:00
Rick Macklem	9f4c522e6b	Set the slotid and ND_HASSLOTID flag for NFSv4.1 sequenced operations. Most NFSv4.1 compound RPCs start with a Sequence operation. For these cases, save the slotid and note that it is saved by setting ND_HASSLOTID. This is used by r335568 to free up the session slot and disable it. MFC after: 2 weeks	2018-06-23 00:48:45 +00:00
Rick Macklem	b18130d330	Define ND_HASSLOTID needed by r335568. r335568 uses a flag called ND_HASSLOTID to indicate that the slotid is set, so it can free and invalidate it. This flag needs to be set, which will be done in a subsequent commit. MFC after: 2 weeks	2018-06-23 00:37:15 +00:00
Rick Macklem	ba6cce3aea	Fix the handling of NFSv4.1 sessions for "soft" mounts. When a "soft" mount is used for NFSv4.1, an RPC that fails without completing will leave a slot in the NFSv4.1 session in an indeterminate state. As such, all that can be done is free up the slot while making is no longer usable. A "soft" NFSv4.1 mount is not recommended in general, since it will leave Open/Lock state in an indeterminate state. An exception is a pNFS mount of a DS, since there are no Opens/Locks done for them except file creates where loss of the Open state does not matter. The patch also makes connections to DSs soft, so that they will fail when a DS is non-functional or network partitioned, allowing the pNFS MDS to disable the DS for a mirrored configuration. This patch should not affect normal "hard" NFSv4.1 mounts. MFC after: 2 weeks	2018-06-22 21:37:20 +00:00
Rick Macklem	2e35b8fe24	Change the NFSv4.1 pNFS client so that it returns the DS error in layoutreturn. When the NFSv4.1 pNFS client gets an error for a DS I/O operation using a Flexible File layout, it returns the layout with an error. This patch changes the code slightly, so that it returns the layout for all errors except EACCES and lets the MDS decide what to do based on the error. It also makes a couple of changes to nfscl_layoutrecall() to ensure that the first layoutreturn(s) will have the error in the reply. Plus, the patch adds a wakeup() so that the "nfscl" thread won't wait 1sec before doing the LayoutReturn. Tested against the pNFS service. This patch should not affect non-pNFS use of the client. The unused "dsp" argument will be used by a future patch that disables the connection to the DS when possible. MFC after: 2 weeks	2018-06-22 21:25:27 +00:00
Rick Macklem	c16f407e31	Add a counter to limit the number of disabled DSs for a mirrored pNFS MDS. This patch adds a counter that limits the number of disabled mirrored DSs to mirror level - 1. It also makes a small change that keeps a Write that has failed with EACCES when attempted by a client to a DS from disabling the DS. This patch only affects the pNFS server.	2018-06-22 00:55:39 +00:00
Rick Macklem	755e4b7936	Revert r335263, since it can cause crashes in unusual circumstances. This needs to be fixed in a different way.	2018-06-17 23:08:54 +00:00
Rick Macklem	2bad64241c	Make the pNFS NFSv4.1 client return a Flexible File layout upon error. The Flexible File layout LayoutReturn operation has argument fields where an I/O error encountered when attempting I/O on a DS can be reported back to the MDS. This patch adds code to the client to do this for the Flexible File layout mirrored case. This patch should only affect mounts using the "pnfs" option against servers that support the Flexible File layout. MFC after: 2 weeks	2018-06-17 16:30:06 +00:00

1 2 3 4 5 ...

3928 Commits