freebsd-nq

Author	SHA1	Message	Date
Alan Somers	4a6d5507f7	fusefs: fix an inverted error check in my last commit This should be merged alongside 345766 Sponsored by: The FreeBSD Foundation	2019-04-01 16:15:29 +00:00
Alan Somers	5ec10aa527	fusefs: replace obsolete array idioms r345742 replaced fusefs's fufh array with a fufh list. But it left a few array idioms in place. This commit replaces those idioms with more efficient list idioms. One location is in fuse_filehandle_close, which now takes a pointer argument. Three other locations are places that had to loop over all of a vnode's fuse filehandles. Sponsored by: The FreeBSD Foundation	2019-04-01 14:23:43 +00:00
Alan Somers	1cedd6dfac	fusefs: replace the fufh table with a linked list The FUSE protocol allows each open file descriptor to have a unique file handle. On FreeBSD, these file handles must all be stored in the vnode. The old method (also used by OSX and OpenBSD) is to store them all in a small array. But that limits the total number that can be stored. This commit replaces the array with a linked list (a technique also used by Illumos). There is not yet any change in functionality, but this is the first step to fixing several bugs. PR: 236329, 236340, 236381, 236560, 236844 Discussed with: cem Sponsored by: The FreeBSD Foundation	2019-03-31 03:19:10 +00:00
Alan Somers	5fccbf313a	fusefs: don't force direct io for files opened O_WRONLY Previously fusefs would treat any file opened O_WRONLY as though the FOPEN_DIRECT_IO flag were set, in an attempt to avoid issuing reads as part of a RMW write operation on a cached part of the file. However, the FUSE protocol explicitly allows reads of write-only files for precisely that reason. Sponsored by: The FreeBSD Foundation	2019-03-30 00:57:07 +00:00
Alan Somers	f220ef0b35	fix the GENERIC-NODEBUG build after r345675 Submitted by: cy Reported by: cy, Michael Butler <imb@protected-networks.net> MFC after: 2 weeks X-MFC-With: 345675	2019-03-29 14:07:30 +00:00
Alan Somers	415e34c4d5	MFHead@r345677	2019-03-29 03:25:20 +00:00
Alan Somers	080518d810	fusefs: convert debug printfs into dtrace probes fuse(4) was heavily instrumented with debug printf statements that could only be enabled with compile-time flags. They fell into three basic groups: 1. Totally redundant with dtrace FBT probes. These I deleted. 2. Print textual information, usually error messages. These I converted to SDT probes of the form fuse:fuse:FILE:trace. They work just like the old printf statements except they can be enabled at runtime with dtrace. They can be filtered by FILE and/or by priority. 3. More complicated probes that print detailed information. These I converted into ad-hoc SDT probes. Also, de-inline fuse_internal_cache_attrs. It's big enough to be a regular function, and this way it gets a dtrace FBT probe. This commit is a merge of r345304, r344914, r344703, and r344664 from projects/fuse2. Reviewed by: cem MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19667	2019-03-29 02:13:06 +00:00
Alan Somers	98852a32af	fusefs: fix error handling in fuse_vnop_strategy Reported by: cem Sponsored by: The FreeBSD Foundation	2019-03-28 21:57:42 +00:00
Alan Somers	f203d1734d	fusefs: don't ignore errors in fuse_vnode_refreshsize Reported by: Coverity Coverity CID: 1368622 Sponsored by: The FreeBSD Foundation	2019-03-27 16:45:30 +00:00
Alan Somers	019dca0199	fusefs: delete dead code in fuse_vnop_setattr The dead code in question was a broken and incomplete attempt to support the default_permissions mount option during VOP_SETATTR. There wasn't anything there worth saving; I'll have to rewrite it later. Reported by: Coverity Coverity CID: 1008668 Sponsored by: The FreeBSD Foundation	2019-03-27 16:19:02 +00:00
Alan Somers	3885d4091d	fusefs: fix a derefence-after-null-check Reported by: Coverity Coverity CID: 1017940 Sponsored by: The FreeBSD Foundation	2019-03-27 14:15:35 +00:00
Alan Somers	e0bec057db	fusefs: correctly set fuse_release_in.flags in an error path fuse_vnop_create must close the newly created file if it can't allocate a vnode. When it does so, it must use the same file flags for FUSE_RELEASE as it used for FUSE_OPEN or FUSE_CREATE. Reported by: Coverity Coverity CID: 1066204 Sponsored by: The FreeBSD Foundation	2019-03-27 02:57:59 +00:00
Alan Somers	4a4282cb06	FUSEFS: during FUSE_READDIR, set the read size correctly. The old formula was unnecessarily restrictive. Sponsored by: The FreeBSD Foundation	2019-03-27 02:01:34 +00:00
Alan Somers	3ba6a4d473	fusefs: set fuse_init_in->max_readahead correctly The old value was correct only by coincidence. Sponsored by: The FreeBSD Foundation	2019-03-27 01:49:35 +00:00
Alan Somers	fd2749f25d	fusefs: delete dead code This change also inlines several previously #define'd symbols that didn't really have the meanings indicated by the comments. Sponsored by: The FreeBSD Foundation	2019-03-26 03:02:45 +00:00
Maxim Sobolev	4f20706113	Refine r345425: get rid of superfluous helper macro that I have added. MFC after: 2 weeks	2019-03-26 01:28:10 +00:00
Allan Jude	b4b3e3498b	Make TMPFS_PAGES_MINRESERVED a kernel option TMPFS_PAGES_MINRESERVED controls how much memory is reserved for the system and not used by tmpfs. On very small memory systems, the default value may be too high and this prevents these small memory systems from using reroot, which is required for them to install firmware updates. Submitted by: Hiroki Mori <yamori813@yahoo.co.jp> Reviewed by: mizhka Differential Revision: https://reviews.freebsd.org/D13583	2019-03-25 07:46:20 +00:00
Alan Somers	19ef317d62	fusefs: fallback to MKNOD/OPEN if a filesystem doesn't support CREATE If a FUSE filesystem returns ENOSYS for FUSE_CREATE, then fallback to FUSE_MKNOD/FUSE_OPEN. Also, fix a memory leak in the error path of fuse_vnop_create. And do a little cleanup in fuse_vnop_open. PR: 199934 Reported by: samm@os2.kiev.ua Sponsored by: The FreeBSD Foundation	2019-03-23 00:22:29 +00:00
Maxim Sobolev	ac1a10efad	Make it possible to update TMPFS mount point from read-only to read-write and vice versa. Reviewed by: delphij Approved by: delphij MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D19682	2019-03-22 21:31:21 +00:00
Alan Somers	bf4d70841f	fusefs: support VOP_MKNOD PR: 236236 Sponsored by: The FreeBSD Foundation	2019-03-22 19:08:48 +00:00
Alan Somers	8ba190efeb	fusefs: fix a panic on mount Don't page fault if the file descriptor provided with "-o fd" is invalid. Sponsored by: The FreeBSD Foundation	2019-03-22 17:53:13 +00:00
Alan Somers	6248288e97	fusefs: correctly handle cacheable negative LOOKUP responses The FUSE protocol allows for LOOKUP to return a cacheable negative response, which means that the file doesn't exist and the kernel can cache its nonexistence. As of this commit fusefs doesn't cache the nonexistence, but it does correctly handle such responses. Prior to this commit attempting to create a file, even with O_CREAT would fail with ENOENT if the daemon returned a cacheable negative response. PR: 236231 Sponsored by: The FreeBSD Foundation	2019-03-21 23:31:10 +00:00
Alan Somers	915012e0d0	fusefs: Don't treat fsync the same as fdatasync For an unknown reason, fusefs was _always_ sending the fdatasync operation instead of fsync. Now it correctly sends one or the other. Also, remove the Fsync.fsync_metadata_only test, along with the recently removed Fsync.nop. They should never have been added. The kernel shouldn't keep track of which files have dirty data; that's the daemon's job. PR: 236473 Sponsored by: The FreeBSD Foundation	2019-03-21 23:01:56 +00:00
Alan Somers	90612f3c38	fusefs: VOP_FSYNC should be synchronous -- sometimes I committed too hastily in r345390. There are cases, not directly reachable from userland, where VOP_FSYNC ought to be asynchronous. This commit fixes fusefs to handle VOP_FSYNC synchronously if and only if the VFS requests it. PR: 236474 X-MFC-With: 345390 Sponsored by: The FreeBSD Foundation	2019-03-21 22:17:10 +00:00
Alan Somers	cc34f2f66a	fusefs: VOP_FSYNC should be synchronous returning asynchronously pretty much defeats the point of fsync PR: 236474 Sponsored by: The FreeBSD Foundation	2019-03-21 21:53:55 +00:00
Konstantin Belousov	7ae3486e6d	nullfs: fix unmounts when filesystem is active. If vflush() did not completely flushed the mount vnodes queue, either retry for forced unmounts, or give up for non-forced. This situation can occur when new vnodes are instantiated while vflush() worked. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-03-21 13:30:48 +00:00
Alan Somers	f9856d0813	MFHead @345353	2019-03-20 23:32:37 +00:00
Alan Somers	123af6ec70	Rename fuse(4) to fusefs(4) This makes it more consistent with other filesystems, which all end in "fs", and more consistent with its mount helper, which is already named "mount_fusefs". Reviewed by: cem, rgrimes MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19649	2019-03-20 21:48:43 +00:00
Alan Somers	7e4844f7d9	fuse(4): remove more debugging printfs I missed these in r344664. They're basically useless because they can only be controlled at compile-time. Also, de-inline fuse_internal_cache_attrs. It's big enough to be a regular function, and this way it gets a dtrace FBT probe. Sponsored by: The FreeBSD Foundation	2019-03-19 17:49:15 +00:00
Alan Somers	2aaf9152a8	MFHead@r345275	2019-03-18 19:21:53 +00:00
Fedor Uporov	0204d1c793	Remove unneeded mount point unlock function calls. The ext2_nodealloccg() function unlocks the mount point in case of successful node allocation. The additional unlocks are not required and should be removed. PR: 236452 Reported by: pho MFC after: 3 days	2019-03-15 11:49:46 +00:00
Edward Tomasz Napierala	2df8bd90c8	Drop unused 'p' argument to nfsv4_strtogid(). MFC after: 2 weeks Sponsored by: DARPA, AFRL	2019-03-12 15:07:47 +00:00
Edward Tomasz Napierala	c703cba811	Drop unused 'p' argument to nfsv4_gidtostr(). MFC after: 2 weeks Sponsored by: DARPA, AFRL	2019-03-12 15:05:11 +00:00
Edward Tomasz Napierala	0658ac3943	Drop unused 'p' argument to nfsv4_strtouid(). MFC after: 2 weeks Sponsored by: DARPA, AFRL	2019-03-12 15:02:52 +00:00
Edward Tomasz Napierala	0f86b94a56	Drop unused 'p' argument to nfsv4_uidtostr(). MFC after: 2 weeks Sponsored by: DARPA, AFRL	2019-03-12 14:59:08 +00:00
Edward Tomasz Napierala	f32bf2922f	Drop unused 'p' argument to nfsrv_getuser(). Reviewed by: rmacklem MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19455	2019-03-12 14:53:53 +00:00
Simon J. Gerraty	f5fdf82d82	Add _PC_ACL_* to vop_stdpathconf This avoid EINVAL from tmpfs etc. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D19512	2019-03-11 20:40:56 +00:00
Alan Somers	84c4fd1f48	fuse(4): add dtrace probe for illegal short writes Sponsored by: The FreeBSD Foundation	2019-03-08 02:00:49 +00:00
Conrad Meyer	9a6a45d850	fuse: switch from DFLTPHYS/MAXBSIZE to maxcachebuf On GENERIC kernels with empty loader.conf, there is no functional change. DFLTPHYS and MAXBSIZE are both 64kB at the moment. This change allows larger bufcache block sizes to be used when either MAXBSIZE (custom kernel) or the loader.conf tunable vfs.maxbcachebuf (GENERIC) is adjusted higher than the default. Suggested by: ken@	2019-03-07 00:55:49 +00:00
Conrad Meyer	e7df98863b	FUSE: Prevent trivial panic When open(2) was invoked against a FUSE filesystem with an unexpected flags value (no O_RDONLY / O_RDWR / O_WRONLY), an assertion fired, causing panic. For now, prevent the panic by rejecting such VOP_OPENs with EINVAL. This is not considered the correct long term fix, but does prevent an unprivileged denial-of-service. PR: 236329 Reported by: asomers Reviewed by: asomers Sponsored by: Dell EMC Isilon	2019-03-06 22:56:49 +00:00
Alan Somers	4cbb4f8886	fuse(4): add tests related to FUSE_MKNOD PR: 236236 Sponsored by: The FreeBSD Foundation	2019-03-05 00:27:54 +00:00
Edward Tomasz Napierala	01c27978f5	Don't pass td to nfsvno_open(). MFC after: 2 weeks Sponsored by: DARPA, AFRL	2019-03-04 14:50:00 +00:00
Edward Tomasz Napierala	127152fe56	Don't pass td to nfsvno_createsub(). MFC after: 2 weeks Sponsored by: DARPA, AFRL	2019-03-04 14:30:53 +00:00
Edward Tomasz Napierala	5edc9102dc	Don't pass td to nfsd_fhtovp(), it's unused. Reviewed by: rmacklem (earlier version) MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19421	2019-03-04 13:18:04 +00:00
Edward Tomasz Napierala	af444b18ed	Push down the thread argument in NFS server code, using curthread instead of passing it explicitly. No functional changes Reviewed by: rmacklem (earlier version) MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19419	2019-03-04 13:12:23 +00:00
Edward Tomasz Napierala	113aa93390	Push down td in nfsrvd_dorpc() - make it use curthread instead of it being explicitly passed as an argument. No functional changes. The big picture here is that I want to get rid of the 'td' argument being passed everywhere, and this is the first piece that affects the NFS server. Reviewed by: rmacklem MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19417	2019-03-04 13:02:36 +00:00
Fedor Uporov	9441309ae0	Fix double free in case of mount error. Reported by: Christopher Krah <krah@protonmail.com> Reported as: FS-9-EXT3-2: Denial Of Service in nmount-5 (vm_fault_hold) Reviewed by: pfg MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D19385	2019-03-04 11:33:49 +00:00
Fedor Uporov	3eed9f20d4	Do not read the on-disk inode in case of vnode allocation. Reported by: Christopher Krah <krah@protonmail.com> Reported as: FS-6-EXT2-4: Denial Of Service in mkdir-0 (ext2_mkdir/vn_rdwr) Reviewed by: pfg MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D19327	2019-03-04 11:27:47 +00:00
Fedor Uporov	736da5176d	Fix integer overflow possibility. Reported by: Christopher Krah <krah@protonmail.com> Reported as: FS-2-EXT2-1: Out-of-Bounds Write in nmount (ext2_vget) Reviewed by: pfg MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D19326	2019-03-04 11:19:21 +00:00
Fedor Uporov	4ff6603ab3	Do not panic if inode bitmap is corrupted. admbug: 804 Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com> Reviewed by: pfg MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D19325	2019-03-04 11:12:19 +00:00
Fedor Uporov	80a4a9716b	Validate block bitmaps. Reviewed by: pfg MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D19324	2019-03-04 11:01:23 +00:00
Fedor Uporov	daa2d62da2	Add additional on-disk inode checks. Reviewed by: pfg MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D19323	2019-03-04 10:55:01 +00:00
Fedor Uporov	6e38bf94e5	Make superblock reading logic more strict. Add more on-disk superblock consistency checks to ext2_compute_sb_data() function. It should decrease the probability of mounting filesystems with corrupted superblock data. Reviewed by: pfg MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D19322	2019-03-04 10:42:25 +00:00
Alan Somers	c02ccc7e44	Fix typos from r344664 Sponsored by: The FreeBSD Foundation	2019-03-01 15:49:11 +00:00
Alan Somers	cf16949867	fuse(4): convert debug printfs into dtrace probes fuse(4) was heavily instrumented with debug printf statements that could only be enabled with compile-time flags. They fell into three basic groups: 1) Totally redundant with dtrace FBT probes. These I deleted. 2) Print textual information, usually error messages. These I converted to SDT probes of the form fuse:fuse:FILE:trace. They work just like the old printf statements except they can be enabled at runtime with dtrace. They can be filtered by FILE and/or by priority. 3) More complicated probes that print detailed information. These I converted into ad-hoc SDT probes. Sponsored by: The FreeBSD Foundation	2019-02-28 19:27:54 +00:00
Conrad Meyer	f6ebb68395	fuse: Fix a regression introduced in r337165 On systems with non-default DFLTPHYS and/or MAXBSIZE, FUSE would attempt to use a buf cache block size in excess of permitted size. This did not affect most configurations, since DFLTPHYS and MAXBSIZE both default to 64kB. The issue was discovered and reported using a custom kernel with a DFLTPHYS of 512kB. PR: 230260 (comment #9) Reported by: ken@ MFC after: π/𝑒 weeks	2019-02-21 02:41:57 +00:00
Matt Macy	81167243b4	PFS: Bump NAMELEN and don't require clients to be sleepable - debugfs consumers expect to be able to export names more than 48 characters - debugfs consumers expect to be able to hold locks across calls and are able to handle allocation failures Reviewed by: hps@ MFC after: 1 week Sponsored by: iX Systems Differential Revision: https://reviews.freebsd.org/D19256	2019-02-20 20:55:02 +00:00
Conrad Meyer	02295caf43	Fuse: whitespace and style(9) cleanup Take a pass through fixing some of the most egregious whitespace issues in fs/fuse. Also fix some style(9) warts while here. Not 100% cleaned up, but somewhat less painful to look at and edit. No functional change.	2019-02-20 02:49:26 +00:00
Conrad Meyer	bd4cb2a46d	fuse: add descriptions for remaining sysctls (Except reclaim revoked; I don't know what that goal of that one is.)	2019-02-20 02:48:59 +00:00
Edward Tomasz Napierala	c9172fb4f1	Work around the "nfscl: bad open cnt on server" assertion that can happen when rerooting into NFSv4 rootfs with kernel built with INVARIANTS. I've talked to rmacklem@ (back in 2017), and while the root cause is still unknown, the case guarded by assertion (nfscl_doclose() being called from VOP_INACTIVE) is believed to be safe, and the whole thing seems to run just fine. Obtained from: CheriBSD MFC after: 2 weeks Sponsored by: DARPA, AFRL	2019-02-19 12:45:37 +00:00
Conrad Meyer	3c324b9465	FUSE: Refresh cached file size when it changes (lookup) The cached fvdat->filesize is indepedent of the (mostly unused) cached_attrs, and we failed to update it when a cached (but perhaps inactive) vnode was found during VOP_LOOKUP to have a different size than cached. As noted in the code comment, this can occur in distributed filesystems or with other kinds of irregular file behavior (anything is possible in FUSE). We do something similar in fuse_vnop_getattr already. PR: 230258 (as reported in description; other issues explored in comments are not all resolved) Reported by: MooseFS FreeBSD Team <freebsd AT moosefs.com> Submitted by: Jakub Kruszona-Zawadzki <acid AT moosefs.com> (earlier version)	2019-02-15 22:55:13 +00:00
Conrad Meyer	c4af8b173a	FUSE: The FUSE design expects writethrough caching At least prior to 7.23 (which adds FUSE_WRITEBACK_CACHE), the FUSE protocol specifies only clean data to be cached. Prior to this change, we implement and default to writeback caching. This is ok enough for local only filesystems without hardlinks, but violates the general design contract with FUSE and breaks distributed filesystems or concurrent access to hardlinks of the same inode. In this change, add cache mode as an extension of cache enable/disable. The new modes are UC (was: cache disabled), WT (default), and WB (was: cache enabled). For now, WT caching is implemented as write-around, which meets the goal of only caching clean data. WT can be better than WA for workloads that frequently read data that was recently written, but WA is trivial to implement. Note that this has no effect on O_WRONLY-opened files, which were already coerced to write-around. Refs: * https://sourceforge.net/p/fuse/mailman/message/8902254/ * https://github.com/vgough/encfs/issues/315 PR: 230258 (inspired by)	2019-02-15 22:52:49 +00:00
Conrad Meyer	194e691aaf	FUSE: Only "dirty" cached file size when data is dirty Most users of fuse_vnode_setsize() set the cached fvdat->filesize and update the buf cache bounds as a result of either a read from the underlying FUSE filesystem, or as part of a write-through type operation (like truncate => VOP_SETATTR). In these cases, do not set the FN_SIZECHANGE flag, which indicates that an inode's data is dirty (in particular, that the local buf cache and fvdat->filesize have dirty extended data). PR: 230258 (related)	2019-02-15 22:51:09 +00:00
Conrad Meyer	09176f096b	FUSE: Respect userspace FS "do-not-cache" of path components The FUSE protocol demands that kernel implementations cache user filesystem path components (lookup/cnp data) for a maximum period of time in the range of [0, ULONG_MAX] seconds. In practice, typical requests are for 0, 1, or 10 seconds; or "a long time" to represent indefinite caching. Historically, FreeBSD FUSE has ignored this client directive entirely. This works fine for local-only filesystems, but causes consistency issues with multi-writer network filesystems. For now, respect 0 second cache TTLs and do not cache such metadata. Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds are still cached indefinitely, because it is unclear how a userspace filesystem could do anything sensible with those semantics even if implemented. Pass fuse_entry_out to fuse_vnode_get when available and only cache lookup if the user filesystem did not set a zero second TTL. PR: 230258 (inspired by; does not fix)	2019-02-15 22:50:31 +00:00
Conrad Meyer	78a7722fbc	FUSE: Respect userspace FS "do-not-cache" of file attributes The FUSE protocol demands that kernel implementations cache user filesystem file attributes (vattr data) for a maximum period of time in the range of [0, ULONG_MAX] seconds. In practice, typical requests are for 0, 1, or 10 seconds; or "a long time" to represent indefinite caching. Historically, FreeBSD FUSE has ignored this client directive entirely. This works fine for local-only filesystems, but causes consistency issues with multi-writer network filesystems. For now, respect 0 second cache TTLs and do not cache such metadata. Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds are still cached indefinitely, because it is unclear how a userspace filesystem could do anything sensible with those semantics even if implemented. In the future, as an optimization, we should implement notify_inval_entry, etc, which provide userspace filesystems a way of evicting the kernel cache. One potentially bogus access to invalid cached attribute data was left in fuse_io_strategy. It is restricted behind the undocumented and non-default "vfs.fuse.fix_broken_io" sysctl or "brokenio" mount option; maybe these are deadcode and can be eliminated? Some minor APIs changed to facilitate this: 1. Attribute cache validity is tracked in FUSE inodes ("fuse_vnode_data"). 2. cache_attrs() respects the provided TTL and only caches in the FUSE inode if TTL > 0. It also grows an "out" argument, which, if non-NULL, stores the translated fuse_attr (even if not suitable for caching). 3. FUSE VTOVA(vp) returns NULL if the vnode's cache is invalid, to help avoid programming mistakes. 4. A VOP_LINK check for potential nlink overflow prior to invoking the FUSE link op was weakened (only performed when we have a valid attr cache). The check is racy in a multi-writer network filesystem anyway -- classic TOCTOU. We have to trust any userspace filesystem that rejects local caching to account for it correctly. PR: 230258 (inspired by; does not fix)	2019-02-15 22:49:15 +00:00
Konstantin Belousov	b9662886ef	Un null_vptocnp(), cache vp->v_mount and use it for null_nodeget() call. The vp vnode is unlocked during the execution of the VOP method and can be reclaimed, zeroing vp->v_data. Caching allows to use the correct mount point. Reported and tested by: pho PR: 235549 Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-02-08 08:20:18 +00:00
Konstantin Belousov	25728e8411	Before using VTONULL(), check that the covered vnode belongs to nullfs. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-02-08 08:17:31 +00:00
Konstantin Belousov	930cc2dbef	Some style for nullfs_mount(). Also use bool type for isvnunlocked. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-02-08 08:15:29 +00:00
Pedro F. Giffuni	771ec59bb7	ext2fs: Add some extra consistency checks for the superblock. Maliciously formed, or badly corrupted, filesystems can cause kernel panics. In general, such acts of foot-shooting can only be accomplished by root, but in a world with VM images that is moving towards automated mounts it is important to have some form of prevention. Reported by: Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE. Incidentaly this should also fix a memory corruption issue reported by Dr Silvio Cesare of InfoSect. Huge thanks to all reseachers for making us aware of the issue. admbug: 872, 891 Reviewed by: fsu Obtained from: NetBSD (with minor changes) MFC after: 3 days	2019-01-25 22:22:29 +00:00
Mark Johnston	d9463dd4f3	nfs: Zero the buffers exported by NFSSVC_DUMPCLIENTS and DUMPLOCKS. Note that these interfaces are available only to root. admbugs: 765 Reported by: Vlad Tsyrklevich <vlad@tsyrklevich.net> Reviewed by: rmacklem MFC after: 1 day Security: Kernel memory disclosure Sponsored by: The FreeBSD Foundation	2019-01-21 23:54:33 +00:00
Oleksandr Tymoshenko	52b2c8e242	[smbfs] Allow semicolon in mounts that support long names Semicolon is a legal character in long names but not in 8.3 format. Move it to respective character set. PR: 140068 Submitted by: tom@uffner.com MFC after: 3 weeks	2019-01-20 05:52:16 +00:00
Gleb Smirnoff	756a541279	Allocate pager bufs from UMA instead of 80-ish mutex protected linked list. o In vm_pager_bufferinit() create pbuf_zone and start accounting on how many pbufs are we going to have set. In various subsystems that are going to utilize pbufs create private zones via call to pbuf_zsecond_create(). The latter calls uma_zsecond_create(), and sets a limit on created zone. After startup preallocate pbufs according to requirements of all pbuf zones. Subsystems that used to have a private limit with old allocator now have private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS cluster, FFS, swap, vnode pager. The following subsystems use shared pbuf zone: cam(4), nvme(4), physio(9), aio(4). They should have their private limits, but changing that is out of scope of this commit. o Fetch tunable value of kern.nswbuf from init_param2() and while here move NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that was holding only this option. Default values aren't touched by this commit, but they probably should be reviewed wrt to modern hardware. This change removes a tight bottleneck from sendfile(2) operation, that uses pbufs in vnode pager. Other pagers also would benefit from faster allocation. Together with: gallatin Tested by: pho	2019-01-15 01:02:16 +00:00
Kirk McKusick	c0029546f8	When loading an inode from disk, verify that its mode is valid. If invalid, return EINVAL. Note that inode check-hashes greatly reduce the chance that these errors will go undetected. Reported by: Christopher Krah <krah@protonmail.com> Reported as: FS-5-UFS-2: Denial Of Service in nmount-3 (ffs_read) Reviewed by: kib MFC after: 1 week Sponsored by: Netflix M sys/fs/ext2fs/ext2_vnops.c M sys/kern/vfs_subr.c M sys/ufs/ffs/ffs_snapshot.c M sys/ufs/ufs/ufs_vnops.c	2018-12-27 07:18:53 +00:00
Bruce Evans	416e232cc6	Fix clobbering of the fatchain cache for clustered i/o's when full clustering is not done. The bug caused extreme slowness for large files in some cases. There is no way to tell VOP_BMAP() how many blocks are wanted, so for all file systems it has to waste time in some cases by searching for more contiguous blocks than will be accessed. For msdosfs, it also clobbered the fatchain cache in these cases by advancing the cache to point to the chain entry for block that won't be read. This makes the cache useless for the next sequential i/o (or VOP_BMAP()), so the fat chain is searched from the beginning. The cache only has 1 relevant entry, so it is similarly useless for random i/o. Fix this by only advancing the cache to point to the chain entry for the first block that will be read. Clustering uses results from VOP_BMAP(), so when more than 1 block is read by clustering, the cache is not advanced as optimally as before, but it is at most 1 cluster size behind and searching the chain through the blocks for this cluster doesn't take too long.	2018-12-21 21:17:45 +00:00
Bruce Evans	8ec22c4d65	Quick fix for initialization of mnt_iosize_max. (This limit controls mainly clustering and read-ahead.) Copy the initialization from ffs, and also copy a couple of lines of ffs's nearby style for initialization order and whitespace. A correct fix would de-duplicate the initialization and fix bitrot in it instead of adding another instance of the duplication. Complications to use the size preferred by the device have been reduced to hard-coding slightly pessimal and/or inconsistent defaults, using large code that was almost needed to support the complications. For msdosfs, the result was that mnt_iosize_max was DFTLPHYS (64K) but is now MAXPHYS (128K).	2018-12-21 20:12:43 +00:00
Rick Macklem	23114c6c2a	Fix the NFSv4 server to obey vfs.nfsd.nfs_privport. When the NFSv4 server was coded, I believed that the specification authors did not want NFSv4 servers to require a client to use a reserved port#. However, recently it has been noted that the Linux NFSv4 server does support a check for a reserved port#. Since both the FreeBSD and Linux NFSv4 clients use a reserved port# by default, enabling vfs.nfsd.nfs_privport to require a reserved port# for NFSv4 the same as it does for NFSv2, 3 seems reasonable. The only case where this could cause a POLA violation is a FreeBSD NFSv4 server with vfs.nfsd.nfs_privport set, but with NFSv4 clients doing mounts without using a reserved port# (< 1024). Tested by: chaz.newton58@gmail.com PR: 234106 MFC after: 1 week	2018-12-20 22:21:41 +00:00
Mateusz Guzik	cc426dd319	Remove unused argument to priv_check_cred. Patch mostly generated with cocinnelle: @@ expression E1,E2; @@ - priv_check_cred(E1,E2,0) + priv_check_cred(E1,E2) Sponsored by: The FreeBSD Foundation	2018-12-11 19:32:16 +00:00
Mark Johnston	352aaa5122	Plug memory disclosures via ptrace(2). On some architectures, the structures returned by PT_GET*REGS were not fully populated and could contain uninitialized stack memory. The same issue existed with the register files in procfs. Reported by: Thomas Barabosch, Fraunhofer FKIE Reviewed by: kib MFC after: 3 days Security: kernel stack memory disclosure Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18421	2018-12-03 20:54:17 +00:00
Mark Johnston	fee65dfc37	Ensure the dirent remains initialized when dirent.d_fileno is unset. Reported by: rmacklem MFC with: r340856 Sponsored by: The FreeBSD Foundation	2018-11-23 23:07:49 +00:00
Mark Johnston	6d2e2df764	Ensure that directory entry padding bytes are zeroed. Directory entries must be padded to maintain alignment; in many filesystems the padding was not initialized, resulting in stack memory being copied out to userspace. With the ino64 work there are also some explicit pad fields in struct dirent. Add a subroutine to clear these bytes and use it in the in-tree filesystems. The NFS client is omitted for now as it was fixed separately in r340787. Reported by: Thomas Barabosch, Fraunhofer FKIE Reviewed by: kib MFC after: 3 days Sponsored by: The FreeBSD Foundation	2018-11-23 22:24:59 +00:00
Rick Macklem	f86bce1770	Make sure the NFS readdir client fills in all "struct dirent" data. The NFS client code (nfsrpc_readdir() and nfsrpc_readdirplus()) wasn't filling in parts of the readdir reply, such as d_pad[01] and the bytes at the end of d_name within d_reclen. As such, data left in a buffer cache block could be leaked to userland in the readdir reply. This patch makes sure all of the data is filled in. Reported by: Thomas Barabosch, Fraunhofer FKIE Reviewed by: kib, markj MFC after: 2 weeks	2018-11-23 00:17:47 +00:00
Mateusz Guzik	53011553fa	proc: convert pfind & friends to use pidhash locks and other cleanup pfind_locked is retired as it relied on allproc which unnecessarily restricts locking of the hash. Sponsored by: The FreeBSD Foundation	2018-11-21 20:15:56 +00:00
Mateusz Guzik	30e0cf499f	tmpfs: use unr64 for inode numbers Sponsored by: The FreeBSD Foundation	2018-11-20 15:14:30 +00:00
Rick Macklem	75772b69f2	Improve sanity checking for the dircount hint argument to NFSv3's ReaddirPlus and NFSv4's Readdir operations. The code checked for a zero argument, but did not check for a very large value. This patch clips dircount at the server's maximum data size. MFC after: 1 week	2018-11-20 01:59:57 +00:00
Rick Macklem	778f29833b	nfsm_advance() would panic() when the offs argument was negative. The code assumed that this would indicate a corrupted mbuf chain, but it could simply be caused by bogus RPC message data. This patch replaces the panic() with a printf() plus error return. MFC after: 1 week	2018-11-20 01:56:34 +00:00
Rick Macklem	1d171e7971	r304026 added code that started statistics gathering for an operation before the operation number (the variable called "op") was sanity checked. This patch moves the code down to below the range sanity check for "op".	2018-11-20 01:52:45 +00:00
Mark Johnston	3d2a0fe762	Remove comments made obsolete by the ino64 work. MFC after: 3 days Sponsored by: The FreeBSD Foundation	2018-11-19 17:33:44 +00:00
Konstantin Belousov	1c4ca77890	Add d_off support for multiple filesystems. The d_off field has been added to the dirent structure recently. Currently filesystems don't support this feature. Support has been added and tested for zfs, ufs, ext2fs, fdescfs, msdosfs and unionfs. A stub implementation is available for cd9660, nandfs, udf and pseudofs but hasn't been tested. Motivation for this feature: our usecase is for a userspace nfs server (nfs-ganesha) with zfs. At the moment we cache direntry offsets by calling lseek once per entry, with this patch we can get the offset directly from getdirentries(2) calls which provides a significant speedup. Submitted by: Jack Halford <jack@gandi.net> Reviewed by: mckusick, pfg, rmacklem (previous versions) Sponsored by: Gandi.net MFC after: 1 week Differential revision: https://reviews.freebsd.org/D17917	2018-11-14 14:18:35 +00:00
Rick Macklem	6ad8a6eaa4	Change nfs_advlock() so that the NFSVOPUNLOCK() is mostly done at the end. Prior to this patch, nfs_advlock() did NFSVOPUNLOCK(); return (error); in many places. This patch replaces these code sequenences with a "goto out;" and does the NFSVOPUNLOCK(); return (error); at the end of the function in order to make the vnode locking simpler. This patch does not change the semantics of nfs_advlock(). Suggested by: kib Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D17853	2018-11-06 22:50:50 +00:00
Brooks Davis	318f0d7720	Use declared types for caddr_t arguments. Leave ptrace(2) alone for the moment as it's defined to take a caddr_t. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D17852	2018-11-06 18:46:38 +00:00
Brooks Davis	1493c2ee62	Make vop_symlink take a const target path. This will enable callers to take const paths as part of syscall decleration improvements. Where doing so is easy and non-distruptive carry the const through implementations. In UFS the value is passed to an interface that must take non-const values. In ZFS, const poisoning would touch code shared with upstream and it's not worth adding diffs. Bump __FreeBSD_version for external API consumers. Reviewed by: kib (prior version) Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D17805	2018-11-02 14:42:36 +00:00
Rick Macklem	881a9516a2	Fix NFS client vnode locking to avoid a crash during forced dismount. A crash was reported where the crash occurred in nfs_advlock() when the NFS_ISV4(vp) macro was being executed. This was caused by the vnode being VI_DOOMED due to a forced dismount in progress. This patch fixes the problem by locking the vnode before executing the NFS_ISV4() macro. Tested by: rlibby PR: 232673 Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D17757	2018-11-01 15:27:22 +00:00
Brooks Davis	ed34a7fcf2	Move 32-bit compat support for FIODGNAME to the right place. ioctl(2) commands only have meaning in the context of a file descriptor so translating them in the syscall layer is incorrect. The new handler users an accessor to retrieve/construct a pointer from the last member of the passed structure and relies on type punning to access the other member which requires no translation. Unlike r339174 this change supports both places FIODGNAME is handled. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D17475	2018-10-26 17:59:25 +00:00
Konstantin Belousov	8ff7fad1d7	Only call sigdeferstop() for NFS. Use bypass to catch any NFS VOP dispatch and route it through the wrapper which does sigdeferstop() and then dispatches original VOP. NFS does not need a bypass below it, which is not supported. The vop offset in the vop_vector is added since otherwise it is impossible to get vop_op_t from the internal table, and I did not wanted to create the layered fs only to wrap NFS VOPs. VFS_OP()s wrap is straightforward. Requested and reviewed by: mjg (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D17658	2018-10-23 21:43:41 +00:00
Andriy Gapon	ca8f3d1ca2	nfsrvd_readdirplus: for some errors, do not fail the entire request Instead, a failing entry is skipped. This change consist of two logical changes. A failure to vget or lookup an entry is considered to be a result of a concurrent removal, which is the only reasonable explanation given that the filesystem is busied. So, the entry would be silently skipped. In the case of a failure to get attributes of an entry for an NFSv3 request, the entry would be silently skipped. There can be legitimate reasons for the failure, but NFSv3 does not provide any means to report the error, so we have two options: either fail the whole request or ignore the failed entry. Traditionally, the old NFS server used the latter option, so the code is reverted to it. Making the whole directory unreadable because of a single entry seems to be unpractical. Additionally, some bits of code are slightly re-arranged to account for the new control flow and to honor style(9). Reviewed by: rmacklem Sponsored by: Panzura Differential Revision: https://reviews.freebsd.org/D15424	2018-10-22 15:33:05 +00:00
Rick Macklem	910ccc7727	Fix the pNFS server's reporting of disk space usage for the "#<path>" case. The pNFS server would report the total disk space used and free for all of the DSs, even when certain DSs are assigned to the file system via the "#<path>" suffix used in the "nfsd -p" option argument. This patch fixes this case. It only reports usage for the file system that the argument vnode resides on. This is consistent with the non-pNFS NFSv4 server. In NFSv4 it is possible to have subtrees on other file systems, but these are not included in the usage information for NFSv4. Approved by: re (gjb)	2018-10-09 01:10:50 +00:00
Brooks Davis	9bc603bd20	Revert r339174: Move 32-bit compat support for FIODGNAME to the right place. A case was missed in this commit which breaks sshing into a 32-bit sshd on a 64-bit system. Approved by: re (gjb)	2018-10-04 23:55:03 +00:00
Brooks Davis	23f2e22802	Move 32-bit compat support for FIODGNAME to the right place. ioctl(2) commands only have meaning in the context of a file descriptor so translating them in the syscall layer is incorrect. The new handler users an accessor to retrieve/construct a pointer from the last member of the passed structure and relies on type punning to access the other member which requires no translation. Reviewed by: kib Approved by: re (rgrimes, gjb) Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Review: https://reviews.freebsd.org/D17388	2018-10-03 20:39:48 +00:00
Mark Murray	19fa89e938	Remove the Yarrow PRNG algorithm option in accordance with due notice given in random(4). This includes updating of the relevant man pages, and no-longer-used harvesting parameters. Ensure that the pseudo-unit-test still does something useful, now also with the "other" algorithm instead of Yarrow. PR: 230870 Reviewed by: cem Approved by: so(delphij,gtetlow) Approved by: re(marius) Differential Revision: https://reviews.freebsd.org/D16898	2018-08-26 12:51:46 +00:00
Fedor Uporov	28f4f62303	FUSE extattrs: fix issue when neither uio nor size were not passed to VOP_* (cosmetic only). Reviewed by: cem, pfg MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D13737	2018-08-21 18:50:29 +00:00
Fedor Uporov	493b4a8ccd	FUSE extattrs: fix issue when neither uio nor size were not passed to VOP_*. The requested size was returned incorrectly in case uio == NULL from listextattr because the nameprefix/name conversion was not applied. Also, make a_size/uio returning logic more unified with other filesystems. Reviewed by: cem, pfg MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D13528	2018-08-21 18:39:47 +00:00
Fedor Uporov	4c1e1d2bcc	Change unused inodes counters behavior in the cylinder groups. Make it more close to native ext4 implementation to avoid fsck errors.	2018-08-21 18:39:29 +00:00
Fedor Uporov	e49d64a7a7	Fix directory blocks checksum updating logic. Count dirent tail in the searchslot logic in case of directory block search. Add htree root csum update function call in case of rename.	2018-08-21 18:39:02 +00:00
Rick Macklem	fdab4d3b29	Fix LORs between vn_start_write() and vn_lock() in nfsrv_copymr(). When coding the pNFS server, I added vn_start_write() calls in nfsrv_copymr() done while the vnodes were locked, not realizing I had introduced LORs and possible deadlock when an exported file system on the MDS is suspended. This patch fixes the LORs by moving the vn_start_write() calls up to before where the vnodes are locked. For "tvp", the vn_start_write() probaby isn't necessary, because NFS mounts can't be suspended. However, I think doing so is harmless. Thanks go to kib@ for letting me know that I had introduced these LORs. This patch only affects the behaviour of the pNFS server when pnfsdscopymr(8) is used to recover a mirrored DS.	2018-08-18 19:14:06 +00:00
Rick Macklem	3e5ba2e187	Fix LORs between vn_start_write() and vn_lock() in the pNFS server. When coding the pNFS server, I added several vn_start_write() calls done while the vnode was locked, not realizing I had introduced LORs and possible deadlock when an exported file system on the MDS is suspended. This patch fixes this by removing the added vn_start_write() calls and modifying the code so that the extant vn_start_write() call before the NFS RPC/operation is done when needed by the pNFS server. Flags are changed so that LayoutCommit and LayoutReturn now get a vn_start_write() done for them. When the pNFS server is enabled, the code now also changes the flags for Getattr, so that the vn_start_write() is done for Getattr, since it may need to do a vn_set_extattr(). The nfs_writerpc flag array was made global to the NFS server and renamed nfsrv_writerpc, which is consistent naming for globals in the NFS server. Thanks go to kib@ for reporting that doing vn_start_write() while the vnode is locked results in a LOR. This patch only affects the behaviour of the pNFS server.	2018-08-17 21:12:16 +00:00
Rick Macklem	9fbb0faf4f	Don't set a file's size for the MDS file of a pNFS service. When a pNFS service is running, the size of the files created on the MDS are normally 0, since the data is written to the data files on the DS(s). However, without this patch, if a Setattr with a non-zero size was done by a client, the MDS file was set to that size. This was thought to be benign, but it turns out that files with a non-zero size plus extended attributes can cause a "ffs_truncate3" panic in UFS. Although the exact cause of this panic() has not been isolated, this patch avoids the panic() and leaves the MDS files in a consistent state of always having a size == 0. Note that these MDS files never store data. The patch also includes an unnecessary initialization of savsize in case some compiler or static analyser complains it might not be initialized. This patch only affects the NFS server when pNFS is enabled via the "-p" command line option on nfsd.	2018-08-17 12:32:38 +00:00
Jamie Gritton	284001a222	Put jail(2) under COMPAT_FREEBSD11. It has been the "old" way of creating jails since FreeBSD 7. Along with the system call, put the various security.jail.allow_foo and security.jail.foo_allowed sysctls partly under COMPAT_FREEBSD11 (or BURN_BRIDGES). These sysctls had two disparate uses: on the system side, they were global permissions for jails created via jail(2) which lacked fine-grained permission controls; inside a jail, they're read-only descriptions of what the current jail is allowed to do. The first use is obsolete along with jail(2), but keep them for the second-read-only use. Differential Revision: D14791	2018-08-16 18:40:16 +00:00
Conrad Meyer	5cb27f0813	FUSE: Document global sysctl knobs So that I don't have to keep grepping around the codebase to remember what each one does. And maybe it saves someone else some time. Fix a trivial whitespace issue while here. No functional change. Sponsored by: Dell EMC Isilon	2018-08-15 17:41:19 +00:00
Toomas Soome	527d337fdb	cd9660 pointer sign issues and missing __packed attribute The isonum_* functions are defined to take unsigend char* as an argument, but the structure fields are defined as char. Change to u_char where needed. Probably the full structure should be changed, but I'm not sure about the side affects. While there, add __packed attribute. Differential Revision: https://reviews.freebsd.org/D16564	2018-08-15 06:42:31 +00:00
Rick Macklem	41df1b5b47	Assorted fixes to handling of LayoutRecall callbacks, mostly error handling. After a re-read of the appropriate section of RFC5661, I decided that a few things should be changed related to LayoutRecall callback handling. Here are the things fixed by this patch. - For two of the three cases that LayoutRecall is done, I now think setting the clora_changed argument false is correct. - All errors other than NFSERR_DELAY returned by LayoutRecall appear permanent, so don't retry for any of them. (NFSERR_DELAY is retried by newnfs_request(), so it is not affected by this patch.) - Instead of waiting "forever" (actually until the process is SIGTERM'd) for Layouts to be returned during a mirror copy, fail and return ENXIO after about 1minute. Waiting for a <ctrl>C made sense when pnfsdscopymr() was done by itself, but did not make sense when done via find(1). This patch only affects the pNFS server.	2018-08-08 20:21:45 +00:00
Pedro F. Giffuni	c820acbf0a	msdosfs: fixes for Undefined Behavior. These were found by the Undefined Behaviour GsoC project at NetBSD: Do not change signedness bit with left shift. While there avoid signed integer overflow. Address both issues with using unsigned type. msdosfs_fat.c:512:42, left shift of 1 by 31 places cannot be represented in type 'int' msdosfs_fat.c:521:44, left shift of 1 by 31 places cannot be represented in type 'int' msdosfs_fat.c:744:14, left shift of 1 by 31 places cannot be represented in type 'int' msdosfs_fat.c:744:24, signed integer overflow: -2147483648 - 1 cannot be represented in type 'int [20]' msdosfs_fat.c:840:13, left shift of 1 by 31 places cannot be represented in type 'int' msdosfs_fat.c:840:36, signed integer overflow: -2147483648 - 1 cannot be represented in type 'int [20]' Detected with micro-UBSan in the user mode. Hinted from: NetBSD (CVS 1.33) MFC after: 2 weeks Differenctial Revision: https://reviews.freebsd.org/D16615	2018-08-08 15:08:22 +00:00
Fedor Uporov	53288b712d	Split the dir_index and dir_nlink features. Do not allow to create more that EXT4_LINK_MAX links to directory in case if the dir_nlink is not set, like it is done in the fresh e2fsprogs updates. MFC after: 3 months	2018-08-08 12:08:46 +00:00
Fedor Uporov	17c7b27f55	Fix directory blocks checksum updating logic. The checksum updating functions were not called in case of dir index inode splitting and in case of dir entry removing, when the entry was first in the block. Fix and move the dir entry adding logic when i_count == 0 to new function. MFC after: 3 months	2018-08-08 12:07:45 +00:00
Conrad Meyer	3dc1c7d6bc	FUSE: Remove some set-but-not-used variables No functional change.	2018-08-08 04:46:03 +00:00
Rick Macklem	93df87f208	Allow newnfs_request() to retry all callback RPCs with an NFSERR_DELAY reply. The code in newnfs_request() retries RPCs that get a reply of NFSERR_DELAY, but exempts certain NFSv4 operations. However, for callback RPCs, there should not be any exemptions at this time. The code would have erroneously exempted the CBRECALL callback, since it has the same operation number as the CLOSE operation. This patch fixes this by checking for a callback RPC (indicated by clp != NULL) and not checking for exempt operations for callbacks. This would have only affected the NFSv4 server when delegations are enabled (they are not enabled by default) and the client replies to CBRECALL with NFSERR_DELAY. This may never actually happen. Spotted during code inspection. MFC after: 2 weeks	2018-08-07 21:29:14 +00:00
Rick Macklem	25705dd5d0	Copy all bits of a file handle in case there is padding in the structure. At least on x86, fhandle_t is a packed structure, so I believe an assignment will copy all the bits. However, for some current/future architectures, there might be padding in the structure that doesn't get copied via an assignment. Since NFS assumes a file handle is an opaque blob of bits that can be compared via memcmp()/bcmp(), all the bits including any padding must be copied. This patch replaces the assignments with a call to a byte copy function. Spotted during code inspection.	2018-08-05 19:21:50 +00:00
Rick Macklem	ac0d649588	Silence newer gcc warnings. Newer versions of gcc generate "might not be initialized" warnings for several variables in nfsrpc_doiods(). I have checked and all of these variables are assigned values before they are used. In the one case of "tdrpc", it could have passed garbage as an argument to nfscl_dofflayoutio() when mirrorcnt is one. However nfscl_dofflayoutio() only uses the argument when mirrorcnt > 1, so it wasn't actually broken. This patch initializes "tdrpc" to avoid confusion and initializes the rest to make the compiler happy. Requested by: mmacy	2018-08-02 20:10:59 +00:00
Conrad Meyer	dab6195cd3	FUSE: Bump maximum IO size to enable more performant operation Various components restrict size of IO passed up to the userspace filesystem based on the mount's f_iosize value. The previous default of PAGE_SIZE is anemic, even for normal filesystems, but especially considering every FUSE operation involves a kernel <-> userspace IPC upcall. Bump to DFLTPHYS (currently 64kB) to match other FUSE implementations. Anecdotally, Jakub reports IO read performance increased from 600 MB/s -> 2700 MB/s with a basic RAM-backed FUSE filesystem. PR: 230260 Reported by: Peter (MooseFS) <freebsd AT moosefs.com> Tested by: Jakub Kruszona-Zawadzki <acid AT moosefs.com> MFC after: 3 days	2018-08-02 19:25:43 +00:00
Ed Maste	195e6c50d3	msdosfs: trim EOL whitespace	2018-07-31 12:44:28 +00:00
Ed Maste	a6274b81d5	cd9660: replace bcopy/bzero with C standard equivalents To reduce diffs against NetBSD.	2018-07-31 12:36:46 +00:00
Ed Maste	22e56aea3f	msdosfs: use same max filesize #define as NetBSD and move to header For use by makefs msdosfs support. Obtained from: NetBSD denode.h 1.6 Sponsored by: The FreeBSD Foundation	2018-07-30 20:36:51 +00:00
Rick Macklem	743d528198	Silence newer gcc warnings. Newer versions of gcc generate "set, but not used" warnings. Add __unused macros to silence these warnings. Although the variables are not being used, they are values parsed from arguments to callback RPCs that might be needed in the future. Requested by: mmacy	2018-07-30 20:25:32 +00:00
Rick Macklem	8014c97147	Silence newer gcc warnings. Newer versions of gcc generate "set, but not used" warnings in the NFS server. Add __unused macros to silence these warnings. Requested by: mmacy	2018-07-29 21:51:17 +00:00
Rick Macklem	a3e709cd33	Modify the NFSv4.1 server so that it allows ReclaimComplete as done by ESXi 6.7. I believe that a ReclaimComplete with rca_one_fs == TRUE is only to be used after a file system has been transferred to a different file server. However, RFC5661 is somewhat vague w.r.t. this and the ESXi 6.7 client does both a ReclaimComplete with rca_one_fs == TRUE and one with ReclaimComplete with rca_one_fs == FALSE. Therefore, just ignore the rca_one_fs == TRUE operation and return NFS_OK without doing anything instead of replying NFS4ERR_NOTSUPP. This allows the ESXi 6.7 NFSv4.1 client to do a mount. After discussion on the NFSv4 IETF working group mailing list, doing this along with setting a flag to note that a ReclaimComplete with rca_one_fs TRUE was an appropriate way to handle this. The flag that indicates that a ReclaimComplete with rca_one_fs == TRUE was done may be used to disable replies of NFS4ERR_GRACE for non-reclaim state operations in a future commit. This patch along with r332790, r334492 and r336357 allow ESXi 6.7 NFSv4.1 mounts work ok. ESX 6.5 NFSv4.1 mounts do not work well, due to what I believe are violations of RFC-5661 and should not be used. Reported by: andreas.nagy@frequentis.com Tested by: andreas.nagy@frequentis.com, daniel@ftml.net (earlier version) MFC after: 2 weeks Relnotes: yes	2018-07-28 20:21:04 +00:00
Eitan Adler	33f4bccaa6	Use https over http for FreeBSD pages	2018-07-27 10:40:48 +00:00
Ed Maste	6ae00e306f	Revert msdosfs MAKEFS #ifdef changes from r319870 These changes are not needed for current msdosfs makefs WIP. Submitted by: Siva Mahadevan Sponsored by: The FreeBSD Foundation	2018-07-24 21:10:17 +00:00
Rick Macklem	cecf6c6e9c	Set CLSET_TIMEOUT on TCP connections to pNFS DSs. Use CLSET_TIMEOUT to set the timeout for connections to DSs instead of specifying a timeout on each RPC. This is done so that SO_SNDTIMEO is set on the TCP socket as well as specifying a time limit when waiting for an RPC reply. Useful if the send queue for the TCP connection has become constipated, due to a failed DS. The choice of lease_duration / 4 is fairly arbitrary, but seems to work ok, with a lower bound of 10sec. For client connections to a DS, set the retry limit to vfs.nfsd.dsretries, which is 2 by default. This patch should only affect pNFS connections to DSs. This patch requires r336542. MFC after: 2 weeks	2018-07-21 01:33:07 +00:00
Alan Somers	5717aa2d2a	Allow mounting FUSE filesystems in jails Reviewed by: jamie MFC after: 2 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D16371	2018-07-20 21:35:31 +00:00
Rick Macklem	5d54f186bb	Modify the reasons for not issuing a delegation in the NFSv4.1 server. The ESXi NFSv4.1 client will generate warning messages when the reason for not issuing a delegation is two. Two refers to a resource limit and I do not see why it would be considered invalid. However it probably was not the best choice of reason for not issuing a delegation. This patch changes the reasons used to ones that the ESXi client doesn't complain about. This change does not affect the FreeBSD client and does not appear to affect behaviour of the Linux NFSv4.1 client. RFC5661 defines these "reasons" but does not give any guidance w.r.t. which ones are more appropriate to return to a client. Tested by: andreas.nagy@frequentis.com PR: 226650 MFC after: 2 weeks	2018-07-16 21:32:50 +00:00
Rick Macklem	5da3882447	Shut down the TCP connection to a DS in the pNFS client when Renew fails. When a NFSv4.1 client mount using pNFS detects a failure trying to do a Renew (actually just a Sequence operation), the code would simply try again and again and again every 30sec. This would tie up the "nfscl" thread, which should also be doing other things like Renews on other DSs and the MDS. This patch adds code which closes down the TCP connection and marks it defunct when Renew detects an failure to communicate with the DS, so further Renews will not be attempted until a new working TCP connection to the DS is established. It also makes the call to nfscl_cancelreqs() unconditional, since nfscl_cancelreqs() checks the NFSCLDS_SAMECONN flag and does so while holding the lock. This fix only applies to the NFSv4.1 client whne using pNFS and without it the only effect would have been an "nfscl" thread busy doing Renew attempts on an unresponsive DS. MFC after: 2 weeks	2018-07-15 18:54:44 +00:00
Rick Macklem	89c64a3a4f	Fix the pNFS client when mirrors aren't on the same machine. Without this patch, the client side NFSv4.1 pNFS code erroneously did writes and commits to both DS mirrors using the TCP connection of the first one. For my test setup this worked, since I have both DSs running on the same machine, but it would have failed when the DSs are on separate machines. This patch fixes the code to use the correct TCP connection for each DS. This patch should only affect the NFSv4.1 client when using "pnfs" mounts to mirrored DSs. MFC after: 2 weeks	2018-07-14 19:51:44 +00:00
Rick Macklem	0e7bd20bb2	Close down the TCP connection to a pNFS DS when it is disabled. So long as the TCP connection to a pNFS DS isn't shared with other DSs, it can be closed down when the DS is being disabled in the pNFS client. This causes any RPCs in progress to fail. This patch only affects the NFSv4.1 pNFS client when errors occur while doing I/O on a DS. MFC after: 2 weeks	2018-07-13 20:03:05 +00:00
Rick Macklem	83f526de6a	Change the pNFS client so that it does not report an NFSERR_STALE from an I/O attempt on a DS to the server via LayoutReturn. The current FreeBSD client can generate these errors for an operational DS while doing a recovery of a mirror after a mirrored DS has been repaired. I am not sure why these errors occur, but my best current guess is a race between the Layout Recall issued by the kernel code run from pnfsdscopymr(8) and a Read operation on the DS for the file bing copied. The errrors are not fatal, since the client falls back on doing I/O through the MDS, which can do the I/O successfully as a proxy. (The fact that the MDS can do this indicates that the file does still exist on the functioning DS.) This patch only affects behaviour of the pNFS client and only when using Flexible File layouts. MFC after: 2 weeks	2018-07-13 12:39:27 +00:00
Rick Macklem	a6fed5f514	Modify the NFSv4.1 pNFS client to use separate TCP connections for DSs. Without this patch, the NFSv4.1 pNFS client shared a single TCP connection for all DSs that resided on the same machine. This made disabling one of the DSs impossible. Although unlikely, it is possible that the storage subsystem has failed in such a way that the storage for one DS on a machine is no longer functioning correctly, but the storage used by another DS on the same machine is still ok. For this case, it would be nice if a system can fail one of the DSs without failing them all. This patch changes the default behaviour to use separate TCP connections for each DS even if they reside on the same machine. I do not believe that this will be a problem for extant pNFS servers, but a sysctl can be set to restore the old behaviour if this change causes a problem for an extant pNFS server. This patch only affects the NFSv4.1 pNFS client. MFC after: 2 weeks	2018-07-12 20:46:22 +00:00
Rick Macklem	8361de2544	Ignore the cookie verifier for NFSv4.1 when the cookie is 0. RFC5661 states that the cookie verifier should be 0 when the cookie is 0. However, the wording is somewhat unclear and a recent discussion on the nfsv4@ietf.org mailing list indicated that the NFSv4 server should ignore the cookie verifier's value when the dirctory offset cookie is 0. This patch deletes the check for this that would return NFSERR_BAD_COOKIE when the verifier was not 0. This was found during testing of the ESXi client against the NFSv4.1 server. Reported by: daniel@ftml.net (via packet trace) MFC after: 2 weeks	2018-07-11 23:23:29 +00:00
Rick Macklem	de9a1a70ab	Add support for a "forced" pnfsdskill to the pNFS server kernel code. The pnfsdskill(8) command will normally fail if there is no valid mirror for the DS to be disabled. However, a system administrator may need to disable a DS which does not have a valid mirror so that the nfsd threads can be terminated. This patch adds the kernel code needed by pnfsdskill(8) to implement this "forced" case of disabling a DS. This patch only affects the pNFS server.	2018-07-09 19:58:01 +00:00
Rick Macklem	acc6e58def	Fix the kernel part of pnfsdscopymr() to handle holes in the file being copied. If a mirrored DS is being recovered that has a lot of large sparse files, pnfsdscopymr(8) would use a lot of space on the recovered mirror since it would write the "holes" in the file being mirrored. This patch adds code to check for a "hole" and skip doing the write. The check is done on a "per PNFSDS_COPYSIZ size block", which is currently 64K. I think that most file server file systems will be using a blocksize at least this large. If the file server is using a smaller blocksize and smaller holes need to be preserved, PNFSDS_COPYSIZ could be decreased. The block of 0s is malloc()d, since pnfsdcopymr(8) should be an infrequent occurrence.	2018-07-08 18:15:55 +00:00
Rick Macklem	ed66a76bca	Fix handling of the hybrid DS case for a pNFS server. After the addition of the "#mds_path" suffix for a DS specification on the "-p" nfsd option, it is possible to have a mix of DSs assigned to an MDS file system and DSs that store files for all DSs. This is what I referred to as "hybrid" above. At first, I didn't think this hybrid case would be useful, but I now believe that some system administrators may fine it useful. This patch modifies the file storage assignment algorithm so that it makes the "#mds_path" DSs take priority and the all file systems DSs are now only used for MDS file systems with no "#mds_path" DS servers. This only affects the pNFS server for this "hybrid" case.	2018-07-07 19:27:49 +00:00
Rick Macklem	5b500ea949	Change the pNFS server so that it does not disable a mirrored DS for an NFSERR_STALE error reported via a LayoutReturn. The current FreeBSD client can generate these errors for an operational DS while doing a recovery of a mirror after a mirrored DS has been repaired. I am not sure why these errors occur, but my best current guess is a race between the Layout Recall issued by the kernel code run from pnfsdscopymr(8) and a Read operation on the DS for the file bing copied. The errors are not fatal, since the client falls back on doing I/O through the MDS, which can do the I/O successfully as a proxy. (The fact that the MDS can do this indicates that the file does still exist on the functioning DS.) This change only affects the pNFS server and only when a client does a LayoutReturn with the NFSERR_STALE error report.	2018-07-06 19:18:45 +00:00
Rick Macklem	ff3b992f38	Fix the pNFS server so that it handles the "#mds_path" check for mirrors. The recently added feature of the pNFS server will set an fsid for the MDS file system to define the file system a DS should store files for. For a case where a DS handling all file systems has failed, it was possible for the code to check for a mirror with a specified fs, even though nfsdev_mdsisset was 0, possibly causing a false successful check for a mirror. This patch adds a check for nfsdev_mdsisset != 0 to avoid this. It only affects the pNFS server for a rare case. Found via code inspection.	2018-07-04 19:46:26 +00:00
Rick Macklem	2f32675c83	Add an optional feature to the pNFS server. Without this patch, the pNFS server distributes the data storage files across all of the specified DSs. A tester noted that it would be nice if a system administrator could control which DSs are used to store the file data for a given exported MDS file system. This patch adds the kernel support to do this. It also makes a slight semantic change to nfsv4_findmirror(), since some uses of it no longer require that the DS being searched for have a current mirror. A patch that will be committed in a few minutes will modify the nfsd daemon to support this feature. The patch should only affect sites using the pNFS server (specified via the "-p" command line option for nfsd. Suggested by: james.rose@framestore.com	2018-07-02 19:21:33 +00:00
Rick Macklem	1aabf3fd5e	Fix the pNFS server for a case where mirror level equals number of DSs. If a pNFS service was set up where the number of DSs equals the mirror level and then a DS was disabled, the service would create files with duplicate entries for the same DS. This bug occurred because I didn't realize that TAILQ_FOREACH_FROM() would start at the beginning of the list when the inital value of the variable was NULL. This patch also changes the pNFS server DS file creation code so that it creates entrie(s) with 0.0.0.0 IP address when it cannot create mirror level files due to lack of DSs. The patch only affects the pNFS service and only when it was created with a number of DSs equal to the mirror level and mirroring is enabled.	2018-06-29 12:41:36 +00:00
Rick Macklem	9f4c522e6b	Set the slotid and ND_HASSLOTID flag for NFSv4.1 sequenced operations. Most NFSv4.1 compound RPCs start with a Sequence operation. For these cases, save the slotid and note that it is saved by setting ND_HASSLOTID. This is used by r335568 to free up the session slot and disable it. MFC after: 2 weeks	2018-06-23 00:48:45 +00:00
Rick Macklem	b18130d330	Define ND_HASSLOTID needed by r335568. r335568 uses a flag called ND_HASSLOTID to indicate that the slotid is set, so it can free and invalidate it. This flag needs to be set, which will be done in a subsequent commit. MFC after: 2 weeks	2018-06-23 00:37:15 +00:00
Rick Macklem	ba6cce3aea	Fix the handling of NFSv4.1 sessions for "soft" mounts. When a "soft" mount is used for NFSv4.1, an RPC that fails without completing will leave a slot in the NFSv4.1 session in an indeterminate state. As such, all that can be done is free up the slot while making is no longer usable. A "soft" NFSv4.1 mount is not recommended in general, since it will leave Open/Lock state in an indeterminate state. An exception is a pNFS mount of a DS, since there are no Opens/Locks done for them except file creates where loss of the Open state does not matter. The patch also makes connections to DSs soft, so that they will fail when a DS is non-functional or network partitioned, allowing the pNFS MDS to disable the DS for a mirrored configuration. This patch should not affect normal "hard" NFSv4.1 mounts. MFC after: 2 weeks	2018-06-22 21:37:20 +00:00
Rick Macklem	2e35b8fe24	Change the NFSv4.1 pNFS client so that it returns the DS error in layoutreturn. When the NFSv4.1 pNFS client gets an error for a DS I/O operation using a Flexible File layout, it returns the layout with an error. This patch changes the code slightly, so that it returns the layout for all errors except EACCES and lets the MDS decide what to do based on the error. It also makes a couple of changes to nfscl_layoutrecall() to ensure that the first layoutreturn(s) will have the error in the reply. Plus, the patch adds a wakeup() so that the "nfscl" thread won't wait 1sec before doing the LayoutReturn. Tested against the pNFS service. This patch should not affect non-pNFS use of the client. The unused "dsp" argument will be used by a future patch that disables the connection to the DS when possible. MFC after: 2 weeks	2018-06-22 21:25:27 +00:00
Rick Macklem	c16f407e31	Add a counter to limit the number of disabled DSs for a mirrored pNFS MDS. This patch adds a counter that limits the number of disabled mirrored DSs to mirror level - 1. It also makes a small change that keeps a Write that has failed with EACCES when attempted by a client to a DS from disabling the DS. This patch only affects the pNFS server.	2018-06-22 00:55:39 +00:00
Rick Macklem	755e4b7936	Revert r335263, since it can cause crashes in unusual circumstances. This needs to be fixed in a different way.	2018-06-17 23:08:54 +00:00
Rick Macklem	2bad64241c	Make the pNFS NFSv4.1 client return a Flexible File layout upon error. The Flexible File layout LayoutReturn operation has argument fields where an I/O error encountered when attempting I/O on a DS can be reported back to the MDS. This patch adds code to the client to do this for the Flexible File layout mirrored case. This patch should only affect mounts using the "pnfs" option against servers that support the Flexible File layout. MFC after: 2 weeks	2018-06-17 16:30:06 +00:00
Rick Macklem	46d30d3d9c	Fix NFSv4.1 client side handling of "soft,retrans=2" mounts. Normally "soft,retrans=2" cannot be safely used on NFSv4 mounts, since the RPC can fail and leave the open/lock state in an undefined state. Doing I/O on a pNFS DS is an exception to this, since no open/lock state is maintained on the DS server. It is useful to do "soft,retrans=2" connections to a DS when it is mirrored, so that the client can detect failure of the DS. As such, mounts from the MDS to the DSs should use these mount options when mirroring is enabled. However, the NFSv4.1 client still leaves the session in an undefined state when this happens. This patch fixes the problem by setting the session defunct, so it will no longer be used. The patch also sets "retries=2" on the connections done by the client to a DS, which is the internal equivalent of "soft,retrans=2". The client does not know if the server implements mirroring at connection time, but always doing this should be safe, since it will fall back on doing I/O via the MDS as a proxy when there is a failure doing an I/O RPC to the DS. This patch should not affect non-pNFS client mounts. MFC after: 2 weeks	2018-06-16 19:45:06 +00:00
Rick Macklem	c338c94d20	Move four functions in nfscl.ko to nfscommon.ko. Four functions nfscl_reqstart(), nfscl_fillsattr(), nfsm_stateidtom() and nfsmnt_mdssession() are now called from within the nfsd. As such, they needed to be moved from nfscl.ko to nfscommon.ko so that nfsd.ko would load when nfscl.ko wasn't loaded. Reported by: herbert@gojira.at	2018-06-14 10:00:19 +00:00
Bruce Evans	ab35e1c71b	Fix the encoding of major and minor numbers in 64-bit dev_t by restoring the old encodings for the lower 16 and 32 bits and only using the higher 32 bits for unusually large major and minor numbers. This change breaks compatibility with the previous encoding (which was only used in -current). Fix truncation to (essentially) 16-bit dev_t in newnfs v3. Any encoding of device numbers gives an ABI, so it can't be changed without translations for compatibility. Extra bits give the much larger complication that the translations need to compress into fewer bits. Fortunately, more than 32 bits are rarely needed, so compression is rarely needed except for 16-bit linux dev_t where it was always needed but never done. The previous encoding moved the major number into the top 32 bits. Almost no translation code handled this, so the major number was blindly truncated away in most 32-bit encodings. E.g., for ffs, mknod(8) with major = 1 and minor = 2 gave dev_t = 0x10000002; ffs cannot represent this and blindly truncated it to 2. But if this mknod was run on any released version of FreeBSD, it gives dev_t = 0x102. ffs can represent this, but in the previous encoding it was not decoded, giving major = 0, minor = 0x102. The presence of bugs was most obvious for exporting dev_t's from an old system to -current, since bugs in newnfs augment them. I fixed oldnfs to support 32-bit dev_t in 1996 (r16634), but this regressed to 16-bit dev_t in newnfs, first to the old 16-bit encoding and then further in -current. E.g., old ad0 with major = 234, minor = 0x10002 had the correct (major, minor) number on the wire, but newnfs truncated this to (234, 2) and then the previous encoding shifted the major number into oblivion as seen by ffs or old applications. I first tried to fix this by translating on every ABI/API boundary, but there are too many boundaries and too many sloppy translations by blind truncation. So use the old encoding for the low 32 bits so that sloppy translations work no worse than before provided the high 32 bits are not set. Add some error checking for when bits are lost. Keep not doing any error checking for translations for almost everything in compat/linux. compat/freebsd32/freebsd32_misc.c: Optionally check for losing bits after possibly-truncating assignments as before. compat/linux/linux_stats.c: Depend on the representation being compatible with Linux's (or just with itself for local use) and spell some of the translations as assignments in a macro that hides the details. fs/nfsclient/nfs_clcomsubs.c: Essentially the same fix as in 1996, except there is now no possible truncation in makedev() itself. Also fix nearby style bugs. kern/vfs_syscalls.c: As for freebsd32. Also update the sysctl description to include file numbers, and change it to describe device ids as device numbers. sys/types.h: Use inline functions (wrapped by macros) since the expressions are now a bit too complicated for plain macros. Describe the encoding and some of the reasons for it. 16-bit compatibility didn't leave many reasonable choices for the 32-bit encoding, and 32-bit compatibility doesn't leave many reasonable choices for the 64-bit encoding. My choice is to put the 8 new minor bits in the low 8 bits of the top 32 bits. This minimizes discontiguities. Reviewed by: kib (except for rewrite of the comment in linux_stats.c)	2018-06-13 12:22:00 +00:00
Rick Macklem	90d2dfab19	Merge the pNFS server code from projects/pnfs-planb-server into head. This code merge adds a pNFS service to the NFSv4.1 server. Although it is a large commit it should not affect behaviour for a non-pNFS NFS server. Some documentation on how this works can be found at: http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt and will hopefully be turned into a proper document soon. This is a merge of the kernel code. Userland and man page changes will come soon, once the dust settles on this merge. It has passed a "make universe", so I hope it will not cause build problems. It also adds NFSv4.1 server support for the "current stateid". Here is a brief overview of the pNFS service: A pNFS service separates the Read/Write oeprations from all the other NFSv4.1 Metadata operations. It is hoped that this separation allows a pNFS service to be configured that exceeds the limits of a single NFS server for either storage capacity and/or I/O bandwidth. It is possible to configure mirroring within the data servers (DSs) so that the data storage file for an MDS file will be mirrored on two or more of the DSs. When this is used, failure of a DS will not stop the pNFS service and a failed DS can be recovered once repaired while the pNFS service continues to operate. Although two way mirroring would be the norm, it is possible to set a mirroring level of up to four or the number of DSs, whichever is less. The Metadata server will always be a single point of failure, just as a single NFS server is. A Plan B pNFS service consists of a single MetaData Server (MDS) and K Data Servers (DS), all of which are recent FreeBSD systems. Clients will mount the MDS as they would a single NFS server. When files are created, the MDS creates a file tree identical to what a single NFS server creates, except that all the regular (VREG) files will be empty. As such, if you look at the exported tree on the MDS directly on the MDS server (not via an NFS mount), the files will all be of size 0. Each of these files will also have two extended attributes in the system attribute name space: pnfsd.dsfile - This extended attrbute stores the information that the MDS needs to find the data storage file(s) on DS(s) for this file. pnfsd.dsattr - This extended attribute stores the Size, AccessTime, ModifyTime and Change attributes for the file, so that the MDS doesn't need to acquire the attributes from the DS for every Getattr operation. For each regular (VREG) file, the MDS creates a data storage file on one (or more if mirroring is enabled) of the DSs in one of the "dsNN" subdirectories. The name of this file is the file handle of the file on the MDS in hexadecimal so that the name is unique. The DSs use subdirectories named "ds0" to "dsN" so that no one directory gets too large. The value of "N" is set via the sysctl vfs.nfsd.dsdirsize on the MDS, with the default being 20. For production servers that will store a lot of files, this value should probably be much larger. It can be increased when the "nfsd" daemon is not running on the MDS, once the "dsK" directories are created. For pNFS aware NFSv4.1 clients, the FreeBSD server will return two pieces of information to the client that allows it to do I/O directly to the DS. DeviceInfo - This is relatively static information that defines what a DS is. The critical bits of information returned by the FreeBSD server is the IP address of the DS and, for the Flexible File layout, that NFSv4.1 is to be used and that it is "tightly coupled". There is a "deviceid" which identifies the DeviceInfo. Layout - This is per file and can be recalled by the server when it is no longer valid. For the FreeBSD server, there is support for two types of layout, call File and Flexible File layout. Both allow the client to do I/O on the DS via NFSv4.1 I/O operations. The Flexible File layout is a more recent variant that allows specification of mirrors, where the client is expected to do writes to all mirrors to maintain them in a consistent state. The Flexible File layout also allows the client to report I/O errors for a DS back to the MDS. The Flexible File layout supports two variants referred to as "tightly coupled" vs "loosely coupled". The FreeBSD server always uses the "tightly coupled" variant where the client uses the same credentials to do I/O on the DS as it would on the MDS. For the "loosely coupled" variant, the layout specifies a synthetic user/group that the client uses to do I/O on the DS. The FreeBSD server does not do striping and always returns layouts for the entire file. The critical information in a layout is Read vs Read/Writea and DeviceID(s) that identify which DS(s) the data is stored on. At this time, the MDS generates File Layout layouts to NFSv4.1 clients that know how to do pNFS for the non-mirrored DS case unless the sysctl vfs.nfsd.default_flexfile is set non-zero, in which case Flexible File layouts are generated. The mirrored DS configuration always generates Flexible File layouts. For NFS clients that do not support NFSv4.1 pNFS, all I/O operations are done against the MDS which acts as a proxy for the appropriate DS(s). When the MDS receives an I/O RPC, it will do the RPC on the DS as a proxy. If the DS is on the same machine, the MDS/DS will do the RPC on the DS as a proxy and so on, until the machine runs out of some resource, such as session slots or mbufs. As such, DSs must be separate systems from the MDS. Tested by: james.rose@framestore.com Relnotes: yes	2018-06-12 19:36:32 +00:00
Rick Macklem	73b1879c2d	Add a couple of safety belt checks to the NFSv4.1 client related to sessions. There were a couple of cases in newnfs_request() that it assumed that it was an NFSv4.1 mount with a session. This should always be the case when a Sequence operation is in the reply or the server replies NFSERR_BADSESSION. However, if a server was broken and sent an erroneous reply, these safety belt checks should avoid trouble. The one check required a small tweak to nfsmnt_mdssession() so that it returns NULL when there is no session instead of the offset of the field in the structure (0x8 for i386). This patch should have no effect on normal operation of the client. Found by inspection during pNFS server development. MFC after: 2 weeks	2018-06-11 19:00:07 +00:00
Rick Macklem	8097753476	Add checks for the Flexible File layout to LayoutRecall callbacks. The Flexible File layout case wasn't handled by LayoutRecall callbacks because it just checked for File layout and returned NFSERR_NOMATCHLAYOUT otherwise. This patch adds the Flexible File layout handling. Found during testing of the pNFS server. MFC after: 2 weeks	2018-06-10 19:03:21 +00:00
Rick Macklem	be9d155ff4	Delete some macros that are unused. These macros were added because they were used by the pNFS server last year. However, they are no longer used by the pNFS server code and might as well be deleted. This is a partial reversion of r326735.	2018-06-09 23:38:22 +00:00
Rick Macklem	d506aa140d	Delete an unused macro and clean up a comment about it. NFSDEV_MIRRORSTR was defined for the pNFS server, but has not been used, so this patch deletes it. It also cleans up the comment and hopefully makes it more readable.	2018-06-09 23:14:59 +00:00
Rick Macklem	8472f76005	Revert r334586 since I now think __unused is the better way to handle this.	2018-06-04 11:35:04 +00:00
Rick Macklem	12c7a494ad	Fix a gcc8 warning about a write only variable. gcc8 warns that "verf" was set but not used. This was because the code that uses it is disabled via a "#if 0". This patch adds a "#if 0" to the variable's declaration and assignment to get rid of the warning. This way the code could be re-enabled without difficulty. Requested by: mmacy MFC after: 2 weeks	2018-06-03 19:46:44 +00:00
Rick Macklem	dec8894b45	Fix the default number of threads for Flex File layout pNFS client I/O. The intent was that the default would be based on number of CPUs, but the code disabled using taskqueue() by default. This code is only executed when mounting a NFSv4.1 server that supports the Flexible File layout for pNFS and, since such servers are rare, this change shouldn't result in a POLA violation. (The FreeBSD pNFS server is still a project and the only other one that uses Flexible File layout is being developed by Primary Data and I don't know if they have even shipped any to customers yet.) Found while testing the pNFS server.	2018-06-02 00:11:26 +00:00
Rick Macklem	9442a64e53	Add the BindConnectiontoSession operation to the NFSv4.1 server. Under some fairly unusual circumstances, the Linux NFSv4.1 client is doing a BindConnectiontoSession operation for TCP connections. It is also used by the ESXi6.5 NFSv4.1 client. This patch adds this operation to the NFSv4.1 server. Reported by: andreas.nagy@frequentis.com Tested by: andreas.nagy@frequentis.com MFC after: 2 weeks	2018-06-01 19:47:41 +00:00
Rick Macklem	440e2f9e91	Strengthen locking for the NFSv4.1 server DestroySession operation. If a client did a DestroySession on a session while it was still in use, the server might try to use the session structure after it is free'd. I think the client has violated RFC5661 if it does this, but this patch makes DestroySession block all other nfsd threads so no thread could be using the session when it is free'd. After the DestroySession, nfsd threads will not be able to find the session. The patch also adds a check for nd_sessionid being set, although if that was not the case it would have been all 0s and unlikely to have a false match. This might fix the crashes described in PR#228497 for the FreeNAS server. PR: 228497 MFC after: 1 week	2018-05-30 20:16:17 +00:00
Rick Macklem	260785fe60	Fix the sleep event for layout recall. The sleep for I/O completion during an NFSv4.1 pNFS layout recall used the wrong event value and could result in the "[nfscl]" thread hung for the mount. This patch fixes the event to be the correct. This bug will only affect NFSv4.1 pnfs mounts and only when the server does a layout recall callback, so it won't affect many. Without the patch, a mount without the "pnfs" option will avoid the problem. Found during testing of the pNFS server. MFC after: 1 week	2018-05-26 23:02:15 +00:00
Matt Macy	b7faa59dee	nfsclient: warnings cleanups	2018-05-20 06:14:12 +00:00
Ed Maste	891cf3ed44	Use NULL for SYSINIT's last arg, which is a pointer type Sponsored by: The FreeBSD Foundation	2018-05-18 17:58:09 +00:00
Rick Macklem	04b1905584	Add a missing nfsrv_freesession() call for an unlikely failure case. Since NFSv4.1 clients normally create a single session which supports both fore and back channels, it is unlikely that a callback will fail due to a lack of a back channel. However, if this failure occurred, the session wasn't being dereferenced and would never be free'd. Found by inspection during pNFS server development. Tested by: andreas.nagy@frequentis.com MFC after: 2 months	2018-05-17 21:17:20 +00:00
Kirk McKusick	4111ab7088	Revert change made in base r171522 (https://svnweb.freebsd.org/base?view=revision&revision=304232) converting clrbuf() (which clears the entire buffer) to vfs_bio_clrbuf() (which clears only the new pages that have been added to the buffer). Failure to properly remove pages from the buffer cache can make pages that appear not to need clearing to actually have bad random data in them. See for example base r304232 (https://svnweb.freebsd.org/base?view=revision&revision=304232) which noted the need to set B_INVAL and B_NOCACHE as well as clear the B_CACHE flag before calling brelse() to release the buffer. Rather than trying to find all the incomplete brelse() calls, it is simpler, though more slightly expensive, to simply clear the entire buffer when it is newly allocated. PR: 213507 Submitted by: Damjan Jovanovic Reviewed by: kib	2018-05-16 23:30:03 +00:00
Rick Macklem	0ebe2634be	End grace for the NFSv4 server if all mounts do ReclaimComplete. The NFSv4 protocol requires that the server only allow reclaim of state and not issue any new open/lock state for a grace period after booting. The NFSv4.0 protocol required this grace period to be greater than the lease duration (over 2minutes). For NFSv4.1, the client tells the server that it has done reclaiming state by doing a ReclaimComplete operation. If all NFSv4 clients are NFSv4.1, the grace period can end once all the clients have done ReclaimComplete, shortening the time period considerably. This patch does this. If there are any NFSv4.0 mounts, the grace period will still be over 2minutes. This change is only an optimization and does not affect correct operation. Tested by: andreas.nagy@frequentis.com MFC after: 2 months	2018-05-15 20:28:50 +00:00
Rick Macklem	8932a4835f	Fix the eir_server_scope reply argument for NFSv4.1 ExchangeID. In the reply to an ExchangeID operation, the NFSv4.1 server returns a "scope" value (eir_server_scope). If this value is the same, it indicates that two servers share state, which is never the case for FreeBSD servers. As such, the value needs to be unique and it was without this patch. However, I just found out that it is not supposed to change when the server reboots and without this patch, it did change. This patch fixes eir_server_scope so that it does not change when the server is rebooted. The only affect not having this patch has is that Linux clients don't reclaim opens and locks after a server reboot, which meant they lost any byte range locks held before the server rebooted. It only affects NFSv4.1 mounts and the FreeBSD NFSv4.1 client was not affected by this bug. MFC after: 1 week	2018-05-13 23:38:01 +00:00
Fedor Uporov	6d4a4ed747	Fix directory blocks checksumming. Reviewed by: pfg MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D15396	2018-05-13 19:48:30 +00:00
Fedor Uporov	c4aa9a026d	Fix on-disk inode checksum calculation logic. Reviewed by: pfg MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D15395	2018-05-13 19:29:35 +00:00
Fedor Uporov	e06e5241a0	Fix EXT2FS_DEBUG definition usage. Reviewed by: pfg MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D15394	2018-05-13 19:19:10 +00:00
Rick Macklem	0f13d146a0	Fix a slow leak of session structures in the NFSv4.1 server. For a fairly rare case of a client doing an ExchangeID after a hard reboot, the old confirmed clientid still exists, but some clients use a new co_verifier. For this case, the server was not freeing up the sessions on the old confirmed clientid. This patch fixes this case. It also adds two LIST_INIT() macros, which are actually no-ops, since the structure is malloc()d with M_ZERO so the pointer is already set to NULL. It should have minimal impact, since the only way I could exercise this code path was by doing a hard power cycle (pulling the plus) on a machine running Linux with a NFSv4.1 mount on the server. Originally spotted during testing of the ESXi 6.5 client. Tested by: andreas.nagy@frequentis.com MFC after: 2 months	2018-05-13 12:42:53 +00:00
Rick Macklem	bb3436966a	The NFSv4.1 server should return NFSERR_BACKCHANBUSY instead of NFS_OK. When an NFSv4.1 session is busy due to a callback being in progress, nfsrv_freesession() should return NFSERR_BACKCHANBUSY instead of NFS_OK. The only effect this has is that the DestroySession operation will report the failure for this case and this probably has little or no effect on a client. Spotted by inspection and no failures related to this have been reported. MFC after: 2 months	2018-05-13 12:29:09 +00:00
Rick Macklem	5d4835e4b7	Add support for the TestStateID operation to the NFSv4.1 server. The Linux client now uses the TestStateID operation, so this patch adds support for it to the NFSv4.1 server. The FreeBSD client never uses this operation, so it should not be affected. MFC after: 2 months	2018-05-11 22:16:23 +00:00
Matt Macy	cbd92ce62e	Eliminate the overhead of gratuitous repeated reinitialization of cap_rights - Add macros to allow preinitialization of cap_rights_t. - Convert most commonly used code paths to use preinitialized cap_rights_t. A 3.6% speedup in fstat was measured with this change. Reported by: mjg Reviewed by: oshogbo Approved by: sbruno MFC after: 1 month	2018-05-09 18:47:24 +00:00
Pedro F. Giffuni	b732ceb6ca	msdosfs: use vfs_timestamp() to generate timestamps instead of getnanotime(). Most filesystems, with the notable exceptions of msdosfs and autofs use only vfs_timestamp() to read the current time. This has the benefit of configurable granularity (using the vfs.timestamp_precision sysctl). For convenience, use it on msdosfs too. Submitted by: Damjan Jovanovic Differential Revision: https://reviews.freebsd.org/D15297	2018-05-06 21:29:29 +00:00
Jamie Gritton	0e5c6bd436	Make it easier for filesystems to count themselves as jail-enabled, by doing most of the work in a new function prison_add_vfs in kern_jail.c Now a jail-enabled filesystem need only mark itself with VFCF_JAIL, and the rest is taken care of. This includes adding a jail parameter like allow.mount.foofs, and a sysctl like security.jail.mount_foofs_allowed. Both of these used to be a static list of known filesystems, with predefined permission bits. Reviewed by: kib Differential Revision: D14681	2018-05-04 20:54:27 +00:00
Pedro F. Giffuni	c85866888d	msdosfs: long names of files are created incorrectly. This fixes a regression that happened in r120492 (2003) where libkiconv was introduced and we went from checking unlen to checking for '\0'. PR: 111843 Patch by: Damjan Jovanovic MFC after: 1 week	2018-05-04 03:44:12 +00:00
Rick Macklem	7427a9f138	Revert r333183, since I am not sure that just initializing the list is the correct thing to do and that is already done without this commit.	2018-05-02 21:29:42 +00:00
Rick Macklem	858bb2fc1a	Add two missing LIST_INIT()s. This patch adds two missing LIST_INIT()s. Found by inspection. In practice, these are currently no-ops, since the structure they are in is malloc'd with M_ZERO and all LIST_INIT does is set the pointer in the list head to NULL. (In other words, the M_ZERO has already correctly initialized it.) MFC after: 2 months	2018-05-02 20:36:11 +00:00
Eitan Adler	e07db02261	[procfs] Split procfs_attr into multiple functions Reviewed by: des, kib Discussed with: mmacy Differential Revision: https://reviews.freebsd.org/D15150	2018-04-24 14:49:09 +00:00
Rick Macklem	cb0d9834d4	Fix use of pointer after being set NULL. Using a pointer after setting it NULL is probably not a good plan. Spotted by inspection during changes for Flexible File Layout Ioerr handling. This code path obviously isn't normally executed. MFC after: 1 week	2018-04-20 11:38:29 +00:00
Rick Macklem	6269d66373	Fix OpenDowngrade for NFSv4.1 if a client sets the OPEN_SHARE_ACCESS_WANT* bits. The NFSv4.1 RFC specifies that the OPEN_SHARE_ACCESS_WANT bits can be set in the OpenDowngrade share_access argument and are basically ignored. I do not know of a extant NFSv4.1 client that does this, but this little patch fixes it just in case. It also changes the error from NFSERR_BADXDR to NFSERR_INVAL since the NFSv4.1 RFC specifies this as the error to be returned if bogus bits are set. (The NFSv4.0 RFC didn't specify any error for this, so the error reply can be changed for NFSv4.0 as well.) Found by inspection while looking at a problem with OpenDowngrade reported for the ESXi 6.5 NFSv4.1 client. Reported by: andreas.nagy@frequentis.com PR: 227214 MFC after: 1 week	2018-04-19 20:30:33 +00:00
Brooks Davis	6469bdcdb6	Move most of the contents of opt_compat.h to opt_global.h. opt_compat.h is mentioned in nearly 180 files. In-progress network driver compabibility improvements may add over 100 more so this is closer to "just about everywhere" than "only some files" per the guidance in sys/conf/options. Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h is created on all architectures. Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the set of compiled files. Reviewed by: kib, cem, jhb, jtl Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14941	2018-04-06 17:35:35 +00:00
Benno Rice	7acb51f681	Add isoboot(8) for booting BIOS systems from HDDs containing ISO images. This is part of a project for adding the ability to create hybrid CD/USB boot images. In the BIOS case when booting from something that isn't a CD we need some extra boot code to actually find our next stage (loader) within an ISO9660 filesystem. This code will reside in a GPT partition (similar to gptboot(8) from which it is derived) and looks for /boot/loader in an ISO9660 filesystem on the image. Reviewed by: imp Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D14914	2018-04-05 19:40:46 +00:00
Ed Maste	d8ba45e213	Revert r313780 (UFS_ prefix)	2018-03-17 12:59:55 +00:00
Ed Maste	1e2b9afca9	Prefix UFS symbols with UFS_ to reduce namespace pollution Followup to r313780. Also prefix ext2's and nandfs's versions with EXT2_ and NANDFS_. Reported by: kib Reviewed by: kib, mckusick Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D9623	2018-03-17 01:48:27 +00:00
Hajimu UMEMOTO	9f5fab694c	Fix Bad file descriptor error. MFC after: 1 week	2018-03-09 04:45:24 +00:00
Eitan Adler	40301da899	sys/fuse: fix off by one error Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com> Reported by: Domagoj Stolfa <domagoj.stolfa@gmail.com>	2018-03-03 20:42:39 +00:00
Pedro F. Giffuni	7cbd6d338e	{ext2\|ufs}_readdir: Avoid setting negative ncookies. ncookies cannot be negative or the allocator will fail. This should only happen if a caller is very broken but we can still try to survive the event. We should probably also verify for uio_resid > MAXPHYS but in that case it is not clear that just clipping the ncookies value is an adequate response. MFC after: 2 weeks	2018-02-06 22:38:19 +00:00
Jeff Roberson	e2068d0bcd	Use per-domain locks for vm page queue free. Move paging control from global to per-domain state. Protect reservations with the free lock from the domain that they belong to. Refactor to make vm domains more of a first class object. Reviewed by: markj, kib, gallatin Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14000	2018-02-06 22:10:07 +00:00
Pedro F. Giffuni	fdc154e44a	ext2fs: remove EXT4F_RO_INCOMPAT_SUPP This was a hack to be able to mount ext4 filesystems read-only while not supporting all the features. We now support all those features so it doesn't make sense to keep the undocumented hack. Discussed with: fsu	2018-02-05 15:14:01 +00:00
Pedro F. Giffuni	f86f5cd406	ext2fs: Cleanup variable assignments for extents. Delay the initialization of variables until the are needed. In the case of ext4_ext_rm_leaf(), make sure 'error' value is not undefined. Reported by: Clang's static analyzer Differential Revision: https://reviews.freebsd.org/D14193	2018-02-05 14:30:27 +00:00
Fedor Uporov	7c4fa61e6f	Fix mistake in case of zeroed inode check. Reported by: pho MFC after: 6 months	2018-01-29 22:15:46 +00:00
Fedor Uporov	c0f16c65cd	Add flex_bg/meta_bg features RW support. Reviewed by: pfg MFC after: 6 months Differential Revision: https://reviews.freebsd.org/D13964	2018-01-29 21:54:13 +00:00
Pedro F. Giffuni	040fb18b60	Revert r328479: {ext2\|ufs}_readdir: Set limit on valid ncookies values. We aren't allowed to set resid like this. Pointed out by: kib, imp	2018-01-27 16:34:00 +00:00
Pedro F. Giffuni	ee233ab975	{ext2\|ufs}_readdir: Set limit on valid ncookies values. Sanitize the values that will be assigned to ncookies so that we ensure they are sane and we can handle them. Let ncookies signed as it was before r328346. The valid range is such that unsigned values are not required and we are not able to avoid at least one cast anyways. Hinted by: bde	2018-01-27 15:33:52 +00:00
Conrad Meyer	b97b91b547	nfs: Remove NFSSOCKADDRALLOC, NFSSOCKADDRFREE macros They were just thin wrappers over malloc(9) w/ M_ZERO and free(9). Discussed with: rmacklem, markj Sponsored by: Dell EMC Isilon	2018-01-25 22:38:39 +00:00
Conrad Meyer	222daa421f	style: Remove remaining deprecated MALLOC/FREE macros Mechanically replace uses of MALLOC/FREE with appropriate invocations of malloc(9) / free(9) (a series of sed expressions). Something like: * MALLOC(a, b, ... -> a = malloc(... * FREE( -> free( * free((caddr_t) -> free( No functional change. For now, punt on modifying contrib ipfilter code, leaving a definition of the macro in its KMALLOC(). Reported by: jhb Reviewed by: cy, imp, markj, rmacklem Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14035	2018-01-25 22:25:13 +00:00

... 2 3 4 5 6 ...

4077 Commits