freebsd-dev

Author	SHA1	Message	Date
John Baldwin	800c3e80de	Don't defer wakeup()s for completed journal workitems. Normally wakeups() are performed for completed softupdates work items in workitem_free() before the underlying memory is free()'d. complete_jseg() was clearing the "wakeup needed" flag in work items to defer the wakeup until the end of each loop iteration. However, this resulted in the item being free'd before it's address was used with wakeup(). As a result, another part of the kernel could allocate this memory from malloc() and use it as a wait channel for a different "event" with a different lock. This triggered an assertion failure when the lock passed to sleepq_add() did not match the existing lock associated with the sleep queue. Fix this by removing the code to defer the wakeup in complete_jseg() allowing the wakeup to occur slightly earlier in workitem_free() before free() is called. The main reason I can think of for deferring a wakeup() would be to avoid waking up a waiter while holding a lock that the waiter would need. However, no locks are dropped in between the wakeup() in workitem_free() and the end of the loop in complete_jseg() as far as I can tell. In general I think it is not safe to do a wakeup() after free() as one cannot control how other parts of the kernel that might reuse the address for a different wait channel will handle spurious wakeups. Reported by: pho Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D12494	2017-09-26 23:24:15 +00:00
Konstantin Belousov	55e5a5c1f4	Fix 32bit build. Reported by: emaste Sponsored by: The FreeBSD Foundation	2017-09-22 16:42:41 +00:00
Kirk McKusick	75e3597abb	Continuing efforts to provide hardening of FFS, this change adds a check hash to cylinder groups. If a check hash fails when a cylinder group is read, no further allocations are attempted in that cylinder group until it has been fixed by fsck. This avoids a class of filesystem panics related to corrupted cylinder group maps. The hash is done using crc32c. Check hases are added only to UFS2 and not to UFS1 as UFS1 is primarily used in embedded systems with small memories and low-powered processors which need as light-weight a filesystem as possible. Specifics of the changes: sys/sys/buf.h: Add BX_FSPRIV to reserve a set of eight b_xflags that may be used by individual filesystems for their own purpose. Their specific definitions are found in the header files for each filesystem that uses them. Also add fields to struct buf as noted below. sys/kern/vfs_bio.c: It is only necessary to compute a check hash for a cylinder group when it is actually read from disk. When calling bread, you do not know whether the buffer was found in the cache or read. So a new flag (GB_CKHASH) and a pointer to a function to perform the hash has been added to breadn_flags to say that the function should be called to calculate a hash if the data has been read. The check hash is placed in b_ckhash and the B_CKHASH flag is set to indicate that a read was done and a check hash calculated. Though a rather elaborate mechanism, it should also work for check hashing other metadata in the future. A kernel internal API change was to change breada into a static fucntion and add flags and a function pointer to a check-hash function. sys/ufs/ffs/fs.h: Add flags for types of check hashes; stored in a new word in the superblock. Define corresponding BX_ flags for the different types of check hashes. Add a check hash word in the cylinder group. sys/ufs/ffs/ffs_alloc.c: In ffs_getcg do the dance with breadn_flags to get a check hash and if one is provided, check it. sys/ufs/ffs/ffs_vfsops.c: Copy across the BX_FFSTYPES flags in background writes. Update the check hash when writing out buffers that need them. sys/ufs/ffs/ffs_snapshot.c: Recompute check hash when updating snapshot cylinder groups. sys/libkern/crc32.c: lib/libufs/Makefile: lib/libufs/libufs.h: lib/libufs/cgroup.c: Include libkern/crc32.c in libufs and use it to compute check hashes when updating cylinder groups. Four utilities are affected: sbin/newfs/mkfs.c: Add the check hashes when building the cylinder groups. sbin/fsck_ffs/fsck.h: sbin/fsck_ffs/fsutil.c: Verify and update check hashes when checking and writing cylinder groups. sbin/fsck_ffs/pass5.c: Offer to add check hashes to existing filesystems. Precompute check hashes when rebuilding cylinder group (although this will be done when it is written in fsutil.c it is necessary to do it early before comparing with the old cylinder group) sbin/dumpfs/dumpfs.c Print out the new check hash flag(s) sbin/fsdb/Makefile: Needs to add libufs now used by pass5.c imported from fsck_ffs. Reviewed by: kib Tested by: Peter Holm (pho)	2017-09-22 12:45:15 +00:00
John Baldwin	4f45713ae2	Add UFS_LINK_MAX for the UFS-specific limit on link counts. ino64 expanded nlink_t to 64 bits, but the on-disk format for UFS is still limited to 16 bits. This is a nop currently but will matter if LINK_MAX is increased in the future. Reviewed by: kib Sponsored by: Chelsio Communications	2017-09-18 23:30:39 +00:00
Kirk McKusick	855662c611	The new fsck recovery information to enable it to find backup superblocks created in revision 322297 only works on disks with sector sizes up to 4K. This update allows the recovery information to be created by newfs and used by fsck on disks with sector sizes up to 64K. Note that FFS currently limits filesystem to be mounted from disks with up to 8K sectors. Expanding this limitation will be the subject of another commit. Reported by: Peter Holm Reviewed with: kib	2017-09-04 20:19:36 +00:00
Konstantin Belousov	2f9d88c7ae	Protect v_rdev dereference with the vnode interlock instead of the vnode lock. Caller of softdep_count_dependencies() may own a buffer lock, which might conflict with the lock order. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 10 days	2017-08-25 09:51:22 +00:00
Konstantin Belousov	f0d5223230	Avoid dereferencing potentially freed workitem in softdep_count_dependencies(). Buffer's b_dep list is protected by the SU mount lock. Owning the buffer lock is not enough to guarantee the stability of the list. Calculation of the UFS mount owning the workitems from the buffer must be much more careful to not dereference the work item which might be freed meantime. To get to ump, use the pointers chain which does not involve workitems at all. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-08-21 16:23:44 +00:00
Konstantin Belousov	b5f2560d09	Style. Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 3 days	2017-08-21 16:16:02 +00:00
Kirk McKusick	77b63aa0fc	Since the switch to GPT disk labels, fsck for UFS/FFS has been unable to automatically find alternate superblocks. This checkin places the information needed to find alternate superblocks to the end of the area reserved for the boot block. Filesystems created with a newfs of this vintage or later will create the recovery information. If you have a filesystem created prior to this change and wish to have a recovery block created for your filesystem, you can do so by running fsck in forground mode (i.e., do not use the -p or -y options). As it starts, fsck will ask ``SAVE DATA TO FIND ALTERNATE SUPERBLOCKS'' to which you should answer yes. Discussed with: kib, imp MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D11589	2017-08-09 05:17:21 +00:00
Kirk McKusick	33bbdde01f	Avoid reading a snapshot block when it is already in the cache. Update the use of the B_CACHE flag (since the May 1999 commit that made it the correct test here). Reported by: Andreas Longwitz <longwitz@incore.de> Reviewed by: kib Tested by: Peter Holm MFC after: 1 week	2017-07-31 20:41:45 +00:00
Konstantin Belousov	5cf14660ae	Improve publication of the newly allocated snapdata. For freshly allocated snapdata, Lock sn_lock in advance, so si_snapdata readers see the locked snapdata and not race. For existing snapdata, if the thread was put to sleep waiting for sn_lock, re-read si_snapdata. This either closes the race or makes the reliance on LK_DRAIN less important. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-07-21 18:42:35 +00:00
Konstantin Belousov	c536471408	Unlock correct lock in ffs_snapblkfree(). It is possible for ffs_snapblkfree() to race and lock snaplock while the devvp snapdata is instantiated, but no snapshots exist. In this case the loop over snapshots in ffs_snapblkfree() is not executed, and the local variable vp is left initialized to NULL. Unlock using &sn->sn_lock and not vp->v_vnlock. For the inodes on the snapshot list, the locks are same. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-07-21 18:36:17 +00:00
Konstantin Belousov	f2e6bf5c05	Account for lock recursion when transfering snaplock to the vnode lock in ffs_snapremove(). Apparently ffs_snapremove() may be called with the snap lock recursed, at least one trace demonstrated this when snapshot vnode was unlinked while synced. It was inactivated from the syncer thread. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-07-21 18:28:27 +00:00
Konstantin Belousov	35bf780921	Remove write-only variable. Tested by: pho Sponsored by: The FreeBSD Foundation	2017-07-16 07:12:04 +00:00
Konstantin Belousov	51a6a15f8c	A followup to r320453, correct removal of the blocks from UFS snapshots. Tested by: pho PR: 220693 Sponsored by: The FreeBSD Foundation	2017-07-16 07:11:29 +00:00
John Baldwin	15a88f8158	Consistently use vop_stdpathconf() for default pathconf values. Update filesystems not currently using vop_stdpathconf() in pathconf VOPs to use vop_stdpathconf() for any configuration variables that do not have filesystem-specific values. vop_stdpathconf() is used for variables that have system-wide settings as well as providing default values for some values based on system limits. Filesystems can still explicitly override individual settings. PR: 219851 Reported by: cem Reviewed by: cem, kib, ngie MFC after: 1 month Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D11541	2017-07-11 21:55:20 +00:00
Kirk McKusick	9c4f551e98	Create a new function ffs_getcg() to read in and verify a cylinder group. Change all code points that open-coded this functionality to use the new function. This commit is a refactoring with no change in functionality. In the future this change allows more robust checking of cylinder group reads along the lines discussed in the hardening UFS session at BSDCan (retry I/O, add checksums, etc). For more detail see the session notes at https://wiki.freebsd.org/DevSummit/201706/HardeningUFS Reviewed by: kib	2017-06-28 17:32:09 +00:00
Konstantin Belousov	698f05ab95	Mitigate several problems with the softdep_request_cleanup() on busy host. Problems start appearing when there are several threads all doing operations on a UFS volume and the SU workqueue needs a cleanup. It is possible that each thread calling softdep_request_cleanup() owns the lock for some dirty vnode (e.g. all of them are executing mkdir(2), mknod(2), creat(2) etc) and all vnodes which must be flushed are locked by corresponding thread. Then, we get all the threads simultaneously entering softdep_request_cleanup(). There are two problems: - Several threads execute MNT_VNODE_FOREACH_ALL() loops in parallel. Due to the locking, they quickly start executing 'in phase' with the speed of the slowest thread. - Since each thread already owns the lock for a dirty vnode, other threads non-blocking attempt to lock the vnode owned by other thread fail, and loops executing without making the progress. Retry logic does not allow the situation to recover. The result is a livelock. Fix these problems by making the following changes: - Allow only one thread to enter MNT_VNODE_FOREACH_ALL() loop per mp. A new flag FLUSH_RC_ACTIVE guards the loop. - If there were failed locking attempts during the loop, abort retry even if there are still work items on the mp work list. An assumption is that the items will be cleaned when other thread either fsyncs its vnode, or unlock and allow yet another thread to make the progress. It is possible now that some calls would get undeserved ENOSPC from ffs_alloc(), because the cleanup is not aggressive enough. But I do not see how can we reliably clean up workitems if calling softdep_request_cleanup() while still owning the vnode lock. I thought about scheme where ffs_alloc() returns ERESTART and saves the retry counter somewhere in struct thread, to return to the top level, unlock the vnode and retry. But IMO the very rare (and unproven) spurious ENOSPC is not worth the complications. Reported and tested by: pho Style and comments by: mckusick Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-06-03 16:18:50 +00:00
Konstantin Belousov	4cbc378c61	Clean possible td_su reference on the struct mount being unmounted as the last step of ffs_unmount(). It is possible that the mount point is recorded for cleanup in AST context while softdep flush is executed during unmount. The workitems are flushed by other means for the unmount, but the stray reference to struct mount blocks destruction of mount. Check for the situation and manually call vfs_rel() before returning from ffs_unmount(). Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-06-03 14:15:14 +00:00
Konstantin Belousov	215b29f62c	Remove spl() calls from UFS code. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-05-07 14:59:45 +00:00
Ed Maste	3cf259c390	UFS fs.h: clear warning from use in makefs(1) makefs(1) has a number of signedness warnings (when built with higher WARNS), most of which can be addressed by careful application of casts in makefs itself. There is one case where a signedness warning arises from the blksize macro, so must be addressed in the macro itself. Reviewed by: kib, mckusick MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D10589	2017-05-05 15:26:55 +00:00
Gleb Smirnoff	9ed01c32e0	All these files need sys/vmmeter.h, but now they got it implicitly included via sys/pcpu.h.	2017-04-17 17:07:00 +00:00
Conrad Meyer	a96da1c3fb	ufs: Export UFS_MAXNAMLEN to pathconf, statfs Rather than the global NAME_MAX constant. This change is required to support systems with a NAME_MAX/MAXNAMLEN that differs from UFS_MAXNAMLEN. This was missed in r313475 due to the alternative spelling ("NAME_MAX") of MAXNAMLEN. This change is also similar in spirit to r313780. Reported by: ngie@ Sponsored by: Dell EMC Isilon	2017-04-05 01:44:03 +00:00
Warner Losh	fbbd9655e5	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96	2017-02-28 23:42:47 +00:00
Konstantin Belousov	aca4bb9112	Do not leak mount references for dying threads. Thread might create a condition for delayed SU cleanup, which creates a reference to the mount point in td_su, but exit without returning through userret(), e.g. when terminating due to single-threading or process exit. In this case, td_su reference is not dropped and mount point cannot be freed. Handle the situation by clearing td_su also in the thread destructor and in exit1(). softdep_ast_cleanup() has to receive the thread as argument, since e.g. thread destructor is executed in different context. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-02-25 10:38:18 +00:00
Ed Maste	1dc349ab95	prefix UFS symbols with UFS_ to reduce namespace pollution Specifically: ROOTINO -> UFS_ROOTINO WINO -> UFS_WINO NXADDR -> UFS_NXADDR NDADDR -> UFS_NDADDR NIADDR -> UFS_NIADDR MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency) Also prefix ext2's and nandfs's NDADDR and NIADDR with EXT2_ and NANDFS_ Reviewed by: kib, mckusick Obtained from: NetBSD MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D9536	2017-02-15 19:50:26 +00:00
Conrad Meyer	0ecf59f68f	ufs: Use UFS_MAXNAMLEN constant (like NFS, EXT2FS, SVR4, IBCS2) instead of redefining the MAXNAMLEN constant. No functional change. Reviewed by: kib@, markj@ Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D9500	2017-02-09 17:47:01 +00:00
Conrad Meyer	675c187cc4	ffs_vnops: Simplify extattr access As suggested in r167010, use the structure type and macros to access and modify UFS2 extended attributes. Add assertions that pointers are aligned in places where we now access the data through a structure pointer, instead of character-by-character. PR: 216127 Reported by: dewayne at heuristicsystems.com.au Reviewed by: kib@ Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D9225	2017-01-19 16:46:05 +00:00
Conrad Meyer	c9bf814804	restore(8): Handle extended attribute names correctly UFS2 extended attribute names are not NUL-terminated. Handle appropriately. Correct the EXTATTR_BASE_LENGTH() macro, which handled ea_namelength == one (mod eight) extended attributes incorrectly. PR: 216127 Reported by: dewayne at heuristicsystems.com.au Reviewed by: kib@ Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D9208	2017-01-18 18:16:57 +00:00
Conrad Meyer	6e02fac9d9	ufs/extattr.h: Fix documentation of ea_name termination The ea_name string is not nul-terminated. Correct the documentation. Because the subsequent field is padded to 8 bytes, and the padding is zeroed, the ea_name string will appear to be nul-terminated whenever the length isn't exactly one (mod eight). This was introduced in r167010 (2007). Additionally, mark the length fields as unsigned. This particularly matters for the single byte ea_namelength field, which can represent extended attribute names up to 255 bytes long. No functional change. PR: 216127 Reported by: dewayne at heuristicsystems.com.au Reviewed by: kib@ Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D9206	2017-01-18 17:55:49 +00:00
Konstantin Belousov	1c32456953	Use type-independent formats for printing nlink_t and ino_t. Extracted from: ino64 work by gleb, mckusick Discussed with: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-06 16:59:33 +00:00
Mark Johnston	99e6e1930c	Release laundered vnode pages to the head of the inactive queue. The swap pager enqueues laundered pages near the head of the inactive queue to avoid another trip through LRU before reclamation. This change adds support for this behaviour to the vnode pager and makes use of it in UFS and ext2fs. Some ioflag handling is consolidated into a common subroutine so that this support can be easily extended to other filesystems which make use of the buffer cache. No changes are needed for ZFS since its putpages routine always undirties the pages before returning, and the laundry thread requeues the pages appropriately in this case. Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D8589	2016-11-23 17:53:07 +00:00
Konstantin Belousov	714b7df502	Provide simple mutual exclusion between mount point update and unmount. Currently mount update keeps vfs_busy(9) reference on the mount point during MNT_UPDATE VFS_MOUNT() vfsops call. This already provides the exclusion, but is problematic for filesystems which need to perform namei(9) during VFS_MOUNT(MNT_UPDATE) operations, e.g. to refresh mnt_from path, because namei(9) must not be called while the vfs_busy(9) reference is owned. Check for MNT_UPDATE flag before setting MNTK_UNMOUNT, and for MNTK_UNMOUNT before entering innards of vfs_domount_update(), failing syscalls with EBUSY if conflict is detected. Keep vfs_busy(9) reference around VFS_MOUNT(MNT_UPDATE) calls still to not change VFS KPI. In the update path in ffs_mount(), drop vfs_busy() reference around namei(), which is now safe due to unmount never executing in parallel with VFS_MOUNT(MNT_UPDATE), and which avoids the deadlock. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-11-13 21:49:51 +00:00
Ed Maste	15c377c3cc	ANSIfy ffs_subr.c Also renumber license clause to avoid skipping #3	2016-10-31 20:43:43 +00:00
Kirk McKusick	ad544726aa	Avoid possible overflow when calclating malloc size for auxillary data structure sizes when mounting and reloading UFS/FFS filesystems by using a u_long rather than an int for the size. Reported by: Mariusz Zaborski <oshogbo@> MFC after: 1 week	2016-10-28 20:15:19 +00:00
Konstantin Belousov	c39baa7480	Generalize UFS buffer pager to allow it serving other filesystems which also use buffer cache. Most important addition to the code is the handling of filesystems where the block size is less than the machine page size, which might require reading several buffers to validate single page. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-10-28 11:43:59 +00:00
Kirk McKusick	9e7e59125c	The UFS/FFS filesystem checks directory link counts when doing directory create and delete operations. If it ever finds a directory with a link count less than 2, it panics. Thus, an rm -rf that encounters a directory with a link count below 2 causes a kernel panic. The proposed fix is to return the error EINVAL rather than panicing. The effect is that the requested operation is not done, but the system continues to run. At a more convenient later time, the filesystem can be unmounted and cleaned (with fsck or journal run). Once cleaned, the operation can be rerun to successful completion. This fix takes that approach. The panic message has been converted into a uprintf(9) to provide the user with the inode number and filesystem mount point of the offending directory and EINVAL is returned for the operation. The long (three year) delay in fixing this problem occurred because the bug was misclassified when originally assigned and only this week was found during a sweep of old unresolved bug reports. PR: 180894 Reviewed by: kib MFC after: 2 weeks	2016-10-26 20:28:23 +00:00
Marcel Moolenaar	d67ec19c3e	Include <sys/types.h> explicitly instead of depending on that header being included by <sys/param.h>. When compiled as part of makefs(8) and on macOS or Linux, <sys/param.h> is not our own.	2016-10-24 18:12:57 +00:00
Konstantin Belousov	895219834e	Add FFS pager, which uses buffer cache read operation to validate pages. See the comments for more detailed description of the algorithm. The pager is used unconditionally when the block size of the underlying device is larger than the machine page size, since local vnode pager cannot handle the configuration [1]. Otherwise, the vfs.ffs.use_buf_pager sysctl allows to switch to the local pager. Measurements demonstrated no regression in the ever-important buildworld benchmark, and small (~5%) throughput improvements in the special microbenchmark configuration for dbench over swap-backed md(4). Code can be generalized and reused for other filesystems which use buffer cache. Reported by: Anton Yuzhaninov <citrin@citrin.ru> [1] Tested by: pho Benchmarked by: mjg, pho Reviewed by: alc, markj, mckusick (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D8198	2016-10-19 11:09:29 +00:00
Mateusz Guzik	8660b707ff	vfs: remove the __bo_vnode field from struct vnode The pointer can be obtained using __containerof instead. Reviewed by: kib	2016-09-30 17:11:03 +00:00
Konstantin Belousov	bf9c87c813	Be more strict when selecting between snapshot/regular mount. Reclaimed vnode type is VBAD, so succesful comparision like devvp->v_type != VREG does not imply that the devvp references snapshot, it might be due to a reclaimed vnode. Explicitely check the vnode type. In the the most important case of ffs_blkfree(), the devfs vnode is locked and its type is stable. In other cases, if the vnode is reclaimed right after the check, hopefully the buffer methods return right error codes. Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-09-19 15:58:33 +00:00
Konstantin Belousov	6bd8ddcf0c	Fix libprocstat build after r305902. - Use _Bool to not require userspace to include stdbool.h. - Make extattr.h usable without vnode_if.h. - Follow i_ump to get cdev pointer. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-09-17 18:14:31 +00:00
Konstantin Belousov	e1db68971e	Reduce size of ufs inode. Remove redunand i_dev and i_fs pointers, which are available as ip->i_ump->um_dev and ip->i_ump->um_fs, and reorder members by size to reduce padding. To compensate added derefences, the most often i_ump access to differentiate between UFS1 and UFS2 dinode layout is removed, by addition of the new i_flag IN_UFS2. Overall, this actually reduces the amount of memory dereferences. On 64bit machine, original struct inode size is 176, reduced to 152 bytes with the change. Tested by: pho (previous version) Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-09-17 16:47:34 +00:00
Bruce Evans	0c01bcb9ff	Sprinkle DOINGASYNC() checks so as to do delayed writes for async mounts in almost all cases instead of in most cases. Don't override DOINGASYNC() by any condition except IO_SYNC. Fix previous sprinking of DOINGASYNC() checks. Don't override IO_SYNC by DOINGASYNC(). In ffs_write() and ffs_extwrite(), there were intentional overrides that just broke O_SYNC of data. In ffs_truncate(), there are 5 calls to ffs_update(), 4 with apparently-unintentional overrides and 1 without; this had no effect due to the main async mount hack descibed below. Fix 1 place in ffs_truncate() where the caller's IO_ASYNC was overridden for the soft updates case too (to do a delayed write instead of a sync write). This is supposed to be the only change that affects anything except async mounts. In ffs_update(), remove the 19 year old efficiency hack of ignoring the waitfor flag for async mounts, so that fsync() almost works for async mounts. All callers are supposed to be fixed to not ask for a sync update unless they are for fsync() or [I]O_SYNC operations. fsync() now almost works for async mounts. It used to sync the data but not the most important metdata (the inode). It still doesn't sync associated directories. This gave 10-20% fewer writes for my makeworld benchmark with async mounted tmp and obj directories from an already small number. Style fixes: - in ffs_balloc.c, remove rotted quadruplicated comments about the simplest part of the DOING() decisions and rearrange the nearly- quadruplicated code to be more nearly so. - in ufs_vnops.c, use a consistent style with less negative logic and no manual "optimization" of \|\| to \| in DOING() expressions. Reviewed by: kib (previous version)	2016-09-08 17:40:40 +00:00
Konstantin Belousov	63876b3ba2	On rename, do not perform truncation of dirhash if the vnode truncation failed. Doing so resulted in inconsistent state of the ufs dirhash with regard to the actual directory inode state, and could lead to spurious ENOENT errors for lookups of existing files in production kernels, or assertion failures in the debugging kernels. Change the logic of calling ufsdirhash_dirtrunc() to be same as in ufs_direnter(). Execute UFS_TRUNCATE() first, log error, and only do dirtrunc() if UFS_TRUNCATE() succeeded. Note that the problem was exacerbated by the bug in the flush_newblk_dep() function (see r305599), which caused in the spurios errors from ffs_sync() and then ffs_truncate(). In collaboration with: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-09-08 12:09:34 +00:00
Konstantin Belousov	7b05b8a29c	Do not leak transient ENOLCK error from flush_newblk_dep() loop. The buffer lock is retried on failed LK_SLEEPFAIL attempt, and error from the failed attempt is irrelevant. But since there is path after retry which does not clear error, it is possible to return spurious error from the function. The issue resulted in a spurious failure of softdep_sync_buf(), causing further spurious failure of ffs_sync(). In collaboration with: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-09-08 12:08:54 +00:00
Konstantin Belousov	76db05eb14	When logging unlikely UFS_TRUNCATE() failure in ufs_direnter(), include error code. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-09-08 12:08:08 +00:00
Konstantin Belousov	ea16af59a1	When externding directory inode in ufs_direnter(), adjust i_endoff. This change is formally not needed, since i_endoff not used in all code paths after the call to ufs_direnter(), and i_endoff is recalculated by the next lookup. But having the value correct makes the reasoning about code simpler. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-09-08 12:07:25 +00:00
Konstantin Belousov	e599d951e3	In dqsync(), when called from quotactl(), um_quotas entry might appear cleared since nothing prevents completion of the parallel quotaoff. There is nothing to sync in this case, and no reason to panic. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-09-08 12:06:43 +00:00
Konstantin Belousov	60f1c000f3	In softdep_prealloc(), return early not only for snapshots, but for the quota files as well. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-09-08 12:05:13 +00:00

1 2 3 4 5 ...

2053 Commits