freebsd-dev

Author	SHA1	Message	Date
Konstantin Belousov	06f2918ab8	ufs_direnter: directory truncation does not need special case for rename In ufs_rename case, tdvp is locked from the place where ufs_direnter() is done till VOP_VPUT_PAIR(), which means that we no longer need to specially handle rename in ufs_direnter(). Truncation, if possible, is done in the same way in ffs_vput_pair() both for rename and other VOPs calling ufs_direnter(). Remove isrename argument and set IN_ENDOFF if ufs_direnter() succeeded and directory needs truncation. In ffs_vput_pair(), stop verifying the condition that directory needs truncation when IN_ENDOFF is set, instead assert that the condition is true. Suggested by: mckusick Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-02-12 03:02:21 +02:00
Konstantin Belousov	038fe6e089	ufs_rename: use VOP_VPUT_PAIR and rely on directory sync/truncation there Suggested by: mckusick Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-02-12 03:02:21 +02:00
Konstantin Belousov	74a3652f83	ufs_direnter: move directory truncation to ffs_vput_pair(). VOP_VPUT_PAIR() provides the hook to do the truncation right before unlock, which is required since truncation might need to fsync(), which itself might unlock the directory vnode. Set new flag IN_ENDOFF which indicates that i_endoff is valid and should be checked against inode size. Excessive size is chomped, but this operation is advisory and failure to truncate should not result in the failure of the main VOP. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-02-12 03:02:21 +02:00
Konstantin Belousov	30bfb2fa0f	ffs_vput_pair(): try harder to recover from the vnode reclaim In particular, if unlock_vp is false, save vp's inode number and generation. If ffs_inotovp() can re-create the vnode with the same number and generation after we finished with handling dvp, then we most likely raced with unmount, and were able to restore atomicity of open. We use FFSV_REPLACE_DOOMED there, to drop the old vnode. This additional recovery is not strictly required, but it improves the quality of the implementation. Suggested by: mckusick Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-02-12 03:02:21 +02:00
Konstantin Belousov	f2c9d038bd	FFS: implement special VOP_VPUT_PAIR(). It cleans IN_NEEDSYNC flag on dvp before returning, by applying ffs_syncvnode() until success or an error different from ERELOOKUP. IN_NEEDSYNC cleanup is required to avoid creating holes in the directories when extended into indirect block. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-02-12 03:02:21 +02:00
Konstantin Belousov	be44e98637	ffs_snapshot: use VOP_VPUT_PAIR after VOP_CREATE. If the snapshot embrio was reclaimed under us, return error outright. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-02-12 03:02:20 +02:00
Konstantin Belousov	08c2dc2841	ufs_direnter/SU: unconditionally UFS_UPDATE inode when extending directory for all kinds of async/SU mount variants. Submitted by: mckusick Reviewed by: chs Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-02-12 03:02:20 +02:00
Konstantin Belousov	1de1e2bfbf	ffs_syncvnode: only clear IN_NEEDSYNC after successfull sync If it is cleaned before the sync, other threads might see the inode without the flag set, because syncing could unlock it. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-02-12 03:02:20 +02:00
Konstantin Belousov	89fd61d955	Merge ufs_fhtovp() into ffs_inotovp(). The function alone was not used for anything but ffs_fstovp() for long time. Suggested by: mckusick Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-02-12 03:02:20 +02:00
Konstantin Belousov	5952c86c78	ffs_inotovp(): interface to convert (ino, gen) into alive vnode It generalizes the VFS_FHTOVP() interface, making it possible to fetch the inode without faking filehandle. Also it adds the ffs flags argument which allows to control ffs_vgetf() call. Requested by: mckusick Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-02-12 03:02:20 +02:00
Konstantin Belousov	f16c26b1c0	ffs: Add FFSV_REPLACE_DOOMED flag to ffs_vgetf() It specifies that caller requests a fresh non-doomed vnode. If doomed vnode is found in the hash, it should behave similarly to FFSV_REPLACE. Or, to put it differently, the flag is same as FFSV_REPLACE, but only when the found hashed vnode is doomed. Reviewed by: chs, mkcusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-02-12 03:02:20 +02:00
Konstantin Belousov	e94f2f1be3	ffs: call ufsdirhash_dirtrunc() right after setting directory size Later processing of ffs_truncate() might temporary unlock the directory vnode, causing unsychronized dirhash and inode sizes if update is postponed to UFS_TRUNCATE() callers. Reviewed by: chs, mkcusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-02-12 03:02:19 +02:00
Konstantin Belousov	bf0db19339	buf SU hooks: track buf_start() calls with B_IOSTARTED flag and only call buf_complete() if previously started. Some error paths, like CoW failire, might skip buf_start() and do bufdone(), which itself call buf_complete(). Various SU handle_written_XXX() functions check that io was started and incomplete parts of the buffer data reverted before restoring them. This is a useful invariant that B_IO_STARTED on buffer layer allows to keep instead of changing check and panic into check and return. Reported by: pho Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundations	2021-02-12 03:02:19 +02:00
Konstantin Belousov	0281f88e5d	ffs_vnops.c: Move opt_*.h includes to the top. as it is done in other places. Header files might need options defined for correct operation. Reviewed by: chs, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2021-02-12 03:02:19 +02:00
Kirk McKusick	a63eae65ff	Revert `2d4422e799`, Eliminate lock order reversal in UFS ffs_unmount(). After discussion with Chuck Silvers (chs@) we have decided that there is a better way to resolve this lock order reversal which will be committed separately. Sponsored by: Netflix	2021-01-30 00:03:37 -08:00
Mateusz Guzik	c892d60a1d	ufs: denote lack of support for lockless symlink lookup It is unclear without investigating if it can be provided without using extra memory, so for the time being just don't.	2021-01-23 15:04:43 +00:00
Kirk McKusick	79a5c790bd	Eliminate a locking panic when cleaning up UFS snapshots after a disk failure. Each vnode has an embedded lock that controls access to its contents. However vnodes describing a UFS snapshot all share a single snapshot lock to coordinate their access and update. As part of mounting a UFS filesystem with snapshots, each of the vnodes describing a snapshot has its individual lock replaced with the snapshot lock. When the filesystem is unmounted the vnode's original lock is returned replacing the snapshot lock. When a disk fails while the UFS filesystem it contains is still mounted (for example when a thumb drive is removed) UFS forcibly unmounts the filesystem. The loss of the drive causes the GEOM subsystem to orphan the provider, but the consumer remains until the filesystem has finished with the unmount. Information describing the snapshot locks was being prematurely cleared during the orphaning causing the return of the snapshot vnode's original locks to fail. The fix is to not clear the needed information prematurely. Sponsored by: Netflix	2021-01-15 16:36:42 -08:00
Kirk McKusick	173779b98f	Eliminate lock order reversal in UFS when unmounting filesystems with snapshots. Each vnode has an embedded lock that controls access to its contents. However vnodes describing a UFS snapshot all share a single snapshot lock to coordinate their access and update. As part of mounting a UFS filesystem with snapshots, each of the vnodes describing a snapshot has its individual lock replaced with the snapshot lock. When the filesystem is unmounted the vnode's original lock is returned replacing the snapshot lock. The lock order reversal happens because vnode locks must be acquired before snapshot locks. When unmounting we must lock both the snapshot lock and the vnode lock before swapping them so that the vnode will be continuously locked during the swap. For each vnode representing a snapshot, we must first acquire the snapshot lock to ensure exclusive access to it and its original lock. We then face a lock order reversal when we try to acquire the original vnode lock. The problem is eliminated by doing a non-blocking exclusive lock on the original lock which will always succeed since there are no users of that lock. Sponsored by: Netflix	2021-01-15 16:03:01 -08:00
Mateusz Guzik	6b3a9a0f3d	Convert remaining cap_rights_init users to cap_rights_init_one semantic patch: @@ expression rights, r; @@ - cap_rights_init(&rights, r) + cap_rights_init_one(&rights, r)	2021-01-12 13:16:10 +00:00
Kirk McKusick	2d4422e799	Eliminate lock order reversal in UFS ffs_unmount(). UFS uses a new "mntfs" pseudo file system which provides private device vnodes for a file system to safely access its disk device. The original device vnode is saved in um_odevvp to hold the exclusive lock on the device so that any attempts to open it for writing will fail. But it is otherwise unused and has its BO_NOBUFS flag set to enforce that file systems using mntfs vnodes do not accidentally use the original devfs vnode. When the file system is unmounted, um_odevvp is no longer needed and is released. The lock order reversal happens because device vnodes must be locked before UFS vnodes. During unmount, the root directory vnode lock is held. When when calling vrele() on um_odevvp, vrele() attempts to exclusive lock um_odevvp causing the lock order reversal. The problem is eliminated by doing a non-blocking exclusive lock on um_odevvp which will always succeed since there are no users of um_odevvp. With um_odevvp locked, it can be released using vput which does not attempt to do a blocking exclusive lock request and thus avoids the lock order reversal. Sponsored by: Netflix	2021-01-11 16:49:07 -08:00
Thomas Munro	e7347be9e3	ffs: Support O_DSYNC. Respect the new IO_DATASYNC flag when performing synchronous writes. Compared to O_SYNC, O_DSYNC lets us skip updating the inode in some cases, matching the behaviour of fdatasync(2). Reviewed by: kib Differential Review: https://reviews.freebsd.org/D25160	2021-01-08 13:15:56 +13:00
Mateusz Guzik	3e506a67bb	vfs: add v_irflag accessors Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D27793	2021-01-03 06:50:06 +00:00
Mateusz Guzik	9997aedb8f	ufs: use VNPASS when asserting on a vnode in ufs_read_pgcache	2021-01-01 03:14:11 +00:00
Mark Johnston	ace3d9475c	ffs: Avoid out-of-bounds accesses in the fs_active bitmap We use a bitmap to track which cylinder groups have changed between snapshot creation and filesystem suspension. The "legs" of the bitmap are four bytes wide (see ACTIVESET()) so we must round up the allocation size to a multiple of four bytes. I believe this bug is harmless since UMA/kmem_* will both pad the allocation and zero the full allocation. Note that malloc() does inline zeroing when the allocation size is known at compile-time. Reported by: pho (using KASAN) Reviewed by: kib, mckusick MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27731	2020-12-23 11:16:40 -05:00
Ryan Libby	93dba42c0e	ffs: quiet -Wstrict-prototypes Reviewed by: kib, markj, mckusick Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D27558	2020-12-11 22:51:57 +00:00
Kirk McKusick	bb3c01ec79	Document the BA_CLRBUF flag used in ufs and ext2fs filesystems. Suggested by: kib MFC after: 3 days Sponsored by: Netflix	2020-12-06 20:50:21 +00:00
Konstantin Belousov	2c7ada9917	ufs: handle two more cases of possible VNON vnode returned from VFS_VGET(). Reported by: kevans Reviewed by: mckusick, mjg Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27457	2020-12-06 18:09:14 +00:00
Konstantin Belousov	21a45add50	ffs: do not read full direct blocks if they are going to be overwritten. BA_CLRBUF specifies that existing context of the block will be completely overwritten by caller, so there is no reason to spend io fetching existing data. We do the same for indirect blocks. Reported by: tmunro Reviewed by: mckusick, tmunro Tested by: pho, tmunro Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D27353	2020-11-30 17:03:26 +00:00
Konstantin Belousov	cd85379104	Make MAXPHYS tunable. Bump MAXPHYS to 1M. Replace MAXPHYS by runtime variable maxphys. It is initialized from MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys. Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer cache buffers exactly to atop(maxbcachebuf) (currently it is sized to atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1. The +1 for pbufs allow several pbuf consumers, among them vmapbuf(), to use unaligned buffers still sized to maxphys, esp. when such buffers come from userspace (). Overall, we save significant amount of otherwise wasted memory in b_pages[] for buffer cache buffers, while bumping MAXPHYS to desired high value. Eliminate all direct uses of the MAXPHYS constant in kernel and driver sources, except a place which initialize maxphys. Some random (and arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted straight. Some drivers, which use MAXPHYS to size embeded structures, get private MAXPHYS-like constant; their convertion is out of scope for this work. Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs, dev/siis, where either submitted by, or based on changes by mav. Suggested by: mav () Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27225	2020-11-28 12:12:51 +00:00
Konstantin Belousov	92bcefd1d2	clear_inodedeps: handle ERELOOKUP from ffs_syncvnode(). Reported and tested by: pho Sponsored by: The FreeBSD Foundation	2020-11-26 18:03:24 +00:00
Konstantin Belousov	07ef907f6e	ffs_softdep.c: get_parent_vp(): Fix bp lock leak when inum inode was already freed. Reported by: markj, pho Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2020-11-25 17:12:21 +00:00
Konstantin Belousov	8a1509e442	Handle LoR in flush_pagedep_deps(). When operating in SU or SU+J mode, ffs_syncvnode() might need to instantiate other vnode by inode number while owning syncing vnode lock. Typically this other vnode is the parent of our vnode, but due to renames occuring right before fsync (or during fsync when we drop the syncing vnode lock, see below) it might be no longer parent. More, the called function flush_pagedep_deps() needs to lock other vnode while owning the lock for vnode which owns the buffer, for which the dependencies are flushed. This creates another instance of the same LoR as was fixed in softdep_sync(). Put the generic code for safe relocking into new SU helper get_parent_vp() and use it in flush_pagedep_deps(). The case for safe relocking of two vnodes with undefined lock order was extracted into vn helper vn_lock_pair(). Due to call sequence ffs_syncvnode()->softdep_sync_buf()->flush_pagedep_deps(), ffs_syncvnode() indicates with ERELOOKUP that passed vnode was unlocked in process, and can return ENOENT if the passed vnode reclaimed. All callers of the function were inspected. Because UFS namei lookups store auxiliary information about directory entry in in-memory directory inode, and this information is then used by UFS code that creates/removed directory entry in the actual mutating VOPs, it is critical that directory vnode lock is not dropped between lookup and VOP. For softdep_prelink(), which ensures that later link/unlink operation can proceed without overflowing the journal, calls were moved to the place where it is safe to drop processing VOP because mutations are not yet applied. Then, ERELOOKUP causes restart of the whole VFS operation (typically VFS syscall) at top level, including the re-lookup of the involved pathes. [Note that we already do the same restart for failing calls to vn_start_write(), so formally this patch does not introduce new behavior.] Similarly, unsafe calls to fsync in snapshot creation code were plugged. A possible view on these failures is that it does not make sense to continue creating snapshot if the snapshot vnode was reclaimed due to forced unmount. It is possible that relock/ERELOOKUP situation occurs in ffs_truncate() called from ufs_inactive(). In this case, dropping the vnode lock is not safe. Detect the situation with VI_DOINGINACT and reschedule inactivation by setting VI_OWEINACT. ufs_inactive() rechecks VI_OWEINACT and avoids reclaiming vnode is truncation failed this way. In ffs_truncate(), allocation of the EOF block for partial truncation is re-done after vnode is synced, since we cannot leave the buffer locked through ffs_syncvnode(). In collaboration with: pho Reviewed by: mckusick (previous version), markj Tested by: markj (syzkaller), pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26136	2020-11-14 05:30:10 +00:00
Konstantin Belousov	738ea0010b	Add ffs_inode_bwrite() helper. In collaboration with: pho Reviewed by: mckusick (previous version), markj Tested by: markj (syzkaller), pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26136	2020-11-14 05:19:59 +00:00
Konstantin Belousov	7b795aa3c0	Revert r367669 to re-commit with proper message	2020-11-14 05:19:44 +00:00
Konstantin Belousov	c0d2077f41	Add a framework that tracks exclusive vnode lock generation count for UFS. This count is memoized together with the lookup metadata in directory inode, and we assert that accesses to lookup metadata are done under the same lock generation as they were stored. Enabled under DIAGNOSTICS. UFS saves additional data for parent dirent when doing lookup (i_offset, i_count, i_endoff), and this data is used later by VOPs operating on dirents. If parent vnode exclusive lock is dropped and re-acquired between lookup and the VOP call, we corrupt directories. Framework asserts that corruption cannot occur that way, by tracking vnode lock generation counter. Updates to inode dirent members also save the counter, while users compare current and saved counters values. Also, fix a case in ufs_lookup_ino() where i_offset and i_count could be updated under shared lock. It is not a bug on its own since dvp i_offset results from such lookup cannot be used, but it causes false positive in the checker. In collaboration with: pho Reviewed by: mckusick (previous version), markj Tested by: markj (syzkaller), pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26136	2020-11-14 05:17:04 +00:00
Konstantin Belousov	61846fc4dc	Add a framework that tracks exclusive vnode lock generation count for UFS. This count is memoized together with the lookup metadata in directory inode, and we assert that accesses to lookup metadata are done under the same lock generation as they were stored. Enabled under DIAGNOSTICS. UFS saves additional data for parent dirent when doing lookup (i_offset, i_count, i_endoff), and this data is used later by VOPs operating on dirents. If parent vnode exclusive lock is dropped and re-acquired between lookup and the VOP call, we corrupt directories. Framework asserts that corruption cannot occur that way, by tracking vnode lock generation counter. Updates to inode dirent members also save the counter, while users compare current and saved counters values. Also, fix a case in ufs_lookup_ino() where i_offset and i_count could be updated under shared lock. It is not a bug on its own since dvp i_offset results from such lookup cannot be used, but it causes false positive in the checker. In collaboration with: pho Reviewed by: mckusick (previous version), markj Tested by: markj (syzkaller), pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26136	2020-11-14 05:10:39 +00:00
Mark Johnston	f44994874b	ffs: Clamp BIO_SPEEDUP length On 32-bit platforms, the computed size of the BIO_SPEEDUP requested by softdep_request_cleanup() may be negative when assigned to bp->b_bcount, which has type "long". Clamp the size to LONG_MAX. Also convert the unused g_io_speedup() to use an off_t for the magnitude of the shortage for consistency with softdep_send_speedup(). Reviewed by: chs, kib Reported by: pho Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27081	2020-11-11 13:48:07 +00:00
Conrad Meyer	e6790841f7	UFS2: Fix DoS due to corrupted extattrfile Prior versions of FreeBSD (11.x) may have produced a corrupt extattr file. (Specifically, r312416 accidentally fixed this defect by removing a strcpy.) CURRENT FreeBSD supports disk images from those prior versions of FreeBSD. Validate the internal structure as soon as we read it in from disk, to prevent these extattr files from causing invariants violations and DoS. Attempting to access the extattr portion of these files results in EINTEGRITY. At this time, the only way to repair files damaged in this way is to copy the contents to another file and move it over the original. PR: 244089 Reported by: Andrea Venturoli <ml AT netfence.it> Reviewed by: kib Discussed with: mckusick (earlier draft) Security: no Differential Revision: https://reviews.freebsd.org/D27010	2020-10-30 19:00:42 +00:00
Mateusz Guzik	4bfebc8d2c	cache: add cache_vop_mkdir and rename cache_rename to cache_vop_rename	2020-10-30 10:46:35 +00:00
Edward Tomasz Napierala	bce7ee9d41	Drop "All rights reserved" from all my stuff. This includes Foundation copyrights, approved by emaste@. It does not include files which carry other people's copyrights; if you're one of those people, feel free to make similar change. Reviewed by: emaste, imp, gbe (manpages) Differential Revision: https://reviews.freebsd.org/D26980	2020-10-28 13:46:11 +00:00
Kirk McKusick	996d40f91d	Various new check-hash checks have been added to the UFS filesystem over various major releases. Superblock check hashes were added for the 12 release and cylinder-group and inode check hashes will appear in the 13 release. When a disk with a UFS filesystem is writably mounted, the kernel clears the feature flags for anything that it does not support. For example, if a UFS disk from a 12-stable kernel is mounted on an 11-stable system, the 11-stable kernel will clear the flag in the filesystem superblock that indicates that superblock check-hashs are being maintained. Thus if the disk is later moved back to a 12-stable system, the 12-stable system will know to ignore its incorrect check-hash. If the only filesystem modification done on the earlier kernel is to run a utility such as growfs(8) that modifies the superblock but neither updates the check-hash nor clears the feature flag indicating that it does not support the check-hash, the disk will fail to mount if it is moved back to its original newer kernel. This patch moves the code that clears the filesystem feature flags from the mount code (ffs_mountfs()) to the code that reads the superblock (ffs_sbget()). As ffs_sbget() is used by the kernel mount code and is imported into libufs(3), all the filesystem utilities will now also clear these flags when they make modifications to the filesystem. As suggested by John Baldwin, fsck_ffs(8) has been changed to accept and repair bad superblock check-hashes rather than refusing to run. This change allows fsck to recover filesystems that have been impacted by utilities older than those created after this change and is a sensible thing to do in any event. Reported by: John Baldwin (jhb@) MFC after: 2 weeks Sponsored by: Netflix	2020-10-25 00:43:48 +00:00
Mateusz Guzik	25fb30bd9a	vfs: drop spurious cache_purge on rmdir The removed directory gets cache_purged which is sufficient to remove any entries related to the parent. Note only tmpfs, ufs and zfs are patched.	2020-10-23 15:50:49 +00:00
Brooks Davis	44ca4575ea	vmapbuf: don't smuggle address or length in buf Instead, add arguments to vmapbuf. Since this argument is always a pointer use a type of void * and cast to vm_offset_t in vmapbuf. (In CheriBSD we've altered vm_fault_quick_hold_pages to take a pointer and check its bounds.) In no other situtation does b_data contain a user pointer and vmapbuf replaces b_data with the actual mapping. Suggested by: jhb Reviewed by: imp, jhb Obtained from: CheriBSD MFC after: 1 week Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D26784	2020-10-21 16:00:15 +00:00
Mateusz Guzik	e9fb2bd9b8	ufs: catch up with removal of thread argument from VOP_INACTIVE	2020-10-20 09:46:20 +00:00
Konstantin Belousov	e1ef4c29a3	Do not leak B_BARRIER. Normally when a buffer with B_BARRIER is written, the flag is cleared by g_vfs_strategy() when creating bio. But in some cases FFS buffer might not reach g_vfs_strategy(), for instance when copy-on-write reports an error like ENOSPC. In this case buffer is returned to dirty queue and might be written later by other means. Among then bdwrite() reasonably asserts that B_BARRIER is not set. In fact, the only current use of B_BARRIER is for lazy inode block initialization, where write of the new inode block is fenced against cylinder group write to mark inode as used. The situation could be seen that we break dependency by updating cg without written out inode. Practically since CoW was not able to find space for a copy of inode block, for the same reason cg group block write should fail. Reported by: pho Discussed with: chs, imp, mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D26511	2020-10-08 22:41:02 +00:00
Chuck Silvers	8b88330ed6	ufs: restore uniqueness of st_dev as returned by ufs_stat() switch ufs_stat() to use the same value for st_dev as was used by the previous ufs_getattr() stat path. Submitted by: gallatin Reviewed by: mjg, imp, kib, mckusick Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26596	2020-10-05 18:17:50 +00:00
Konstantin Belousov	3c484f325e	Convert page cache read to VOP. There are several negative side-effects of not calling into VOP layer at all for page cache reads. The biggest is the missed activation of EVFILT_READ knotes. Also, it allows filesystem to make more fine grained decision to refuse read from page cache. Keep VIRF_PGREAD flag around, it is still useful for nullfs, and for asserts. Reviewed by: markj Tested by: pho Discussed with: mjg Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26346	2020-09-15 22:06:36 +00:00
Konstantin Belousov	96474d2a3f	Do not copy vp into f_data for DTYPE_VNODE files. The pointer to vnode is already stored into f_vnode, so f_data can be reused. Fix all found users of f_data for DTYPE_VNODE. Provide finit_vnode() helper to initialize file of DTYPE_VNODE type. Reviewed by: markj (previous version) Discussed with: freqlabs (openzfs chunk) Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26346	2020-09-15 21:55:21 +00:00
Mateusz Guzik	d90f2c3617	ufs: clean up empty lines in .c and .h files	2020-09-01 21:23:00 +00:00
Mateusz Guzik	39f8815070	cache: add cache_rename, a dedicated helper to use for renames While here make both tmpfs and ufs use it. No fuctional changes.	2020-08-20 10:05:46 +00:00
Mateusz Guzik	8f226f4c23	vfs: remove the always-curthread td argument from VOP_RECLAIM	2020-08-19 07:28:01 +00:00
Mateusz Guzik	7ad2a82da2	vfs: drop the error parameter from vn_isdisk, introduce vn_isdisk_error Most consumers pass NULL.	2020-08-19 02:51:17 +00:00
Konstantin Belousov	779ad2acf1	VMIO reads: enable for UFS Move v_object creation earlier, so that VIRF_PGREAD is never set if v_object is NULL. There is no much harm from instantiating v_object when later check for append-only flags disallows open. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25968	2020-08-16 21:07:19 +00:00
Mateusz Guzik	a92a971bbb	vfs: remove the thread argument from vget It was already asserted to be curthread. Semantic patch: @@ expression arg1, arg2, arg3; @@ - vget(arg1, arg2, arg3) + vget(arg1, arg2)	2020-08-16 17:18:54 +00:00
Mateusz Guzik	03337743db	vfs: clean MNTK_FPLOOKUP if MNT_UNION is set Elides checking it during lookup.	2020-08-10 11:51:21 +00:00
Mateusz Guzik	76dc5d3224	ufs: add VOP_STAT handler	2020-08-07 23:08:17 +00:00
Mateusz Guzik	d292b1940c	vfs: remove the obsolete privused argument from vaccess This brings argument count down to 6, which is passable without the stack on amd64.	2020-08-05 09:27:03 +00:00
Mateusz Guzik	e5e10c82ec	ufs: only pass LK_ADAPTIVE if LK_NODDLKTREAT is set This restores the pre-adaptive spinning state for SU which livelocks otherwise. Note this is a bug in SU. Reported by: pho	2020-08-04 23:09:15 +00:00
Mateusz Guzik	9d5a594f0b	ufs: add support for lockless lookup ACLs are not supported, meaning their presence will force the use of the old lookup. Reviewed by: kib Tested by: pho (in a patchset) Differential Revision: https://reviews.freebsd.org/D25579	2020-07-25 10:38:05 +00:00
Mateusz Guzik	31ad4050fe	lockmgr: add adaptive spinning It is very conservative. Only spinning when LK_ADAPTIVE is passed, only on exclusive lock and never when any waiters are present. buffer cache is remains not spinning. This reduces total sleep times during buildworld etc., but it does not shorten total real time (culprits are contention in the vm subsystem along with slock + upgrade which is not covered). For microbenchmarks: open3_processes -t 52 (open/close of the same file for writing) ops/s: before: 258845 after: 801638 Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D25753	2020-07-22 12:30:31 +00:00
Kirk McKusick	93440bbefd	The binary representation of the superblock (the fs structure) is written out verbatim to the disk: see ffs_sbput() in sys/ufs/ffs/ffs_subr.c. It contains a pointer to the fs_summary_info structure. This pointer value inadvertently causes garbage to be stored. It is garbage because the pointer to the fs_summary_info structure is the address the then current stack or heap. Although a mere pointer does not reveal anything useful (like a part of a private key) to an attacker, garbage output deteriorates reproducibility. This commit zeros out the pointer to the fs_summary_info structure before writing the out the superblock. Reviewed by: kib Tested by: Peter Holm PR: 246983 Sponsored by: Netflix	2020-06-19 01:04:25 +00:00
Kirk McKusick	34816cb9ae	Move the pointers stored in the superblock into a separate fs_summary_info structure. This change was originally done by the CheriBSD project as they need larger pointers that do not fit in the existing superblock. This cleanup of the superblock eases the task of the commit that immediately follows this one. Suggested by: brooks Reviewed by: kib PR: 246983 Sponsored by: Netflix	2020-06-19 01:02:53 +00:00
Chuck Silvers	d9a8abf6c2	Move all of the functions in ffs_subr.c that are only used by the ufs kernel module from that file into ffs_vfsops.c. This fixes the build for kernel configs that don't include FFS. PR: 247256 Submitted by: glebius Reviewed by: mckusick (earlier version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D25285	2020-06-17 23:39:52 +00:00
Rick Macklem	1f7104d720	Fix export_args ex_flags field so that is 64bits, the same as mnt_flags. Since mnt_flags was upgraded to 64bits there has been a quirk in "struct export_args", since it hold a copy of mnt_flags in ex_flags, which is an "int" (32bits). This happens to currently work, since all the flag bits used in ex_flags are defined in the low order 32bits. However, new export flags cannot be defined. Also, ex_anon is a "struct xucred", which limits it to 16 additional groups. This patch revises "struct export_args" to make ex_flags 64bits and replaces ex_anon with ex_uid, ex_ngroups and ex_groups (which points to a groups list, so it can be malloc'd up to NGROUPS in size. This requires that the VFS_CHECKEXP() arguments change, so I also modified the last "secflavors" argument to be an array pointer, so that the secflavors could be copied in VFS_CHECKEXP() while the export entry is locked. (Without this patch VFS_CHECKEXP() returns a pointer to the secflavors array and then it is used after being unlocked, which is potentially a problem if the exports entry is changed. In practice this does not occur when mountd is run with "-S", but I think it is worth fixing.) This patch also deleted the vfs_oexport_conv() function, since do_mount_update() does the conversion, as required by the old vfs_cmount() calls. Reviewed by: kib, freqlabs Relnotes: yes Differential Revision: https://reviews.freebsd.org/D25088	2020-06-14 00:10:18 +00:00
Kirk McKusick	513274c79c	Clear the IN_SIZEMOD and IN_IBLKDATA flags only when doing a synchronous inode update. The IN_SIZEMOD and IN_IBLKDATA flags indicate changes to the file size and block pointer fields in the inode. When these fields have been changed, the fsync() and fsyncdata() system calls must write the inode to ensure their semantics that the file is on stable store. The IN_SIZEMOD and IN_IBLKDATA flags cannot be cleared until a synchronous write of the inode is done. If they are cleared on an asynchronous write, then the inode may not yet have been written to the disk when an fsync() or fsyncdata() call is done. Absent these flags, these calls would not know that they needed to write the inode. Thus, these flags only can be cleared on synchronous writes of the inode. Since the inode will be locked for the duration of the I/O that writes it to disk, no fsync() or fsyncdata() will be able to run before the on-disk inode is complete. Reviewed by: kib MFC with: -r361785 Differential revision: https://reviews.freebsd.org/D25072	2020-06-06 20:17:56 +00:00
Kirk McKusick	52488b5148	Further evaluation of the POSIX spec for fdatasync() shows that it requires that new data on growing files be accessible. Thus, the the fsyncdata() system call must update the on-disk inode when the size of the file has changed. This commit adds another inode update flag, IN_SIZEMOD, that gets set any time that the file size changes. If either the IN_IBLKDATA or the IN_SIZEMOD flag is set when fdatasync() is called, the associated inode is synchronously written to disk. We could have overloaded the IN_IBLKDATA flag to also track size changes since the only (current) use case for these flags are for fsyncdata(), but it does seem useful for possible future uses to separately track the file size changes and the inode block pointer changes. Reviewed by: kib MFC with: -r361785 Differential revision: https://reviews.freebsd.org/D25072	2020-06-05 01:00:55 +00:00
Stefan Eßer	23e84cf153	Fix obvious typo: IN_BLKDATA should be IN_IBLKDATA	2020-06-04 19:54:25 +00:00
Kirk McKusick	30296c428a	Two additional places that need to identify IN_IBLKDATA. Reviewed by: kib MFC with: -r361785 Differential Revision: https://reviews.freebsd.org/D25072	2020-06-04 18:35:21 +00:00
Konstantin Belousov	7428630b75	UFS: write inode block for fdatasync(2) if pointers in inode where allocated The fdatasync() description in POSIX specifies that all I/O operations shall be completed as defined for synchronized I/O data integrity completion. and then the explanation of Synchronized I/O Data Integrity Completion says The write is complete only when the data specified in the write request is successfully transferred and all file system information required to retrieve the data is successfully transferred. For UFS this means that all pointers must be on disk. Indirect pointers already contribute to the list of dirty data blocks, so only direct blocks and root pointers to indirect blocks, both of which reside in the inode block, should be taken care of. In ffs_balloc(), mark the inode with the new flag IN_IBLKDATA that specifies that ffs_syncvnode(DATA_ONLY) needs a call to ffs_update() to flush the inode block. Reviewed by: mckusick Discussed with: tmunro Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D25072	2020-06-04 12:23:15 +00:00
Chuck Silvers	d79ff54b5c	This commit enables a UFS filesystem to do a forcible unmount when the underlying media fails or becomes inaccessible. For example when a USB flash memory card hosting a UFS filesystem is unplugged. The strategy for handling disk I/O errors when soft updates are enabled is to stop writing to the disk of the affected file system but continue to accept I/O requests and report that all future writes by the file system to that disk actually succeed. Then initiate an asynchronous forced unmount of the affected file system. There are two cases for disk I/O errors: - ENXIO, which means that this disk is gone and the lower layers of the storage stack already guarantee that no future I/O to this disk will succeed. - EIO (or most other errors), which means that this particular I/O request has failed but subsequent I/O requests to this disk might still succeed. For ENXIO, we can just clear the error and continue, because we know that the file system cannot affect the on-disk state after we see this error. For EIO or other errors, we arrange for the geom_vfs layer to reject all future I/O requests with ENXIO just like is done when the geom_vfs is orphaned. In both cases, the file system code can just clear the error and proceed with the forcible unmount. This new treatment of I/O errors is needed for writes of any buffer that is involved in a dependency. Most dependencies are described by a structure attached to the buffer's b_dep field. But some are created and processed as a result of the completion of the dependencies attached to the buffer. Clearing of some dependencies require a read. For example if there is a dependency that requires an inode to be written, the disk block containing that inode must be read, the updated inode copied into place in that buffer, and the buffer then written back to disk. Often the needed buffer is already in memory and can be used. But if it needs to be read from the disk, the read will fail, so we fabricate a buffer full of zeroes and pretend that the read succeeded. This zero'ed buffer can be updated and written back to disk. The only case where a buffer full of zeros causes the code to do the wrong thing is when reading an inode buffer containing an inode that still has an inode dependency in memory that will reinitialize the effective link count (i_effnlink) based on the actual link count (i_nlink) that we read. To handle this case we now store the i_nlink value that we wrote in the inode dependency so that it can be restored into the zero'ed buffer thus keeping the tracking of the inode link count consistent. Because applications depend on knowing when an attempt to write their data to stable storage has failed, the fsync(2) and msync(2) system calls need to return errors if data fails to be written to stable storage. So these operations return ENXIO for every call made on files in a file system where we have otherwise been ignoring I/O errors. Coauthered by: mckusick Reviewed by: kib Tested by: Peter Holm Approved by: mckusick (mentor) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24088	2020-05-25 23:47:31 +00:00
John Baldwin	71d11ee322	Update name of description of vfs.ffs.setsize in comment. Previously it used the name 'adjsize' instead of 'setsize'.	2020-05-22 17:23:43 +00:00
John Baldwin	f2620e9ceb	Retire two unused background fsck sysctls. These two sysctls were added to support UFS softupdates journalling with snapshots. However, the changes to fsck to use them were never committed and there have never been any in-tree uses of these sysctls. More details from Kirk: When journalling got added to soft updates, its journal rollback freed blocks that it thought were no longer in use. But it does not take snapshots into account (i.e., if a snapshot is still using it, then it cannot be freed). So I added the needed logic to fsck by having the free go through the kernel's blkfree code so it could grab blocks that were still needed by snapshots. That is done using the setbufoutput hack. I never got that code working reliably, so it is still sitting in my work directory. Which also explains why you still cannot take snapshots on filesystems running with journalling... In looking over my use of this feature, and in particular the troubles I was having with it, I conclude that it may be better to extract the code from the kernel that handles freeing blocks claimed by snapshots and putting it into fsck directly. My original intent was that it is complex and at the time changing, so only having to maintain it in one place was appealing. But at this point it has not changed in years and the hacks like setinode and setbufoutput to be able to use the kernel code is sufficiently ugly, that I am leaning towards just extracting it. Reviewed by: mckusick MFC after: 1 week Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D24484	2020-04-21 17:42:32 +00:00
Konstantin Belousov	71f2642988	ufs: apply suspension for non-forced rw unmounts. Forced rw unmounts and remounts from rw to ro already suspend filesystem, which closes races with writers instantiating new vnodes while unmount flushes the queue. Original intent of not including non-forced unmounts into this regime was to allow such unmounts to fail if writer was active, but this did not worked well. Similar change, but causing all unmount, even involving only ro filesystem, were proposed in D24088, but I believe that suspending ro is undesirable, and definitely spends CPU time. Reported by: markj Discussed with: chs, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2020-04-10 01:24:16 +00:00
Kirk McKusick	621a274820	Fixing the soft update macros in -r359612 triggered a previously hidden bug in the file truncation code. Until that bug is tracked down and fixed, revert to the old behavior. Reported by: Peter Holm Reviewed by: kib, Chuck Silvers	2020-04-09 23:51:18 +00:00
Kirk McKusick	c79f5a4328	Revert -r359612 as it can cause other panics. An updated version will be made when the issue has been resolved. Reported by: Peter Holm	2020-04-06 20:23:47 +00:00
Kirk McKusick	2baca88584	When shrinking the size of a directory it is sometimes necessary to sync it to disk before shrinking it. Complete the sync before getting the buffer for the block to be updated to do the shrink to avoid panicing with a recursive lock on one of the directory's buffers. Reviewed by: Chuck Silvers (chs) MFC after: 3 days Sponsored by: Netflix	2020-04-03 20:43:25 +00:00
Kirk McKusick	aedb9cc662	Convert DOINGSOFTDEP, MOUNTEDSOFTDEP, DOINGSUJ, and MOUNTEDSUJ to being boolean expressions so that their values are not lost when assigned to `bool' or `int' variables. Reviewed by: Chuck Silvers (chs) MFC after: 3 days Sponsored by: Netflix	2020-04-03 20:30:45 +00:00
Konstantin Belousov	abfdf76791	VOP_GETPAGES_ASYNC(): consistently call iodone() callback in case of error. Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038	2020-03-30 21:44:30 +00:00
Kirk McKusick	95ca762da8	When mounting a UFS filesystem, return EINTEGRITY rather than EIO when a superblock check-hash error is detected. This change clarifies a mount that failed due to media hardware failures (EIO) from a mount that failed due to media errors (EINTEGRITY) that can be corrected by running fsck(8). Sponsored by: Netflix	2020-03-11 21:00:40 +00:00
Chuck Silvers	69b3fdfa0b	Use the devfs vnode rather than the mntfs vnode for permissions checks. I missed this one in r358714. Reported by: pho Reviewed by: mckusick Approved by: imp (mentor) Sponsored by: Netflix	2020-03-09 15:55:13 +00:00
Mateusz Guzik	d2222aa0e9	fd: use smr for managing struct pwd This has a side effect of eliminating filedesc slock/sunlock during path lookup, which in turn removes contention vs concurrent modifications to the fd table. Reviewed by: markj, kib Differential Revision: https://reviews.freebsd.org/D23889	2020-03-08 00:23:36 +00:00
Chuck Silvers	f15ccf8836	Add a new "mntfs" pseudo file system which provides private device vnodes for file systems to safely access their disk devices, and adapt FFS to use it. Also add a new BO_NOBUFS flag to allow enforcing that file systems using mntfs vnodes do not accidentally use the original devfs vnode to create buffers. Reviewed by: kib, mckusick Approved by: imp (mentor) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D23787	2020-03-06 18:41:37 +00:00
Mateusz Guzik	8d03b99b9d	fd: move vnodes out of filedesc into a dedicated structure The new structure is copy-on-write. With the assumption that path lookups are significantly more frequent than chdirs and chrooting this is a win. This provides stable root and jail root vnodes without the need to reference them on lookup, which in turn means less work on globally shared structures. Note this also happens to fix a bug where jail vnode was never referenced, meaning subsequent access on lookup could run into use-after-free. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23884	2020-03-01 21:53:46 +00:00
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Kirk McKusick	98b6844690	Additional KASSERTs to ensure the consistency of the soft updates indirdep structure. No functional change. Tested by: Peter Holm (as part of a larger patch) Sponsored by: Netflix	2020-02-18 23:56:23 +00:00
Scott Long	1353215314	Add rudamentary support for UFS to probe whether a block device supports the BIO_SPEEDUP command. Add complimentary support to the CAM periphs that support it. This is a redo of r357710.	2020-02-16 23:10:59 +00:00
Mateusz Guzik	4d51e175f9	ufs: use faster lockgmr entry points in ffs_lock	2020-02-15 21:48:48 +00:00
Scott Long	85eb41f751	Revert r357710 and 357711 until they can be debugged	2020-02-10 14:27:28 +00:00
Scott Long	9ce150463c	Missed a file in r357710, add it here.	2020-02-10 00:26:41 +00:00
Scott Long	7d99bda79e	Add rudamentary support for UFS to probe whether a block device supports the BIO_SPEEDUP command. Add complimentary support to the CAM periphs that support it.	2020-02-10 00:23:20 +00:00
Chuck Silvers	62612737d6	With INVARIANTS, track all softdep dependency structures centrally so that we can find them in dumps. Approved by: mckusick (mentor) Sponsored by: Netflix	2020-02-03 17:47:14 +00:00
Mateusz Guzik	f1fa1ba3d0	Fix up various vnode-related asserts which did not dump the used vnode	2020-02-03 14:25:32 +00:00
Mateusz Guzik	643656cfaf	vfs: replace VOP_MARKATIME with VOP_MMAPPED The routine is only provided by ufs and is only used on mmap and exec. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23422	2020-02-01 06:46:55 +00:00
Mateusz Guzik	0a09292188	ufs: drop ufs_markatime from ufs_fifoops The routine is only called on mmap and exec, both of which are invalid for this type. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23421	2020-02-01 06:41:44 +00:00
Mateusz Guzik	901b05fbd2	ufs: add the missing vn_need_pageq_flush call to ufs_need_inactive	2020-01-30 05:37:35 +00:00
Mateusz Guzik	6c44a3e019	ufs: add vgone calls for unconstructed vnodes in the error path This mostly eliminates the requirement that vput never unlocks the vnode before calling VOP_INACTIVE. Note it may still be present for other filesystems. See r356126 for an example bug. Note vput stopped doing early unlock in r357070 thus this change does not affect correctness as it is. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23215	2020-01-26 00:38:06 +00:00
Mateusz Guzik	d93762b94d	vfs: stop handling VI_OWEINACT in vget vget is almost always called with LK_SHARED, meaning the flag (if present) is almost guaranteed to get cleared. Stop handling it in the first place and instead let the thread which wanted to do inactive handle the bumepd usecount. Reviewed by: jeff Tested by: pho Differential Revision: https://reviews.freebsd.org/D23184	2020-01-24 07:45:59 +00:00
Warner Losh	38b37b93d4	We only want to send the speedup to the lower layers when there's a shortage. Only send a speedup when there's a shortage. While this is a little racy, lost races aren't a big deal for this function. If there's a shorage just popping up after we check these values, then we'll catch it next time. If there's a shortage that's just clearing up, we may do some work at the lower layers a little sooner than we otherwise would have. Sicne shortages are relatively rare events, both races are acceptable. Reviewed by: chs Differential Revision: https://reviews.freebsd.org/D23182	2020-01-17 01:16:23 +00:00
Warner Losh	3cf5dd8401	Use buf to send speedup It turns out there's a problem with using g_io to send the speedup. It leads to a race when there's a resource shortage when a disk fails. Instead, send BIO_SPEEDUP via struct buf. This is pretty straight forward, except we need to transfer the bio_flags from b_ioflags for BIO_SPEEDUP commands in g_vfs_strategy. Reviewed by: kirk, chs Differential Revision: https://reviews.freebsd.org/D23117	2020-01-17 01:16:19 +00:00
Kirk McKusick	0297c1384a	When sync'ing a mount point, the mount point's vnodes were scanned twice. Once to update the changed inodes, and a second time to update changed quota information. This change merges these two scans into a single scan which does both inode and quota updates. MFC after: 7 days	2020-01-14 22:27:46 +00:00
Jeff Roberson	815b74866a	Fix a long standing bug in journaled soft-updates. The dirrem structure needs to handle file removal, directory removal, file move, directory move, etc. The code in handle_workitem_remove() needs to propagate any completed journal entries to the write that will render the change stable. In the case of a moved directory this means the new parent. However, for an overwrite that frees a directory (DIRCHG) we must move the jsegdep to the removed inode to be released when it is stable in the cg bitmap or the unlinked inode list. This case was previously unhandled and caused a panic. Reported by: mckusick, pho Reviewed by: mckusick Tested by: pho	2020-01-14 02:00:24 +00:00
Mateusz Guzik	f1fcaffd8e	ufs: relax an overzealous assert added in r356671 Part of i_flag can persist across a drop to hold count of 0, at which point the vnode is taken off the lazy list. Then whoever locks and unlocks the vnode can trip on the assert. This trips over kyua running a test untarring character devices to ufs. Reported by: lwhsu	2020-01-13 14:33:51 +00:00
Mateusz Guzik	cc3593fbd9	vfs: rework vnode list management The current notion of an active vnode is eliminated. Vnodes transition between 0<->1 hold counts all the time and the associated traversal between different lists induces significant scalability problems in certain workloads. Introduce a global list containing all allocated vnodes. They get unlinked only when UMA reclaims memory and are only requeued when hold count reaches 0. Sample result from an incremental make -s -j 104 bzImage on tmpfs: stock: 118.55s user 3649.73s system 7479% cpu 50.382 total patched: 122.38s user 1780.45s system 6242% cpu 30.480 total Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22997	2020-01-13 02:37:25 +00:00
Mateusz Guzik	80663cadb8	ufs: use lazy list instead of active list for syncer Quota code is temporarily regressed to do a full vnode scan. Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22996	2020-01-13 02:35:15 +00:00
Mateusz Guzik	ac4ec14188	ufs: add a setter for inode i_flag field This will be used later to add vnodes to the lazy list. Reviewed by: kib (previous version), jeff Tested by: pho (in a larger patch) Differential Revision: https://reviews.freebsd.org/D22994	2020-01-13 02:31:51 +00:00
Kirk McKusick	27a6257130	When a read error occurs while fetching a directory block to delete or rename an entry in it, properly reset the link count of the inode associated with the entry that was to have been changed. Tested by: Peter Holm MFC after: 7 days	2020-01-11 03:18:47 +00:00
Mateusz Guzik	8dbc63520c	vfs: drop thread argument from vinactive	2020-01-05 00:59:47 +00:00
Mateusz Guzik	b249ce48ea	vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427	2020-01-03 22:29:58 +00:00
Konstantin Belousov	4085590d39	ufs: do not leave non-reclaimed vnodes with zero i_mode around. After a recent change, vput() relocks even the exclusively locked vnode before inactivating it. Before that, UFS could safely instantiate a vnode for cleared inode, then the last vput() after ffs_vgetf() noted that ip->i_mode == 0 and recycled. Now, it is possible for other threads to note the half-constructed vnode, e.g. to insert it into hash, which makes other threads to use it despite mode is zero, before inactivation and reclaim. Handle the found cases in SU code, by explicitly doing reclaim. Assert that other places get fully constructed inode from ffs_vgetf(), which cannot be cleared before dependencies are resolved. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-12-27 16:43:34 +00:00
Warner Losh	56e4d45895	Drop a sleepable lock when we plan on sleeping g_io_speedup waits for the completion of the speedup request before proceeding using biowait(), but check_clear_deps is called with the softdeps lock held (which is non-sleepable). It's safe to drop this lock around the call to speedup, so do that. Submitted by: Peter Holm Reviewed by: kib@	2019-12-18 16:01:15 +00:00
Warner Losh	22dd705f1f	Add BIO_SPEEDUP signalling to UFS When we have a resource shortage in UFS, send down a BIO_SPEEDUP to give the CAM I/O scheduler a heads up that we have a resource shortage and that it should bias its decisions knowing that. Reviewed by: kirk, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D18351	2019-12-17 00:13:40 +00:00
Mateusz Guzik	6fa079fc3f	vfs: flatten vop vectors This eliminates the following loop from all VOP calls: while(vop != NULL && \ vop->vop_spare2 == NULL && vop->vop_bypass == NULL) vop = vop->vop_default; Reviewed by: jeff Tesetd by: pho Differential Revision: https://reviews.freebsd.org/D22738	2019-12-16 00:06:22 +00:00
Konstantin Belousov	332f313c45	UFS: implement VOP_INACTIVE() The checks literally repeat conditions that make ufs_inactive() to take some actions. Reviewed by: jeff Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D22616	2019-12-10 14:07:05 +00:00
Mateusz Guzik	abd80ddb94	vfs: introduce v_irflag and make v_type smaller The current vnode layout is not smp-friendly by having frequently read data avoidably sharing cachelines with very frequently modified fields. In particular v_iflag inspected for VI_DOOMED can be found in the same line with v_usecount. Instead make it available in the same cacheline as the v_op, v_data and v_type which all get read all the time. v_type is avoidably 4 bytes while the necessary data will easily fit in 1. Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new flag field with a new value: VIRF_DOOMED. Reviewed by: kib, jeff Differential Revision: https://reviews.freebsd.org/D22715	2019-12-08 21:30:04 +00:00
Kirk McKusick	d00066a5f9	Currently the breadn_flags() and getblkx() interfaces are passed the vnode, logical block number, and size of data block that is being requested. They then use the VOP_BMAP function to calculate the mapping from logical block number to physical block number from which to access the data. This change expands the interface to also pass the physical block number in cases where the VOP_MAP function may no longer work, for example when a file is being truncated. No functional change. Reviewed by: kib Tested by: Peter Holm Sponsored by: Netflix	2019-12-03 23:07:09 +00:00
Chuck Silvers	2ac044e6bc	As part of creating a snapshot, set fs->fs_fmod to 0 in the snapshot image because nothing ever changes this field for read-only mounts and we want to verify that it is still 0 when we unmount. Reviewed by: mckusick Approved by: mckusick (mentor) Sponsored by: Netflix	2019-11-28 00:37:43 +00:00
Chuck Silvers	2bcfb938f4	In ffs_freefile(), use a separate variable to hold the inode number within the cg rather than reusuing "ino" for this purpose. This reduces the diff for an upcoming change that improves handling of I/O errors. No functional change. Reviewed by: mckusick Approved by: mckusick (mentor) Sponsored by: Netflix	2019-11-25 19:31:38 +00:00
Kirk McKusick	486b9a61f7	Add some KASSERTs. Reacquire a mutex after a kernel printf rather than holding it during the printf. White space cleanup. Sponsored by: Netflix	2019-11-20 01:10:01 +00:00
Chuck Silvers	b0cf923749	In ufs_dir_dd_ino(), always initialize *dd_vp since the caller expects it. Reviewed by: kib, mckusick Approved by: imp (mentor) Sponsored by: Netflix	2019-11-12 00:32:33 +00:00
Jeff Roberson	67d0e29304	Replace OBJ_MIGHTBEDIRTY with a system using atomics. Remove the TMPFS_DIRTY flag and use the same system. This enables further fault locking improvements by allowing more faults to proceed with a shared lock. Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D22116	2019-10-29 21:06:34 +00:00
Kirk McKusick	1a75045196	After the unlink() of one name of a file with multiple links, a stat() of one of the remaining names of the file does not show an updated ctime (inode modification time) until several seconds after the unlink() completes. The problem only occurs when the filesystem is running with soft updates enabled. When running with soft updates, the ctime is not updated until the soft updates background process has settled all the needed I/O operations. This commit causes the ctime to be updated immediately during the unlink(). A side effect of this change is that the ctime is updated again when soft updates has finished its processing because that is the time that is correct from the perspective of programs that look at the disk (like dump). This change does not cause any extra I/O to be done, it just ensures that stat() updates the ctime before handing it back. PR: 241373 Reported by: Alan Somers Tested by: Alan Somers MFC after: 3 days Sponsored by: Netflix	2019-10-24 21:28:37 +00:00
Kirk McKusick	7792f70137	Soft updates needs to keep an on-disk linked list of inodes that have been unlinked, but are still referenced by open file descriptors. These inodes cannot be freed until the final file descriptor reference has been closed. If the system crashes while they are still being referenced, these inodes and their referenced blocks need to be freed by fsck. By having them on a linked list with the head pointer in the superblock, fsck can quickly find and process them rather than having to check every inode in the filesystem to see if it is unreferenced. When updating the head pointer of this list of unlinked inodes in the superblock, the superblock check-hash was not getting updated. If the system crashed with the incorrect superblock check-hash, the superblock would appear to be corrupted. This patch ensures that the superblock check-hash is updated when updating the head pointer of the unlinked inodes list. There is no need to MFC as superblock check hashes first appeared in 13.0. Tested by: Peter Holm Sponsored by: Netflix	2019-10-24 19:47:18 +00:00
Mark Johnston	c456a0a1a6	Abbreviate softdep lock names. The softdep lock names were unusually long and tended to stick out in lock profiling reports. Abbreviate them and make them consistent with our conventional style for lock names. Reviewed by: mckusick MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22042	2019-10-18 17:01:27 +00:00
Mateusz Guzik	e35cd9e38f	ufs: add root vnode caching See r353150. Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21646	2019-10-06 22:18:03 +00:00
Eric van Gyzen	fdd888dee3	Add CTLFLAG_STATS to several debug.softdep sysctl OIDs Refer to r353111. MFC after: 2 weeks Sponsored by: Dell EMC Isilon	2019-10-04 21:44:52 +00:00
Kirk McKusick	44d37182ce	Update ffs_getcg() function to accept a flags parameter to be passed to breadn_flags() in preparation for later need when doing forcible unmount when disk dies or is removed. No functional change. Sponsored by: Netflix	2019-10-04 05:28:36 +00:00
Mateusz Guzik	4cace859c2	vfs: convert struct mount counters to per-cpu There are 3 counters modified all the time in this structure - one for keeping the structure alive, one for preventing unmount and one for tracking active writers. Exact values of these counters are very rarely needed, which makes them a prime candidate for conversion to a per-cpu scheme, resulting in much better performance. Sample benchmark performing fstatfs (modifying 2 out of 3 counters) on a 104-way 2 socket Skylake system: before: 852393 ops/s after: 76682077 ops/s Reviewed by: kib, jeff Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21637	2019-09-16 21:37:47 +00:00
Mateusz Guzik	e87f3f72f1	vfs: manage mnt_writeopcount with atomics See r352424. Reviewed by: kib, jeff Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21575	2019-09-16 21:33:16 +00:00
Konstantin Belousov	d89ac450a7	Remove some unneeded vfs_busy() calls in SU code. When softdep_fsync() is running, a caller must already started write for the mount point. Since unmount or remount to ro suspends mount point, it cannot run in parallel with softdep_fsync(), which makes vfs_busy() call there not needed. Doing blocking vfs_busy() there effectively causes lock order reversal between vn_start_write() and setting MNTK_UNMOUNT, because vfs_busy(mp, 0) sleeps waiting for MNTK_UNMOUNT becoming clear, while unmount sets the flag and starts the suspension. Note that all other uses of vfs_busy() in SU code are non-blocking. Reported by: chs by mckusick Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-09-09 11:22:38 +00:00
Konstantin Belousov	f923be6b9a	Properly check for writers when fetching quotas for writeable vnodes in UFS quotaon(). Reviewed by: markj MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21560	2019-09-07 15:57:23 +00:00
Conrad Meyer	f3cf622523	ufs: Remove redundant brelse() after r294954 Same automation. No functional change.	2019-09-06 08:08:33 +00:00
Konstantin Belousov	6470c8d3db	Rework v_object lifecycle for vnodes. Current implementation of vnode_create_vobject() and vnode_destroy_vobject() is written so that it prepared to handle the vm object destruction for live vnode. Practically, no filesystems use this, except for some remnants that were present in UFS till today. One of the consequences of that model is that each filesystem must call vnode_destroy_vobject() in VOP_RECLAIM() or earlier, as result all of them get rid of the v_object in reclaim. Move the call to vnode_destroy_vobject() to vgonel() before VOP_RECLAIM(). This makes v_object stable: either the object is NULL, or it is valid vm object till the vnode reclamation. Remove code from vnode_create_vobject() to handle races with the parallel destruction. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21412	2019-08-29 07:50:25 +00:00
Konstantin Belousov	1604022248	UFS: stop reusing the vnode for reallocated inode. In ffs_valloc(), force reclaim existing vnode on inode reuse, instead of trying to re-initialize the same vnode for new purposes. This is done in preparation of changes to the vp->v_object lifecycle handling. A new FFSV_REPLACE flag to ffs_vgetf() directs the function to vgone(9) the vnode if found in vfs hash, instead of returning it. Reviewed by: markj, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21412	2019-08-29 07:45:23 +00:00
Konstantin Belousov	e671edac06	De-commision the MNTK_NOINSMNTQ kernel mount flag. After all the changes, its dynamic scope is same as for MNTK_UNMOUNT, but to allow the syncer vnode to be re-installed on unmount failure. But the case of syncer was already handled by using the VV_FORCEINSMQ flag for quite some time. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-08-23 19:40:10 +00:00
Kirk McKusick	5a0d467f5f	Clarify comment that describes how the FS_METACKHASH is managed. MFC after: 3 days	2019-08-13 20:56:44 +00:00
Kirk McKusick	9454b4fd78	A race condition existed between the time a UFS/FFS superblock check hash was computed and the time that the superblock was copied to a buffer to be written to disk. The result was a failed superblock check hash the next time that the superblock was read. The fix is to compute the check hash after the superblock has been copied to a buffer to be written. PR: 236504 Reported by: Peter Holm Tested by: Peter Holm Sponsored by: Netflix	2019-08-06 18:10:34 +00:00
Kirk McKusick	90381b1ca9	When updating the user or group disk quotas for the return of inodes or disk blocks, set the FORCE flag in the call to chkiq() or chkdq() since the user is always allowed to return resources and hence there is no need to check the user's credential . Reported by: Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE Reported as: FS-1-UFS-1: Denial Of Service in mount (prison_priv_check) Discussed with: kib MFC: 1 week Sponsored by: Netflix	2019-07-31 22:44:58 +00:00
Rick Macklem	b4c9955e41	Lock the vnode before calling ufs_bmap_seekdata(). r346932 replaced a call to vn_bmap_seekhole() with a call to ufs_bmap_seekdata(). Although vn_bmap_seekhole() locks the vnode, ufs_bmap_seekdata() assumes it is already locked. This patch adds locking of the vnode before the ufs_bmap_seekdata() call. If the vn_lock() call fails, it returns EBADF since that is the normal error returned when a file system is forced dismounted and is already listed as an error return in the lseek(2) man page. Discussed with: markj Reviewed by: kib	2019-07-27 01:52:34 +00:00
Kirk McKusick	fdf34aa3a5	The error reported in FS-14-UFS-3 can only happen on UFS/FFS filesystems that have block pointers that are out-of-range for their filesystem. These out-of-range block pointers are corrected by fsck(8) so are only encountered when an unchecked filesystem is mounted. A new "untrusted" flag has been added to the generic mount interface that can be set when mounting media of unknown provenance or integrity. For example, a daemon that automounts a filesystem on a flash drive when it is plugged into a system. This commit adds a test to UFS/FFS that validates all block numbers before using them. Because checking for out-of-range blocks adds unnecessary overhead to normal operation, the tests are only done when the filesystem is mounted as an "untrusted" filesystem. Reported by: Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE Reported as: FS-14-UFS-3: Out of bounds read in write-2 (ffs_alloccg) Reviewed by: kib Sponsored by: Netflix	2019-07-17 22:07:43 +00:00
Kirk McKusick	ba554157a3	Style. No change intended.	2019-07-16 23:39:39 +00:00
Kirk McKusick	1fd136ec5e	When a process attempts to allocate space on a full filesystem, a filesystem full message is sent to the offending process or the kernel log if the offending process cannot be identified. To prevent an explotion of messages, the kernel ppsratecheck() function is used to limit the messages to one per second. This revision changes the variable that tracks the rate of these messages from a systemwide limit to a per-filesystem limit by moving it from a global variable to a variable in the ufsmount structure. Suggested by: kib Reviewed by: kib Sponsored by: Netflix	2019-07-16 23:12:27 +00:00
Kirk McKusick	daba4da81d	Add a new "untrusted" option to the mount command. Its purpose is to notify the kernel that the file system is untrusted and it should use more extensive checks on the file-system's metadata before using it. This option is intended to be used when mounting file systems from untrusted media such as USB memory sticks or other externally-provided media. It will initially be used by the UFS/FFS file system, but should likely be expanded to be used by other file systems that may appear on external media like msdosfs, exfat, and ext2fs. Reviewed by: kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20786	2019-07-01 23:22:26 +00:00
Mark Johnston	6137883ff3	Remove references to splbio in ffs_softdep.c. Assert that the per-mountpoint softdep mutex is held in modified functions that do not already have this assertion. No functional change intended. Reviewed by: kib, mckusick (previous version) MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20741	2019-06-26 16:28:42 +00:00
Alan Somers	d49b446bfb	Add FIOBMAP2 ioctl This ioctl exposes VOP_BMAP information to userland. It can be used by programs like fragmentation analyzers and optimized cp implementations. But I'm using it to test fusefs's VOP_BMAP implementation. The "2" in the name distinguishes it from the similar but incompatible FIBMAP ioctls in NetBSD and Linux. FIOBMAP2 differs from FIBMAP in that it uses a 64-bit block number instead of 32-bit, and it also returns runp and runb. Reviewed by: mckusick MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20705	2019-06-20 14:13:10 +00:00
Xin LI	f89d207279	Separate kernel crc32() implementation to its own header (gsb_crc32.h) and rename the source to gsb_crc32.c. This is a prerequisite of unifying kernel zlib instances. PR: 229763 Submitted by: Yoshihiro Ota <ota at j.email.ne.jp> Differential Revision: https://reviews.freebsd.org/D20193	2019-06-17 19:49:08 +00:00
Kirk McKusick	e94828443c	Add a missing bresle() in seldom-used error return.	2019-05-28 17:31:35 +00:00
Kirk McKusick	af6aeacb3e	Convert use of UFS-specific #ifdef DEBUG to DIAGNOSTIC or INVARIANTS as appropriate. No functional change intended. Suggested-by: markj	2019-05-28 16:32:04 +00:00
Kirk McKusick	298184acb8	Add function name and line number debugging information to softupdates worklist structures to help track their movement between work lists. No functional change to the operation of soft updates intended.	2019-05-27 06:22:43 +00:00
Alan Somers	65417f5e27	Remove "struct ucred" argument from vtruncbuf vtruncbuf takes a "struct ucred" argument. AFAICT, it's been unused ever since that function was first added in r34611. Remove it. Also, remove some "struct ucred" arguments from fuse and nfs functions that were only used by vtruncbuf. Reviewed by: cem MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20377	2019-05-24 20:27:50 +00:00
Conrad Meyer	daec92844e	Include ktr.h in more compilation units Similar to r348026, exhaustive search for uses of CTRn() and cross reference ktr.h includes. Where it was obvious that an OS compat header of some kind included ktr.h indirectly, .c files were left alone. Some of these files clearly got ktr.h via header pollution in some scenarios, or tinderbox would not be passing prior to this revision, but go ahead and explicitly include it in files using it anyway. Like r348026, these CUs did not show up in tinderbox as missing the include. Reported by: peterj (arm64/mp_machdep.c) X-MFC-With: r347984 Sponsored by: Dell EMC Isilon	2019-05-21 20:38:48 +00:00
Mark Johnston	9e56947ffc	Ensure that error is initialized in ufs_bmap_seekdata(). Reported and tested by: jhibbits MFC with: r346932 Sponsored by: The FreeBSD Foundation	2019-05-05 16:57:03 +00:00
Konstantin Belousov	78022527bb	Switch to use shared vnode locks for text files during image activation. kern_execve() locks text vnode exclusive to be able to set and clear VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0 condition. The change removes VV_TEXT, replacing it with the condition v_writecount <= -1, and puts v_writecount under the vnode interlock. Each text reference decrements v_writecount. To clear the text reference when the segment is unmapped, it is recorded in the vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and v_writecount is incremented on the map entry removal The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that v_writecount does not contradict the desired change. vn_writecheck() is now racy and its use was eliminated everywhere except access. Atomic check for writeability and increment of v_writecount is performed by the VOP. vn_truncate() now increments v_writecount around VOP_SETATTR() call, lack of which is arguably a bug on its own. nullfs bypasses v_writecount to the lower vnode always, so nullfs vnode has its own v_writecount correct, and lower vnode gets all references, since object->handle is always lower vnode. On the text vnode' vm object dealloc, the v_writecount value is reset to zero, and deadfs vop_unset_text short-circuit the operation. Reclamation of lowervp always reclaims all nullfs vnodes referencing lowervp first, so no stray references are left. Reviewed by: markj, trasz Tested by: mjg, pho Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D19923	2019-05-05 11:20:43 +00:00
Kirk McKusick	44b193b09e	Zero out the file directory entry metadata to reduce disk scavenging disclosure. Submitted by: David G. Lawrence <dg@dglawrence.com> MFC after: 1 week	2019-05-04 18:00:57 +00:00
Kirk McKusick	0061238fb0	This update eliminates a kernel stack disclosure bug in UFS/FFS directory entries that is caused by uninitialized directory entry padding written to the disk. It can be viewed by any user with read access to that directory. Up to 3 bytes of kernel stack are disclosed per file entry, depending on the the amount of padding the kernel needs to pad out the entry to a 32 bit boundry. The offset in the kernel stack that is disclosed is a function of the filename size. Furthermore, if the user can create files in a directory, this 3 byte window can be expanded 3 bytes at a time to a 254 byte window with 75% of the data in that window exposed. The additional exposure is done by removing the entry, creating a new entry with a 4-byte longer name, extracting 3 more bytes by reading the directory, and repeating until a 252 byte name is created. This exploit works in part because the area of the kernel stack that is being disclosed is in an area that typically doesn't change that often (perhaps a few times a second on a lightly loaded system), and these file creates and unlinks themselves don't overwrite the area of kernel stack being disclosed. It appears that this bug originated with the creation of the Fast File System in 4.1b-BSD (Circa 1982, more than 36 years ago!), and is likely present in every Unix or Unix-like system that uses UFS/FFS. Amazingly, nobody noticed until now. This update also adds the -z flag to fsck_ffs to have it scrub the leaked information in the name padding of existing directories. It only needs to be run once on each UFS/FFS filesystem after a patched kernel is installed and running. Submitted by: David G. Lawrence <dg@dglawrence.com> Reviewed by: kib MFC after: 1 week	2019-05-03 21:54:14 +00:00
Kirk McKusick	ab2214d400	Simplify calculation of DIRECTSIZ. No functional change intended. Suggested by: kib MFC after: 1 week	2019-05-03 21:46:25 +00:00
Mark Johnston	cc2c33dfb1	Optimize lseek(SEEK_DATA) on UFS. This version fixes the problems identified in r345244. Reviewed by: kib MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19598	2019-04-29 22:05:26 +00:00
Konstantin Belousov	5ffc99e2e4	Handle races when remounting UFS volume from ro to rw. In particular, ensure that writers are not unleashed before SU structures are initialized. Also, correctly handle MNT_ASYNC before this. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-04-08 15:20:05 +00:00
Mariusz Zaborski	a1304030b8	Introduce funlinkat syscall that always us to check if we are removing the file associated with the given file descriptor. Reviewed by: kib, asomers Reviewed by: cem, jilles, brooks (they reviewed previous version) Discussed with: pjd, and many others Differential Revision: https://reviews.freebsd.org/D14567	2019-04-06 09:34:26 +00:00
Kirk McKusick	69166928c7	This is an additional and hopefully final fix for bug report 230962. This bug was introduced with the change to use softdep_bp_to_mp() in January 2018 changes -r327723 and -r327821. The softdep_bp_to_mp() function failed to include VSOCK as one of the valid cases. Although local-domain sockets do not allocate blocks in the filesystem, they will allocate blocks if they use extended attributes (such as ACLs). Thus, softdep_bp_to_mp() needs to return a non-NULL mount pointer when presented with a socket vnode so that the soft updates write complete will properly process the soft updates structures associated with the extended attribute blocks. It was the failure to process these soft updates structures, thus leaving them hanging off the buffer, which lead to the "panic: softdep_deallocate_dependencies: dangling deps" when trying to clean up the buffer after it was written. PR: 230962 Reported by: 2t8mr7kx9f@protonmail.com Reviewed by: kib Tested by: Peter Holm MFC after: 1 week Sponsored by: Netflix	2019-03-20 23:11:05 +00:00
Mark Johnston	783efeb544	Revert r345244 for now. The code which advances the block number is simplistic and is not correct when the starting offset is non-zero. Revert the change until this is fixed.	2019-03-18 05:03:55 +00:00
Mark Johnston	1a7f456a4b	Fix the gcc build (-Wstrict-prototypes) after r345244. Reported by: jenkins MFC with: r345244	2019-03-17 18:06:13 +00:00
Mark Johnston	c676692c61	Optimize lseek(SEEK_DATA) on UFS. The old implementation, at the VFS layer, would map the entire range of logical blocks between the starting offset and the first data block following that offset. With large sparse files this is very inefficient. The VFS currently doesn't provide an interface to improve upon the current implementation in a generic way. Add ufs_bmap_seekdata(), which uses the obvious algorithm of scanning indirect blocks to look for data blocks. Use it instead of vn_bmap_seekhole() to implement SEEK_DATA. Reviewed by: kib, mckusick MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19598	2019-03-17 17:34:06 +00:00
Kirk McKusick	42a5a356a8	Add KASSERT to the softdep_disk_write_complete() function in the soft dependency code to ensure that it will be able to avoid a dangling dependency. Sponsored by: Netflix	2019-03-12 00:10:31 +00:00
Kirk McKusick	3532718257	Give more complete information in INVARIANTS panic messages at end of the ffs_truncate() function. Sponsored by: Netflix	2019-03-11 23:53:56 +00:00
Kirk McKusick	a9f59cc029	Augment the UFS filesystem specific print function (called by the kernel vn_printf() routine when printing out vnodes associated with a UFS filesystem) to also include the inode's link count, effective link count, generation number, owner, group, flags, size, and for UFS2 filesystems, the extent size. Sponsored by: Netflix	2019-03-11 22:05:34 +00:00
Simon J. Gerraty	f5fdf82d82	Add _PC_ACL_* to vop_stdpathconf This avoid EINVAL from tmpfs etc. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D19512	2019-03-11 20:40:56 +00:00
Jason A. Harmening	4775b07ebd	FFS: allow sendfile(2) to work with block sizes greater than the page size Implement ffs_getpages_async(), which when possible calls the asynchronous flavor of the generic pager's getpages function. When the underlying block size is larger than the system page size, however, it will invoke the (synchronous) buffer cache pager, followed by a call to the client completion routine. This retains true asynchronous completion in the most common (block size <= page size) case, which is important for the performance of the new sendfile(2). The behavior in the larger block size case mirrors the default implementation of VOP_GETPAGES_ASYNC, which most other filesystems use anyway as they do not override the getpages_async method. PR: 235708 Reported by: pho Reviewed by: kib, glebius MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D19340	2019-02-26 04:56:10 +00:00
Kirk McKusick	ac4b20a0a7	After a crash, a file that extends into indirect blocks may end up shorter than its size resulting in a hole as its final block (which is a violation of the invarients of the UFS filesystem). Soft updates will always ensure that the file size is correct when writing inodes to disk for files that contain only direct block pointers. However soft updates does not roll back sizes for files with indirect blocks that it has set to unallocated because their contents have not yet been written to disk. Hence, the file can appear to have a hole at its end because the block pointer has been rolled back to zero when its inode was written to disk. Thus, fsck_ffs calculates the last allocated block in the file. For files that extend into indirect blocks, fsck_ffs checks for a size past the last allocated block of the file and if that is found, shortens the file to reference the last allocated block thus avoiding having it reference a hole at its end. Submitted by: Chuck Silvers <chs@netflix.com> Tested by: Chuck Silvers <chs@netflix.com> MFC after: 1 week Sponsored by: Netflix	2019-02-25 21:58:19 +00:00
Kirk McKusick	baba6af702	This bug was introduced with the change to use softdep_bp_to_mp() in January 2018 changes -r327723 and -r327821. The softdep_bp_to_mp() function failed to include VFIFO as one of the valid cases. Although fifo's do not allocate blocks in the filesystem, they will allocate blocks if they use extended attributes (such as ACLs). Thus, softdep_bp_to_mp() needs to return a non-NULL mount pointer when presented with a fifo vnode so that the soft updates write complete will properly process the soft updates structures associated with the extended attribute blocks. It was the failure to process these soft updates structures, thus leaving them hanging off the buffer, which lead to the "panic: softdep_deallocate_dependencies: dangling deps" when trying to clean up the buffer after it was written. PR: 230962 Reported by: 2t8mr7kx9f@protonmail.com Reviewed by: kib Tested by: Peter Holm MFC after: 1 week Sponsored by: Netflix	2019-01-28 21:36:45 +00:00
Kirk McKusick	6967c09c69	Expand DDB's set of printable soft dependency data structures. The set of known soft dependency data structures now includes: sd_worklist, sd_inodedep, sd_allocdirect, sd_allocindir, and sd_mkdir. DDB can also print lists of sd_allinodedeps, sd_mkdir_list, and sd_workhead. The sd_workhead script is useful for listing all the dependencies associated with a buffer, e.g. bp->b_dep. Prefix the soft dependency show names with sd_ so that they sort together when listed by DDB's "show help" and to distinguish them from other data structures printable by DDB. Sponsored by: Netflix	2019-01-26 05:35:24 +00:00
Gleb Smirnoff	756a541279	Allocate pager bufs from UMA instead of 80-ish mutex protected linked list. o In vm_pager_bufferinit() create pbuf_zone and start accounting on how many pbufs are we going to have set. In various subsystems that are going to utilize pbufs create private zones via call to pbuf_zsecond_create(). The latter calls uma_zsecond_create(), and sets a limit on created zone. After startup preallocate pbufs according to requirements of all pbuf zones. Subsystems that used to have a private limit with old allocator now have private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS cluster, FFS, swap, vnode pager. The following subsystems use shared pbuf zone: cam(4), nvme(4), physio(9), aio(4). They should have their private limits, but changing that is out of scope of this commit. o Fetch tunable value of kern.nswbuf from init_param2() and while here move NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that was holding only this option. Default values aren't touched by this commit, but they probably should be reviewed wrt to modern hardware. This change removes a tight bottleneck from sendfile(2) operation, that uses pbufs in vnode pager. Other pagers also would benefit from faster allocation. Together with: gallatin Tested by: pho	2019-01-15 01:02:16 +00:00
Kirk McKusick	751ae98144	Move ASSERT_VOP_LOCKED to top of ufs_vinit() as it should be true when the function is entered. Suggested by: kib	2018-12-30 06:03:20 +00:00
Kirk McKusick	1c521f70d4	For consistency with FFS2's fifoops2 and both versions of FFS's vnodeops make FFS1's fifoops1 use ffs_lock. Also delete ffs_reallocblks from fifoops1 which is needed only for fifoops2 because of its support for extended attributes that need to allocate blocks. Suggested by: kib	2018-12-30 05:03:41 +00:00
Kirk McKusick	c0029546f8	When loading an inode from disk, verify that its mode is valid. If invalid, return EINVAL. Note that inode check-hashes greatly reduce the chance that these errors will go undetected. Reported by: Christopher Krah <krah@protonmail.com> Reported as: FS-5-UFS-2: Denial Of Service in nmount-3 (ffs_read) Reviewed by: kib MFC after: 1 week Sponsored by: Netflix M sys/fs/ext2fs/ext2_vnops.c M sys/kern/vfs_subr.c M sys/ufs/ffs/ffs_snapshot.c M sys/ufs/ufs/ufs_vnops.c	2018-12-27 07:18:53 +00:00
Konstantin Belousov	8690d4dea3	Allocate v_object for the new snapshot vnode. The vnode is not opened, so it ends up with the malloced buffers otherwise. Reported and tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation	2018-12-23 18:54:09 +00:00
Kirk McKusick	c8f55fc4b4	Ensure that the inode check-hash is not left zeroed out in the case where the check-hash fails. Prior to the fix in -r342133 the inode with the zeroed out check-hash was written back to disk causing further confusion. Reported by: Gary Jennejohn (gj) Sponsored by: Netflix	2018-12-15 18:49:30 +00:00
Kirk McKusick	72d28f97be	Reorder ffs_verify_dinode_ckhash() so that it checks the inode check-hash before copying in the inode so that the mode and link-count are not set if the check-hash fails. This change ensures that the vnode will be properly unwound and recycled rather than being held in the cache. Initialize the file mode is zero so that if the loading of the inode fails (for example because of a check-hash failure), the vnode will be properly unwound and recycled. Reported by: Gary Jennejohn (gj) Sponsored by: Netflix	2018-12-15 18:35:46 +00:00
Kirk McKusick	6fa9bc995a	Must set ip->i_effnlink = ip->i_nlink to avoid a soft updates "panic: softdep_update_inodeblock: bad link count" when releasing a partially initialized vnode after an inode check-hash failure. Reported by: Gary Jennejohn <gljennjohn@gmail.com> Reported by: Peter Holm (pho) Sponsored by: Netflix	2018-12-15 17:58:42 +00:00
Kirk McKusick	8f829a5cf0	Continuing efforts to provide hardening of FFS. This change adds a check hash to the filesystem inodes. Access attempts to files associated with an inode with an invalid check hash will fail with EINVAL (Invalid argument). Access is reestablished after an fsck is run to find and validate the inodes with invalid check-hashes. This check avoids a class of filesystem panics related to corrupted inodes. The hash is done using crc32c. Note this check-hash is for the inode itself and not any of its indirect blocks. Check-hash validation may be extended to also cover indirect block pointers, but that will be a separate (and more costly) feature. Check hashes are added only to UFS2 and not to UFS1 as UFS1 is primarily used in embedded systems with small memories and low-powered processors which need as light-weight a filesystem as possible. Reviewed by: kib Tested by: Peter Holm Sponsored by: Netflix	2018-12-11 22:14:37 +00:00
Mateusz Guzik	cc426dd319	Remove unused argument to priv_check_cred. Patch mostly generated with cocinnelle: @@ expression E1,E2; @@ - priv_check_cred(E1,E2,0) + priv_check_cred(E1,E2) Sponsored by: The FreeBSD Foundation	2018-12-11 19:32:16 +00:00
Kirk McKusick	bdd6b77e1f	If the vfs.ffs.dotrimcons sysctl option is enabled while a file deletion is active, specifically after a call to ffs_blkrelease_start() but before the call to ffs_blkrelease_finish(), ffs_blkrelease_start() will have handed out SINGLETON_KEY rather than starting a collection sequence. Thus if we get a SINGLETON_KEY passed to ffs_blkrelease_finish(), we just return rather than trying to finish the nonexistent sequence. Reported by: Warner Losh (imp@) Sponsored by: Netflix	2018-12-06 01:04:56 +00:00
Kirk McKusick	fb14e73cb4	Normally when an attempt is made to mount a UFS/FFS filesystem whose superblock has a check-hash error, an error message noting the superblock check-hash failure is printed and the mount fails. The administrator then runs fsck to repair the filesystem and when successful, the filesystem can once again be mounted. This approach fails if the filesystem in question is a root filesystem from which you are trying to boot. Here, the loader fails when trying to access the filesystem to get the kernel to boot. So it is necessary to allow the loader to ignore the superblock check-hash error and make a best effort to read the kernel. The filesystem may be suffiently corrupted that the read attempt fails, but there is no harm in trying since the loader makes no attempt to write to the filesystem. Once the kernel is loaded and starts to run, it attempts to mount its root filesystem. Once again, failure means that it breaks to its prompt to ask where to get its root filesystem. Unless you have an alternate root filesystem, you are stuck. Since the root filesystem is initially mounted read-only, it is safe to make an attempt to mount the root filesystem with the failed superblock check-hash. Thus, when asked to mount a root filesystem with a failed superblock check-hash, the kernel prints a warning message that the root filesystem superblock check-hash needs repair, but notes that it is ignoring the error and proceeding. It does mark the filesystem as needing an fsck which prevents it from being enabled for writing until fsck has been run on it. The net effect is that the reboot fails to single user, but at least at that point the administrator has the tools at hand to fix the problem. Reported by: Rick Macklem (rmacklem@) Discussed with: Warner Losh (imp@) Sponsored by: Netflix	2018-12-06 00:09:39 +00:00
Kirk McKusick	a02bd3e38c	Move the check for the filesystem having been run on a kernel that predates metadata check hashes so that it is done before deciding whether to compute a check-hash of the superblock. Reported by: Rick Macklem <rmacklem@uoguelph.ca> Sponsored by: Netflix	2018-11-26 00:58:07 +00:00
Kirk McKusick	ade67b509c	Calculate updated superblock check-hash before writing it into the snapshot. This corrects a bug that prevented snapshots from being mounted due to a superblock check-hash failure. Reported by: Brennan Vincent <brennan@umanwizard.com> Tested by: Peter Holm (pho@) Sponsored by: Netflix	2018-11-25 18:01:15 +00:00
Mark Johnston	6d2e2df764	Ensure that directory entry padding bytes are zeroed. Directory entries must be padded to maintain alignment; in many filesystems the padding was not initialized, resulting in stack memory being copied out to userspace. With the ino64 work there are also some explicit pad fields in struct dirent. Add a subroutine to clear these bytes and use it in the in-tree filesystems. The NFS client is omitted for now as it was fixed separately in r340787. Reported by: Thomas Barabosch, Fraunhofer FKIE Reviewed by: kib MFC after: 3 days Sponsored by: The FreeBSD Foundation	2018-11-23 22:24:59 +00:00
Konstantin Belousov	1c4ca77890	Add d_off support for multiple filesystems. The d_off field has been added to the dirent structure recently. Currently filesystems don't support this feature. Support has been added and tested for zfs, ufs, ext2fs, fdescfs, msdosfs and unionfs. A stub implementation is available for cd9660, nandfs, udf and pseudofs but hasn't been tested. Motivation for this feature: our usecase is for a userspace nfs server (nfs-ganesha) with zfs. At the moment we cache direntry offsets by calling lseek once per entry, with this patch we can get the offset directly from getdirentries(2) calls which provides a significant speedup. Submitted by: Jack Halford <jack@gandi.net> Reviewed by: mckusick, pfg, rmacklem (previous versions) Sponsored by: Gandi.net MFC after: 1 week Differential revision: https://reviews.freebsd.org/D17917	2018-11-14 14:18:35 +00:00
Kirk McKusick	9fc5d538fc	In preparation for adding inode check-hashes, clean up and document the libufs interface for fetching and storing inodes. The undocumented getino / putino interface has been replaced with a new getinode / putinode interface. Convert the utilities that had been using the undocumented interface to use the new documented interface. No functional change (as for now the libufs library does not do inode check-hashes). Reviewed by: kib Tested by: Peter Holm Sponsored by: Netflix	2018-11-13 21:40:56 +00:00
Brooks Davis	1493c2ee62	Make vop_symlink take a const target path. This will enable callers to take const paths as part of syscall decleration improvements. Where doing so is easy and non-distruptive carry the const through implementations. In UFS the value is passed to an interface that must take non-const values. In ZFS, const poisoning would touch code shared with upstream and it's not worth adding diffs. Bump __FreeBSD_version for external API consumers. Reviewed by: kib (prior version) Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D17805	2018-11-02 14:42:36 +00:00
Konstantin Belousov	4f77f48884	Implement O_BENEATH and AT_BENEATH. Flags prevent open(2) and *at(2) vfs syscalls name lookup from escaping the starting directory. Supposedly the interface is similar to the same proposed Linux flags. Reviewed by: jilles (code, previous version of manpages), 0mp (manpages) Discussed with: allanjude, emaste, jonathan Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D17547	2018-10-25 22:16:34 +00:00
Kirk McKusick	ec888383cf	Continuing efforts to provide hardening of FFS, this change adds a check hash to the superblock. If a check hash fails when an attempt is made to mount a filesystem, the mount fails with EINVAL (Invalid argument). This avoids a class of filesystem panics related to corrupted superblocks. The hash is done using crc32c. Check hases are added only to UFS2 and not to UFS1 as UFS1 is primarily used in embedded systems with small memories and low-powered processors which need as light-weight a filesystem as possible. Reviewed by: kib Tested by: Peter Holm Sponsored by: Netflix	2018-10-23 21:10:06 +00:00
Konstantin Belousov	b7befdf509	Correct panic messages. Reviewed by: mckusick Sponsored by: The FreeBSD Foundation Approved by: re (rgrimes) MFC after: 1 week	2018-09-22 17:05:49 +00:00
Konstantin Belousov	bf94d6c78b	Fix state of dquot-less vnodes after failed quotaoff. UFS quotaoff iterates over all mp vnodes, and derefences and clears the pointers to corresponding dquots. If SU work items transiently reference some of dquots,quotaoff() would eventually fail, but all processed vnodes are already stripped from dquots. The state is problematic, since quotas are left enabled, but there is no dquots where blocks and inodes can be accounted. The result is assertion failures and NULL pointer dereferences. Fix it by suspending writes around quotaoff() call. Since the filesystem is synced, no dandling references to dquots from SU workitems can left behind, which means that quotaoff succeeds. The complication there is that quotaoff VFS op is performed with the mount point busied, while to suspend, we need to start write on the mp. If vn_start_write() is called on busied mp, system might deadlock against parallel unmount request. Handle this by unbusy-ing mp before starting write, which in turn requires changing the quotaoff() interface to return with the mount point not busied, same as was done for quotaon(). Reviewed by: mckusick Reported and tested by: pho Sponsored by: The FreeBSD Foundation Approved by: re (gjb) MFC after: 1 week Differential revision: https://reviews.freebsd.org/D17208	2018-09-19 14:36:57 +00:00
Konstantin Belousov	a408841593	Do not upgrade the vnode lock to call getinoquota(). Doing so can deadlock when the thread already owns another vnode lock, e.g. during a rename, as was demonstrated by the reporter. In fact, there seems to be no need to force the call to getinoquota() always, because vn_open() locks vnode exclusively, and this is the most important case. To add to the point, directories where the dirent is added or removed, are locked exclusively as well. Reported by: bwidawsk Tested by: bwidawsk, pho (as part of the larger patch) Sponsored by: The FreeBSD Foundation Approved by: re (gjb) MFC after: 1 week	2018-09-17 19:38:43 +00:00
Kirk McKusick	4b6a2c497e	The Call For Testing had no reports of operational problems and found that performance was no worse and usually better when running with TRIM consolidation. Performance improvement was most noticable when multiple large files are released in a short period of time. Thus, TRIM consolidation is being enabled by default. Should operational problems be found, it can be disabled using the command `sysctl vfs.ffs.dotrimcons=0'. This variable can also be set as a tunable if early disabling is necessary. Approved by: re (gjb) Sponsored by: Netflix	2018-09-06 23:28:35 +00:00
Mark Murray	19fa89e938	Remove the Yarrow PRNG algorithm option in accordance with due notice given in random(4). This includes updating of the relevant man pages, and no-longer-used harvesting parameters. Ensure that the pseudo-unit-test still does something useful, now also with the "other" algorithm instead of Yarrow. PR: 230870 Reviewed by: cem Approved by: so(delphij,gtetlow) Approved by: re(marius) Differential Revision: https://reviews.freebsd.org/D16898	2018-08-26 12:51:46 +00:00
Kirk McKusick	a9c2220f5c	Proper spelling of consolidation. Submitted by: Dimitry Andric	2018-08-23 22:35:14 +00:00
Kirk McKusick	d530fd484b	TRIM consolodation is supposed to be off by default	2018-08-20 21:19:21 +00:00
Kirk McKusick	4de0d16b8c	For traditional disks, the filesystem attempts to allocate the blocks of a file as contiguously as possible. Since the filesystem does not know how large a file will grow when it is first being written, it initially places the file in a set of blocks in which it currently fits. As it grows, it is relocated to areas with larger contiguous blocks. In this way it saves its large contiguous sets of blocks for the files that need them and thus avoids unnecessaily fragmenting its disk space. We used to skip reallocating the blocks of a file into a contiguous sequence if the underlying flash device requested BIO_DELETE notifications, because devices that benefit from BIO_DELETE also benefit from not moving the data. However, in the algorithm described above that reallocates the blocks, the destination for the data is usually moved before the data is written to the initially allocated location. So we rarely suffer the penalty of extra writes. With the addition of the consolodation of contiguous blocks into single BIO_DELETE operations, having fewer but larger contiguous blocks reduces the number of (slow and expensive) BIO_DELETE operations. So when doing BIO_DELETE consolodation, we do block reallocation. Reviewed by: kib Tested by: Peter Holm Sponsored by: Netflix	2018-08-19 17:19:20 +00:00
Kirk McKusick	fc6e171535	Add consolodation of TRIM / BIO_DELETE commands to the UFS/FFS filesystem. When deleting files on filesystems that are stored on flash-memory (solid-state) disk drives, the filesystem notifies the underlying disk of the blocks that it is no longer using. The notification allows the drive to avoid saving these blocks when it needs to flash (zero out) one of its flash pages. These notifications of no-longer-being-used blocks are referred to as TRIM notifications. In FreeBSD these TRIM notifications are sent from the filesystem to the drive using the BIO_DELETE command. Until now, the filesystem would send a separate message to the drive for each block of the file that was deleted. Each Gigabyte of file size resulted in over 3000 TRIM messages being sent to the drive. This burst of messages can overwhelm the drive's task queue causing multiple second delays for read and write requests. This implementation collects runs of contiguous blocks in the file and then consolodates them into a single BIO_DELETE command to the drive. The BIO_DELETE command describes the run of blocks as a single large block being deleted. Each Gigabyte of file size can result in as few as two BIO_DELETE commands and is typically less than ten. Though these larger BIO_DELETE commands take longer to run, they do not clog the drive task queue, so read and write commands can intersperse effectively with them. Though this new feature has been throughly reviewed and tested, it is being added disabled by default so as to minimize the possibility of disrupting the upcoming 12.0 release. It can be enabled by running ``sysctl vfs.ffs.dotrimcons=1''. Users are encouraged to test it. If no problems arise, we will consider requesting that it be enabled by default for 12.0. Reviewed by: kib Tested by: Peter Holm Sponsored by: Netflix	2018-08-19 16:56:42 +00:00
Kirk McKusick	7e038bc257	Replace the TRIM consolodation framework originally added in -r337396 driven by problems found with the algorithms being tested for TRIM consolodation. Reported by: Peter Holm Suggested by: kib Reviewed by: kib Sponsored by: Netflix	2018-08-18 22:21:59 +00:00

... 2 3 4 5 6 ...

2456 Commits