freebsd-dev

Author	SHA1	Message	Date
Thomas Munro	e7347be9e3	ffs: Support O_DSYNC. Respect the new IO_DATASYNC flag when performing synchronous writes. Compared to O_SYNC, O_DSYNC lets us skip updating the inode in some cases, matching the behaviour of fdatasync(2). Reviewed by: kib Differential Review: https://reviews.freebsd.org/D25160	2021-01-08 13:15:56 +13:00
Mateusz Guzik	3e506a67bb	vfs: add v_irflag accessors Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D27793	2021-01-03 06:50:06 +00:00
Mateusz Guzik	9997aedb8f	ufs: use VNPASS when asserting on a vnode in ufs_read_pgcache	2021-01-01 03:14:11 +00:00
Mark Johnston	ace3d9475c	ffs: Avoid out-of-bounds accesses in the fs_active bitmap We use a bitmap to track which cylinder groups have changed between snapshot creation and filesystem suspension. The "legs" of the bitmap are four bytes wide (see ACTIVESET()) so we must round up the allocation size to a multiple of four bytes. I believe this bug is harmless since UMA/kmem_* will both pad the allocation and zero the full allocation. Note that malloc() does inline zeroing when the allocation size is known at compile-time. Reported by: pho (using KASAN) Reviewed by: kib, mckusick MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27731	2020-12-23 11:16:40 -05:00
Ryan Libby	93dba42c0e	ffs: quiet -Wstrict-prototypes Reviewed by: kib, markj, mckusick Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D27558	2020-12-11 22:51:57 +00:00
Kirk McKusick	bb3c01ec79	Document the BA_CLRBUF flag used in ufs and ext2fs filesystems. Suggested by: kib MFC after: 3 days Sponsored by: Netflix	2020-12-06 20:50:21 +00:00
Konstantin Belousov	2c7ada9917	ufs: handle two more cases of possible VNON vnode returned from VFS_VGET(). Reported by: kevans Reviewed by: mckusick, mjg Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27457	2020-12-06 18:09:14 +00:00
Konstantin Belousov	21a45add50	ffs: do not read full direct blocks if they are going to be overwritten. BA_CLRBUF specifies that existing context of the block will be completely overwritten by caller, so there is no reason to spend io fetching existing data. We do the same for indirect blocks. Reported by: tmunro Reviewed by: mckusick, tmunro Tested by: pho, tmunro Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D27353	2020-11-30 17:03:26 +00:00
Konstantin Belousov	cd85379104	Make MAXPHYS tunable. Bump MAXPHYS to 1M. Replace MAXPHYS by runtime variable maxphys. It is initialized from MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys. Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer cache buffers exactly to atop(maxbcachebuf) (currently it is sized to atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1. The +1 for pbufs allow several pbuf consumers, among them vmapbuf(), to use unaligned buffers still sized to maxphys, esp. when such buffers come from userspace (). Overall, we save significant amount of otherwise wasted memory in b_pages[] for buffer cache buffers, while bumping MAXPHYS to desired high value. Eliminate all direct uses of the MAXPHYS constant in kernel and driver sources, except a place which initialize maxphys. Some random (and arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted straight. Some drivers, which use MAXPHYS to size embeded structures, get private MAXPHYS-like constant; their convertion is out of scope for this work. Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs, dev/siis, where either submitted by, or based on changes by mav. Suggested by: mav () Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27225	2020-11-28 12:12:51 +00:00
Konstantin Belousov	92bcefd1d2	clear_inodedeps: handle ERELOOKUP from ffs_syncvnode(). Reported and tested by: pho Sponsored by: The FreeBSD Foundation	2020-11-26 18:03:24 +00:00
Konstantin Belousov	07ef907f6e	ffs_softdep.c: get_parent_vp(): Fix bp lock leak when inum inode was already freed. Reported by: markj, pho Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2020-11-25 17:12:21 +00:00
Konstantin Belousov	8a1509e442	Handle LoR in flush_pagedep_deps(). When operating in SU or SU+J mode, ffs_syncvnode() might need to instantiate other vnode by inode number while owning syncing vnode lock. Typically this other vnode is the parent of our vnode, but due to renames occuring right before fsync (or during fsync when we drop the syncing vnode lock, see below) it might be no longer parent. More, the called function flush_pagedep_deps() needs to lock other vnode while owning the lock for vnode which owns the buffer, for which the dependencies are flushed. This creates another instance of the same LoR as was fixed in softdep_sync(). Put the generic code for safe relocking into new SU helper get_parent_vp() and use it in flush_pagedep_deps(). The case for safe relocking of two vnodes with undefined lock order was extracted into vn helper vn_lock_pair(). Due to call sequence ffs_syncvnode()->softdep_sync_buf()->flush_pagedep_deps(), ffs_syncvnode() indicates with ERELOOKUP that passed vnode was unlocked in process, and can return ENOENT if the passed vnode reclaimed. All callers of the function were inspected. Because UFS namei lookups store auxiliary information about directory entry in in-memory directory inode, and this information is then used by UFS code that creates/removed directory entry in the actual mutating VOPs, it is critical that directory vnode lock is not dropped between lookup and VOP. For softdep_prelink(), which ensures that later link/unlink operation can proceed without overflowing the journal, calls were moved to the place where it is safe to drop processing VOP because mutations are not yet applied. Then, ERELOOKUP causes restart of the whole VFS operation (typically VFS syscall) at top level, including the re-lookup of the involved pathes. [Note that we already do the same restart for failing calls to vn_start_write(), so formally this patch does not introduce new behavior.] Similarly, unsafe calls to fsync in snapshot creation code were plugged. A possible view on these failures is that it does not make sense to continue creating snapshot if the snapshot vnode was reclaimed due to forced unmount. It is possible that relock/ERELOOKUP situation occurs in ffs_truncate() called from ufs_inactive(). In this case, dropping the vnode lock is not safe. Detect the situation with VI_DOINGINACT and reschedule inactivation by setting VI_OWEINACT. ufs_inactive() rechecks VI_OWEINACT and avoids reclaiming vnode is truncation failed this way. In ffs_truncate(), allocation of the EOF block for partial truncation is re-done after vnode is synced, since we cannot leave the buffer locked through ffs_syncvnode(). In collaboration with: pho Reviewed by: mckusick (previous version), markj Tested by: markj (syzkaller), pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26136	2020-11-14 05:30:10 +00:00
Konstantin Belousov	738ea0010b	Add ffs_inode_bwrite() helper. In collaboration with: pho Reviewed by: mckusick (previous version), markj Tested by: markj (syzkaller), pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26136	2020-11-14 05:19:59 +00:00
Konstantin Belousov	7b795aa3c0	Revert r367669 to re-commit with proper message	2020-11-14 05:19:44 +00:00
Konstantin Belousov	c0d2077f41	Add a framework that tracks exclusive vnode lock generation count for UFS. This count is memoized together with the lookup metadata in directory inode, and we assert that accesses to lookup metadata are done under the same lock generation as they were stored. Enabled under DIAGNOSTICS. UFS saves additional data for parent dirent when doing lookup (i_offset, i_count, i_endoff), and this data is used later by VOPs operating on dirents. If parent vnode exclusive lock is dropped and re-acquired between lookup and the VOP call, we corrupt directories. Framework asserts that corruption cannot occur that way, by tracking vnode lock generation counter. Updates to inode dirent members also save the counter, while users compare current and saved counters values. Also, fix a case in ufs_lookup_ino() where i_offset and i_count could be updated under shared lock. It is not a bug on its own since dvp i_offset results from such lookup cannot be used, but it causes false positive in the checker. In collaboration with: pho Reviewed by: mckusick (previous version), markj Tested by: markj (syzkaller), pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26136	2020-11-14 05:17:04 +00:00
Konstantin Belousov	61846fc4dc	Add a framework that tracks exclusive vnode lock generation count for UFS. This count is memoized together with the lookup metadata in directory inode, and we assert that accesses to lookup metadata are done under the same lock generation as they were stored. Enabled under DIAGNOSTICS. UFS saves additional data for parent dirent when doing lookup (i_offset, i_count, i_endoff), and this data is used later by VOPs operating on dirents. If parent vnode exclusive lock is dropped and re-acquired between lookup and the VOP call, we corrupt directories. Framework asserts that corruption cannot occur that way, by tracking vnode lock generation counter. Updates to inode dirent members also save the counter, while users compare current and saved counters values. Also, fix a case in ufs_lookup_ino() where i_offset and i_count could be updated under shared lock. It is not a bug on its own since dvp i_offset results from such lookup cannot be used, but it causes false positive in the checker. In collaboration with: pho Reviewed by: mckusick (previous version), markj Tested by: markj (syzkaller), pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26136	2020-11-14 05:10:39 +00:00
Mark Johnston	f44994874b	ffs: Clamp BIO_SPEEDUP length On 32-bit platforms, the computed size of the BIO_SPEEDUP requested by softdep_request_cleanup() may be negative when assigned to bp->b_bcount, which has type "long". Clamp the size to LONG_MAX. Also convert the unused g_io_speedup() to use an off_t for the magnitude of the shortage for consistency with softdep_send_speedup(). Reviewed by: chs, kib Reported by: pho Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27081	2020-11-11 13:48:07 +00:00
Conrad Meyer	e6790841f7	UFS2: Fix DoS due to corrupted extattrfile Prior versions of FreeBSD (11.x) may have produced a corrupt extattr file. (Specifically, r312416 accidentally fixed this defect by removing a strcpy.) CURRENT FreeBSD supports disk images from those prior versions of FreeBSD. Validate the internal structure as soon as we read it in from disk, to prevent these extattr files from causing invariants violations and DoS. Attempting to access the extattr portion of these files results in EINTEGRITY. At this time, the only way to repair files damaged in this way is to copy the contents to another file and move it over the original. PR: 244089 Reported by: Andrea Venturoli <ml AT netfence.it> Reviewed by: kib Discussed with: mckusick (earlier draft) Security: no Differential Revision: https://reviews.freebsd.org/D27010	2020-10-30 19:00:42 +00:00
Mateusz Guzik	4bfebc8d2c	cache: add cache_vop_mkdir and rename cache_rename to cache_vop_rename	2020-10-30 10:46:35 +00:00
Edward Tomasz Napierala	bce7ee9d41	Drop "All rights reserved" from all my stuff. This includes Foundation copyrights, approved by emaste@. It does not include files which carry other people's copyrights; if you're one of those people, feel free to make similar change. Reviewed by: emaste, imp, gbe (manpages) Differential Revision: https://reviews.freebsd.org/D26980	2020-10-28 13:46:11 +00:00
Kirk McKusick	996d40f91d	Various new check-hash checks have been added to the UFS filesystem over various major releases. Superblock check hashes were added for the 12 release and cylinder-group and inode check hashes will appear in the 13 release. When a disk with a UFS filesystem is writably mounted, the kernel clears the feature flags for anything that it does not support. For example, if a UFS disk from a 12-stable kernel is mounted on an 11-stable system, the 11-stable kernel will clear the flag in the filesystem superblock that indicates that superblock check-hashs are being maintained. Thus if the disk is later moved back to a 12-stable system, the 12-stable system will know to ignore its incorrect check-hash. If the only filesystem modification done on the earlier kernel is to run a utility such as growfs(8) that modifies the superblock but neither updates the check-hash nor clears the feature flag indicating that it does not support the check-hash, the disk will fail to mount if it is moved back to its original newer kernel. This patch moves the code that clears the filesystem feature flags from the mount code (ffs_mountfs()) to the code that reads the superblock (ffs_sbget()). As ffs_sbget() is used by the kernel mount code and is imported into libufs(3), all the filesystem utilities will now also clear these flags when they make modifications to the filesystem. As suggested by John Baldwin, fsck_ffs(8) has been changed to accept and repair bad superblock check-hashes rather than refusing to run. This change allows fsck to recover filesystems that have been impacted by utilities older than those created after this change and is a sensible thing to do in any event. Reported by: John Baldwin (jhb@) MFC after: 2 weeks Sponsored by: Netflix	2020-10-25 00:43:48 +00:00
Mateusz Guzik	25fb30bd9a	vfs: drop spurious cache_purge on rmdir The removed directory gets cache_purged which is sufficient to remove any entries related to the parent. Note only tmpfs, ufs and zfs are patched.	2020-10-23 15:50:49 +00:00
Brooks Davis	44ca4575ea	vmapbuf: don't smuggle address or length in buf Instead, add arguments to vmapbuf. Since this argument is always a pointer use a type of void * and cast to vm_offset_t in vmapbuf. (In CheriBSD we've altered vm_fault_quick_hold_pages to take a pointer and check its bounds.) In no other situtation does b_data contain a user pointer and vmapbuf replaces b_data with the actual mapping. Suggested by: jhb Reviewed by: imp, jhb Obtained from: CheriBSD MFC after: 1 week Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D26784	2020-10-21 16:00:15 +00:00
Mateusz Guzik	e9fb2bd9b8	ufs: catch up with removal of thread argument from VOP_INACTIVE	2020-10-20 09:46:20 +00:00
Konstantin Belousov	e1ef4c29a3	Do not leak B_BARRIER. Normally when a buffer with B_BARRIER is written, the flag is cleared by g_vfs_strategy() when creating bio. But in some cases FFS buffer might not reach g_vfs_strategy(), for instance when copy-on-write reports an error like ENOSPC. In this case buffer is returned to dirty queue and might be written later by other means. Among then bdwrite() reasonably asserts that B_BARRIER is not set. In fact, the only current use of B_BARRIER is for lazy inode block initialization, where write of the new inode block is fenced against cylinder group write to mark inode as used. The situation could be seen that we break dependency by updating cg without written out inode. Practically since CoW was not able to find space for a copy of inode block, for the same reason cg group block write should fail. Reported by: pho Discussed with: chs, imp, mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D26511	2020-10-08 22:41:02 +00:00
Chuck Silvers	8b88330ed6	ufs: restore uniqueness of st_dev as returned by ufs_stat() switch ufs_stat() to use the same value for st_dev as was used by the previous ufs_getattr() stat path. Submitted by: gallatin Reviewed by: mjg, imp, kib, mckusick Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26596	2020-10-05 18:17:50 +00:00
Konstantin Belousov	3c484f325e	Convert page cache read to VOP. There are several negative side-effects of not calling into VOP layer at all for page cache reads. The biggest is the missed activation of EVFILT_READ knotes. Also, it allows filesystem to make more fine grained decision to refuse read from page cache. Keep VIRF_PGREAD flag around, it is still useful for nullfs, and for asserts. Reviewed by: markj Tested by: pho Discussed with: mjg Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26346	2020-09-15 22:06:36 +00:00
Konstantin Belousov	96474d2a3f	Do not copy vp into f_data for DTYPE_VNODE files. The pointer to vnode is already stored into f_vnode, so f_data can be reused. Fix all found users of f_data for DTYPE_VNODE. Provide finit_vnode() helper to initialize file of DTYPE_VNODE type. Reviewed by: markj (previous version) Discussed with: freqlabs (openzfs chunk) Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26346	2020-09-15 21:55:21 +00:00
Mateusz Guzik	d90f2c3617	ufs: clean up empty lines in .c and .h files	2020-09-01 21:23:00 +00:00
Mateusz Guzik	39f8815070	cache: add cache_rename, a dedicated helper to use for renames While here make both tmpfs and ufs use it. No fuctional changes.	2020-08-20 10:05:46 +00:00
Mateusz Guzik	8f226f4c23	vfs: remove the always-curthread td argument from VOP_RECLAIM	2020-08-19 07:28:01 +00:00
Mateusz Guzik	7ad2a82da2	vfs: drop the error parameter from vn_isdisk, introduce vn_isdisk_error Most consumers pass NULL.	2020-08-19 02:51:17 +00:00
Konstantin Belousov	779ad2acf1	VMIO reads: enable for UFS Move v_object creation earlier, so that VIRF_PGREAD is never set if v_object is NULL. There is no much harm from instantiating v_object when later check for append-only flags disallows open. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25968	2020-08-16 21:07:19 +00:00
Mateusz Guzik	a92a971bbb	vfs: remove the thread argument from vget It was already asserted to be curthread. Semantic patch: @@ expression arg1, arg2, arg3; @@ - vget(arg1, arg2, arg3) + vget(arg1, arg2)	2020-08-16 17:18:54 +00:00
Mateusz Guzik	03337743db	vfs: clean MNTK_FPLOOKUP if MNT_UNION is set Elides checking it during lookup.	2020-08-10 11:51:21 +00:00
Mateusz Guzik	76dc5d3224	ufs: add VOP_STAT handler	2020-08-07 23:08:17 +00:00
Mateusz Guzik	d292b1940c	vfs: remove the obsolete privused argument from vaccess This brings argument count down to 6, which is passable without the stack on amd64.	2020-08-05 09:27:03 +00:00
Mateusz Guzik	e5e10c82ec	ufs: only pass LK_ADAPTIVE if LK_NODDLKTREAT is set This restores the pre-adaptive spinning state for SU which livelocks otherwise. Note this is a bug in SU. Reported by: pho	2020-08-04 23:09:15 +00:00
Mateusz Guzik	9d5a594f0b	ufs: add support for lockless lookup ACLs are not supported, meaning their presence will force the use of the old lookup. Reviewed by: kib Tested by: pho (in a patchset) Differential Revision: https://reviews.freebsd.org/D25579	2020-07-25 10:38:05 +00:00
Mateusz Guzik	31ad4050fe	lockmgr: add adaptive spinning It is very conservative. Only spinning when LK_ADAPTIVE is passed, only on exclusive lock and never when any waiters are present. buffer cache is remains not spinning. This reduces total sleep times during buildworld etc., but it does not shorten total real time (culprits are contention in the vm subsystem along with slock + upgrade which is not covered). For microbenchmarks: open3_processes -t 52 (open/close of the same file for writing) ops/s: before: 258845 after: 801638 Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D25753	2020-07-22 12:30:31 +00:00
Kirk McKusick	93440bbefd	The binary representation of the superblock (the fs structure) is written out verbatim to the disk: see ffs_sbput() in sys/ufs/ffs/ffs_subr.c. It contains a pointer to the fs_summary_info structure. This pointer value inadvertently causes garbage to be stored. It is garbage because the pointer to the fs_summary_info structure is the address the then current stack or heap. Although a mere pointer does not reveal anything useful (like a part of a private key) to an attacker, garbage output deteriorates reproducibility. This commit zeros out the pointer to the fs_summary_info structure before writing the out the superblock. Reviewed by: kib Tested by: Peter Holm PR: 246983 Sponsored by: Netflix	2020-06-19 01:04:25 +00:00
Kirk McKusick	34816cb9ae	Move the pointers stored in the superblock into a separate fs_summary_info structure. This change was originally done by the CheriBSD project as they need larger pointers that do not fit in the existing superblock. This cleanup of the superblock eases the task of the commit that immediately follows this one. Suggested by: brooks Reviewed by: kib PR: 246983 Sponsored by: Netflix	2020-06-19 01:02:53 +00:00
Chuck Silvers	d9a8abf6c2	Move all of the functions in ffs_subr.c that are only used by the ufs kernel module from that file into ffs_vfsops.c. This fixes the build for kernel configs that don't include FFS. PR: 247256 Submitted by: glebius Reviewed by: mckusick (earlier version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D25285	2020-06-17 23:39:52 +00:00
Rick Macklem	1f7104d720	Fix export_args ex_flags field so that is 64bits, the same as mnt_flags. Since mnt_flags was upgraded to 64bits there has been a quirk in "struct export_args", since it hold a copy of mnt_flags in ex_flags, which is an "int" (32bits). This happens to currently work, since all the flag bits used in ex_flags are defined in the low order 32bits. However, new export flags cannot be defined. Also, ex_anon is a "struct xucred", which limits it to 16 additional groups. This patch revises "struct export_args" to make ex_flags 64bits and replaces ex_anon with ex_uid, ex_ngroups and ex_groups (which points to a groups list, so it can be malloc'd up to NGROUPS in size. This requires that the VFS_CHECKEXP() arguments change, so I also modified the last "secflavors" argument to be an array pointer, so that the secflavors could be copied in VFS_CHECKEXP() while the export entry is locked. (Without this patch VFS_CHECKEXP() returns a pointer to the secflavors array and then it is used after being unlocked, which is potentially a problem if the exports entry is changed. In practice this does not occur when mountd is run with "-S", but I think it is worth fixing.) This patch also deleted the vfs_oexport_conv() function, since do_mount_update() does the conversion, as required by the old vfs_cmount() calls. Reviewed by: kib, freqlabs Relnotes: yes Differential Revision: https://reviews.freebsd.org/D25088	2020-06-14 00:10:18 +00:00
Kirk McKusick	513274c79c	Clear the IN_SIZEMOD and IN_IBLKDATA flags only when doing a synchronous inode update. The IN_SIZEMOD and IN_IBLKDATA flags indicate changes to the file size and block pointer fields in the inode. When these fields have been changed, the fsync() and fsyncdata() system calls must write the inode to ensure their semantics that the file is on stable store. The IN_SIZEMOD and IN_IBLKDATA flags cannot be cleared until a synchronous write of the inode is done. If they are cleared on an asynchronous write, then the inode may not yet have been written to the disk when an fsync() or fsyncdata() call is done. Absent these flags, these calls would not know that they needed to write the inode. Thus, these flags only can be cleared on synchronous writes of the inode. Since the inode will be locked for the duration of the I/O that writes it to disk, no fsync() or fsyncdata() will be able to run before the on-disk inode is complete. Reviewed by: kib MFC with: -r361785 Differential revision: https://reviews.freebsd.org/D25072	2020-06-06 20:17:56 +00:00
Kirk McKusick	52488b5148	Further evaluation of the POSIX spec for fdatasync() shows that it requires that new data on growing files be accessible. Thus, the the fsyncdata() system call must update the on-disk inode when the size of the file has changed. This commit adds another inode update flag, IN_SIZEMOD, that gets set any time that the file size changes. If either the IN_IBLKDATA or the IN_SIZEMOD flag is set when fdatasync() is called, the associated inode is synchronously written to disk. We could have overloaded the IN_IBLKDATA flag to also track size changes since the only (current) use case for these flags are for fsyncdata(), but it does seem useful for possible future uses to separately track the file size changes and the inode block pointer changes. Reviewed by: kib MFC with: -r361785 Differential revision: https://reviews.freebsd.org/D25072	2020-06-05 01:00:55 +00:00
Stefan Eßer	23e84cf153	Fix obvious typo: IN_BLKDATA should be IN_IBLKDATA	2020-06-04 19:54:25 +00:00
Kirk McKusick	30296c428a	Two additional places that need to identify IN_IBLKDATA. Reviewed by: kib MFC with: -r361785 Differential Revision: https://reviews.freebsd.org/D25072	2020-06-04 18:35:21 +00:00
Konstantin Belousov	7428630b75	UFS: write inode block for fdatasync(2) if pointers in inode where allocated The fdatasync() description in POSIX specifies that all I/O operations shall be completed as defined for synchronized I/O data integrity completion. and then the explanation of Synchronized I/O Data Integrity Completion says The write is complete only when the data specified in the write request is successfully transferred and all file system information required to retrieve the data is successfully transferred. For UFS this means that all pointers must be on disk. Indirect pointers already contribute to the list of dirty data blocks, so only direct blocks and root pointers to indirect blocks, both of which reside in the inode block, should be taken care of. In ffs_balloc(), mark the inode with the new flag IN_IBLKDATA that specifies that ffs_syncvnode(DATA_ONLY) needs a call to ffs_update() to flush the inode block. Reviewed by: mckusick Discussed with: tmunro Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D25072	2020-06-04 12:23:15 +00:00
Chuck Silvers	d79ff54b5c	This commit enables a UFS filesystem to do a forcible unmount when the underlying media fails or becomes inaccessible. For example when a USB flash memory card hosting a UFS filesystem is unplugged. The strategy for handling disk I/O errors when soft updates are enabled is to stop writing to the disk of the affected file system but continue to accept I/O requests and report that all future writes by the file system to that disk actually succeed. Then initiate an asynchronous forced unmount of the affected file system. There are two cases for disk I/O errors: - ENXIO, which means that this disk is gone and the lower layers of the storage stack already guarantee that no future I/O to this disk will succeed. - EIO (or most other errors), which means that this particular I/O request has failed but subsequent I/O requests to this disk might still succeed. For ENXIO, we can just clear the error and continue, because we know that the file system cannot affect the on-disk state after we see this error. For EIO or other errors, we arrange for the geom_vfs layer to reject all future I/O requests with ENXIO just like is done when the geom_vfs is orphaned. In both cases, the file system code can just clear the error and proceed with the forcible unmount. This new treatment of I/O errors is needed for writes of any buffer that is involved in a dependency. Most dependencies are described by a structure attached to the buffer's b_dep field. But some are created and processed as a result of the completion of the dependencies attached to the buffer. Clearing of some dependencies require a read. For example if there is a dependency that requires an inode to be written, the disk block containing that inode must be read, the updated inode copied into place in that buffer, and the buffer then written back to disk. Often the needed buffer is already in memory and can be used. But if it needs to be read from the disk, the read will fail, so we fabricate a buffer full of zeroes and pretend that the read succeeded. This zero'ed buffer can be updated and written back to disk. The only case where a buffer full of zeros causes the code to do the wrong thing is when reading an inode buffer containing an inode that still has an inode dependency in memory that will reinitialize the effective link count (i_effnlink) based on the actual link count (i_nlink) that we read. To handle this case we now store the i_nlink value that we wrote in the inode dependency so that it can be restored into the zero'ed buffer thus keeping the tracking of the inode link count consistent. Because applications depend on knowing when an attempt to write their data to stable storage has failed, the fsync(2) and msync(2) system calls need to return errors if data fails to be written to stable storage. So these operations return ENXIO for every call made on files in a file system where we have otherwise been ignoring I/O errors. Coauthered by: mckusick Reviewed by: kib Tested by: Peter Holm Approved by: mckusick (mentor) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24088	2020-05-25 23:47:31 +00:00

1 2 3 4 5 ...

2286 Commits