freebsd-dev

Author	SHA1	Message	Date
Kirk McKusick	b21582ee03	Add a flags parameter to the ffs_sbget() function that reads UFS superblocks. Rather than trying to shoehorn flags into the requested superblock address, create a separate flags parameter to the ffs_sbget() function in sys/ufs/ffs/ffs_subr.c. The ffs_sbget() function is used both in the kernel and in user-level utilities through export to the sbget() function in the libufs(3) library (see sbget(3) for details). The kernel uses ffs_sbget() when mounting UFS filesystems, in the glabel(8) and gjournal(8) GEOM utilities, and in the standalone library used when booting the system from a UFS root filesystem. The ffs_sbget() function reads the superblock located at the byte offset specified by its sblockloc parameter. The value UFS_STDSB may be specified for sblockloc to request that the standard location for the superblock be read. The two existing options are now flags: UFS_NOHASHFAIL will note if the check hash is wrong but will still return the superblock. This is used by the bootstrap code to give the system a chance to come up so that fsck can be run to correct the problem. UFS_NOMSG indicates that superblock inconsistency error messages should not be printed. It is used by programs like fsck that want to print their own error message and programs like glabel(8) that just want to know if a UFS filesystem exists on a partition. One additional flag is added: UFS_NOCSUM causes only the superblock itself to be returned, but does not read in any auxiliary data structures like the cylinder group summary information. It is used by clients like glabel(8) that just want to check for possible filesystem types. Using UFS_NOCSUM skips the superblock checks for csum data which allows superblocks that have corrupted csum data to be read and used. The validate_sblock() function checks that the superblock has not been corrupted in a way that can crash or hang the system. Unless the UFS_NOMSG flag is specified, it will print out any errors that it finds. Prior to this commit, validate_sblock() returned as soon as it found an inconsistency so would print at most one message. It now does all its checks so when UFS_NOMSG has not been specified will print out everything that it finds inconsistent. Sponsored by: The FreeBSD Foundation	2022-07-30 22:51:38 -07:00
Kirk McKusick	548045bf57	Updates to UFS/FFS superblock integrity checks when reading a superblock. Reorder a few checks to ensure fields have been checked before using them to check other fields. Add eight new checks mostly checking for non-negative values. No legitimate superblocks should fail as a result of these changes.	2022-07-30 22:35:11 -07:00
Dimitry Andric	ed1d5f95a5	Adjust function definitions in ufs_dirhash.c to avoid clang 15 warnings With clang 15, the following -Werror warnings are produced: sys/ufs/ufs/ufs_dirhash.c:1303:16: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] ufsdirhash_init() ^ void sys/ufs/ufs/ufs_dirhash.c:1319:18: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] ufsdirhash_uninit() ^ void This is because ufsdirhash_init() and ufsdirhash_uninit() are declared with (void) argument lists, but defined with empty argument lists. Make the definitions match the declarations. MFC after: 3 days	2022-07-26 21:32:55 +02:00
Dimitry Andric	c9dde6f0c7	Fix unused variable warning in ffs_snapshot.c With clang 15, the following -Werror warning is produced: sys/ufs/ffs/ffs_snapshot.c:204:7: error: variable 'redo' set but not used [-Werror,-Wunused-but-set-variable] long redo = 0, snaplistsize = 0; ^ The 'redo' variable is only used when DIAGNOSTIC is defined. Ensure it is only declared and set in that case. MFC after: 3 days	2022-07-26 21:32:51 +02:00
Dimitry Andric	08c16dd4bf	Adjust function definition in ufs_dirhash.c to avoid clang 15 warnings With clang 15, the following -Werror warning is produced: sys/ufs/ufs/ufs_dirhash.c:1252:18: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] ufsdirhash_lowmem() ^ void This is ufsdirhash_lowmem() is declared with a (void) argument list, but defined with an empty argument list. Make the definition match the declaration. MFC after: 3 days	2022-07-26 21:25:09 +02:00
Kirk McKusick	36e08b0127	Bug fix to UFS/FFS superblock integrity checks when reading a superblock. A better fix to commit `9e1f44d044`. Rather than coping with the case where a backup superblock is used, catch the case when the superblock is being read in and ensure that the standard one is used rather than the backup one.	2022-07-20 22:52:11 -07:00
Kirk McKusick	904347a00c	Additional check for UFS/FFS superblock integrity checks. Tested by: Peter Holm PR: 265162	2022-07-16 10:31:52 -07:00
Kirk McKusick	2e66649e4f	Another fix to build from `064e6b4`. Spotted by: Cy Schubert	2022-07-13 21:05:05 -07:00
Kirk McKusick	c792466f87	Fix build from `064e6b4`.	2022-07-13 16:53:04 -07:00
Kirk McKusick	064e6b4303	Rewrite function definitions in the UFS/FFS code base with identifier lists. The K&R style in UFS and other places in the tree's days are numbered as this syntax is removed in C2x proposal N2432: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2432.pdf Though running to nearly 6000 lines of diffs this update should cause no functional change to the code. Requested by: Warner Losh MFC after: 2 weeks	2022-07-13 14:08:05 -07:00
Kirk McKusick	5bc926af9f	Bug fix to UFS/FFS superblock integrity checks when reading a superblock. Older versions of growfs(8) failed to correctly update fs_dsize. Filesystems that have been grown fail the test for fs_dsize's correct value. For now we exclude the fs_dsize test from the requirements. Reported by: Edward Tomasz Napiera Tested by: Edward Tomasz Napiera Tested by: Peter Holm MFC after: 1 month (with `076002f24d`) Differential Revision: https://reviews.freebsd.org/D35219	2022-07-06 14:45:30 -07:00
Kirk McKusick	9e1f44d044	Bug fix to UFS/FFS superblock integrity checks when reading a superblock. The original check verified that if an alternate superblock has not been selected that the superblock is located in its standard location. For UFS1 the with a 65536 block size, the first backup superblock is at the same location as the UFS2 superblock. Since SBLOCK_UFS2 is the first location checked, the first backup is the superblock that will be used for a UFS1 filesystems with a 65536 block size. This patch allows the use of the first backup superblock in that situation. Reported by: Peter Holm Tested by: Peter Holm MFC after: 1 month (with `076002f24d`) Differential Revision: https://reviews.freebsd.org/D35219	2022-07-06 14:45:30 -07:00
Kirk McKusick	f3f5368dfb	Bug fix to UFS/FFS superblock integrity checks when reading a superblock. The tests for number of cylinder groups (fs_ncg), inodes per cylinder group (fs_ipg), and the size and layout of the cylinder group summary information (fs_csaddr and fs_cssize) were overly restrictive and would exclude some valid filesystems. These updates avoid precluding valid fiesystems while still detecting rogue values that can crash or hang the kernel. Reported by: Chuck Silvers Tested by: Peter Holm MFC after: 1 month (with `076002f24d`) Differential Revision: https://reviews.freebsd.org/D35219	2022-07-06 14:45:29 -07:00
Konstantin Belousov	513e1bbc73	ufs_rename(): revert the bump of fvp nlink count in case of EMLINK for tdvp Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2022-07-06 15:34:36 +03:00
Konstantin Belousov	ab5ef5fb63	ufs_rename(): do not treat ERELOOKUP specially Delegate handling of it to the top VFS layer, as it is done everywhere. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2022-07-06 15:34:28 +03:00
Konstantin Belousov	026502d9ed	UFS quotaoff: start write before unbusying Otherwise the mount point could be unmounted meantime. Reported and tested by: pho Reviewed by: jah Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35638	2022-06-29 12:36:59 +03:00
Konstantin Belousov	bc6d0d72f4	UFS rename: make it reliable when using SU and reaching nlink limit PR: 165392 Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35577	2022-06-24 17:46:26 +03:00
Kirk McKusick	ce6296caa3	Fix build break in `50dc4c7`. No functional change intended. MFC after: 1 month (with `076002f24d`)	2022-06-23 19:54:18 -07:00
Kirk McKusick	50dc4c7df4	When a superblock integrity check fails, report the cause of the failure. No functional change intended. MFC after: 1 month (with `076002f24d`) Differential Revision: https://reviews.freebsd.org/D35219	2022-06-23 17:39:53 -07:00
Chuck Silvers	f1b4324b81	ffs: fix vn_read_from_obj() usage for PAGE_SIZE > block size vn_read_from_obj() requires that all pages of a vnode (except the last partial page) be either completely valid or completely invalid, but for file systems with block size smaller than PAGE_SIZE, partially valid pages may exist anywhere in the file. Do not enable the vn_read_from_obj() path in this case. Reviewed by: mckusick, kib, markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D34836	2022-06-22 14:57:29 -07:00
Konstantin Belousov	8db679af66	UFS: make mkdir() and link() reliable when using SU and reaching nlink limit i_nlink overflow might be transient, i_effnlink indicates the final value of the link count after all dependencies would be resolved. So if i_nlink reached the maximum but i_efflink did not, we should be able to make the link by syncing. We must sync the whole filesystem to resolve dependencies, which requires unlocking vnodes locked for VOPs. Use existing ERELOOKUP/VOP_UNLOCK_PAIR() mechanism to restart the VOP if sync with unlock was done. PR: 165392 Reported by: Vsevolod Volkov <vvv@colocall.net> Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35514	2022-06-22 15:35:47 +03:00
Chuck Silvers	82817f26f8	ffs: fix vn_io_fault_pgmove() offset for PAGE_SIZE > block size The "offset" argument to vn_io_fault_pgmove() is supposed to be the offset within the page, but for ffs we currently use the offset within the block. When the block size is at least as large as the page size then these values are the same, but when the page size is larger than the block size then we need to add the offset of the block within the page as well. Sponsored by: Netflix Reviewed by: mckusick, kib, markj Differential Revision: https://reviews.freebsd.org/D34835	2022-06-21 17:54:18 -07:00
Kirk McKusick	800a53b445	Bug fix to UFS/FFS superblock integrity checks when reading a superblock. One of the checks was that the cylinder group size (fs_cgsize) matched that calculated by CGSIZE(). The value calculated by CGSIZE() has changed over time as the filesystem has evolved. Thus comparing the value of CGSIZE() of the current generation filesystem may not match the size as computed by CGSIZE() that was in effect at the time an older filesystem was created. Therefore the check for fs_cgsize is changed to simply ensure that it is not larger than the filesystem blocksize (fs_bsize). Reported by: Martin Birgmeier Tested by: Martin Birgmeier MFC after: 1 month (with `076002f24d`) PR: 264450 Differential Revision: https://reviews.freebsd.org/D35219	2022-06-11 11:05:14 -07:00
Gordon Bergling	a429d3050e	ufs: Fix a typo a source code comment - s/droped/dropped/ MFC after: 3 days	2022-06-04 15:23:53 +02:00
Kirk McKusick	bc218d8920	Two bug fixes to UFS/FFS superblock integrity checks when reading a superblock. Two bugs have been reported with the UFS/FFS superblock integrity checks that were added in commit `076002f24d`. The code checked that fs_sblockactualloc was properly set to the location of the superblock. The fs_sblockactualloc field was an addition to the superblock in commit `dffce2150e` on Jan 26 2018 and used a field that was zero in filesystems created before it was added. The integrity check had to be expanded to accept the fs_sblockactualloc field being zero so as not to reject filesystems created before Jan 26 2018. The integrity check set an upper bound on the value of fs_maxcontig based on the maximum transfer size supported by the kernel. It required that fs->fs_maxcontig <= maxphys / fs->fs_bsize. The kernel variable maxphys defines the maximum transfer size permitted by the controllers and/or buffering. The fs_maxcontig parameter controls the maximum number of blocks that the filesystem will read or write in a single transfer. It is calculated when the filesystem is created as maxphys / fs_bsize. The bug appeared in the loader because it uses a maxphys of 128K even when running on a system that supports larger values. If the filesystem was built on a system that supports a larger maxphys (1M is typical) it will have configured fs_maxcontig for that larger system so would fail the test when run with the smaller maxphys used by the loader. So we bound the upper allowable limit for fs_maxconfig to be able to at least work with a 1M maxphys on the smallest block size filesystem: 1M / 4096 == 256. We then use the limit for fs_maxcontig as fs_maxcontig <= MAX(256, maxphys / fs_bsize). There is no harm in allowing the mounting of filesystems that make larger than maxphys I/O requests because those (mostly 32-bit machines) can (very slowly) handle I/O requests that exceed maxphys. Thanks to everyone who helped sort out the problems and the fixes. Reported by: Cy Schubert, David Wolfskill Diagnosis by: Mark Johnston, John Baldwin Reviewed by: Warner Losh Tested by: Cy Schubert, David Wolfskill MFC after: 1 month (with `076002f24d`) Differential Revision: https://reviews.freebsd.org/D35219	2022-05-31 19:58:37 -07:00
Kirk McKusick	076002f24d	Do comprehensive UFS/FFS superblock integrity checks when reading a superblock. Historically only minimal checks were made of a superblock when it was read in as it was assumed that fsck would have been run to correct any errors before attempting to use the filesystem. Recently several bug reports have been submitted reporting kernel panics that can be triggered by deliberately corrupting filesystem superblocks, see Bug 263979 - [meta] UFS / FFS / GEOM crash (panic) tracking which is tracking the reported corruption bugs. This change upgrades the checks that are performed. These additional checks should prevent panics from a corrupted superblock. Although it appears in only one place, the new code will apply to the kernel modules and (through libufs) user applications that read in superblocks. Reported by: Robert Morris and Neeraj Reviewed by: kib Tested by: Peter Holm PR: 263979 MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D35219	2022-05-27 12:22:07 -07:00
Kirk McKusick	187d7e9821	Reduce code nesting in readsuper(). No functional change.	2022-05-15 15:02:24 -07:00
Konstantin Belousov	ca7c2d2eed	UFS: clear fs_fmod once more, in the buffer data copy. This is needed for in-kernel copy of the code, where allocation might happen after fs_fmod is cleared in ffs_sbput() but before the write. Reported by: markj Reviewed by: chs, markj PR: 263765 Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35149	2022-05-09 23:46:05 +03:00
Konstantin Belousov	4ac2df8f4c	ffs_use_bwrite: make the superblock snapshot more consistent Copy in-memory struct fs to the superblock buffer under the UFS mutex. Reviewed by: chs, markj PR: 263765 Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35149	2022-05-09 23:45:27 +03:00
Stefan Eßer	ecbbb0c85e	ffs: plug a set-but-not-used var	2022-04-19 16:51:12 +02:00
Konstantin Belousov	5c075d6404	ufs/acl.h: forward-declare struct inode Right now it is incidentally declared in sys/lockf.h, which will be corrected shortly. Reviewed by: markj, rmacklem Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D34756	2022-04-10 00:43:53 +03:00
Konstantin Belousov	8cc19b1e47	Style. Reviewed by: markj, rmacklem Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D34756	2022-04-10 00:43:53 +03:00
Gordon Bergling	d4b3b0c2ef	ufs: Fix a typo in a source code comment - s/explicitely/explicitly/ MFC after: 3 days	2022-04-09 09:13:31 +02:00
Chuck Silvers	3dc5f8e19d	ffs: wait for trims earlier during unmount to avoid panic All softdep processing is supposed to be completed by softdep_flushfiles() and no more deps are supposed to be created after that, but if a pending trim completes after softdep_flushfiles() and before softdep_unmount() then the blkfree that is performed by ffs_blkfree_trim_task() will create a dep when none should exist, and if softdep_unmount() is called before that dep is freed then the kernel will panic. Prevent this by waiting for trims to complete earlier in the unmount process, in ffs_flushfiles(), so that any deps will be freed and any modified CG buffers will be flushed by the final fsync of the devvp in ffs_flushfiles() as intended. Reviewed by: mckusick, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D34806	2022-04-08 10:19:40 -07:00
Gordon Bergling	2733b242e4	ffs(3): Fix a common typo in source code comments - s/quadradically/quadratically/ Obtained from: NetBSD MFC after: 3 days	2022-03-28 19:37:03 +02:00
Mateusz Guzik	bb92cd7bcd	vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd)	2022-03-24 10:20:51 +00:00
Robert Wing	ab2dbd9b87	ffs_mount(): fix snapshotting Commit `0455cc7104` broke snapshotting for ffs. In that commit, ffs_mount() was changed so the namei() lookup for a disk device happens before ffs_snapshot(). This caused the issue where namei() would lookup the snapshot file and fail because the file doesn't exist. Even if it did exist, taking a snapshot would still fail since it's not a disk device. Fix this by taking a snapshot of the filesystem as-is and return without altering ro/rw or any other attributes that are passed in. Reported by: pho Reviewed by: mckusick Fixes: `0455cc7104` ("ffs_mount(): return early if namei() fails to lookup disk device") Differential Revision: https://reviews.freebsd.org/D34562	2022-03-16 17:32:37 -08:00
Robert Wing	0455cc7104	ffs_mount(): return early if namei() fails to lookup disk device With soft updates enabled, an INVARIANTS panic is hit in ffs_unmount(). The problem occurs in ffs_mount() when upgrading a mount from ro->rw. During a mount update, the soft update code gets set up but doesn't get cleaned up if namei() fails when looking up the disk device. Avoid this scenario by looking up the disk device first and bail early if the namei() lookup fails. PR: 256511 MFC After: 2 weeks Reviewed by: mckusick, kib Differential Revision: https://reviews.freebsd.org/D30870	2022-03-07 10:48:44 -09:00
Konstantin Belousov	0af463e661	ffs_read(): lock buffers after snaplk with LK_NOWITNESS Reviewed and tested by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D34179	2022-02-06 03:26:22 +02:00
Konstantin Belousov	303d3ae7e8	ufs, msdosfs: do not record witness order when creating vnode When allocating new vnode, we need to lock it exclusively before making it externally visible. Since other threads cannot observe the vnode yet, current lock order cannot create LoR conditions. Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D34126	2022-02-01 10:51:55 +02:00
Konstantin Belousov	99aa3b731c	ffs: lock buffers after snaplk with LK_NOWITNESS Reviewed by: mckusick Discussed with: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D34073	2022-02-01 06:54:50 +02:00
Konstantin Belousov	e11b2b69c5	ffs_alloc.c: order includes alphabetically Reviewed by: mckusick Discussed with: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D34073	2022-02-01 06:54:50 +02:00
Konstantin Belousov	8d8589b385	ufs: be more persistent with finishing some operations when the vnode is doomed after relock. The mere fact that the vnode is doomed does not prevent us from doing UFS operations on it while it is still belongs to UFS, which is determined by non-NULL v_data. Not finishing some operations, e.g. not syncing the inode block only because the vnode started reclamation, is not correct. Add macro IS_UFS() which incapsulates the v_data != NULL, and use it instead of VN_IS_DOOMED() for places where the operation completion is important. Reviewed by: markj, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D34072	2022-01-31 04:46:21 +02:00
Konstantin Belousov	4559700a0a	ffs_snapblkfree(): add a comment explaining lockmgr invocation Reviewed by: markj, mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D34072	2022-01-31 04:46:21 +02:00
Konstantin Belousov	0cdc603308	ufs: Use IS_SNAPSHOT() Reviewed by: markj, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D34072	2022-01-31 04:46:21 +02:00
Kirk McKusick	ddf162d1d1	ufs: handle LoR between snap lock and vnode lock When a filesystem is mounted all of its associated snapshots must be activated. It first allocates a snapshot lock (snaplk) that will be shared by all the snapshot vnodes associated with the filesystem. As part of each snapshot file activation, it must replace its own ufs vnode lock with the snaplk. In this way acquiring the snaplk gives exclusive access to all the snapshots for the filesystem. A write to a ufs vnode first acquires the ufs vnode lock for the file to be written then acquires the snaplk. Once it has the snaplk, it can check all the snapshots to see if any of them needs to make a copy of the block that is about to be written. This ffs_copyonwrite() code path establishes the ufs vnode followed by snaplk locking order. When a filesystem is unmounted it has to release all of its snapshot vnodes. Part of doing the release is to revert the snapshot vnode from using the snaplk to using its original vnode lock. While holding the snaplk, the vnode lock has to be acquired, the vnode updated to reference it, then the snaplk released. Acquiring the vnode lock while holding the snaplk violates the ufs vnode then snaplk order. Because the vnode lock is unused, using LK_EXCLUSIVE \| LK_NOWAIT to acquire it will always succeed and the LK_NOWAIT prevents the reverse lock order from being recorded. This change was made in January 2021 (`173779b98f`) to avoid an LOR violation in ffs_snapshot_unmount(). The same LOR issue was recently found again when removing a snapshot in ffs_snapremove() which must also revert the snaplk to the original vnode lock as part of freeing it. The unwind in ffs_snapremove() deals with the case in which the snaplk is held as a recursive lock holding multiple references. Specifically an equal number of references are made on the vnode lock. This change factors out the lock reversion operations into a new function revert_snaplock() which handles both the recursive locks and avoids the LOR. The new revert_snaplock() function is then used in both ffs_snapshot_unmount() and in ffs_snapremove(). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D33946	2022-01-27 23:03:35 -08:00
Kirk McKusick	7ef56fb049	Avoid unnecessary setting of UFS flag requesting fsck(8) be run. When the kernel is requested to mount a filesystem with a bad superblock check hash, it would set the flag in the superblock requesting that the fsck(8) program be run. The flag is only written to disk as part of a superblock update. Since the superblock always has its check hash updated when it is written to disk, the problem for which the flag has been set will no longer exist. Hence, it is counter-productive to set the flag as it will just cause an unnecessary run of fsck if it ever gets written. Sponsored by: Netflix	2022-01-09 16:18:28 -08:00
Kirk McKusick	1fbcaa13b0	When doing a read-only mount of a UFS filesystem using gjournal(8), suppress error message about a missing gjournal provider. Submitted by: Andreas Longwitz MFC after: 2 weeks Sponsored by: Netflix	2022-01-02 14:04:39 -08:00
Jessica Clarke	324150d6da	ufs: Avoid subobject overflow in snapshot expunge code The code here tries to be smart and zeroes out both di_db and di_ib with a single bzero call, thereby overrunning the di_db subobject. This is fine on most architectures, if a little dodgy. However, on CHERI, the compiler can optionally restrict the bounds on pointers to subobjects to just that subobject, in order to mitigate intra-object buffer overflows, and this is enabled in CheriBSD's pure-capability kernels. Instead, use separate bzero calls for each array, and let the compiler optimise it as it sees fit; even if it's not generating inline zeroing code, Clang will happily optimise two consecutive bzero's to a single larger call. Reviewed by: mckusick Differential Revision: https://reviews.freebsd.org/D33651	2022-01-02 20:55:49 +00:00
Jessica Clarke	5b13fa7987	ufs: Rework shortlink handling to avoid subobject overflows Shortlinks occupy the space of both di_db and di_ib when used. However, everywhere that wants to read or write a shortlink takes a pointer do di_db and promptly runs off the end of it into di_ib. This is fine on most architectures, if a little dodgy. However, on CHERI, the compiler can optionally restrict the bounds on pointers to subobjects to just that subobject, in order to mitigate intra-object buffer overflows, and this is enabled in CheriBSD's pure-capability kernels. Instead, clean this up by inserting a union such that a new di_shortlink can be added with the right size and element type, avoiding the need to cast and allowing the use of the DIP macro to access the field. This also mirrors how the ext2fs code implements extents support, with the exact same structure other than having a uint32_t i_data[] instead of a char di_shortlink[]. Reviewed by: mckusick, jhb Differential Revision: https://reviews.freebsd.org/D33650	2022-01-02 20:55:36 +00:00
Alan Somers	b214fcceac	Change VOP_READDIR's cookies argument to a **uint64_t The cookies argument is only used by the NFS server. NFSv2 defines the cookie as 32 bits on the wire, but NFSv3 increased it to 64 bits. Our VOP_READDIR, however, has always defined it as u_long, which is 32 bits on some architectures. Change it to 64 bits on all architectures. This doesn't matter for any in-tree file systems, but it matters for some FUSE file systems that use 64-bit directory cookies. PR: 260375 Reviewed by: rmacklem Differential Revision: https://reviews.freebsd.org/D33404	2021-12-15 20:54:57 -07:00
Gordon Bergling	f9af3151fa	Revert "ffs(3): Fix a typo in a sysctl description" It should be - s/contigous/contiguous/ not continuous Reported by: tuexen@ This reverts commit `42efe994ec`.	2021-12-05 13:45:47 +01:00
Gordon Bergling	42efe994ec	ffs(3): Fix a typo in a sysctl description - s/contigous/continuous/ MFC after: 3 days	2021-12-04 12:15:34 +01:00
Mateusz Guzik	7e1d3eefd4	vfs: remove the unused thread argument from NDINIT* See `b4a58fbf64` ("vfs: remove cn_thread") Bump __FreeBSD_version to 1400043.	2021-11-25 22:50:42 +00:00
Gordon Bergling	bebff61587	ffs_softdep: Fix a typo in a source code comment - s/conditonally/conditionally/ MFC after: 3 days	2021-11-19 19:17:41 +01:00
Konstantin Belousov	c34a5148e8	ffs: fix newly introduced LOR between mntfs vnode lock and topology lock The mntfs vnode lock should be before topology, as established in ffs_mountfs(). Extend the locked region in ffs_unmount(). Reported and reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D33013	2021-11-16 20:01:31 +02:00
Kirk McKusick	9b8eb1c5b6	Followup to `f2b391528a` to improve printed message. Sponsored by: Netflix	2021-11-15 16:10:02 -08:00
Kirk McKusick	9e9dcac95a	Allow forced r/w mount of UFS/FFS filesystem with a bad check hash. Normally a UFS/FFS filesystem with a bad check hash can only be mounted read only. With this commit the mount(8) -f (force) option can be used to force a read-write mount of a UFS/FFS filesystem with a bad check hash. Conveniently the filesystem will proceed to update its on-disk superblock with a corrected check hash. Sponsored by: Netflix	2021-11-15 16:03:47 -08:00
Kirk McKusick	f2b391528a	Add ability to suppress UFS/FFS superblock check-hash failure messages. When reading UFS/FFS superblocks that have check hashes, both the kernel and libufs print an error message if the check hash is incorrect. This commit adds the ability to request that the error message not be made. It is intended for use by programs like fsck that wants to print its own error message and by kernel subsystems like glabel that just wants to check for possible filesystem types. This capability will be used in followup commits. Sponsored by: Netflix	2021-11-15 09:11:54 -08:00
Kirk McKusick	b366ee4868	Consolodate four copies of the STDSB define into a single place. The STDSB macro is passed to the ffs_sbget() routine to fetch a UFS/FFS superblock "from the stadard place". It was identically defined in lib/libufs/libufs.h, stand/libsa/ufs.c, sys/ufs/ffs/ffs_extern.h, and sys/ufs/ffs/ffs_subr.c. Delete it from these four files and define it instead in sys/ufs/ffs/fs.h. All existing uses of this macro already include sys/ufs/ffs/fs.h so no include changes need to be made. No functional change intended. Sponsored by: Netflix	2021-11-14 22:10:16 -08:00
Konstantin Belousov	eede22d66d	ffs_snapshot: do not assert that um_devvp is locked It is not, and the lock is not needed there Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32761	2021-11-13 01:00:54 +02:00
Konstantin Belousov	25809a018d	mntfs: lock mntfs pseudo devfs vnode properly Require devvp locked for mntfs_freevp(), to have it locked around vgone(). Make that true for ffs, which is the only consumer of the interface. Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32761	2021-11-13 01:00:41 +02:00
Konstantin Belousov	76b05e3e39	ffs: Remove assertions about locked um_devvp in several places Namely, ffs_blkfree_cg(), and ffs_flushfiles(). Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32761	2021-11-13 01:00:33 +02:00
Konstantin Belousov	2030ee0e1b	ufs: remove write-only variables Mark variables as __diagused for invariant-only vars Reviewed by: imp, mjg Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32577	2021-10-21 21:40:46 +03:00
Mateusz Guzik	b4a58fbf64	vfs: remove cn_thread It is always curthread. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D32453	2021-10-11 13:21:47 +00:00
Kyle Evans	6b88668f0b	vfs: remove dead fifoop VOP_KQFILTER implementations These began to become obsolete in `d6d64f0f2c` (r137739) and the deal was later sealed in `003e18aef4` (r137801) when vfs.fifofs.fops was dropped and vop-bypass for pipes became mandatory. PR: 225934 Suggested by: markj Reviewe by: kib, markj Differential Revision: https://reviews.freebsd.org/D32270	2021-10-03 01:02:51 -05:00
Robert Wing	9acea16404	ffs: retire unused fsckpid mount option The fsckpid mount option was introduced in `927a12ae16` along with a couple sysctl's to support SU+J with snapshots. However, those sysctl's were never used and eventually removed in `f2620e9ceb`. There are no in-tree consumers of this mount option. Reviewed by: mckusick, kib Differential Revision: https://reviews.freebsd.org/D32015	2021-10-02 15:11:40 -08:00
Kirk McKusick	4a365e863f	Avoid "consumer not attached in g_io_request" panic when disk lost while using a UFS snapshot. The UFS filesystem supports snapshots. Each snapshot is a file whose contents are a frozen image of the disk partition on which the filesystem resides. Each time an existing block in the filesystem is modified, the filesystem checks whether that block was in use at the time that the snapshot was taken. If so, and if it has not already been copied, a new block is allocated from among the blocks that were not in use at the time that the snapshot was taken and placed in the snapshot file to replace the entry that has not yet been copied. The previous contents of the block are copied to the newly allocated snapshot file block, and the write to the original is then allowed to proceed. The block allocation is done using the usual UFS_BALLOC() routine which allocates the needed block in the snapshot and returns a buffer that is set up to write data into the newly allocated block. In usual filesystem operation, the contents for the new block is copied from user space into the buffer and the buffer is then written to the file using bwrite(), bawrite(), or bdwrite(). In the case of a snapshot the new block must be filled from the disk block that is about to be rewritten. The snapshot routine has a function readblock() that it uses to read the `about to be rewritten' disk block. /* * Read the specified block into the given buffer. / static int readblock(snapvp, bp, lbn) struct vnode snapvp; struct buf bp; ufs2_daddr_t lbn; { struct inode ip; struct bio bip; struct fs fs; ip = VTOI(snapvp); fs = ITOFS(ip); bip = g_alloc_bio(); bip->bio_cmd = BIO_READ; bip->bio_offset = dbtob(fsbtodb(fs, blkstofrags(fs, lbn))); bip->bio_data = bp->b_data; bip->bio_length = bp->b_bcount; bip->bio_done = NULL; g_io_request(bip, ITODEVVP(ip)->v_bufobj.bo_private); bp->b_error = biowait(bip, "snaprdb"); g_destroy_bio(bip); return (bp->b_error); } When the underlying disk fails, its GEOM module is removed. Subsequent attempts to access it should return the ENXIO error. The functionality of checking for the lost disk and returning ENXIO is handled by the g_vfs_strategy() routine: void g_vfs_strategy(struct bufobj bo, struct buf bp) { struct g_vfs_softc sc; struct g_consumer cp; struct bio bip; cp = bo->bo_private; sc = cp->geom->softc; / * If the provider has orphaned us, just return ENXIO. / mtx_lock(&sc->sc_mtx); if (sc->sc_orphaned \|\| sc->sc_enxio_active) { mtx_unlock(&sc->sc_mtx); bp->b_error = ENXIO; bp->b_ioflags \|= BIO_ERROR; bufdone(bp); return; } sc->sc_active++; mtx_unlock(&sc->sc_mtx); bip = g_alloc_bio(); bip->bio_cmd = bp->b_iocmd; bip->bio_offset = bp->b_iooffset; bip->bio_length = bp->b_bcount; bdata2bio(bp, bip); if ((bp->b_flags & B_BARRIER) != 0) { bip->bio_flags \|= BIO_ORDERED; bp->b_flags &= ~B_BARRIER; } if (bp->b_iocmd == BIO_SPEEDUP) bip->bio_flags \|= bp->b_ioflags; bip->bio_done = g_vfs_done; bip->bio_caller2 = bp; g_io_request(bip, cp); } Only after checking that the device is present does it construct the "bio" request and call g_io_request(). When readblock() constructs its own "bio" request and calls g_io_request() directly it panics with "consumer not attached in g_io_request" when the underlying device no longer exists. The fix is to have readblock() call g_vfs_strategy() rather than constructing its own "bio" request: / * Read the specified block into the given buffer. / static int readblock(snapvp, bp, lbn) struct vnode snapvp; struct buf bp; ufs2_daddr_t lbn; { struct inode ip; struct fs *fs; ip = VTOI(snapvp); fs = ITOFS(ip); bp->b_iocmd = BIO_READ; bp->b_iooffset = dbtob(fsbtodb(fs, blkstofrags(fs, lbn))); bp->b_iodone = bdone; g_vfs_strategy(&ITODEVVP(ip)->v_bufobj, bp); bufwait(bp); return (bp->b_error); } Here it uses the buffer that will eventually be written to the disk. The g_vfs_strategy() routine uses four parts of the buffer: b_bcount, b_iocmd, b_iooffset, and b_data. The b_bcount field is already correctly set for the buffer. It is safe to set the b_iocmd and b_iooffset fields as they are set correctly when the later write is done. The write path will also clear the B_DONE flag that our use of the buffer will set. The b_iodone callback has to be set to bdone() which will do just notification that the I/O is done in bufdone(). The rest of bufdone() includes things like processing the softdeps associated with the buffer should not be done until the buffer has been written. Bufdone() will set b_iodone back to NULL after using it, so the full bufdone() processing will be done when the buffer is written. The final change from the previous version of readblock() is that it used the b_data for the destination of the read while g_vfs_strategy() uses the bdata2bio() function to take advantage of VMIO when it is available. Differential revision: https://reviews.freebsd.org/D32150 Reviewed by: kib, chs MFC after: 1 week Sponsored by: Netflix	2021-09-27 20:04:51 -07:00
Kirk McKusick	d7770a5495	Eliminate snaplk / bufwait LOR when creating UFS snapshots Each vnode has an embedded lock that controls access to its contents. However vnodes describing a UFS snapshot all share a single snapshot lock to coordinate their access and update. As part of creating a new UFS snapshot, it has to have its individual vnode lock replaced with the filesystem's snapshot lock. The lock order for regular vnodes with respect to buffer locks is that they must first acquire the vnode lock, then a buffer lock. The order for the snapshot lock is reversed: a buffer lock must be acquired before the snapshot lock. When creating a new snapshot, the snapshot file must retain its vnode lock until it has allocated all the blocks that it needs before switching to the snapshot lock. This update moves one final piece of the initial snapshot block allocation so that it is done before the newly created snapshot is switched to use the snapshot lock. Reported by: Witness code MFC after: 1 week Sponsored by: Netflix	2021-09-18 17:02:30 -07:00
Konstantin Belousov	197a4f29f3	buffer pager: allow get_blksize method to return error Reported and reviewed by: asomers Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31998	2021-09-17 20:29:55 +03:00
Robert Wing	440320b620	ffs: remove unused thread argument from ffs_reload() MFC After: 1 week Reviewed by: imp, kib Differential Revision: https://reviews.freebsd.org/D31127	2021-09-04 12:25:10 -08:00
Konstantin Belousov	bb536de6c0	ffs_update(): Do not assume that EBUSY can only come LK_NOWAIT trylock Instead do protective check for the local flags and do not interpret EBUSY specially if we did not request trylock mode for bread(). Reviewed by: mckusick Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2021-08-31 07:38:35 +03:00
Konstantin Belousov	f822d4feb8	ffs_update(): recalculate flags after relocking the vnode Inode type could migrate between snapshot and regular types while the vnode is unlocked. Recalculate flags specific for snapshot after relock. Reviewed by: mckusick Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2021-08-31 07:38:35 +03:00
Keith Owens	3b29c8b4bd	ddb: do not assume that ffs is mounted with softdep Avoid a panic when debugging with "show ffs" in ddb. Reviewed By: kib, markj, mckusick MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D31622	2021-08-24 21:00:19 -05:00
Gordon Bergling	464a166c27	ufs_dirhash: Correct a typo in a comment - s/memry/memory/ MFC after: 3 days	2021-08-20 09:59:18 +02:00
Konstantin Belousov	8df4bc48c8	ufs rename: ensure that the result of ufs_checkpath() is stable ufs_rename() calls ufs_checkpath() to ensure that the target directory is not a child of the source. If not, rename would create a loop. For instance: source->X1->X2->target and if source moved under target, we get corrupted filesystem. Suppose that we initially have source->X1 .... and X2->target where X1 is not on path from root to X2. Then ufs_checkpath() accepts the inodes, but there is nothing preventing parallel rename of X2 to become under X1, after checkpath finished. Ensure stability of ufs_checkpath() result by taking a per-mount sx in ufs_rename right before ufs_checkpath() and till the end. Reviewed by: chs, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2021-08-13 17:52:26 +03:00
Konstantin Belousov	2e2212b4f5	Style: wrap the long line, definition of ufs_checkpath() Sponsored by: The FreeBSD Foundation MFC after: 3 days	2021-08-13 17:52:20 +03:00
Kirk McKusick	a91716efeb	Clean up orphaned indirdep dependency structures after disk failure. During forcible unmount after a disk failure there is a bug that causes one or more indirdep dependency structures to fail to be deallocated. Until we manage to track down why they fail to get cleaned up, this code tracks them down and eliminates them so that the unmount can succeed. Reported by: Peter Holm Help from: kib Reviewed by: Chuck Silvers Tested by: Peter Holm MFC after: 7 days Sponsored by: Netflix	2021-07-29 16:31:16 -07:00
Kirk McKusick	412b5e40a7	Diagnotic improvement to soft dependency structure management. The soft updates diagnotic code keeps a list for each type of soft update dependency. When a new block is allocated for a file it is initially tracked by a "newblk" dependency. The "newblk" dependency eventually becomes either an "allocdirect" dependency or an "indiralloc" dependency. The diagnotic code failed to move the "newblk" from the list of "newblk"s to its new type list. No functional change intended. Reviewed by: Chuck Silvers (as part of a larger change) Tested by: Peter Holm (as part of a larger change) Sponsored by: Netflix	2021-07-29 16:13:54 -07:00
Jason A. Harmening	211ec9b7d6	FFS: remove ffs_fsfail_task Now that dounmount() supports a dedicated taskqueue, we can simply call it with MNT_DEFERRED directly from the failing context. This also avoids blocking taskqueue_thread with a potentially-expensive unmount operation. Reviewed by: kib, mckusick Tested by: pho Differential Revision: https://reviews.freebsd.org/D31016	2021-07-24 12:52:41 -07:00
Jason A. Harmening	c746ed724d	Allow stacked filesystems to be recursively unmounted In certain emergency cases such as media failure or removal, UFS will initiate a forced unmount in order to prevent dirty buffers from accumulating against the no-longer-usable filesystem. The presence of a stacked filesystem such as nullfs or unionfs above the UFS mount will prevent this forced unmount from succeeding. This change addreses the situation by allowing stacked filesystems to be recursively unmounted on a taskqueue thread when the MNT_RECURSE flag is specified to dounmount(). This call will block until all upper mounts have been removed unless the caller specifies the MNT_DEFERRED flag to indicate the base filesystem should also be unmounted from the taskqueue. To achieve this, the recently-added vfs_pin_from_vp()/vfs_unpin() KPIs have been combined with the existing 'mnt_uppers' list used by nullfs and renamed to vfs_register_upper_from_vp()/vfs_unregister_upper(). The format of the mnt_uppers list has also been changed to accommodate filesystems such as unionfs in which a given mount may be stacked atop more than one lower mount. Additionally, management of lower FS reclaim/unlink notifications has been split into a separate list managed by a separate set of KPIs, as registration of an upper FS no longer implies interest in these notifications. Reviewed by: kib, mckusick Tested by: pho Differential Revision: https://reviews.freebsd.org/D31016	2021-07-24 12:52:00 -07:00
John Baldwin	58109a87d4	Use an ANSI C function declaration for journal_check_space. GCC6 fails to compile this due to a -Wstrict-prototypes error. Sponsored by: Chelsio Communications	2021-07-23 15:59:11 -07:00
Konstantin Belousov	50acaaef54	ffs_softdep: force sync if journal is low in journal_check_space This effectively causes syncing of the mount point from softdep_prealloc(), softdep_prerename(), and softdep_prelink(). Typically it avoids the need for journal suspension at this point, at all. Suggested and reviewed by: mckusick Discussed with: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D30041	2021-06-23 23:47:05 +03:00
Konstantin Belousov	2126f103e0	ffs_softdep.c: add journal_check_space() helper Reviewed by: mckusick Discussed with: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D30041	2021-06-23 23:47:05 +03:00
Konstantin Belousov	64b494a105	softdep_prelink(): only do sync if other thread changed the vnode metadata since previous prelink We call into softdep_prerename() and softdep_prelink() when there is low free space in the journal. Functions sync all vnodes participating in the VOP, in the hope that this would reduce journal utilization. But if the vnodes are already synced, doing sync would only spend writes, journal is filled not due to the records from modifications of our vnodes. Remember original seqc numbers for vnodes, and only initiate syncs when seqc changed. Reviewed by: mckusick Discussed with: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D30041	2021-06-23 23:46:54 +03:00
Konstantin Belousov	f756546662	ufs_rename(): only do softdep_prerename() when other thread changed a vnode Reviewed by: mckusick Discussed with: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D30041	2021-06-23 23:46:15 +03:00
Konstantin Belousov	d4d289cd51	ffs: mark block (re-)allocations as seqc writes Reviewed by: mckusick Discussed with: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D30041	2021-06-23 23:46:15 +03:00
Konstantin Belousov	5eacde3eb8	ufs_rename(): softdep_prerename() does something only for SU+J so call it only in SU+J case Reviewed by: mckusick Discussed with: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D30041	2021-06-23 23:46:15 +03:00
Konstantin Belousov	d0929a990c	ffs: reduce number of dvp relocks in softdep_prelink() If vp == NULL, we unlocked and then immediately relocked dvp there. Reviewed by: mckusick Discussed with: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D30041	2021-06-23 23:46:15 +03:00
Konstantin Belousov	b2b40b28b1	ufs_vnops.c: style Wrap too long functions declarations. Reviewed by: mckusick Discussed with: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D30041	2021-06-23 23:46:15 +03:00
Mark Johnston	b2f9575646	ffs: Correct the input size check in sysctl_ffs_fsck() Make sure we return an error if no input was specified, since SYSCTL_IN() will report success in that case. Reported by: KMSAN Reviewed by: mckusick MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30586	2021-05-31 18:59:18 -04:00
Jason A. Harmening	a4b07a2701	VFS_QUOTACTL(9): allow implementation to indicate busy state changes Instead of requiring all implementations of vfs_quotactl to unbusy the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param to VFS_QUOTACTL(9). The implementation may then indicate to the caller whether it needed to unbusy the mount. Also, add stbool.h to libprocstat modules which #define _KERNEL before including sys/mount.h. Otherwise they'll pull in sys/types.h before defining _KERNEL and therefore won't have the bool definition they need for mp_busy. Reviewed By: kib, markj Differential Revision: https://reviews.freebsd.org/D30556	2021-05-30 14:53:47 -07:00
Jason A. Harmening	271fcf1c28	Revert commits `6d3e78ad6c` and `54256e7954` Parts of libprocstat like to pretend they're kernel components for the sake of including mount.h, and including sys/types.h in the _KERNEL case doesn't fix the build for some reason. Revert both the VFS_QUOTACTL() change and the follow-up "fix" for now.	2021-05-29 17:48:02 -07:00
Jason A. Harmening	6d3e78ad6c	VFS_QUOTACTL(9): allow implementation to indicate busy state changes Instead of requiring all implementations of vfs_quotactl to unbusy the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param to VFS_QUOTACTL(9). The implementation may then indicate to the caller whether it needed to unbusy the mount. Reviewed By: kib, markj Differential Revision: https://reviews.freebsd.org/D30218	2021-05-29 14:05:39 -07:00
Konstantin Belousov	f784da883f	Move mnt_maxsymlinklen into appropriate fs mount data structures Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week X-MFC-Note: struct mount layout Differential revision: https://reviews.freebsd.org/D30325	2021-05-22 15:16:09 +03:00
Don Morris	f17a590085	ufs: Avoid M_WAITOK allocations when building a dirhash At this point the directory's vnode lock is held, so blocking while waiting for free pages makes the system more susceptible to deadlock in low memory conditions. This is particularly problematic on NUMA systems as UMA currently implements a strict first-touch policy. ufsdirhash_build() already uses M_NOWAIT for other allocations and already handled failures for the block array allocation, so just convert to M_NOWAIT. PR: 253992 Reviewed by: markj, mckusick, vangyzen MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29045	2021-05-20 11:25:45 -04:00
Kirk McKusick	9a2fac6ba6	Fix handling of embedded symbolic links (and history lesson). The original filesystem release (4.2BSD) had no embedded sysmlinks. Historically symbolic links were just a different type of file, so the content of the symbolic link was contained in a single disk block fragment. We observed that most symbolic links were short enough that they could fit in the area of the inode that normally holds the block pointers. So we created embedded symlinks where the content of the link was held in the inode's pointer area thus avoiding the need to seek and read a data fragment and reducing the pressure on the block cache. At the time we had only UFS1 with 32-bit block pointers, so the test for a fastlink was: di_size < (NDADDR + NIADDR) * sizeof(daddr_t) (where daddr_t would be ufs1_daddr_t today). When embedded symlinks were added, a spare field in the superblock with a known zero value became fs_maxsymlinklen. New filesystems set this field to (NDADDR + NIADDR) * sizeof(daddr_t). Embedded symlinks were assumed when di_size < fs->fs_maxsymlinklen. Thus filesystems that preceeded this change always read from blocks (since fs->fs_maxsymlinklen == 0) and newer ones used embedded symlinks if they fit. Similarly symlinks created on pre-embedded symlink filesystems always spill into blocks while newer ones will embed if they fit. At the same time that the embedded symbolic links were added, the on-disk directory structure was changed splitting the former u_int16_t d_namlen into u_int8_t d_type and u_int8_t d_namlen. Thus fs_maxsymlinklen <= 0 (as used by the OFSFMT() macro) can be used to distinguish old directory formats. In retrospect that should have just been an added flag, but we did not realize we needed to know about that change until it was already in production. Code was split into ufs/ffs so that the log structured filesystem could use ufs functionality while doing its own disk layout. This meant that no ffs superblock fields could be used in the ufs code. Thus ffs superblock fields that were needed in ufs code had to be copied to fields in the mount structure. Since ufs_readlink needed to know if a link was embedded, fs_maxlinklen gets copied to mnt_maxsymlinklen. The kernel panic that arose to making this fix was triggered when a disk error created an inode of type symlink with no allocated data blocks but a large size. When readlink was called the uiomove was attempted which segment faulted. static int ufs_readlink(ap) struct vop_readlink_args /* { struct vnode a_vp; struct uio a_uio; struct ucred a_cred; } / ap; { struct vnode vp = ap->a_vp; struct inode ip = VTOI(vp); doff_t isize; isize = ip->i_size; if ((isize < vp->v_mount->mnt_maxsymlinklen) \|\| DIP(ip, i_blocks) == 0) { / XXX - for old fastlink support / return (uiomove(SHORTLINK(ip), isize, ap->a_uio)); } return (VOP_READ(vp, ap->a_uio, 0, ap->a_cred)); } The second part of the "if" statement that adds DIP(ip, i_blocks) == 0) { / XXX - for old fastlink support */ is problematic. It never appeared in BSD released by Berkeley because as noted above mnt_maxsymlinklen is 0 for old format filesystems, so will always fall through to the VOP_READ as it should. I had to dig back through `git blame' to find that Rodney Grimes added it as part of ``The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.'' He must have brought it across from an earlier FreeBSD. Unfortunately the source-control logs for FreeBSD up to the merger with the AT&T-blessed 4.4BSD-Lite conversion were destroyed as part of the agreement to let FreeBSD remain unencumbered, so I cannot pin-point where that line got added on the FreeBSD side. The one change needed here is that mnt_maxsymlinklen is declared as an `int' and should be changed to be `u_int64_t'. This discovery led us to check out the code that deletes symbolic links. Specifically if (vp->v_type == VLNK && (ip->i_size < vp->v_mount->mnt_maxsymlinklen \|\| datablocks == 0)) { if (length != 0) panic("ffs_truncate: partial truncate of symlink"); bzero(SHORTLINK(ip), (u_int)ip->i_size); ip->i_size = 0; DIP_SET(ip, i_size, 0); UFS_INODE_SET_FLAG(ip, IN_SIZEMOD \| IN_CHANGE \| IN_UPDATE); if (needextclean) goto extclean; return (ffs_update(vp, waitforupdate)); } Here too our broken symlink inode with no data blocks allocated and a large size will segment fault as we are incorrectly using the test that we have no data blocks to decide that it is an embdedded symbolic link and attempting to bzero past the end of the inode. The test for datablocks == 0 is unnecessary as the test for ip->i_size < vp->v_mount->mnt_maxsymlinklen will do the right thing in all cases. The test for datablocks == 0 was added by David Greenman in this commit: Author: David Greenman <dg@FreeBSD.org> Date: Tue Aug 2 13:51:05 1994 +0000 Completed (hopefully) the kernel support for old style "fastlinks". Notes: svn path=/head/; revision=1821 I am guessing that he likely earlier added the incorrect test in the ufs_readlink code. I asked David if he had any recollection of why he made this change. Amazingly, he still had a recollection of why he had made a one-line change more than twenty years ago. And unsurpisingly it was because he had been stuck between a rock and a hard place. FreeBSD was up to 1.1.5 before the switch to the 4.4BSD-Lite code base. Prior to that, there were three years of development in all areas of the kernel, including the filesystem code, from the combined set of people including Bill Jolitz, Patchkit contributors, and FreeBSD Project members. The compatibility issue at hand was caused by the FASTLINKS patches from Curt Mayer. In merging in the 4.4BSD-Lite changes David had to find a way to provide compatibility with both the changes that had been made in FreeBSD 1.1.5 and with 4.4BSD-Lite. He felt that these changes would provide compatibility with both systems. In his words: ``My recollection is that the 'FASTLINKS' symlinks support in FreeBSD-1.x, as implemented by Curt Mayer, worked differently than 4.4BSD. He used a spare field in the inode to duplicately store the length. When the 4.4BSD-Lite merge was done, the optimized symlinks support for existing filesystems (those that were initialized in FreeBSD-1.x) were broken due to the FFS on-disk structure of 4.4BSD-Lite differing from FreeBSD-1.x. My commit was needed to restore the backward compatibility with FreeBSD-1.x filesystems. I think it was the best that could be done in the somewhat urgent circumstances of the post Berkeley-USL settlement. Also, regarding Rod's massive commit with little explanation, some context: John Dyson and I did the initial re-port of the 4.4BSD-Lite kernel to the 386 platform in just 10 days. It was by far the most intense hacking effort of my life. In addition to the porting of tons of FreeBSD-1 code, I think we wrote more than 30,000 lines of new code in that time to deal with the missing pieces and architectural changes of 4.4BSD-Lite. We didn't make many notes along the way. There was a lot of pressure to get something out to the rest of the developer community as fast as possible, so detailed discrete commits didn't happen - it all came as a giant wad, which is why Rod's commit message was worded the way it was.'' Reported by: Chuck Silvers Tested by: Chuck Silvers History by: David Greenman Lawrence MFC after: 1 week Sponsored by: Netflix	2021-05-16 17:04:11 -07:00
Konstantin Belousov	e3d6759585	b_vflags update requries bufobj lock The trunc_dependencies() issue was reported by Alexander Lochmann <alexander.lochmann@tu-dortmund.de>, who found the problem by performing lock analysis using LockDoc, see https://doi.org/10.1145/3302424.3303948. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2021-04-15 15:47:42 +03:00
Kirk McKusick	14d0cd7225	Ensure that the mount command shows "with quotas" when quotas are enabled. When quotas are enabled with the quotaon(8) command, it sets the MNT_QUOTA flag in the mount structure mnt_flag field. The mount structure holds a cached copy of the filesystem statfs structure in mnt_stat that includes a copy of the mnt_flag field in mnt_stat.f_flags. The mnt_stat structure may not be updated for hours. Since the mount command requests mount details using the MNT_NOWAIT option, it gets the mount's mnt_stat statfs structure whose f_flags field does not yet show the MNT_QUOTA flag being set in mnt_flag. The fix is to have quotaon(8) set the MNT_QUOTA flag in both mnt_flag and in mnt_stat.f_flags so that it will be immediately visible to callers of statfs(2). Reported by: Christos Chatzaras Tested by: Christos Chatzaras PR: 254682 MFC after: 3 days Sponsored by: Netflix	2021-04-14 15:25:08 -07:00
Konstantin Belousov	0b3948e73b	softdep_unmount: assert that no dandling dependencies are left Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178	2021-03-12 13:31:08 +02:00

1 2 3 4 5 ...

2485 Commits