freebsd-nq

Author	SHA1	Message	Date
Kirk McKusick	efbf396426	This change is some refactoring of Mark Johnston's changes in r329375 to fix the memory leak that I introduced in r328426. Instead of trying to clear up the possible memory leak in all the clients, I ensure that it gets cleaned up in the source (e.g., ffs_sbget ensures that memory is always freed if it returns an error). The original change in r328426 was a bit sparse in its description. So I am expanding on its description here (thanks cem@ and rgrimes@ for your encouragement for my longer commit messages). In preparation for adding check hashing to superblocks, r328426 is a refactoring of the code to get the reading/writing of the superblock into one place. Unlike the cylinder group reading/writing which ends up in two places (ffs_getcg/ffs_geom_strategy in the kernel and cgget/cgput in libufs), I have the core superblock functions just in the kernel (ffs_sbfetch/ffs_sbput in ffs_subr.c which is already imported into utilities like fsck_ffs as well as libufs to implement sbget/sbput). The ffs_sbfetch and ffs_sbput functions take a function pointer to do the actual I/O for which there are four variants: ffs_use_bread / ffs_use_bwrite for the in-kernel filesystem g_use_g_read_data / g_use_g_write_data for kernel geom clients ufs_use_sa_read for the standalone code (stand/libsa/ufs.c but not stand/libsa/ufsread.c which is size constrained) use_pread / use_pwrite for libufs Uses of these interfaces are in the UFS filesystem, geoms journal & label, libsa changes, and libufs. They also permeate out into the filesystem utilities fsck_ffs, newfs, growfs, clri, dump, quotacheck, fsirand, fstyp, and quot. Some of these utilities should probably be converted to directly use libufs (like dumpfs was for example), but there does not seem to be much win in doing so. Tested by: Peter Holm (pho@)	2018-03-02 04:34:53 +00:00
Conrad Meyer	d4e6557bae	ffs: softdep_disk_write_complete: Quiesce spurious Coverity warning Coverity cannot determine that handle_written_indirdep() does not access uninitialized 'sbp' when flags argument is zero. So, simply move the initialization slightly sooner to silence the warning. No functional change. Reported by: Coverity Sponsored by: Dell EMC Isilon	2018-03-01 00:29:52 +00:00
Kirk McKusick	528833fae1	Use a more straight-forward approach to relaxing the location restraints when validating one of the backup superblocks.	2018-02-26 00:34:56 +00:00
Kirk McKusick	4cbd996a84	Relax the location restraints when validating one of the backup superblocks.	2018-02-24 03:33:46 +00:00
Kirk McKusick	f686b1710a	Refactor fix in r329600 to do its check once in readsuper() rather than in the two places that call readsuper(). No semantic change intended. Reviewed by: kib	2018-02-21 19:56:19 +00:00
Konstantin Belousov	9f74642385	Do not free(9) uninitialized pointer. Reported and tested by: allanjude Reviewed by: markj Sponsored by: The FreeBSD Foundation	2018-02-19 19:08:25 +00:00
Mark Johnston	16759360d4	Fix a memory leak introduced in r328426. ffs_sbget() may return a superblock buffer even if it fails, so the caller must be prepared to free it in this case. Moreover, when tasting alternate superblock locations in a loop, ffs_sbget()'s readfunc callback must free the previously allocated buffer. Reported and tested by: pho Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D14390	2018-02-16 15:41:03 +00:00
Kirk McKusick	068beacf21	The goal of this change is to prevent accidental foot shooting by folks running filesystems created on check-hash enabled kernels (which I will call "new") on a non-check-hash enabled kernels (which I will call "old). The idea here is to detect when a filesystem is run on an old kernel and flag the filesystem so that when it gets moved back to a new kernel, it will not start getting a slew of check-hash errors. Back when the UFS version 2 filesystem was created, it added a file flag FS_INDEXDIRS that was to be set on any filesystem that kept some sort of on-disk indexing for directories. The idea was precisely to solve the issue we have today. Specifically that a newer kernel that supported indexing would be able to tell that the filesystem had been run on an older non-indexing kernel and that the indexes should not be used until they had been rebuilt. Since we have never implemented on-disk directory indicies, the FS_INDEXDIRS flag is cleared every time any UFS version 2 filesystem ever created is mounted for writing. This commit repurposes the FS_INDEXDIRS flag as the FS_METACKHASH flag. Thus, the FS_METACKHASH is definitively known to have always been cleared. The FS_INDEXDIRS flag has been moved to a new block of flags that will always be cleared starting with this commit (until they get used to implement some future feature which needs to detect that the filesystem was mounted on a kernel that predates the new feature). If a filesystem with check-hashes enabled is mounted on an old kernel the FS_METACKHASH flag is cleared. When that filesystem is mounted on a new kernel it will see that the FS_METACKHASH has been cleared and clears all of the fs_metackhash flags. To get them re-enabled the user must run fsck (in interactive mode without the -y flag) which will ask for each supported check hash whether it should be rebuilt and enabled. When fsck is run in its default preen mode, it will just ignore the check hashes so they will remain disabled. The kernel has always disabled any check hash functions that it does not support, so as more types of check hashes are added, we will get a non-surprising result. Specifically if filesystems get moved to kernels supporting fewer of the check hashes, those that are not supported will be disabled. If the filesystem is moved back to a kernel with more of the check-hashes available and fsck is run interactively to rebuild them, then their checking will resume. Otherwise just the smaller subset will be checked. A side effect of this commit is that filesystems running with cylinder-group check hashes will stop having them checked until fsck is run to re-enable them (since none of them currently have the FS_METACKHASH flag set). So, if you want check hashes enabled on your filesystems after booting a kernel with these changes, you need to run fsck to enable them. Any newly created filesystems will have check hashes enabled. If in doubt as to whether you have check hashes emabled, run dumpfs and look at the list of enabled flags at the end of the superblock details.	2018-02-08 23:06:58 +00:00
Kirk McKusick	47806d1b93	Occasional cylinder-group check-hash errors were being reported on systems running with a heavy filesystem load. Tracking down this bug was elusive because there were actually two problems. Sometimes the in-memory check hash was wrong and sometimes the check hash computed when doing the read was wrong. The occurrence of either error caused a check-hash mismatch to be reported. The first error was that the check hash in the in-memory cylinder group was incorrect. This error was caused by the following sequence of events: - We read a cylinder-group buffer and the check hash is valid. - We update its cg_time and cg_old_time which makes the in-memory check-hash value invalid but we do not mark the cylinder group dirty. - We do not make any other changes to the cylinder group, so we never mark it dirty, thus do not write it out, and hence never update the incorrect check hash for the in-memory buffer. - Later, the buffer gets freed, but the page with the old incorrect check hash is still in the VM cache. - Later, we read the cylinder group again, and the first page with the old check hash is still in the VM cache, but some other pages are not, so we have to do a read. - The read does not actually get the first page from disk, but rather from the VM cache, resulting in the old check hash in the buffer. - The value computed after doing the read does not match causing the error to be printed. The fix for this problem is to only set cg_time and cg_old_time as the cylinder group is being written to disk. This keeps the in-memory check-hash valid unless the cylinder group has had other modifications which will require it to be written with a new check hash calculated. It also requires that the check hash be recalculated in the in-memory cylinder group when it is marked clean after doing a background write. The second problem was that the check hash computed at the end of the read was incorrect because the calculation of the check hash on completion of the read was being done too soon. - When a read completes we had the following sequence: - bufdone() -- b_ckhashcalc (calculates check hash) -- bufdone_finish() --- vfs_vmio_iodone() (replaces bogus pages with the cached ones) - When we are reading a buffer where one or more pages are already in memory (but not all pages, or we wouldn't be doing the read), the I/O is done with bogus_page mapped in for the pages that exist in the VM cache. This mapping is done to avoid corrupting the cached pages if there is any I/O overrun. The vfs_vmio_iodone() function is responsible for replacing the bogus_page(s) with the cached ones. But we were calculating the check hash before the bogus_page(s) were replaced. Hence, when we were calculating the check hash, we were partly reading from bogus_page, which means we calculated a bad check hash (e.g., because multiple pages have been mapped to bogus_page, so its contents are indeterminate). The second fix is to move the check-hash calculation from bufdone() to bufdone_finish() after the call to vfs_vmio_iodone() so that it computes the check hash over the correct set of pages. With these two changes, the occasional cylinder-group check-hash errors are gone. Submitted by: David Pfitzner <dpfitzner@netflix.com> Reviewed by: kib Tested by: David Pfitzner	2018-02-06 00:19:46 +00:00
Kirk McKusick	0d37a428f0	When reading a cylinder group, break out reporting of check hash errors from other types of errors so that the error is correctly reported.	2018-01-31 23:13:37 +00:00
Kirk McKusick	dffce2150e	Refactoring of reading and writing of the UFS/FFS superblock. Specifically reading is done if ffs_sbget() and writing is done in ffs_sbput(). These functions are exported to libufs via the sbget() and sbput() functions which then used in the various filesystem utilities. This work is in preparation for adding subperblock check hashes. No functional change intended. Reviewed by: kib	2018-01-26 00:58:32 +00:00
Pedro F. Giffuni	a94a2945be	ext2fs\|ufs:Unsign some values related to allocation. When allocating memory through malloc(9), we always expect the amount of memory requested to be unsigned as a negative value would either stand for an error or an overflow. Unsign some values, found when considering the use of mallocarray(9), to avoid unnecessary casting. Also consider that indexes should be of at least the same size/type as the upper limit they pretend to index. MFC after: 2 weeks	2018-01-24 17:58:48 +00:00
Pedro F. Giffuni	f9834d101a	Revert r327781, r328093, r328056: ufs\|ext2fs: Revert uses of mallocarray(9). These aren't really useful: drop them. Variable unsigning will be brought again later.	2018-01-24 16:44:57 +00:00
Pedro F. Giffuni	90b618f35b	ufs: use mallocarray(9). Basic use of mallocarray to prevent overflows: static analyzers are also likely to perform additional checks. Since mallocarray expects unsigned parameters, unsign some related variables to minimize sign conversions. Reviewed by: mckusick	2018-01-17 18:18:33 +00:00
Konstantin Belousov	147b0c1a3e	Softlink inodes can own buffers with dependencies. At least, softlinks longer than 120 bytes have data fragments. Submitted by: mckusick MFC after: 5 days	2018-01-11 13:37:45 +00:00
Konstantin Belousov	c999b43527	Generalize the fix from r322757 and apply it to several more places. The code accesses bp->b_dep without owning the ufs mount softdep lock, which makes it possible for the derefenced workitem to be freed in parallel. In particular, the deallocate_dependencies(), softdep_disk_io_initiation() and softdep_disk_write_complete() are affected. Move the code to safely calculate ump from the buffer with dependencies into the helper softdep_bp_to_mp() and use it for all found cases. Tested by: pho (as part of the bigger patch) Reviewed by: mckusick (as part of the bigger patch) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-01-09 10:51:44 +00:00
Konstantin Belousov	e51e3c7e73	When handling write completion, take SU lock around calls to handle_written_XXX() in case of processing the buffer with an error. Tested by: pho (as part of the bigger patch) Reviewed by: mckusick (as part of the bigger patch) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-01-09 10:44:17 +00:00
Konstantin Belousov	377f88fb08	Postpone the disassotiation of the background write buffer with devvp so that buf_complete() sees fully constructed buffer. This is a NOP right now, but will be needed by the forthcoming SU change. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-01-09 10:33:11 +00:00
Pedro F. Giffuni	51268c3852	SPDX: Complete License ID tags for UFS.	2017-12-27 19:13:50 +00:00
Eitan Adler	caa7e52f3f	kernel: Fix several typos and minor errors - duplicate words - typos - references to old versions of FreeBSD Reviewed by: imp, benno	2017-12-27 03:23:21 +00:00
Alexander Kabaev	151ba7933a	Do pass removing some write-only variables from the kernel. This reduces noise when kernel is compiled by newer GCC versions, such as one used by external toolchain ports. Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial) Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c) Differential Revision: https://reviews.freebsd.org/D10385	2017-12-25 04:48:39 +00:00
Alexander Kabaev	5f943cca65	Remove dead initialization of the inode pointer. The pointer gets initialized again later in the code. This also improves code style(9).	2017-12-23 16:24:02 +00:00
Mark Johnston	b6fbf003e1	Provide a sysctl to force synchronous initialization of inode blocks. FFS performs asynchronous inode initialization, using a barrier write to ensure that the inode block is written before the corresponding cylinder group header update. Some GEOMs do not appear to handle BIO_ORDERED correctly, meaning that the barrier write may not work as intended. The sysctl allows one to work around this problem at the cost of expensive file creation on new filesystems. The default behaviour is unchanged. Reviewed by: kib, mckusick MFC after: 1 weeks Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13428	2017-12-09 15:44:30 +00:00
Pedro F. Giffuni	fe267a5590	sys: general adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. No functional change intended.	2017-11-27 15:23:17 +00:00
Pedro F. Giffuni	51369649b0	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
Konstantin Belousov	4e13cca54e	Improve the message printed when the cylinder group checksum is wrong. Mention the device path and mount point path, handle snapshots. Tested by: imp Sponsored by: The FreeBSD Foundation	2017-11-05 13:28:48 +00:00
Mark Johnston	4770655901	Remove a stale and incorrect comment. MFC after: 1 week Sponsored by: Dell EMC Isilon	2017-10-28 02:51:27 +00:00
Mark Johnston	9cf7abcc1d	Remove workqueue items after updating the workqueue tail pointer. When QUEUE_MACRO_DEBUG_TRASH is configured, the queue linkage fields are trashed upon removal of the item, so be sure to only read them before removing the item. No functional change intended. MFC after: 1 week Sponsored by: Dell EMC Isilon	2017-10-28 02:48:37 +00:00
Mark Johnston	4c52a9993b	Make drain_output() use bufobj_wwait(). No functional change intended. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D12790	2017-10-25 17:20:18 +00:00
John Baldwin	800c3e80de	Don't defer wakeup()s for completed journal workitems. Normally wakeups() are performed for completed softupdates work items in workitem_free() before the underlying memory is free()'d. complete_jseg() was clearing the "wakeup needed" flag in work items to defer the wakeup until the end of each loop iteration. However, this resulted in the item being free'd before it's address was used with wakeup(). As a result, another part of the kernel could allocate this memory from malloc() and use it as a wait channel for a different "event" with a different lock. This triggered an assertion failure when the lock passed to sleepq_add() did not match the existing lock associated with the sleep queue. Fix this by removing the code to defer the wakeup in complete_jseg() allowing the wakeup to occur slightly earlier in workitem_free() before free() is called. The main reason I can think of for deferring a wakeup() would be to avoid waking up a waiter while holding a lock that the waiter would need. However, no locks are dropped in between the wakeup() in workitem_free() and the end of the loop in complete_jseg() as far as I can tell. In general I think it is not safe to do a wakeup() after free() as one cannot control how other parts of the kernel that might reuse the address for a different wait channel will handle spurious wakeups. Reported by: pho Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D12494	2017-09-26 23:24:15 +00:00
Konstantin Belousov	55e5a5c1f4	Fix 32bit build. Reported by: emaste Sponsored by: The FreeBSD Foundation	2017-09-22 16:42:41 +00:00
Kirk McKusick	75e3597abb	Continuing efforts to provide hardening of FFS, this change adds a check hash to cylinder groups. If a check hash fails when a cylinder group is read, no further allocations are attempted in that cylinder group until it has been fixed by fsck. This avoids a class of filesystem panics related to corrupted cylinder group maps. The hash is done using crc32c. Check hases are added only to UFS2 and not to UFS1 as UFS1 is primarily used in embedded systems with small memories and low-powered processors which need as light-weight a filesystem as possible. Specifics of the changes: sys/sys/buf.h: Add BX_FSPRIV to reserve a set of eight b_xflags that may be used by individual filesystems for their own purpose. Their specific definitions are found in the header files for each filesystem that uses them. Also add fields to struct buf as noted below. sys/kern/vfs_bio.c: It is only necessary to compute a check hash for a cylinder group when it is actually read from disk. When calling bread, you do not know whether the buffer was found in the cache or read. So a new flag (GB_CKHASH) and a pointer to a function to perform the hash has been added to breadn_flags to say that the function should be called to calculate a hash if the data has been read. The check hash is placed in b_ckhash and the B_CKHASH flag is set to indicate that a read was done and a check hash calculated. Though a rather elaborate mechanism, it should also work for check hashing other metadata in the future. A kernel internal API change was to change breada into a static fucntion and add flags and a function pointer to a check-hash function. sys/ufs/ffs/fs.h: Add flags for types of check hashes; stored in a new word in the superblock. Define corresponding BX_ flags for the different types of check hashes. Add a check hash word in the cylinder group. sys/ufs/ffs/ffs_alloc.c: In ffs_getcg do the dance with breadn_flags to get a check hash and if one is provided, check it. sys/ufs/ffs/ffs_vfsops.c: Copy across the BX_FFSTYPES flags in background writes. Update the check hash when writing out buffers that need them. sys/ufs/ffs/ffs_snapshot.c: Recompute check hash when updating snapshot cylinder groups. sys/libkern/crc32.c: lib/libufs/Makefile: lib/libufs/libufs.h: lib/libufs/cgroup.c: Include libkern/crc32.c in libufs and use it to compute check hashes when updating cylinder groups. Four utilities are affected: sbin/newfs/mkfs.c: Add the check hashes when building the cylinder groups. sbin/fsck_ffs/fsck.h: sbin/fsck_ffs/fsutil.c: Verify and update check hashes when checking and writing cylinder groups. sbin/fsck_ffs/pass5.c: Offer to add check hashes to existing filesystems. Precompute check hashes when rebuilding cylinder group (although this will be done when it is written in fsutil.c it is necessary to do it early before comparing with the old cylinder group) sbin/dumpfs/dumpfs.c Print out the new check hash flag(s) sbin/fsdb/Makefile: Needs to add libufs now used by pass5.c imported from fsck_ffs. Reviewed by: kib Tested by: Peter Holm (pho)	2017-09-22 12:45:15 +00:00
John Baldwin	4f45713ae2	Add UFS_LINK_MAX for the UFS-specific limit on link counts. ino64 expanded nlink_t to 64 bits, but the on-disk format for UFS is still limited to 16 bits. This is a nop currently but will matter if LINK_MAX is increased in the future. Reviewed by: kib Sponsored by: Chelsio Communications	2017-09-18 23:30:39 +00:00
Kirk McKusick	855662c611	The new fsck recovery information to enable it to find backup superblocks created in revision 322297 only works on disks with sector sizes up to 4K. This update allows the recovery information to be created by newfs and used by fsck on disks with sector sizes up to 64K. Note that FFS currently limits filesystem to be mounted from disks with up to 8K sectors. Expanding this limitation will be the subject of another commit. Reported by: Peter Holm Reviewed with: kib	2017-09-04 20:19:36 +00:00
Konstantin Belousov	2f9d88c7ae	Protect v_rdev dereference with the vnode interlock instead of the vnode lock. Caller of softdep_count_dependencies() may own a buffer lock, which might conflict with the lock order. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 10 days	2017-08-25 09:51:22 +00:00
Konstantin Belousov	f0d5223230	Avoid dereferencing potentially freed workitem in softdep_count_dependencies(). Buffer's b_dep list is protected by the SU mount lock. Owning the buffer lock is not enough to guarantee the stability of the list. Calculation of the UFS mount owning the workitems from the buffer must be much more careful to not dereference the work item which might be freed meantime. To get to ump, use the pointers chain which does not involve workitems at all. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-08-21 16:23:44 +00:00
Konstantin Belousov	b5f2560d09	Style. Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 3 days	2017-08-21 16:16:02 +00:00
Kirk McKusick	77b63aa0fc	Since the switch to GPT disk labels, fsck for UFS/FFS has been unable to automatically find alternate superblocks. This checkin places the information needed to find alternate superblocks to the end of the area reserved for the boot block. Filesystems created with a newfs of this vintage or later will create the recovery information. If you have a filesystem created prior to this change and wish to have a recovery block created for your filesystem, you can do so by running fsck in forground mode (i.e., do not use the -p or -y options). As it starts, fsck will ask ``SAVE DATA TO FIND ALTERNATE SUPERBLOCKS'' to which you should answer yes. Discussed with: kib, imp MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D11589	2017-08-09 05:17:21 +00:00
Kirk McKusick	33bbdde01f	Avoid reading a snapshot block when it is already in the cache. Update the use of the B_CACHE flag (since the May 1999 commit that made it the correct test here). Reported by: Andreas Longwitz <longwitz@incore.de> Reviewed by: kib Tested by: Peter Holm MFC after: 1 week	2017-07-31 20:41:45 +00:00
Konstantin Belousov	5cf14660ae	Improve publication of the newly allocated snapdata. For freshly allocated snapdata, Lock sn_lock in advance, so si_snapdata readers see the locked snapdata and not race. For existing snapdata, if the thread was put to sleep waiting for sn_lock, re-read si_snapdata. This either closes the race or makes the reliance on LK_DRAIN less important. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-07-21 18:42:35 +00:00
Konstantin Belousov	c536471408	Unlock correct lock in ffs_snapblkfree(). It is possible for ffs_snapblkfree() to race and lock snaplock while the devvp snapdata is instantiated, but no snapshots exist. In this case the loop over snapshots in ffs_snapblkfree() is not executed, and the local variable vp is left initialized to NULL. Unlock using &sn->sn_lock and not vp->v_vnlock. For the inodes on the snapshot list, the locks are same. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-07-21 18:36:17 +00:00
Konstantin Belousov	f2e6bf5c05	Account for lock recursion when transfering snaplock to the vnode lock in ffs_snapremove(). Apparently ffs_snapremove() may be called with the snap lock recursed, at least one trace demonstrated this when snapshot vnode was unlinked while synced. It was inactivated from the syncer thread. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-07-21 18:28:27 +00:00
Konstantin Belousov	35bf780921	Remove write-only variable. Tested by: pho Sponsored by: The FreeBSD Foundation	2017-07-16 07:12:04 +00:00
Konstantin Belousov	51a6a15f8c	A followup to r320453, correct removal of the blocks from UFS snapshots. Tested by: pho PR: 220693 Sponsored by: The FreeBSD Foundation	2017-07-16 07:11:29 +00:00
Kirk McKusick	9c4f551e98	Create a new function ffs_getcg() to read in and verify a cylinder group. Change all code points that open-coded this functionality to use the new function. This commit is a refactoring with no change in functionality. In the future this change allows more robust checking of cylinder group reads along the lines discussed in the hardening UFS session at BSDCan (retry I/O, add checksums, etc). For more detail see the session notes at https://wiki.freebsd.org/DevSummit/201706/HardeningUFS Reviewed by: kib	2017-06-28 17:32:09 +00:00
Konstantin Belousov	698f05ab95	Mitigate several problems with the softdep_request_cleanup() on busy host. Problems start appearing when there are several threads all doing operations on a UFS volume and the SU workqueue needs a cleanup. It is possible that each thread calling softdep_request_cleanup() owns the lock for some dirty vnode (e.g. all of them are executing mkdir(2), mknod(2), creat(2) etc) and all vnodes which must be flushed are locked by corresponding thread. Then, we get all the threads simultaneously entering softdep_request_cleanup(). There are two problems: - Several threads execute MNT_VNODE_FOREACH_ALL() loops in parallel. Due to the locking, they quickly start executing 'in phase' with the speed of the slowest thread. - Since each thread already owns the lock for a dirty vnode, other threads non-blocking attempt to lock the vnode owned by other thread fail, and loops executing without making the progress. Retry logic does not allow the situation to recover. The result is a livelock. Fix these problems by making the following changes: - Allow only one thread to enter MNT_VNODE_FOREACH_ALL() loop per mp. A new flag FLUSH_RC_ACTIVE guards the loop. - If there were failed locking attempts during the loop, abort retry even if there are still work items on the mp work list. An assumption is that the items will be cleaned when other thread either fsyncs its vnode, or unlock and allow yet another thread to make the progress. It is possible now that some calls would get undeserved ENOSPC from ffs_alloc(), because the cleanup is not aggressive enough. But I do not see how can we reliably clean up workitems if calling softdep_request_cleanup() while still owning the vnode lock. I thought about scheme where ffs_alloc() returns ERESTART and saves the retry counter somewhere in struct thread, to return to the top level, unlock the vnode and retry. But IMO the very rare (and unproven) spurious ENOSPC is not worth the complications. Reported and tested by: pho Style and comments by: mckusick Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-06-03 16:18:50 +00:00
Konstantin Belousov	4cbc378c61	Clean possible td_su reference on the struct mount being unmounted as the last step of ffs_unmount(). It is possible that the mount point is recorded for cleanup in AST context while softdep flush is executed during unmount. The workitems are flushed by other means for the unmount, but the stray reference to struct mount blocks destruction of mount. Check for the situation and manually call vfs_rel() before returning from ffs_unmount(). Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-06-03 14:15:14 +00:00
Konstantin Belousov	215b29f62c	Remove spl() calls from UFS code. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-05-07 14:59:45 +00:00
Ed Maste	3cf259c390	UFS fs.h: clear warning from use in makefs(1) makefs(1) has a number of signedness warnings (when built with higher WARNS), most of which can be addressed by careful application of casts in makefs itself. There is one case where a signedness warning arises from the blksize macro, so must be addressed in the macro itself. Reviewed by: kib, mckusick MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D10589	2017-05-05 15:26:55 +00:00
Gleb Smirnoff	9ed01c32e0	All these files need sys/vmmeter.h, but now they got it implicitly included via sys/pcpu.h.	2017-04-17 17:07:00 +00:00

1 2 3 4 5 ...

1333 Commits