freebsd-dev

Author	SHA1	Message	Date
Kevin Lo	57d2ac2f90	arc4random() returns 0 to (2**32)−1, use an alternative to initialize i_gen if it's zero rather than a divide by 2. With inputs from delphij, mckusick, rmacklem Reviewed by: mckusick	2016-05-22 14:31:20 +00:00
Konstantin Belousov	5f36cc5bfa	Stop dropping and reacquiring Giant around geom calls in UFS. Sponsored by: The FreeBSD Foundation	2016-05-21 10:13:25 +00:00
Konstantin Belousov	c70b3cd28d	Improve handling of rdev->si_mountpt on mount and unmount of FFS volumes. Treat the field as a semaphore protecting availability of the device for mounting. Do no access devvp->v_rdev without the vnode lock owned. Protect change of the devvp->v_bufobj bo_ops vector with the vnode lock. Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-05-21 09:49:35 +00:00
Konstantin Belousov	3f7ca894de	Ensure that ftruncate(2) is performed synchronously when file is opened in O_SYNC mode, at least for UFS. This also handles truncation, done due to the O_SYNC \| O_TRUNC flags combination to open(2), in synchronous way. Noted by: bde Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-05-18 12:03:57 +00:00
Konstantin Belousov	bfe89a8ee1	Do enable io accounting for read-only mounts and mounts which are remounted to writeable after initial read-only. Assign to dev->si_mountpt earlier to account the accesses done at the mount time. Based on submission by: bde MFC after: 1 week	2016-05-17 21:35:35 +00:00
Konstantin Belousov	f30ddc49e6	If IO_SYNC was passed to ffs_truncate(), request synchronous inode update from the final ffs_update(). Noted by: bde MFC after: 1 week	2016-05-17 21:30:58 +00:00
Konstantin Belousov	3d0691acdc	For async UFS mounts, shrink the directory asynchronously, at least do not pass IO_SYNC to ffs_truncate() unneccessary. Submitted by: bde MFC after: 1 week	2016-05-17 21:28:28 +00:00
Konstantin Belousov	98cbffd7bd	Fix comments. Submitted by: bde MFC after: 1 week	2016-05-17 08:24:27 +00:00
Konstantin Belousov	5b6526413f	Fix typo in the message. Submitted by: bde MFC after: 1 week	2016-05-17 08:19:20 +00:00
Pedro F. Giffuni	08e941833d	UFS: spelling fixes on comments. No functional change.	2016-04-29 20:43:51 +00:00
Enji Cooper	028d33c96a	Add FEATURE knob for testing for UFS extended attribute kernel support Support can be verified via `feature_present("ufs_extattr")`, etc. Differential Revision: https://reviews.freebsd.org/D6053 MFC after: 2 weeks Relnotes: yes Reviewed by: asomers, kib Sponsored by: EMC / Isilon Storage Division	2016-04-22 08:09:27 +00:00
Pedro F. Giffuni	d9c9c81c08	sys: use our roundup2/rounddown2() macros when param.h is available. rounddown2 tends to produce longer lines than the original code and when the code has a high indentation level it was not really advantageous to do the replacement. This tries to strike a balance between readability using the macros and flexibility of having the expressions, so not everything is converted.	2016-04-21 19:57:40 +00:00
Pedro F. Giffuni	abafa4db03	ufs: replace 0 with NULL for pointers. While here also do late initialization of the variables we are changing. Found with devel/coccinelle. Reviewed by: mckusick MFC after: 2 weeks	2016-04-10 21:48:11 +00:00
Edward Tomasz Napierala	ae34b6ff96	Add four new RCTL resources - readbps, readiops, writebps and writeiops, for limiting disk (actually filesystem) IO. Note that in some cases these limits are not quite precise. It's ok, as long as it's within some reasonable bounds. Testing - and review of the code, in particular the VFS and VM parts - is very welcome. MFC after: 1 month Relnotes: yes Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5080	2016-04-07 04:23:25 +00:00
Edward Tomasz Napierala	35030a5dd4	Remove some NULL checks for M_WAITOK allocations. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-03-29 13:56:59 +00:00
Konstantin Belousov	c79dff0fbe	Split the global taskqueue used to process all UFS trim completions, into per-mount taskqueue with the private taskqueue processing thread. This allows to drain the taskqueue on unmount, to ensure that all TRIMs are finished before mount structures are freed. But just draining the taskqueue where TRIM biodone geom-up completions are processed is not enough, since ffs_blkfree(), called by the task, might result in more writes. Count inflight delayed blkfree's and pause() unmount until the counter drains as well. Reported by: Nick Evans <nevans@talkpoint.com> Tested by: Nick Evans <nevans@talkpoint.com>, pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-03-27 08:21:17 +00:00
Konstantin Belousov	d016708959	Style: wrap long lines. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-03-27 08:07:12 +00:00
Konstantin Belousov	6336aefc1d	Fix locking mistake in softdep_waitidle(). The surrounding code expects that the loop is always exited with the SU lock owned, even on error. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-03-23 09:58:51 +00:00
Kirk McKusick	24b6748c42	The UFS filesystem requires that the last block of a file always be allocated. When shortening the length of a file in which the new end of the file contains a hole, the hole must have a block allocated. Reported by: Maxim Sobolev Reviewed by: kib Tested by: Peter Holm	2016-02-24 01:58:40 +00:00
Edward Tomasz Napierala	50102a6342	Remove ffs_mountroot() prototype; seems to be long gone. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-01-28 12:21:23 +00:00
Kirk McKusick	abe53f7e91	This fixes a bug in UFS2 exported NFS volumes. An NFS client can crash a server that has exported UFS2 by presenting a filehandle with an inode number that references an uninitialized inode in a cylinder group. The problem is that UFS2 only initializes blocks of inodes as they are first allocated and ffs_fhtovp() does not validate that the inode is in a range of inodes that have been initialized. Attempting to read an uninitialized inode gets random data from the disk. When the kernel tries to interpret it as an inode, panics often arise. Reported by: Christos Zoulas (from NetBSD) Reviewed by: kib	2016-01-27 21:27:05 +00:00
Kirk McKusick	d2a28cb080	The bread() function was inconsistent about whether it would return a buffer pointer in the event of an error (for some errors it would return a buffer pointer and for other errors it would not return a buffer pointer). The cluster_read() function was similarly inconsistent. Clients of these functions were inconsistent in handling errors. Some would assume that no buffer was returned after an error and would thus lose buffers under certain error conditions. Others would assume that brelse() should always be called after an error and would thus panic the system under certain error conditions. To correct both of these problems with minimal code churn, bread() and cluster_write() now always free the buffer when returning an error thus ensuring that buffers will never be lost. The brelse() routine checks for being passed a NULL buffer pointer and silently returns to avoid panics. Thus both approaches to handling error returns from bread() and cluster_read() will work correctly. Future code should be written assuming that bread() and cluster_read() will never return a buffer with an error, so should not attempt to brelse() the buffer when an error is returned. Reviewed by: kib	2016-01-27 21:23:01 +00:00
Konstantin Belousov	44de1ba141	Recheck curthread->td_su after the VFS_SYNC() call, and re-sync if the ast was rescheduled during VFS_SYNC(). It is possible that enough parallel writes or slow/hung volume result in VFS_SYNC() deferring to the ast flushing of workqueue. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-12-21 11:50:32 +00:00
Konstantin Belousov	dc15d94fd9	Update ctime when atime or birthtime are updated. Cleanup setting of ctime/mtime/birthtime: do not set IN_ACCESS or IN_UPDATE, then clear them with ufs_itimes(), making transient (possibly inconsistent) change to the times, and then copy user-supplied times into the inode. Instead, directly clear IN_ACCESS or IN_UPDATE when user supplied the time, and copy the value into the inode. Minor inconsistency left is that the inode ctime is updated even when birthtime update attempt is performed on a UFS1 volume. Submitted by: bde MFC after: 2 weeks	2015-12-07 12:09:04 +00:00
Kirk McKusick	43a993bb7d	For performance reasons, it is useful to have a single string used as the name of a filesystem when setting it as the first parameter to the getnewvnode() function. Most filesystems call getnewvnode from just one place so can use a literal string as the first parameter. However, NFS calls getnewvnode from two places, so we create a global constant string that can be used by the two instances. This change also collapses two instances of getnewvnode() in the UFS filesystem to a single call. Reviewed by: kib Tested by: Peter Holm	2015-11-29 21:01:02 +00:00
Konstantin Belousov	fe920528c1	Do not perform read-ahead for BA_CLRBUF request when we are low on memory or when dirty buffer queue is too large. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-10-27 13:44:13 +00:00
Warner Losh	476dfdc3bf	Do not relocate extents to make them contiguous if the underlying drive can do deletions. Ability to do deletions is a strong indication that this optimization will not help performance. It will only generate extra write traffic. These devices are typically flash based and have a limited number of write cycles. In addition, making the file contiguous in LBA space doesn't improve the access times from flash devices because they have no seek time. Reviewed by: mckusick@	2015-10-16 03:06:02 +00:00
Gleb Smirnoff	20e8b32db4	In softdep_setup_freeblocks(): - Move the bread() to the beginning of function. - Return if it fails, otherwise we will panic. Submitted by: mckusick Sponsored by: Netflix	2015-10-07 12:36:28 +00:00
Konstantin Belousov	7a82f35c9d	Do not consume extra reference. This is a bug in r287479. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-09-05 12:28:18 +00:00
Konstantin Belousov	c65f759837	Declare the writes around the call to VFS_SYNC() in softdep_ast_cleanup_proc(). Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-09-05 08:48:24 +00:00
Konstantin Belousov	76cae067b8	By doing file extension fast, it is possible to create excess supply of the D_NEWBLK kinds of dependencies (i.e. D_ALLOCDIRECT and D_ALLOCINDIR), which can exhaust kmem. Handle excess of D_NEWBLK in the same way as excess of D_INODEDEP and D_DIRREM, by scheduling ast to flush dependencies, after the thread, which created new dep, left the VFS/FFS innards. For D_NEWBLK, the only way to get rid of them is to do full sync, since items are attached to data blocks of arbitrary vnodes. The check for D_NEWBLK excess in softdep_ast_cleanup_proc() is unlocked. For 32bit arches, reduce the total amount of allowed dependencies by two. It could be considered increasing the limit for 64 bit platforms with direct maps. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-09-01 13:07:27 +00:00
Jeff Roberson	98082691bb	- Make 'struct buf *buf' private to vfs_bio.c. Having a global variable 'buf' is inconvenient and has lead me to some irritating to discover bugs over the years. It also makes it more challenging to refactor the buf allocation system. - Move swbuf and declare it as an extern in vfs_bio.c. This is still not perfect but better than it was before. - Eliminate the unused ffs function that relied on knowledge of the buf array. - Move the shutdown code that iterates over the buf array into vfs_bio.c. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2015-07-29 02:26:57 +00:00
Jeff Roberson	fade8dd714	Refactor unmapped buffer address handling. - Use pointer assignment rather than a combination of pointers and flags to switch buffers between unmapped and mapped. This eliminates multiple flags and generally simplifies the logic. - Eliminate b_saveaddr since it is only used with pager bufs which have their b_data re-initialized on each allocation. - Gather up some convenience routines in the buffer cache for manipulating buf space and buf malloc space. - Add an inline, buf_mapped(), to standardize checks around unmapped buffers. In collaboration with: mlaier Reviewed by: kib Tested by: pho (many small revisions ago) Sponsored by: EMC / Isilon Storage Division	2015-07-23 19:13:41 +00:00
Mateusz Guzik	f0725a8e1e	Move chdir/chroot-related fdp manipulation to kern_descrip.c Prefix exported functions with pwd_. Deduplicate some code by adding a helper for setting fd_cdir. Reviewed by: kib	2015-07-11 16:19:11 +00:00
Mark Johnston	5f34e93c58	Check suspendability on the mountpoint returned by VOP_GETWRITEMOUNT. This obviates the need for a MNTK_SUSPENDABLE flag, since passthrough filesystems like nullfs and unionfs no longer need to inherit this information from their lower layer(s). This change also restores the pre-r273336 behaviour of using the presence of a susp_clean VFS method to request suspension support. Reviewed by: kib, mjg Differential Revision: https://reviews.freebsd.org/D2937	2015-07-05 22:37:33 +00:00
Mark Murray	d1b06863fb	Huge cleanup of random(4) code. * GENERAL - Update copyright. - Make kernel options for RANDOM_YARROW and RANDOM_DUMMY. Set neither to ON, which means we want Fortuna - If there is no 'device random' in the kernel, there will be NO random(4) device in the kernel, and the KERN_ARND sysctl will return nothing. With RANDOM_DUMMY there will be a random(4) that always blocks. - Repair kern.arandom (KERN_ARND sysctl). The old version went through arc4random(9) and was a bit weird. - Adjust arc4random stirring a bit - the existing code looks a little suspect. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Redo read_random(9) so as to duplicate random(4)'s read internals. This makes it a first-class citizen rather than a hack. - Move stuff out of locked regions when it does not need to be there. - Trim RANDOM_DEBUG printfs. Some are excess to requirement, some behind boot verbose. - Use SYSINIT to sequence the startup. - Fix init/deinit sysctl stuff. - Make relevant sysctls also tunables. - Add different harvesting "styles" to allow for different requirements (direct, queue, fast). - Add harvesting of FFS atime events. This needs to be checked for weighing down the FS code. - Add harvesting of slab allocator events. This needs to be checked for weighing down the allocator code. - Fix the random(9) manpage. - Loadable modules are not present for now. These will be re-engineered when the dust settles. - Use macros for locks. - Fix comments. * src/share/man/... - Update the man pages. * src/etc/... - The startup/shutdown work is done in D2924. * src/UPDATING - Add UPDATING announcement. * src/sys/dev/random/build.sh - Add copyright. - Add libz for unit tests. * src/sys/dev/random/dummy.c - Remove; no longer needed. Functionality incorporated into randomdev.. live_entropy_sources.c live_entropy_sources.h - Remove; content moved. - move content to randomdev.[ch] and optimise. * src/sys/dev/random/random_adaptors.c src/sys/dev/random/random_adaptors.h - Remove; plugability is no longer used. Compile-time algorithm selection is the way to go. * src/sys/dev/random/random_harvestq.c src/sys/dev/random/random_harvestq.h - Add early (re)boot-time randomness caching. * src/sys/dev/random/randomdev_soft.c src/sys/dev/random/randomdev_soft.h - Remove; no longer needed. * src/sys/dev/random/uint128.h - Provide a fake uint128_t; if a real one ever arrived, we can use that instead. All that is needed here is N=0, N++, N==0, and some localised trickery is used to manufacture a 128-bit 0ULLL. * src/sys/dev/random/unit_test.c src/sys/dev/random/unit_test.h - Improve unit tests; previously the testing human needed clairvoyance; now the test will do a basic check of compressibility. Clairvoyant talent is still a good idea. - This is still a long way off a proper unit test. * src/sys/dev/random/fortuna.c src/sys/dev/random/fortuna.h - Improve messy union to just uint128_t. - Remove unneeded 'static struct fortuna_start_cache'. - Tighten up up arithmetic. - Provide a method to allow eternal junk to be introduced; harden it against blatant by compress/hashing. - Assert that locks are held correctly. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Turn into self-sufficient module (no longer requires randomdev_soft.[ch]) * src/sys/dev/random/yarrow.c src/sys/dev/random/yarrow.h - Improve messy union to just uint128_t. - Remove unneeded 'staic struct start_cache'. - Tighten up up arithmetic. - Provide a method to allow eternal junk to be introduced; harden it against blatant by compress/hashing. - Assert that locks are held correctly. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Turn into self-sufficient module (no longer requires randomdev_soft.[ch]) - Fix some magic numbers elsewhere used as FAST and SLOW. Differential Revision: https://reviews.freebsd.org/D2025 Reviewed by: vsevolod,delphij,rwatson,trasz,jmg Approved by: so (delphij)	2015-06-30 17:00:45 +00:00
Konstantin Belousov	521987c3e5	Simplify code, no need to test the flag before clearing it. Submitted by: ed MFC after: 12 days	2015-06-29 13:06:24 +00:00
Konstantin Belousov	b2c3df842b	Handle errors from background write of the cylinder group blocks. First, on the write error, bufdone() call from ffs_backgroundwrite() panics because pbrelvp() cleared bp->b_bufobj, while brelse() would try to re-dirty the copy of the cg buffer. Handle this by setting B_INVAL for the case of BIO_ERROR. Second, we must re-dirty the real buffer containing the cylinder group block data when background write failed. Real cg buffer was already marked clean in ffs_bufwrite(). After the BV_BKGRDINPROG flag is cleared on the real cg buffer in ffs_backgroundwrite(), buffer scan may reuse the buffer at any moment. The result is lost write, and if the write error was only transient, we get corrupted bitmaps. We cannot re-dirty the original cg buffer in the ffs_backgroundwritedone(), since the context is not sleepable, preventing us from sleeping for origbp' lock. Add BV_BKGDERR flag (protected by the buffer object lock), which is converted into delayed write by brelse(), bqrelse() and buffer scan. In collaboration with: Conrad Meyer <cse.cem@gmail.com> Reviewed by: mckusick Sponsored by: The FreeBSD Foundation (kib), EMC/Isilon storage division (Conrad) MFC after: 2 weeks	2015-06-27 09:44:14 +00:00
Konstantin Belousov	1eabd96728	vfs_msync(), called from syncer vnode fsync VOP, only iterates over the active vnode list for the given mount point, with the assumption that vnodes with dirty pages are active. This is enforced by vinactive() doing vm_object_page_clean() pass over the vnode pages. The issue is, if vinactive() cannot be called during vput() due to the vnode being only shared-locked, we might end up with the dirty pages for the vnode on the free list. Such vnode is invisible to syncer, and pages are only cleaned on the vnode reactivation. In other words, the race results in the broken guarantee that user data, written through the mmap(2), is written to the disk not later than in 30 seconds after the write. Fix this by keeping the vnode which is freed but still owing inactivation, on the active list. When syncer loops find such vnode, it is deactivated and cleaned by the final vput() call. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-06-17 04:46:58 +00:00
Mateusz Guzik	4da8456f0a	Replace struct filedesc argument in getvnode with struct thread This is is a step towards removal of spurious arguments.	2015-06-16 13:09:18 +00:00
Konstantin Belousov	46c3d3ac99	Syncing a directory vnode might drop the vnode lock in the softdep_sync() similarly to the regular vnode sync. Allow retry for both vnode types. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-06-03 20:48:00 +00:00
Konstantin Belousov	aef373ce38	Remove unused variable. When deallocate_dependencies() is performed, softdep_journal_freeblocks() already called cancel_allocdirect() which should have eliminated direct dependencies for all truncated full blocks. The indirect dependencies are allowed above, since second- and third-level dependencies are only dealt with by the code which frees indirect block, which happens after the inode write. Discussed with: mckusick, jeff Reviewed by: jeff Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-05-31 15:50:54 +00:00
Konstantin Belousov	69baeadc31	Remove several write-only variables, all reported by the gcc 4.9 buildkernel run. Some of them were write-only under some kernel options, e.g. variables keeping values only used by CTR() macros. It costs nothing to the code readability and correctness to eliminate the warnings in those cases too by removing the local cached values used only for single-access. Review: https://reviews.freebsd.org/D2665 Reviewed by: rodrigc Looked at by: bjk Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-05-29 13:24:17 +00:00
Konstantin Belousov	0bc4fe10ad	After r283600, NODELAY flag to inodedep_lookup() function is unused. Eliminate it, and simplify code by removing the local dflags variable always initialized to DEPALLOC. Noted by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-05-27 09:49:04 +00:00
Konstantin Belousov	1bc93bb7b9	Currently, softupdate code detects overstepping on the workitems limits in the code which is deep in the call stack, and owns several critical system resources, like vnode locks. Attempt to wait while the per-mount softupdate thread cleans up the backlog may deadlock, because the thread might need to lock the same vnode which is owned by the waiting thread. Instead of synchronously waiting for the worker, perform the worker' tickle and pause until the backlog is cleaned, at the safe point during return from kernel to usermode. A new ast request to call softdep_ast_cleanup() is created, the SU code now only checks the size of queue and schedules ast. There is no ast delivery for the kernel threads, so they are exempted from the mechanism, except NFS daemon threads. NFS server loop explicitely checks for the request, and informs the schedule_cleanup() that it is capable of handling the requests by the process P2_AST_SU flag. This is needed because nfsd may be the sole cause of the SU workqueue overflow. But, to not cause nsfd to spawn additional threads just because we slow down existing workers, only tickle su threads, without waiting for the backlog cleanup. Reviewed by: jhb, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-05-27 09:20:42 +00:00
Kirk McKusick	74a87c3804	Limit the number of cylinder groups that will be searched when trying to build a cluster. The limit is tunable using the sysctl vfs.ffs.maxclustersearch. The current limit is 10 cylinder groups per block allocation. It was previously limited to the number of cylinder groups in the filesystem per block allocation. When there were no clusters of the needed size left, it repeatedly searched the whole filesystem for a non-existent cluster on every block allocation. The result was very slow filesystem allocation with 100% CPU utilization. The old behavior can be had by setting vfs.ffs.maxclustersearch to a huge number (1,000,000). This change affects only the layout policy routines so is not able to interfere with the integrity of the filesystem. Reported by: Dmitry Sivachenko (demon@) Tested by: Dmitry Sivachenko (demon@) MFC after: 2 weeks	2015-04-24 23:27:50 +00:00
Rick Macklem	dda11d4ab9	File systems that do not use the buffer cache (such as ZFS) must use VOP_FSYNC() to perform the NFS server's Commit operation. This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which is set by file systems that use the buffer cache. If this flag is not set, the NFS server always does a VOP_FSYNC(). This should be ok for old file system modules that do not set MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although it might not be optimal for file systems that use the buffer cache. Reviewed by: kib MFC after: 2 weeks	2015-04-15 20:16:31 +00:00
Konstantin Belousov	9928931186	Fix build (with gcc). Reported by: bz, ian Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-03-27 15:49:21 +00:00
Konstantin Belousov	4af9f77ecb	Fix the hand after the immediate reboot when the following command sequence is performed on UFS SU+J rootfs: cp -Rp /sbin/init /sbin/init.old mv -f /sbin/init.old /sbin/init Hang occurs on the rootfs unmount. There are two issues: 1. Removed init binary, which is still mapped, creates a reference to the removed vnode. The inodeblock for such vnode must have active inodedep, which is (eventually) linked through the unlinked list. This means that ffs_sync(MNT_SUSPEND) cannot succeed, because number of softdep workitems for the mp is always > 0. FFS is suspended during unmount, so unmount just hangs. 2. As noted above, the inodedep is linked eventually. It is not linked until the superblock is written. But at the vfs_unmountall() time, when the rootfs is unmounted, the call is made to ffs_unmount()->ffs_sync() before vflush(), and ffs_sync() only calls ffs_sbupdate() after all workitems are flushed. It is masked for normal system operations, because syncer works in parallel and eventually flushes superblock. Syncer is stopped when rootfs unmounted, so ffs_sync() must do sb update on its own. Correct the issues listed above. For MNT_SUSPEND, count the number of linked unlinked inodedeps (this is not a typo) and substract the count of such workitems from the total. For the second issue, the ffs_sbupdate() is called right after device sync in ffs_sync() loop. There is third problem, occuring with both SU and SU+J. The softdep_waitidle() loop, which waits for softdep_flush() thread to clear the worklist, only waits 20ms max. It seems that the 1 tick, specified for msleep(9), was a typo. Add fsync(devvp, MNT_WAIT) call to softdep_waitidle(), which seems to significantly help the softdep thread, and change the MNT_LAZY update at the reboot time to MNT_WAIT for similar reasons. Note that userspace cannot create more work while devvp is flushed, since the mount point is always suspended before the call to softdep_waitidle() in unmount or remount path. PR: 195458 In collaboration with: gjb, pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-03-27 13:55:56 +00:00
Konstantin Belousov	f6f76e1747	Partially revert r277922, avoid sleeping and do flush if we a awaken, instead of waiting for the FLUSH_* flags. Also, when requesting flush, do the wakeups unconditionally even when FLUSH_CLEANUP flag was already set. Reported and tested by: dim, "Lundberg, Johannes" <johannes@brilliantservice.co.jp> Bisected by: dim MFC after: 2 weeks	2015-02-05 13:00:27 +00:00

1 2 3 4 5 ...

1991 Commits