freebsd-skq

Author	SHA1	Message	Date
Ed Maste	15c377c3cc	ANSIfy ffs_subr.c Also renumber license clause to avoid skipping #3	2016-10-31 20:43:43 +00:00
Kirk McKusick	ad544726aa	Avoid possible overflow when calclating malloc size for auxillary data structure sizes when mounting and reloading UFS/FFS filesystems by using a u_long rather than an int for the size. Reported by: Mariusz Zaborski <oshogbo@> MFC after: 1 week	2016-10-28 20:15:19 +00:00
Konstantin Belousov	c39baa7480	Generalize UFS buffer pager to allow it serving other filesystems which also use buffer cache. Most important addition to the code is the handling of filesystems where the block size is less than the machine page size, which might require reading several buffers to validate single page. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-10-28 11:43:59 +00:00
Kirk McKusick	9e7e59125c	The UFS/FFS filesystem checks directory link counts when doing directory create and delete operations. If it ever finds a directory with a link count less than 2, it panics. Thus, an rm -rf that encounters a directory with a link count below 2 causes a kernel panic. The proposed fix is to return the error EINVAL rather than panicing. The effect is that the requested operation is not done, but the system continues to run. At a more convenient later time, the filesystem can be unmounted and cleaned (with fsck or journal run). Once cleaned, the operation can be rerun to successful completion. This fix takes that approach. The panic message has been converted into a uprintf(9) to provide the user with the inode number and filesystem mount point of the offending directory and EINVAL is returned for the operation. The long (three year) delay in fixing this problem occurred because the bug was misclassified when originally assigned and only this week was found during a sweep of old unresolved bug reports. PR: 180894 Reviewed by: kib MFC after: 2 weeks	2016-10-26 20:28:23 +00:00
Marcel Moolenaar	d67ec19c3e	Include <sys/types.h> explicitly instead of depending on that header being included by <sys/param.h>. When compiled as part of makefs(8) and on macOS or Linux, <sys/param.h> is not our own.	2016-10-24 18:12:57 +00:00
Konstantin Belousov	895219834e	Add FFS pager, which uses buffer cache read operation to validate pages. See the comments for more detailed description of the algorithm. The pager is used unconditionally when the block size of the underlying device is larger than the machine page size, since local vnode pager cannot handle the configuration [1]. Otherwise, the vfs.ffs.use_buf_pager sysctl allows to switch to the local pager. Measurements demonstrated no regression in the ever-important buildworld benchmark, and small (~5%) throughput improvements in the special microbenchmark configuration for dbench over swap-backed md(4). Code can be generalized and reused for other filesystems which use buffer cache. Reported by: Anton Yuzhaninov <citrin@citrin.ru> [1] Tested by: pho Benchmarked by: mjg, pho Reviewed by: alc, markj, mckusick (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D8198	2016-10-19 11:09:29 +00:00
Mateusz Guzik	8660b707ff	vfs: remove the __bo_vnode field from struct vnode The pointer can be obtained using __containerof instead. Reviewed by: kib	2016-09-30 17:11:03 +00:00
Konstantin Belousov	bf9c87c813	Be more strict when selecting between snapshot/regular mount. Reclaimed vnode type is VBAD, so succesful comparision like devvp->v_type != VREG does not imply that the devvp references snapshot, it might be due to a reclaimed vnode. Explicitely check the vnode type. In the the most important case of ffs_blkfree(), the devfs vnode is locked and its type is stable. In other cases, if the vnode is reclaimed right after the check, hopefully the buffer methods return right error codes. Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-09-19 15:58:33 +00:00
Konstantin Belousov	6bd8ddcf0c	Fix libprocstat build after r305902. - Use _Bool to not require userspace to include stdbool.h. - Make extattr.h usable without vnode_if.h. - Follow i_ump to get cdev pointer. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-09-17 18:14:31 +00:00
Konstantin Belousov	e1db68971e	Reduce size of ufs inode. Remove redunand i_dev and i_fs pointers, which are available as ip->i_ump->um_dev and ip->i_ump->um_fs, and reorder members by size to reduce padding. To compensate added derefences, the most often i_ump access to differentiate between UFS1 and UFS2 dinode layout is removed, by addition of the new i_flag IN_UFS2. Overall, this actually reduces the amount of memory dereferences. On 64bit machine, original struct inode size is 176, reduced to 152 bytes with the change. Tested by: pho (previous version) Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-09-17 16:47:34 +00:00
Bruce Evans	0c01bcb9ff	Sprinkle DOINGASYNC() checks so as to do delayed writes for async mounts in almost all cases instead of in most cases. Don't override DOINGASYNC() by any condition except IO_SYNC. Fix previous sprinking of DOINGASYNC() checks. Don't override IO_SYNC by DOINGASYNC(). In ffs_write() and ffs_extwrite(), there were intentional overrides that just broke O_SYNC of data. In ffs_truncate(), there are 5 calls to ffs_update(), 4 with apparently-unintentional overrides and 1 without; this had no effect due to the main async mount hack descibed below. Fix 1 place in ffs_truncate() where the caller's IO_ASYNC was overridden for the soft updates case too (to do a delayed write instead of a sync write). This is supposed to be the only change that affects anything except async mounts. In ffs_update(), remove the 19 year old efficiency hack of ignoring the waitfor flag for async mounts, so that fsync() almost works for async mounts. All callers are supposed to be fixed to not ask for a sync update unless they are for fsync() or [I]O_SYNC operations. fsync() now almost works for async mounts. It used to sync the data but not the most important metdata (the inode). It still doesn't sync associated directories. This gave 10-20% fewer writes for my makeworld benchmark with async mounted tmp and obj directories from an already small number. Style fixes: - in ffs_balloc.c, remove rotted quadruplicated comments about the simplest part of the DOING() decisions and rearrange the nearly- quadruplicated code to be more nearly so. - in ufs_vnops.c, use a consistent style with less negative logic and no manual "optimization" of \|\| to \| in DOING() expressions. Reviewed by: kib (previous version)	2016-09-08 17:40:40 +00:00
Konstantin Belousov	63876b3ba2	On rename, do not perform truncation of dirhash if the vnode truncation failed. Doing so resulted in inconsistent state of the ufs dirhash with regard to the actual directory inode state, and could lead to spurious ENOENT errors for lookups of existing files in production kernels, or assertion failures in the debugging kernels. Change the logic of calling ufsdirhash_dirtrunc() to be same as in ufs_direnter(). Execute UFS_TRUNCATE() first, log error, and only do dirtrunc() if UFS_TRUNCATE() succeeded. Note that the problem was exacerbated by the bug in the flush_newblk_dep() function (see r305599), which caused in the spurios errors from ffs_sync() and then ffs_truncate(). In collaboration with: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-09-08 12:09:34 +00:00
Konstantin Belousov	7b05b8a29c	Do not leak transient ENOLCK error from flush_newblk_dep() loop. The buffer lock is retried on failed LK_SLEEPFAIL attempt, and error from the failed attempt is irrelevant. But since there is path after retry which does not clear error, it is possible to return spurious error from the function. The issue resulted in a spurious failure of softdep_sync_buf(), causing further spurious failure of ffs_sync(). In collaboration with: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-09-08 12:08:54 +00:00
Konstantin Belousov	76db05eb14	When logging unlikely UFS_TRUNCATE() failure in ufs_direnter(), include error code. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-09-08 12:08:08 +00:00
Konstantin Belousov	ea16af59a1	When externding directory inode in ufs_direnter(), adjust i_endoff. This change is formally not needed, since i_endoff not used in all code paths after the call to ufs_direnter(), and i_endoff is recalculated by the next lookup. But having the value correct makes the reasoning about code simpler. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-09-08 12:07:25 +00:00
Konstantin Belousov	e599d951e3	In dqsync(), when called from quotactl(), um_quotas entry might appear cleared since nothing prevents completion of the parallel quotaoff. There is nothing to sync in this case, and no reason to panic. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-09-08 12:06:43 +00:00
Konstantin Belousov	60f1c000f3	In softdep_prealloc(), return early not only for snapshots, but for the quota files as well. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-09-08 12:05:13 +00:00
Konstantin Belousov	ccb19123e5	There is no need to upgrade the last dvp lock on lookups for modifying operations. Instead of upgrading, assert that the lock is exclusive. Explain the cause in comments. This effectively reverts r209367. Tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-09-08 12:04:45 +00:00
Konstantin Belousov	df4265774e	Partially lift suspension when ffs_reload() finished with cgs and going to re-read inodes. Secondary write initiators, e.g. ufs_inactive(), might need to start a write while owning the vnode lock. Since the suspended state established by /dev/ufssuspend prevents them from entering vn_start_secondary_write(), we get deadlock otherwise. Note that it is arguably not very useful to re-read inodes after /dev/ufssuspend suspension, because the suspension does not block readers, and other threads might read existing files in parallel with suspension owner (for now, only growfs(8)) operations. This effectively means that suspension owner cannot safely modify existing inodes, and then there is no sense in re-reading. But keep the code enabled for now. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-09-08 12:01:28 +00:00
Warner Losh	949b56ba66	Renumber the advertising clause.	2016-09-06 15:17:35 +00:00
Kirk McKusick	988fd417a0	Bug 211013 reports that a write error to a UFS filesystem running with softupdates panics the kernel. The problem that has been pointed out is that when there is a transient write error on certain metadata blocks, specifically directory blocks (PAGEDEP), inode blocks (INODEDEP), indirect pointer blocks (INDIRDEPS), and cylinder group (BMSAFEMAP, but only when journaling is enabled), we get a panic in one of the routines called by softdep_disk_io_initiation that the I/O is "already started" when we retry the write. These dependency types potentially need to do roll-backs when called by softdep_disk_io_initiation before doing a write and then a roll-forward when called by softdep_disk_write_complete after the I/O completes. The panic happens when there is a transient error. At the top of softdep_disk_write_complete we check to see if the write had an error and if an error occurred we just return. This return is correct most of the time because the main role of the routines called by softdep_disk_write_complete is to process the now-completed dependencies so that the next I/O steps can happen. But for the four types listed above, they do not get to do their rollback operations. This causes the panic when softdep_disk_io_initiation gets called on the second attempt to do the write and the roll-back routines find that the roll-backs have already been done. As an aside I note that there is also the problem that the buffer will have been unlocked and thus made visible to the filesystem and to user applications with the roll-backs in place. The way to resolve the problem is to add a flag to the routines called by softdep_disk_write_complete for the four dependency types noted that indicates whether the write was successful (WRITESUCCEEDED). If the write does not succeed, they do just the roll-backs and then return. If the write was successful they also do their usual processing of the now-completed dependencies. The fix was tested by selectively injecting write errors for buffers holding dependencies of each of the four types noted above and then verifying that the kernel no longer paniced and that following the successful retry of the write that the filesystem could be unmounted and successfully checked cleanly. PR: 211013 Reviewed by: kib	2016-08-16 21:02:30 +00:00
Konstantin Belousov	948137db10	In UFS_BALLOC(), invalidate pages of indirect buffers on failed block allocation unwinding. Dandling buffers are released on UFS_BALLOC() failure to ensure that later attempt to allocate blocks in close range do not find the blocks with invalid content, since possible partial block allocations are unwound. As such, it is not enough to just release the buffers, the pages must also invalidated and removed from the vnode vm_object queue. Otherwise the pages might be found later and used to reconstruct indirect buffers when doing allocations at offset close to the failure point, and their stale content compromise the filesystem integrity. Note that just marking the buffer as B_INVAL is not enough, B_NOCACHE is required. To be sure, clear the B_CACHE flag as well. This complements the r174973, which started releasing buffers. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-16 17:30:58 +00:00
Konstantin Belousov	2da4bcb82b	On unwind after failed block allocation in ffs_balloc_ufs{1,2}, assert that recorded allocated blocks numbers match the physical block numbers of dandling buffers which are released. When finally freeing the blocks during unwind, assert that dandling buffers where not re-allocated. They shouldn't, because the vnode lock is owned exclusive. Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-16 17:18:38 +00:00
Konstantin Belousov	39f7cbe9ab	When looking up dandling buffers for unwing after failing block allocation in UFS_BALLOC(), there is no need to map them. Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-16 17:05:15 +00:00
Konstantin Belousov	554248587e	When block allocation fails in UFS_BALLOC(), and the volume does not have SU enabled, there is no point in calling softdep_request_cleanup(). The call cannot produce free blocks, but we unecessarily lock ufsmount and do inode block write. Usual point of not doing optimizations for the corner case of the full volume is not applicable there, the work is easily avoidable, and the addition CPU and disk io load do not lead to succeeding retry. Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-16 16:50:48 +00:00
Konstantin Belousov	f1503c5a49	In ffs_balloc_ufs{1,2} routines, assert that unwind records do not overflow local arrays. This is not immediately obvious from the static code inspection, due to retry logic. Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-16 16:49:56 +00:00
Konstantin Belousov	2f514f92cf	Implement VOP_FDATASYNC() for UFS. Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D7471	2016-08-15 19:22:23 +00:00
Edward Tomasz Napierala	411455a8fb	Replace all remaining calls to vprint(9) with vn_printf(9), and remove the old macro. MFC after: 1 month	2016-08-10 16:12:31 +00:00
Konstantin Belousov	e73b16ae7a	Ensure that the UFS directory vnode' vm_object is properly sized before UFS_BALLOC() is called. I do not believe that this caused any real issue on FreeBSD because the exclusive vnode lock is held over the balloc/resize, the change is to make formally correct KPI use. Based on: the Matthew Dillon' patch from DragonFly BSD PR: 93942 Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-07-20 14:40:56 +00:00
Kevin Lo	57d2ac2f90	arc4random() returns 0 to (2**32)−1, use an alternative to initialize i_gen if it's zero rather than a divide by 2. With inputs from delphij, mckusick, rmacklem Reviewed by: mckusick	2016-05-22 14:31:20 +00:00
Konstantin Belousov	5f36cc5bfa	Stop dropping and reacquiring Giant around geom calls in UFS. Sponsored by: The FreeBSD Foundation	2016-05-21 10:13:25 +00:00
Konstantin Belousov	c70b3cd28d	Improve handling of rdev->si_mountpt on mount and unmount of FFS volumes. Treat the field as a semaphore protecting availability of the device for mounting. Do no access devvp->v_rdev without the vnode lock owned. Protect change of the devvp->v_bufobj bo_ops vector with the vnode lock. Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-05-21 09:49:35 +00:00
Konstantin Belousov	3f7ca894de	Ensure that ftruncate(2) is performed synchronously when file is opened in O_SYNC mode, at least for UFS. This also handles truncation, done due to the O_SYNC \| O_TRUNC flags combination to open(2), in synchronous way. Noted by: bde Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-05-18 12:03:57 +00:00
Konstantin Belousov	bfe89a8ee1	Do enable io accounting for read-only mounts and mounts which are remounted to writeable after initial read-only. Assign to dev->si_mountpt earlier to account the accesses done at the mount time. Based on submission by: bde MFC after: 1 week	2016-05-17 21:35:35 +00:00
Konstantin Belousov	f30ddc49e6	If IO_SYNC was passed to ffs_truncate(), request synchronous inode update from the final ffs_update(). Noted by: bde MFC after: 1 week	2016-05-17 21:30:58 +00:00
Konstantin Belousov	3d0691acdc	For async UFS mounts, shrink the directory asynchronously, at least do not pass IO_SYNC to ffs_truncate() unneccessary. Submitted by: bde MFC after: 1 week	2016-05-17 21:28:28 +00:00
Konstantin Belousov	98cbffd7bd	Fix comments. Submitted by: bde MFC after: 1 week	2016-05-17 08:24:27 +00:00
Konstantin Belousov	5b6526413f	Fix typo in the message. Submitted by: bde MFC after: 1 week	2016-05-17 08:19:20 +00:00
Pedro F. Giffuni	08e941833d	UFS: spelling fixes on comments. No functional change.	2016-04-29 20:43:51 +00:00
Enji Cooper	028d33c96a	Add FEATURE knob for testing for UFS extended attribute kernel support Support can be verified via `feature_present("ufs_extattr")`, etc. Differential Revision: https://reviews.freebsd.org/D6053 MFC after: 2 weeks Relnotes: yes Reviewed by: asomers, kib Sponsored by: EMC / Isilon Storage Division	2016-04-22 08:09:27 +00:00
Pedro F. Giffuni	d9c9c81c08	sys: use our roundup2/rounddown2() macros when param.h is available. rounddown2 tends to produce longer lines than the original code and when the code has a high indentation level it was not really advantageous to do the replacement. This tries to strike a balance between readability using the macros and flexibility of having the expressions, so not everything is converted.	2016-04-21 19:57:40 +00:00
Pedro F. Giffuni	abafa4db03	ufs: replace 0 with NULL for pointers. While here also do late initialization of the variables we are changing. Found with devel/coccinelle. Reviewed by: mckusick MFC after: 2 weeks	2016-04-10 21:48:11 +00:00
Edward Tomasz Napierala	ae34b6ff96	Add four new RCTL resources - readbps, readiops, writebps and writeiops, for limiting disk (actually filesystem) IO. Note that in some cases these limits are not quite precise. It's ok, as long as it's within some reasonable bounds. Testing - and review of the code, in particular the VFS and VM parts - is very welcome. MFC after: 1 month Relnotes: yes Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5080	2016-04-07 04:23:25 +00:00
Edward Tomasz Napierala	35030a5dd4	Remove some NULL checks for M_WAITOK allocations. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-03-29 13:56:59 +00:00
Konstantin Belousov	c79dff0fbe	Split the global taskqueue used to process all UFS trim completions, into per-mount taskqueue with the private taskqueue processing thread. This allows to drain the taskqueue on unmount, to ensure that all TRIMs are finished before mount structures are freed. But just draining the taskqueue where TRIM biodone geom-up completions are processed is not enough, since ffs_blkfree(), called by the task, might result in more writes. Count inflight delayed blkfree's and pause() unmount until the counter drains as well. Reported by: Nick Evans <nevans@talkpoint.com> Tested by: Nick Evans <nevans@talkpoint.com>, pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-03-27 08:21:17 +00:00
Konstantin Belousov	d016708959	Style: wrap long lines. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-03-27 08:07:12 +00:00
Konstantin Belousov	6336aefc1d	Fix locking mistake in softdep_waitidle(). The surrounding code expects that the loop is always exited with the SU lock owned, even on error. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-03-23 09:58:51 +00:00
Kirk McKusick	24b6748c42	The UFS filesystem requires that the last block of a file always be allocated. When shortening the length of a file in which the new end of the file contains a hole, the hole must have a block allocated. Reported by: Maxim Sobolev Reviewed by: kib Tested by: Peter Holm	2016-02-24 01:58:40 +00:00
Edward Tomasz Napierala	50102a6342	Remove ffs_mountroot() prototype; seems to be long gone. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-01-28 12:21:23 +00:00
Kirk McKusick	abe53f7e91	This fixes a bug in UFS2 exported NFS volumes. An NFS client can crash a server that has exported UFS2 by presenting a filehandle with an inode number that references an uninitialized inode in a cylinder group. The problem is that UFS2 only initializes blocks of inodes as they are first allocated and ffs_fhtovp() does not validate that the inode is in a range of inodes that have been initialized. Attempting to read an uninitialized inode gets random data from the disk. When the kernel tries to interpret it as an inode, panics often arise. Reported by: Christos Zoulas (from NetBSD) Reviewed by: kib	2016-01-27 21:27:05 +00:00

1 2 3 4 5 ...

2020 Commits