freebsd-dev

Author	SHA1	Message	Date
Jeff Roberson	67d0e29304	Replace OBJ_MIGHTBEDIRTY with a system using atomics. Remove the TMPFS_DIRTY flag and use the same system. This enables further fault locking improvements by allowing more faults to proceed with a shared lock. Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D22116	2019-10-29 21:06:34 +00:00
Kirk McKusick	1a75045196	After the unlink() of one name of a file with multiple links, a stat() of one of the remaining names of the file does not show an updated ctime (inode modification time) until several seconds after the unlink() completes. The problem only occurs when the filesystem is running with soft updates enabled. When running with soft updates, the ctime is not updated until the soft updates background process has settled all the needed I/O operations. This commit causes the ctime to be updated immediately during the unlink(). A side effect of this change is that the ctime is updated again when soft updates has finished its processing because that is the time that is correct from the perspective of programs that look at the disk (like dump). This change does not cause any extra I/O to be done, it just ensures that stat() updates the ctime before handing it back. PR: 241373 Reported by: Alan Somers Tested by: Alan Somers MFC after: 3 days Sponsored by: Netflix	2019-10-24 21:28:37 +00:00
Kirk McKusick	7792f70137	Soft updates needs to keep an on-disk linked list of inodes that have been unlinked, but are still referenced by open file descriptors. These inodes cannot be freed until the final file descriptor reference has been closed. If the system crashes while they are still being referenced, these inodes and their referenced blocks need to be freed by fsck. By having them on a linked list with the head pointer in the superblock, fsck can quickly find and process them rather than having to check every inode in the filesystem to see if it is unreferenced. When updating the head pointer of this list of unlinked inodes in the superblock, the superblock check-hash was not getting updated. If the system crashed with the incorrect superblock check-hash, the superblock would appear to be corrupted. This patch ensures that the superblock check-hash is updated when updating the head pointer of the unlinked inodes list. There is no need to MFC as superblock check hashes first appeared in 13.0. Tested by: Peter Holm Sponsored by: Netflix	2019-10-24 19:47:18 +00:00
Mark Johnston	c456a0a1a6	Abbreviate softdep lock names. The softdep lock names were unusually long and tended to stick out in lock profiling reports. Abbreviate them and make them consistent with our conventional style for lock names. Reviewed by: mckusick MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22042	2019-10-18 17:01:27 +00:00
Mateusz Guzik	e35cd9e38f	ufs: add root vnode caching See r353150. Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21646	2019-10-06 22:18:03 +00:00
Eric van Gyzen	fdd888dee3	Add CTLFLAG_STATS to several debug.softdep sysctl OIDs Refer to r353111. MFC after: 2 weeks Sponsored by: Dell EMC Isilon	2019-10-04 21:44:52 +00:00
Kirk McKusick	44d37182ce	Update ffs_getcg() function to accept a flags parameter to be passed to breadn_flags() in preparation for later need when doing forcible unmount when disk dies or is removed. No functional change. Sponsored by: Netflix	2019-10-04 05:28:36 +00:00
Mateusz Guzik	4cace859c2	vfs: convert struct mount counters to per-cpu There are 3 counters modified all the time in this structure - one for keeping the structure alive, one for preventing unmount and one for tracking active writers. Exact values of these counters are very rarely needed, which makes them a prime candidate for conversion to a per-cpu scheme, resulting in much better performance. Sample benchmark performing fstatfs (modifying 2 out of 3 counters) on a 104-way 2 socket Skylake system: before: 852393 ops/s after: 76682077 ops/s Reviewed by: kib, jeff Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21637	2019-09-16 21:37:47 +00:00
Mateusz Guzik	e87f3f72f1	vfs: manage mnt_writeopcount with atomics See r352424. Reviewed by: kib, jeff Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21575	2019-09-16 21:33:16 +00:00
Konstantin Belousov	d89ac450a7	Remove some unneeded vfs_busy() calls in SU code. When softdep_fsync() is running, a caller must already started write for the mount point. Since unmount or remount to ro suspends mount point, it cannot run in parallel with softdep_fsync(), which makes vfs_busy() call there not needed. Doing blocking vfs_busy() there effectively causes lock order reversal between vn_start_write() and setting MNTK_UNMOUNT, because vfs_busy(mp, 0) sleeps waiting for MNTK_UNMOUNT becoming clear, while unmount sets the flag and starts the suspension. Note that all other uses of vfs_busy() in SU code are non-blocking. Reported by: chs by mckusick Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-09-09 11:22:38 +00:00
Konstantin Belousov	f923be6b9a	Properly check for writers when fetching quotas for writeable vnodes in UFS quotaon(). Reviewed by: markj MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21560	2019-09-07 15:57:23 +00:00
Conrad Meyer	f3cf622523	ufs: Remove redundant brelse() after r294954 Same automation. No functional change.	2019-09-06 08:08:33 +00:00
Konstantin Belousov	6470c8d3db	Rework v_object lifecycle for vnodes. Current implementation of vnode_create_vobject() and vnode_destroy_vobject() is written so that it prepared to handle the vm object destruction for live vnode. Practically, no filesystems use this, except for some remnants that were present in UFS till today. One of the consequences of that model is that each filesystem must call vnode_destroy_vobject() in VOP_RECLAIM() or earlier, as result all of them get rid of the v_object in reclaim. Move the call to vnode_destroy_vobject() to vgonel() before VOP_RECLAIM(). This makes v_object stable: either the object is NULL, or it is valid vm object till the vnode reclamation. Remove code from vnode_create_vobject() to handle races with the parallel destruction. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21412	2019-08-29 07:50:25 +00:00
Konstantin Belousov	1604022248	UFS: stop reusing the vnode for reallocated inode. In ffs_valloc(), force reclaim existing vnode on inode reuse, instead of trying to re-initialize the same vnode for new purposes. This is done in preparation of changes to the vp->v_object lifecycle handling. A new FFSV_REPLACE flag to ffs_vgetf() directs the function to vgone(9) the vnode if found in vfs hash, instead of returning it. Reviewed by: markj, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21412	2019-08-29 07:45:23 +00:00
Konstantin Belousov	e671edac06	De-commision the MNTK_NOINSMNTQ kernel mount flag. After all the changes, its dynamic scope is same as for MNTK_UNMOUNT, but to allow the syncer vnode to be re-installed on unmount failure. But the case of syncer was already handled by using the VV_FORCEINSMQ flag for quite some time. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-08-23 19:40:10 +00:00
Kirk McKusick	5a0d467f5f	Clarify comment that describes how the FS_METACKHASH is managed. MFC after: 3 days	2019-08-13 20:56:44 +00:00
Kirk McKusick	9454b4fd78	A race condition existed between the time a UFS/FFS superblock check hash was computed and the time that the superblock was copied to a buffer to be written to disk. The result was a failed superblock check hash the next time that the superblock was read. The fix is to compute the check hash after the superblock has been copied to a buffer to be written. PR: 236504 Reported by: Peter Holm Tested by: Peter Holm Sponsored by: Netflix	2019-08-06 18:10:34 +00:00
Kirk McKusick	90381b1ca9	When updating the user or group disk quotas for the return of inodes or disk blocks, set the FORCE flag in the call to chkiq() or chkdq() since the user is always allowed to return resources and hence there is no need to check the user's credential . Reported by: Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE Reported as: FS-1-UFS-1: Denial Of Service in mount (prison_priv_check) Discussed with: kib MFC: 1 week Sponsored by: Netflix	2019-07-31 22:44:58 +00:00
Rick Macklem	b4c9955e41	Lock the vnode before calling ufs_bmap_seekdata(). r346932 replaced a call to vn_bmap_seekhole() with a call to ufs_bmap_seekdata(). Although vn_bmap_seekhole() locks the vnode, ufs_bmap_seekdata() assumes it is already locked. This patch adds locking of the vnode before the ufs_bmap_seekdata() call. If the vn_lock() call fails, it returns EBADF since that is the normal error returned when a file system is forced dismounted and is already listed as an error return in the lseek(2) man page. Discussed with: markj Reviewed by: kib	2019-07-27 01:52:34 +00:00
Kirk McKusick	fdf34aa3a5	The error reported in FS-14-UFS-3 can only happen on UFS/FFS filesystems that have block pointers that are out-of-range for their filesystem. These out-of-range block pointers are corrected by fsck(8) so are only encountered when an unchecked filesystem is mounted. A new "untrusted" flag has been added to the generic mount interface that can be set when mounting media of unknown provenance or integrity. For example, a daemon that automounts a filesystem on a flash drive when it is plugged into a system. This commit adds a test to UFS/FFS that validates all block numbers before using them. Because checking for out-of-range blocks adds unnecessary overhead to normal operation, the tests are only done when the filesystem is mounted as an "untrusted" filesystem. Reported by: Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE Reported as: FS-14-UFS-3: Out of bounds read in write-2 (ffs_alloccg) Reviewed by: kib Sponsored by: Netflix	2019-07-17 22:07:43 +00:00
Kirk McKusick	ba554157a3	Style. No change intended.	2019-07-16 23:39:39 +00:00
Kirk McKusick	1fd136ec5e	When a process attempts to allocate space on a full filesystem, a filesystem full message is sent to the offending process or the kernel log if the offending process cannot be identified. To prevent an explotion of messages, the kernel ppsratecheck() function is used to limit the messages to one per second. This revision changes the variable that tracks the rate of these messages from a systemwide limit to a per-filesystem limit by moving it from a global variable to a variable in the ufsmount structure. Suggested by: kib Reviewed by: kib Sponsored by: Netflix	2019-07-16 23:12:27 +00:00
Kirk McKusick	daba4da81d	Add a new "untrusted" option to the mount command. Its purpose is to notify the kernel that the file system is untrusted and it should use more extensive checks on the file-system's metadata before using it. This option is intended to be used when mounting file systems from untrusted media such as USB memory sticks or other externally-provided media. It will initially be used by the UFS/FFS file system, but should likely be expanded to be used by other file systems that may appear on external media like msdosfs, exfat, and ext2fs. Reviewed by: kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20786	2019-07-01 23:22:26 +00:00
Mark Johnston	6137883ff3	Remove references to splbio in ffs_softdep.c. Assert that the per-mountpoint softdep mutex is held in modified functions that do not already have this assertion. No functional change intended. Reviewed by: kib, mckusick (previous version) MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20741	2019-06-26 16:28:42 +00:00
Alan Somers	d49b446bfb	Add FIOBMAP2 ioctl This ioctl exposes VOP_BMAP information to userland. It can be used by programs like fragmentation analyzers and optimized cp implementations. But I'm using it to test fusefs's VOP_BMAP implementation. The "2" in the name distinguishes it from the similar but incompatible FIBMAP ioctls in NetBSD and Linux. FIOBMAP2 differs from FIBMAP in that it uses a 64-bit block number instead of 32-bit, and it also returns runp and runb. Reviewed by: mckusick MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20705	2019-06-20 14:13:10 +00:00
Xin LI	f89d207279	Separate kernel crc32() implementation to its own header (gsb_crc32.h) and rename the source to gsb_crc32.c. This is a prerequisite of unifying kernel zlib instances. PR: 229763 Submitted by: Yoshihiro Ota <ota at j.email.ne.jp> Differential Revision: https://reviews.freebsd.org/D20193	2019-06-17 19:49:08 +00:00
Kirk McKusick	e94828443c	Add a missing bresle() in seldom-used error return.	2019-05-28 17:31:35 +00:00
Kirk McKusick	af6aeacb3e	Convert use of UFS-specific #ifdef DEBUG to DIAGNOSTIC or INVARIANTS as appropriate. No functional change intended. Suggested-by: markj	2019-05-28 16:32:04 +00:00
Kirk McKusick	298184acb8	Add function name and line number debugging information to softupdates worklist structures to help track their movement between work lists. No functional change to the operation of soft updates intended.	2019-05-27 06:22:43 +00:00
Alan Somers	65417f5e27	Remove "struct ucred" argument from vtruncbuf vtruncbuf takes a "struct ucred" argument. AFAICT, it's been unused ever since that function was first added in r34611. Remove it. Also, remove some "struct ucred" arguments from fuse and nfs functions that were only used by vtruncbuf. Reviewed by: cem MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20377	2019-05-24 20:27:50 +00:00
Conrad Meyer	daec92844e	Include ktr.h in more compilation units Similar to r348026, exhaustive search for uses of CTRn() and cross reference ktr.h includes. Where it was obvious that an OS compat header of some kind included ktr.h indirectly, .c files were left alone. Some of these files clearly got ktr.h via header pollution in some scenarios, or tinderbox would not be passing prior to this revision, but go ahead and explicitly include it in files using it anyway. Like r348026, these CUs did not show up in tinderbox as missing the include. Reported by: peterj (arm64/mp_machdep.c) X-MFC-With: r347984 Sponsored by: Dell EMC Isilon	2019-05-21 20:38:48 +00:00
Mark Johnston	9e56947ffc	Ensure that error is initialized in ufs_bmap_seekdata(). Reported and tested by: jhibbits MFC with: r346932 Sponsored by: The FreeBSD Foundation	2019-05-05 16:57:03 +00:00
Konstantin Belousov	78022527bb	Switch to use shared vnode locks for text files during image activation. kern_execve() locks text vnode exclusive to be able to set and clear VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0 condition. The change removes VV_TEXT, replacing it with the condition v_writecount <= -1, and puts v_writecount under the vnode interlock. Each text reference decrements v_writecount. To clear the text reference when the segment is unmapped, it is recorded in the vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and v_writecount is incremented on the map entry removal The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that v_writecount does not contradict the desired change. vn_writecheck() is now racy and its use was eliminated everywhere except access. Atomic check for writeability and increment of v_writecount is performed by the VOP. vn_truncate() now increments v_writecount around VOP_SETATTR() call, lack of which is arguably a bug on its own. nullfs bypasses v_writecount to the lower vnode always, so nullfs vnode has its own v_writecount correct, and lower vnode gets all references, since object->handle is always lower vnode. On the text vnode' vm object dealloc, the v_writecount value is reset to zero, and deadfs vop_unset_text short-circuit the operation. Reclamation of lowervp always reclaims all nullfs vnodes referencing lowervp first, so no stray references are left. Reviewed by: markj, trasz Tested by: mjg, pho Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D19923	2019-05-05 11:20:43 +00:00
Kirk McKusick	44b193b09e	Zero out the file directory entry metadata to reduce disk scavenging disclosure. Submitted by: David G. Lawrence <dg@dglawrence.com> MFC after: 1 week	2019-05-04 18:00:57 +00:00
Kirk McKusick	0061238fb0	This update eliminates a kernel stack disclosure bug in UFS/FFS directory entries that is caused by uninitialized directory entry padding written to the disk. It can be viewed by any user with read access to that directory. Up to 3 bytes of kernel stack are disclosed per file entry, depending on the the amount of padding the kernel needs to pad out the entry to a 32 bit boundry. The offset in the kernel stack that is disclosed is a function of the filename size. Furthermore, if the user can create files in a directory, this 3 byte window can be expanded 3 bytes at a time to a 254 byte window with 75% of the data in that window exposed. The additional exposure is done by removing the entry, creating a new entry with a 4-byte longer name, extracting 3 more bytes by reading the directory, and repeating until a 252 byte name is created. This exploit works in part because the area of the kernel stack that is being disclosed is in an area that typically doesn't change that often (perhaps a few times a second on a lightly loaded system), and these file creates and unlinks themselves don't overwrite the area of kernel stack being disclosed. It appears that this bug originated with the creation of the Fast File System in 4.1b-BSD (Circa 1982, more than 36 years ago!), and is likely present in every Unix or Unix-like system that uses UFS/FFS. Amazingly, nobody noticed until now. This update also adds the -z flag to fsck_ffs to have it scrub the leaked information in the name padding of existing directories. It only needs to be run once on each UFS/FFS filesystem after a patched kernel is installed and running. Submitted by: David G. Lawrence <dg@dglawrence.com> Reviewed by: kib MFC after: 1 week	2019-05-03 21:54:14 +00:00
Kirk McKusick	ab2214d400	Simplify calculation of DIRECTSIZ. No functional change intended. Suggested by: kib MFC after: 1 week	2019-05-03 21:46:25 +00:00
Mark Johnston	cc2c33dfb1	Optimize lseek(SEEK_DATA) on UFS. This version fixes the problems identified in r345244. Reviewed by: kib MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19598	2019-04-29 22:05:26 +00:00
Konstantin Belousov	5ffc99e2e4	Handle races when remounting UFS volume from ro to rw. In particular, ensure that writers are not unleashed before SU structures are initialized. Also, correctly handle MNT_ASYNC before this. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-04-08 15:20:05 +00:00
Mariusz Zaborski	a1304030b8	Introduce funlinkat syscall that always us to check if we are removing the file associated with the given file descriptor. Reviewed by: kib, asomers Reviewed by: cem, jilles, brooks (they reviewed previous version) Discussed with: pjd, and many others Differential Revision: https://reviews.freebsd.org/D14567	2019-04-06 09:34:26 +00:00
Kirk McKusick	69166928c7	This is an additional and hopefully final fix for bug report 230962. This bug was introduced with the change to use softdep_bp_to_mp() in January 2018 changes -r327723 and -r327821. The softdep_bp_to_mp() function failed to include VSOCK as one of the valid cases. Although local-domain sockets do not allocate blocks in the filesystem, they will allocate blocks if they use extended attributes (such as ACLs). Thus, softdep_bp_to_mp() needs to return a non-NULL mount pointer when presented with a socket vnode so that the soft updates write complete will properly process the soft updates structures associated with the extended attribute blocks. It was the failure to process these soft updates structures, thus leaving them hanging off the buffer, which lead to the "panic: softdep_deallocate_dependencies: dangling deps" when trying to clean up the buffer after it was written. PR: 230962 Reported by: 2t8mr7kx9f@protonmail.com Reviewed by: kib Tested by: Peter Holm MFC after: 1 week Sponsored by: Netflix	2019-03-20 23:11:05 +00:00
Mark Johnston	783efeb544	Revert r345244 for now. The code which advances the block number is simplistic and is not correct when the starting offset is non-zero. Revert the change until this is fixed.	2019-03-18 05:03:55 +00:00
Mark Johnston	1a7f456a4b	Fix the gcc build (-Wstrict-prototypes) after r345244. Reported by: jenkins MFC with: r345244	2019-03-17 18:06:13 +00:00
Mark Johnston	c676692c61	Optimize lseek(SEEK_DATA) on UFS. The old implementation, at the VFS layer, would map the entire range of logical blocks between the starting offset and the first data block following that offset. With large sparse files this is very inefficient. The VFS currently doesn't provide an interface to improve upon the current implementation in a generic way. Add ufs_bmap_seekdata(), which uses the obvious algorithm of scanning indirect blocks to look for data blocks. Use it instead of vn_bmap_seekhole() to implement SEEK_DATA. Reviewed by: kib, mckusick MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19598	2019-03-17 17:34:06 +00:00
Kirk McKusick	42a5a356a8	Add KASSERT to the softdep_disk_write_complete() function in the soft dependency code to ensure that it will be able to avoid a dangling dependency. Sponsored by: Netflix	2019-03-12 00:10:31 +00:00
Kirk McKusick	3532718257	Give more complete information in INVARIANTS panic messages at end of the ffs_truncate() function. Sponsored by: Netflix	2019-03-11 23:53:56 +00:00
Kirk McKusick	a9f59cc029	Augment the UFS filesystem specific print function (called by the kernel vn_printf() routine when printing out vnodes associated with a UFS filesystem) to also include the inode's link count, effective link count, generation number, owner, group, flags, size, and for UFS2 filesystems, the extent size. Sponsored by: Netflix	2019-03-11 22:05:34 +00:00
Simon J. Gerraty	f5fdf82d82	Add _PC_ACL_* to vop_stdpathconf This avoid EINVAL from tmpfs etc. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D19512	2019-03-11 20:40:56 +00:00
Jason A. Harmening	4775b07ebd	FFS: allow sendfile(2) to work with block sizes greater than the page size Implement ffs_getpages_async(), which when possible calls the asynchronous flavor of the generic pager's getpages function. When the underlying block size is larger than the system page size, however, it will invoke the (synchronous) buffer cache pager, followed by a call to the client completion routine. This retains true asynchronous completion in the most common (block size <= page size) case, which is important for the performance of the new sendfile(2). The behavior in the larger block size case mirrors the default implementation of VOP_GETPAGES_ASYNC, which most other filesystems use anyway as they do not override the getpages_async method. PR: 235708 Reported by: pho Reviewed by: kib, glebius MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D19340	2019-02-26 04:56:10 +00:00
Kirk McKusick	ac4b20a0a7	After a crash, a file that extends into indirect blocks may end up shorter than its size resulting in a hole as its final block (which is a violation of the invarients of the UFS filesystem). Soft updates will always ensure that the file size is correct when writing inodes to disk for files that contain only direct block pointers. However soft updates does not roll back sizes for files with indirect blocks that it has set to unallocated because their contents have not yet been written to disk. Hence, the file can appear to have a hole at its end because the block pointer has been rolled back to zero when its inode was written to disk. Thus, fsck_ffs calculates the last allocated block in the file. For files that extend into indirect blocks, fsck_ffs checks for a size past the last allocated block of the file and if that is found, shortens the file to reference the last allocated block thus avoiding having it reference a hole at its end. Submitted by: Chuck Silvers <chs@netflix.com> Tested by: Chuck Silvers <chs@netflix.com> MFC after: 1 week Sponsored by: Netflix	2019-02-25 21:58:19 +00:00
Kirk McKusick	baba6af702	This bug was introduced with the change to use softdep_bp_to_mp() in January 2018 changes -r327723 and -r327821. The softdep_bp_to_mp() function failed to include VFIFO as one of the valid cases. Although fifo's do not allocate blocks in the filesystem, they will allocate blocks if they use extended attributes (such as ACLs). Thus, softdep_bp_to_mp() needs to return a non-NULL mount pointer when presented with a fifo vnode so that the soft updates write complete will properly process the soft updates structures associated with the extended attribute blocks. It was the failure to process these soft updates structures, thus leaving them hanging off the buffer, which lead to the "panic: softdep_deallocate_dependencies: dangling deps" when trying to clean up the buffer after it was written. PR: 230962 Reported by: 2t8mr7kx9f@protonmail.com Reviewed by: kib Tested by: Peter Holm MFC after: 1 week Sponsored by: Netflix	2019-01-28 21:36:45 +00:00

1 2 3 4 5 ...

2187 Commits