freebsd-skq

Author	SHA1	Message	Date
Chuck Silvers	8b88330ed6	ufs: restore uniqueness of st_dev as returned by ufs_stat() switch ufs_stat() to use the same value for st_dev as was used by the previous ufs_getattr() stat path. Submitted by: gallatin Reviewed by: mjg, imp, kib, mckusick Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26596	2020-10-05 18:17:50 +00:00
Konstantin Belousov	3c484f325e	Convert page cache read to VOP. There are several negative side-effects of not calling into VOP layer at all for page cache reads. The biggest is the missed activation of EVFILT_READ knotes. Also, it allows filesystem to make more fine grained decision to refuse read from page cache. Keep VIRF_PGREAD flag around, it is still useful for nullfs, and for asserts. Reviewed by: markj Tested by: pho Discussed with: mjg Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26346	2020-09-15 22:06:36 +00:00
Konstantin Belousov	96474d2a3f	Do not copy vp into f_data for DTYPE_VNODE files. The pointer to vnode is already stored into f_vnode, so f_data can be reused. Fix all found users of f_data for DTYPE_VNODE. Provide finit_vnode() helper to initialize file of DTYPE_VNODE type. Reviewed by: markj (previous version) Discussed with: freqlabs (openzfs chunk) Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26346	2020-09-15 21:55:21 +00:00
Mateusz Guzik	d90f2c3617	ufs: clean up empty lines in .c and .h files	2020-09-01 21:23:00 +00:00
Mateusz Guzik	39f8815070	cache: add cache_rename, a dedicated helper to use for renames While here make both tmpfs and ufs use it. No fuctional changes.	2020-08-20 10:05:46 +00:00
Mateusz Guzik	8f226f4c23	vfs: remove the always-curthread td argument from VOP_RECLAIM	2020-08-19 07:28:01 +00:00
Mateusz Guzik	7ad2a82da2	vfs: drop the error parameter from vn_isdisk, introduce vn_isdisk_error Most consumers pass NULL.	2020-08-19 02:51:17 +00:00
Konstantin Belousov	779ad2acf1	VMIO reads: enable for UFS Move v_object creation earlier, so that VIRF_PGREAD is never set if v_object is NULL. There is no much harm from instantiating v_object when later check for append-only flags disallows open. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25968	2020-08-16 21:07:19 +00:00
Mateusz Guzik	a92a971bbb	vfs: remove the thread argument from vget It was already asserted to be curthread. Semantic patch: @@ expression arg1, arg2, arg3; @@ - vget(arg1, arg2, arg3) + vget(arg1, arg2)	2020-08-16 17:18:54 +00:00
Mateusz Guzik	03337743db	vfs: clean MNTK_FPLOOKUP if MNT_UNION is set Elides checking it during lookup.	2020-08-10 11:51:21 +00:00
Mateusz Guzik	76dc5d3224	ufs: add VOP_STAT handler	2020-08-07 23:08:17 +00:00
Mateusz Guzik	d292b1940c	vfs: remove the obsolete privused argument from vaccess This brings argument count down to 6, which is passable without the stack on amd64.	2020-08-05 09:27:03 +00:00
Mateusz Guzik	e5e10c82ec	ufs: only pass LK_ADAPTIVE if LK_NODDLKTREAT is set This restores the pre-adaptive spinning state for SU which livelocks otherwise. Note this is a bug in SU. Reported by: pho	2020-08-04 23:09:15 +00:00
Mateusz Guzik	9d5a594f0b	ufs: add support for lockless lookup ACLs are not supported, meaning their presence will force the use of the old lookup. Reviewed by: kib Tested by: pho (in a patchset) Differential Revision: https://reviews.freebsd.org/D25579	2020-07-25 10:38:05 +00:00
Mateusz Guzik	31ad4050fe	lockmgr: add adaptive spinning It is very conservative. Only spinning when LK_ADAPTIVE is passed, only on exclusive lock and never when any waiters are present. buffer cache is remains not spinning. This reduces total sleep times during buildworld etc., but it does not shorten total real time (culprits are contention in the vm subsystem along with slock + upgrade which is not covered). For microbenchmarks: open3_processes -t 52 (open/close of the same file for writing) ops/s: before: 258845 after: 801638 Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D25753	2020-07-22 12:30:31 +00:00
Kirk McKusick	93440bbefd	The binary representation of the superblock (the fs structure) is written out verbatim to the disk: see ffs_sbput() in sys/ufs/ffs/ffs_subr.c. It contains a pointer to the fs_summary_info structure. This pointer value inadvertently causes garbage to be stored. It is garbage because the pointer to the fs_summary_info structure is the address the then current stack or heap. Although a mere pointer does not reveal anything useful (like a part of a private key) to an attacker, garbage output deteriorates reproducibility. This commit zeros out the pointer to the fs_summary_info structure before writing the out the superblock. Reviewed by: kib Tested by: Peter Holm PR: 246983 Sponsored by: Netflix	2020-06-19 01:04:25 +00:00
Kirk McKusick	34816cb9ae	Move the pointers stored in the superblock into a separate fs_summary_info structure. This change was originally done by the CheriBSD project as they need larger pointers that do not fit in the existing superblock. This cleanup of the superblock eases the task of the commit that immediately follows this one. Suggested by: brooks Reviewed by: kib PR: 246983 Sponsored by: Netflix	2020-06-19 01:02:53 +00:00
Chuck Silvers	d9a8abf6c2	Move all of the functions in ffs_subr.c that are only used by the ufs kernel module from that file into ffs_vfsops.c. This fixes the build for kernel configs that don't include FFS. PR: 247256 Submitted by: glebius Reviewed by: mckusick (earlier version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D25285	2020-06-17 23:39:52 +00:00
Rick Macklem	1f7104d720	Fix export_args ex_flags field so that is 64bits, the same as mnt_flags. Since mnt_flags was upgraded to 64bits there has been a quirk in "struct export_args", since it hold a copy of mnt_flags in ex_flags, which is an "int" (32bits). This happens to currently work, since all the flag bits used in ex_flags are defined in the low order 32bits. However, new export flags cannot be defined. Also, ex_anon is a "struct xucred", which limits it to 16 additional groups. This patch revises "struct export_args" to make ex_flags 64bits and replaces ex_anon with ex_uid, ex_ngroups and ex_groups (which points to a groups list, so it can be malloc'd up to NGROUPS in size. This requires that the VFS_CHECKEXP() arguments change, so I also modified the last "secflavors" argument to be an array pointer, so that the secflavors could be copied in VFS_CHECKEXP() while the export entry is locked. (Without this patch VFS_CHECKEXP() returns a pointer to the secflavors array and then it is used after being unlocked, which is potentially a problem if the exports entry is changed. In practice this does not occur when mountd is run with "-S", but I think it is worth fixing.) This patch also deleted the vfs_oexport_conv() function, since do_mount_update() does the conversion, as required by the old vfs_cmount() calls. Reviewed by: kib, freqlabs Relnotes: yes Differential Revision: https://reviews.freebsd.org/D25088	2020-06-14 00:10:18 +00:00
Kirk McKusick	513274c79c	Clear the IN_SIZEMOD and IN_IBLKDATA flags only when doing a synchronous inode update. The IN_SIZEMOD and IN_IBLKDATA flags indicate changes to the file size and block pointer fields in the inode. When these fields have been changed, the fsync() and fsyncdata() system calls must write the inode to ensure their semantics that the file is on stable store. The IN_SIZEMOD and IN_IBLKDATA flags cannot be cleared until a synchronous write of the inode is done. If they are cleared on an asynchronous write, then the inode may not yet have been written to the disk when an fsync() or fsyncdata() call is done. Absent these flags, these calls would not know that they needed to write the inode. Thus, these flags only can be cleared on synchronous writes of the inode. Since the inode will be locked for the duration of the I/O that writes it to disk, no fsync() or fsyncdata() will be able to run before the on-disk inode is complete. Reviewed by: kib MFC with: -r361785 Differential revision: https://reviews.freebsd.org/D25072	2020-06-06 20:17:56 +00:00
Kirk McKusick	52488b5148	Further evaluation of the POSIX spec for fdatasync() shows that it requires that new data on growing files be accessible. Thus, the the fsyncdata() system call must update the on-disk inode when the size of the file has changed. This commit adds another inode update flag, IN_SIZEMOD, that gets set any time that the file size changes. If either the IN_IBLKDATA or the IN_SIZEMOD flag is set when fdatasync() is called, the associated inode is synchronously written to disk. We could have overloaded the IN_IBLKDATA flag to also track size changes since the only (current) use case for these flags are for fsyncdata(), but it does seem useful for possible future uses to separately track the file size changes and the inode block pointer changes. Reviewed by: kib MFC with: -r361785 Differential revision: https://reviews.freebsd.org/D25072	2020-06-05 01:00:55 +00:00
Stefan Eßer	23e84cf153	Fix obvious typo: IN_BLKDATA should be IN_IBLKDATA	2020-06-04 19:54:25 +00:00
Kirk McKusick	30296c428a	Two additional places that need to identify IN_IBLKDATA. Reviewed by: kib MFC with: -r361785 Differential Revision: https://reviews.freebsd.org/D25072	2020-06-04 18:35:21 +00:00
Konstantin Belousov	7428630b75	UFS: write inode block for fdatasync(2) if pointers in inode where allocated The fdatasync() description in POSIX specifies that all I/O operations shall be completed as defined for synchronized I/O data integrity completion. and then the explanation of Synchronized I/O Data Integrity Completion says The write is complete only when the data specified in the write request is successfully transferred and all file system information required to retrieve the data is successfully transferred. For UFS this means that all pointers must be on disk. Indirect pointers already contribute to the list of dirty data blocks, so only direct blocks and root pointers to indirect blocks, both of which reside in the inode block, should be taken care of. In ffs_balloc(), mark the inode with the new flag IN_IBLKDATA that specifies that ffs_syncvnode(DATA_ONLY) needs a call to ffs_update() to flush the inode block. Reviewed by: mckusick Discussed with: tmunro Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D25072	2020-06-04 12:23:15 +00:00
Chuck Silvers	d79ff54b5c	This commit enables a UFS filesystem to do a forcible unmount when the underlying media fails or becomes inaccessible. For example when a USB flash memory card hosting a UFS filesystem is unplugged. The strategy for handling disk I/O errors when soft updates are enabled is to stop writing to the disk of the affected file system but continue to accept I/O requests and report that all future writes by the file system to that disk actually succeed. Then initiate an asynchronous forced unmount of the affected file system. There are two cases for disk I/O errors: - ENXIO, which means that this disk is gone and the lower layers of the storage stack already guarantee that no future I/O to this disk will succeed. - EIO (or most other errors), which means that this particular I/O request has failed but subsequent I/O requests to this disk might still succeed. For ENXIO, we can just clear the error and continue, because we know that the file system cannot affect the on-disk state after we see this error. For EIO or other errors, we arrange for the geom_vfs layer to reject all future I/O requests with ENXIO just like is done when the geom_vfs is orphaned. In both cases, the file system code can just clear the error and proceed with the forcible unmount. This new treatment of I/O errors is needed for writes of any buffer that is involved in a dependency. Most dependencies are described by a structure attached to the buffer's b_dep field. But some are created and processed as a result of the completion of the dependencies attached to the buffer. Clearing of some dependencies require a read. For example if there is a dependency that requires an inode to be written, the disk block containing that inode must be read, the updated inode copied into place in that buffer, and the buffer then written back to disk. Often the needed buffer is already in memory and can be used. But if it needs to be read from the disk, the read will fail, so we fabricate a buffer full of zeroes and pretend that the read succeeded. This zero'ed buffer can be updated and written back to disk. The only case where a buffer full of zeros causes the code to do the wrong thing is when reading an inode buffer containing an inode that still has an inode dependency in memory that will reinitialize the effective link count (i_effnlink) based on the actual link count (i_nlink) that we read. To handle this case we now store the i_nlink value that we wrote in the inode dependency so that it can be restored into the zero'ed buffer thus keeping the tracking of the inode link count consistent. Because applications depend on knowing when an attempt to write their data to stable storage has failed, the fsync(2) and msync(2) system calls need to return errors if data fails to be written to stable storage. So these operations return ENXIO for every call made on files in a file system where we have otherwise been ignoring I/O errors. Coauthered by: mckusick Reviewed by: kib Tested by: Peter Holm Approved by: mckusick (mentor) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24088	2020-05-25 23:47:31 +00:00
John Baldwin	71d11ee322	Update name of description of vfs.ffs.setsize in comment. Previously it used the name 'adjsize' instead of 'setsize'.	2020-05-22 17:23:43 +00:00
John Baldwin	f2620e9ceb	Retire two unused background fsck sysctls. These two sysctls were added to support UFS softupdates journalling with snapshots. However, the changes to fsck to use them were never committed and there have never been any in-tree uses of these sysctls. More details from Kirk: When journalling got added to soft updates, its journal rollback freed blocks that it thought were no longer in use. But it does not take snapshots into account (i.e., if a snapshot is still using it, then it cannot be freed). So I added the needed logic to fsck by having the free go through the kernel's blkfree code so it could grab blocks that were still needed by snapshots. That is done using the setbufoutput hack. I never got that code working reliably, so it is still sitting in my work directory. Which also explains why you still cannot take snapshots on filesystems running with journalling... In looking over my use of this feature, and in particular the troubles I was having with it, I conclude that it may be better to extract the code from the kernel that handles freeing blocks claimed by snapshots and putting it into fsck directly. My original intent was that it is complex and at the time changing, so only having to maintain it in one place was appealing. But at this point it has not changed in years and the hacks like setinode and setbufoutput to be able to use the kernel code is sufficiently ugly, that I am leaning towards just extracting it. Reviewed by: mckusick MFC after: 1 week Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D24484	2020-04-21 17:42:32 +00:00
Konstantin Belousov	71f2642988	ufs: apply suspension for non-forced rw unmounts. Forced rw unmounts and remounts from rw to ro already suspend filesystem, which closes races with writers instantiating new vnodes while unmount flushes the queue. Original intent of not including non-forced unmounts into this regime was to allow such unmounts to fail if writer was active, but this did not worked well. Similar change, but causing all unmount, even involving only ro filesystem, were proposed in D24088, but I believe that suspending ro is undesirable, and definitely spends CPU time. Reported by: markj Discussed with: chs, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2020-04-10 01:24:16 +00:00
Kirk McKusick	621a274820	Fixing the soft update macros in -r359612 triggered a previously hidden bug in the file truncation code. Until that bug is tracked down and fixed, revert to the old behavior. Reported by: Peter Holm Reviewed by: kib, Chuck Silvers	2020-04-09 23:51:18 +00:00
Kirk McKusick	c79f5a4328	Revert -r359612 as it can cause other panics. An updated version will be made when the issue has been resolved. Reported by: Peter Holm	2020-04-06 20:23:47 +00:00
Kirk McKusick	2baca88584	When shrinking the size of a directory it is sometimes necessary to sync it to disk before shrinking it. Complete the sync before getting the buffer for the block to be updated to do the shrink to avoid panicing with a recursive lock on one of the directory's buffers. Reviewed by: Chuck Silvers (chs) MFC after: 3 days Sponsored by: Netflix	2020-04-03 20:43:25 +00:00
Kirk McKusick	aedb9cc662	Convert DOINGSOFTDEP, MOUNTEDSOFTDEP, DOINGSUJ, and MOUNTEDSUJ to being boolean expressions so that their values are not lost when assigned to `bool' or `int' variables. Reviewed by: Chuck Silvers (chs) MFC after: 3 days Sponsored by: Netflix	2020-04-03 20:30:45 +00:00
Konstantin Belousov	abfdf76791	VOP_GETPAGES_ASYNC(): consistently call iodone() callback in case of error. Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038	2020-03-30 21:44:30 +00:00
Kirk McKusick	95ca762da8	When mounting a UFS filesystem, return EINTEGRITY rather than EIO when a superblock check-hash error is detected. This change clarifies a mount that failed due to media hardware failures (EIO) from a mount that failed due to media errors (EINTEGRITY) that can be corrected by running fsck(8). Sponsored by: Netflix	2020-03-11 21:00:40 +00:00
Chuck Silvers	69b3fdfa0b	Use the devfs vnode rather than the mntfs vnode for permissions checks. I missed this one in r358714. Reported by: pho Reviewed by: mckusick Approved by: imp (mentor) Sponsored by: Netflix	2020-03-09 15:55:13 +00:00
Mateusz Guzik	d2222aa0e9	fd: use smr for managing struct pwd This has a side effect of eliminating filedesc slock/sunlock during path lookup, which in turn removes contention vs concurrent modifications to the fd table. Reviewed by: markj, kib Differential Revision: https://reviews.freebsd.org/D23889	2020-03-08 00:23:36 +00:00
Chuck Silvers	f15ccf8836	Add a new "mntfs" pseudo file system which provides private device vnodes for file systems to safely access their disk devices, and adapt FFS to use it. Also add a new BO_NOBUFS flag to allow enforcing that file systems using mntfs vnodes do not accidentally use the original devfs vnode to create buffers. Reviewed by: kib, mckusick Approved by: imp (mentor) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D23787	2020-03-06 18:41:37 +00:00
Mateusz Guzik	8d03b99b9d	fd: move vnodes out of filedesc into a dedicated structure The new structure is copy-on-write. With the assumption that path lookups are significantly more frequent than chdirs and chrooting this is a win. This provides stable root and jail root vnodes without the need to reference them on lookup, which in turn means less work on globally shared structures. Note this also happens to fix a bug where jail vnode was never referenced, meaning subsequent access on lookup could run into use-after-free. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23884	2020-03-01 21:53:46 +00:00
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Kirk McKusick	98b6844690	Additional KASSERTs to ensure the consistency of the soft updates indirdep structure. No functional change. Tested by: Peter Holm (as part of a larger patch) Sponsored by: Netflix	2020-02-18 23:56:23 +00:00
Scott Long	1353215314	Add rudamentary support for UFS to probe whether a block device supports the BIO_SPEEDUP command. Add complimentary support to the CAM periphs that support it. This is a redo of r357710.	2020-02-16 23:10:59 +00:00
Mateusz Guzik	4d51e175f9	ufs: use faster lockgmr entry points in ffs_lock	2020-02-15 21:48:48 +00:00
Scott Long	85eb41f751	Revert r357710 and 357711 until they can be debugged	2020-02-10 14:27:28 +00:00
Scott Long	9ce150463c	Missed a file in r357710, add it here.	2020-02-10 00:26:41 +00:00
Scott Long	7d99bda79e	Add rudamentary support for UFS to probe whether a block device supports the BIO_SPEEDUP command. Add complimentary support to the CAM periphs that support it.	2020-02-10 00:23:20 +00:00
Chuck Silvers	62612737d6	With INVARIANTS, track all softdep dependency structures centrally so that we can find them in dumps. Approved by: mckusick (mentor) Sponsored by: Netflix	2020-02-03 17:47:14 +00:00
Mateusz Guzik	f1fa1ba3d0	Fix up various vnode-related asserts which did not dump the used vnode	2020-02-03 14:25:32 +00:00
Mateusz Guzik	643656cfaf	vfs: replace VOP_MARKATIME with VOP_MMAPPED The routine is only provided by ufs and is only used on mmap and exec. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23422	2020-02-01 06:46:55 +00:00
Mateusz Guzik	0a09292188	ufs: drop ufs_markatime from ufs_fifoops The routine is only called on mmap and exec, both of which are invalid for this type. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23421	2020-02-01 06:41:44 +00:00
Mateusz Guzik	901b05fbd2	ufs: add the missing vn_need_pageq_flush call to ufs_need_inactive	2020-01-30 05:37:35 +00:00

1 2 3 4 5 ...

2261 Commits