freebsd-dev

Author	SHA1	Message	Date
Konstantin Belousov	1c771f9222	After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason to pull vm_param.h was removed. Other big dependency of vm_page.h on vm_param.h are PA_LOCK* definitions, which are only needed for in-kernel code, because modules use KBI-safe functions to lock the pages. Stop including vm_param.h into vm_page.h. Include vm_param.h explicitely for the kernel code which needs it. Suggested and reviewed by: alc MFC after: 2 weeks	2012-08-05 14:11:42 +00:00
Kevin Lo	f7a3729c91	Use NULL instead of 0 for pointers	2012-07-22 15:40:31 +00:00
Konstantin Belousov	c5c1199c83	Extend the KPI to lock and unlock f_offset member of struct file. It now fully encapsulates all accesses to f_offset, and extends f_offset locking to other consumers that need it, in particular, to lseek() and variants of getdirentries(). Ensure that on 32bit architectures f_offset, which is 64bit quantity, always read and written under the mtxpool protection. This fixes apparently easy to trigger race when parallel lseek()s or lseek() and read/write could destroy file offset. The already broken ABI emulations, including iBCS and SysV, are not converted (yet). Tested by: pho No objections from: jhb MFC after: 3 weeks	2012-07-02 21:01:03 +00:00
Konstantin Belousov	7aac7bc18a	Fix unbounded-length malloc, controlled from usermode. The added check is performed before exact size of the buffer is calculated, but the buffer cannot have size greater then the total space allocated for extended attributes. The existing check is executing with precise size, but it is too late, since buffer needs to be allocated in advance. Also, adapt to uio_resid being of ssize_t type. Use lblktosize instead of multiplying by fs block size by hand as well. Reported and tested by: pho MFC after: 1 week	2012-06-21 09:20:07 +00:00
Kirk McKusick	aa445c9d7c	In softdep_setup_inomapdep() we may have to allocate both inodedep and bmsafemap dependency structures in inodedep_lookup() and bmsafemap_lookup() respectively. The setup of these structures must be done while holding the soft-dependency mutex. If the inodedep is allocated first, it may be freed in the I/O completion callback when the mutex is released to allocate the bmsafemap. If the bmsafemap is allocated first, it may be freed in the I/O completion callback when the mutex is released to allocate the inodedep. To resolve this problem, bmsafemap_lookup has had a parameter added that allows a pre-malloc'ed bmsafemap to be passed in so that it does not need to release the mutex to create a new bmsafemap. The softdep_setup_inomapdep() routine pre-malloc's a bmsafemap dependency before acquiring the mutex and starting to build the inodedep with a call to inodedep_lookup(). The subsequent call to bmsafemap_lookup() is passed this pre-allocated bmsafemap entry so that it need not release the mutex if it needs to create a new one. Reported by: Peter Holm Tested by: Peter Holm MFC after: 1 week	2012-06-11 23:07:21 +00:00
Konstantin Belousov	b569050a78	Enable vn_io_fault() lock avoidance for UFS. Tested by: pho MFC after: 2 months	2012-05-30 16:45:41 +00:00
Konstantin Belousov	6ee10a96c0	Implement SEEK_HOLE/SEEK_DATA for UFS. MFC after: 2 weeks	2012-05-26 05:29:53 +00:00
Kirk McKusick	8b6207110d	Add missing `continue' statement at end of case. Found by: Kevin Lo (kevlo@) MFC after: 1 week	2012-05-18 15:20:21 +00:00
Edward Tomasz Napierala	06c65b6b41	Remove unused thread argument from ufs_extattr_uepm_lock()/ufs_extattr_uepm_unlock().	2012-04-23 17:56:35 +00:00
Edward Tomasz Napierala	05cc75de83	Fix build.	2012-04-23 17:54:49 +00:00
Edward Tomasz Napierala	26621e1f06	Remove unused thread argument from clear_inodeps() and clear_remove().	2012-04-23 14:44:18 +00:00
Edward Tomasz Napierala	af6e6b87ad	Remove unused thread argument to vrecycle(). Reviewed by: kib	2012-04-23 14:10:34 +00:00
Edward Tomasz Napierala	c52fd858ae	Remove unused thread argument from vtruncbuf(). Reviewed by: kib	2012-04-23 13:21:28 +00:00
Edward Tomasz Napierala	72b8ff1c74	Fix use-after-free introduced in r234036. Reviewed by: mckusick Tested by: pho	2012-04-21 10:45:46 +00:00
Kirk McKusick	dca5e0ec50	This update uses the MNT_VNODE_FOREACH_ACTIVE interface that loops over just the active vnodes associated with a mount point to replace MNT_VNODE_FOREACH_ALL in the vfs_msync, ffs_sync_lazy, and qsync routines. The vfs_msync routine is run every 30 seconds for every writably mounted filesystem. It ensures that any files mmap'ed from the filesystem with modified pages have those pages queued to be written back to the file from which they are mapped. The ffs_lazy_sync and qsync routines are run every 30 seconds for every writably mounted UFS/FFS filesystem. The ffs_lazy_sync routine ensures that any files that have been accessed in the previous 30 seconds have had their access times queued for updating in the filesystem. The qsync routine ensures that any files with modified quotas have those quotas queued to be written back to their associated quota file. In a system configured with 250,000 vnodes, less than 1000 are typically active at any point in time. Prior to this change all 250,000 vnodes would be locked and inspected twice every minute by the syncer. For UFS/FFS filesystems they would be locked and inspected six times every minute (twice by each of these three routines since each of these routines does its own pass over the vnodes associated with a mount point). With this change the syncer now locks and inspects only the tiny set of vnodes that are active. Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks	2012-04-20 07:00:28 +00:00
Jaakko Heinonen	808dd116cb	The part about exec atime no longer applies in the comment. Pointed out by: bde	2012-04-18 15:19:00 +00:00
Kirk McKusick	71469bb38f	Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL. The primary changes are that the user of the interface no longer needs to manage the mount-mutex locking and that the vnode that is returned has its mutex locked (thus avoiding the need to check to see if its is DOOMED or other possible end of life senarios). To minimize compatibility issues for third-party developers, the old MNT_VNODE_FOREACH interface will remain available so that this change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH will be removed in head. The reason for this update is to prepare for the addition of the MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks	2012-04-17 16:28:22 +00:00
Kirk McKusick	ecb6e528c5	Export vinactive() from kern/vfs_subr.c (e.g., make it no longer static and declare its prototype in sys/vnode.h) so that it can be called from process_deferred_inactive() (in ufs/ffs/ffs_snapshot.c) instead of the body of vinactive() being cut and pasted into process_deferred_inactive(). Reviewed by: kib MFC after: 2 weeks	2012-04-11 23:01:11 +00:00
Jaakko Heinonen	fce74feae1	- Return EPERM from ufs_setattr() when an user without PRIV_VFS_SYSFLAGS privilege attempts to toggle SF_SETTABLE flags. - Use the '^' operator in the SF_SNAPSHOT anti-toggling check. Flags are now stored to ip->i_flags in one place after all checks. Submitted by: bde	2012-04-10 15:59:37 +00:00
Edward Tomasz Napierala	2b028c25d3	Fix panic in ffs_reload(), which may happen when read-only filesystem gets resized and then reloaded. Reviewed by: kib, mckusick (earlier version) Sponsored by: The FreeBSD Foundation	2012-04-08 13:44:55 +00:00
Kirk McKusick	b73ffa31d4	Drop an unnecessary setting of si_mountpt when updating a UFS mount point. Clearly it must have been set when the mount was done. Reviewed by: kib	2012-04-08 06:14:49 +00:00
Jaakko Heinonen	63dfd849a1	Add a check for unsupported file flags to ufs_setattr(). Discussed with: bde MFC after: 2 weeks	2012-04-04 14:50:21 +00:00
Kirk McKusick	23d6e518da	A file cannot be deallocated until its last name has been removed and it is no longer referenced by a user process. The inode for a file whose name has been removed, but is still referenced at the time of a crash will still be allocated in the filesystem, but will have no references (e.g., they will have no names referencing them from any directory). With traditional soft updates these unreferenced inodes will be found and reclaimed when the background fsck is run. When using journaled soft updates, the kernel must keep track of these inodes so that it can find and reclaim them during the cleanup process. Their existence cannot be stored in the journal as the journal only handles short-term events, and they may persist for days. So, they are tracked by keeping them in a linked list whose head pointer is stored in the superblock. The journal tracks them only until their linked list pointers have been commited to disk. Part of the cleanup process involves traversing the list of unreferenced inodes and reclaiming them. This bug was triggered when confusion arose in the commit steps of keeping the unreferenced-inode linked list coherent on disk. Notably, a race between the link() system call adding a link-count to a file and the unlink() system call removing a link-count to the file. Here if the unlink() ran after link() had looked up the file but before link() had incremented the link-count of the file, the file's link-count would drop to zero before the link() incremented it back up to one. If the file was referenced by a user process, the first transition through zero made it appear that it should be added to the unreferenced-inode list when in fact it should not have been added. If the new name created by link() was deleted within a few seconds (with the file still referenced by a user process) it would legitimately be a candidate for addition to the unreferenced-inode list. The result was that there were two attempts to add the same inode to the unreferenced-inode list which scrambled the unreferenced-inode list's pointers leading to a panic. The fix is to detect and avoid the false attempt at adding it to the unreferenced-inode list by having the link() system call check to see if the link count is zero before it increments it. If it is, the link() fails with ENOENT (showing that it has failed the link()/unlink() race). While tracking down this bug, we have added additional assertions to detect the problem sooner and also simplified some of the code. Reported by: Kirk Russell Fix submitted by: Jeff Roberson Tested by: Peter Holm PR: kern/159971 MFC (to 9 only): 2 weeks	2012-04-02 21:58:37 +00:00
Jaakko Heinonen	fccf286d24	- Use more natural ip->i_flags instead of vap->va_flags in the final flags check. - Add a comment for the immutable/append check done after handling of the flags. - Style improvements. No functional change intended. Submitted by: bde MFC after: 2 weeks	2012-04-02 16:33:21 +00:00
Kirk McKusick	6c09f4a27c	A refinement of change 232351 to avoid a race with a forcible unmount. While we have a snapshot vnode unlocked to avoid a deadlock with another inode in the same inode block being updated, the filesystem containing it may be forcibly unmounted. When that happens the snapshot vnode is revoked. We need to check for that condition and fail appropriately. This change will be included along with 232351 when it is MFC'ed to 9. Spotted by: kib Reviewed by: kib	2012-03-28 21:21:19 +00:00
Kirk McKusick	1faacf5d09	Keep track of the mount point associated with a special device to enable the collection of counts of synchronous and asynchronous reads and writes for its associated filesystem. The counts are displayed using `mount -v'. Ensure that buffers used for paging indicate the vnode from which they are operating so that counts of paging I/O operations from the filesystem are collected. This checkin only adds the setting of the mount point for the UFS/FFS filesystem, but it would be trivial to add the setting and clearing of the mount point at filesystem mount/unmount time for other filesystems too. Reviewed by: kib	2012-03-28 20:49:11 +00:00
Konstantin Belousov	ea573a50b3	Do trivial reformatting of the comment to record the missed commit message for r233609: Restore the writes of atimes, quotas and superblock from syncer vnode. Noted by: rdivacky	2012-03-28 14:16:15 +00:00
Konstantin Belousov	a988a5c609	Reviewed by: bde, mckusick Tested by: pho MFC after: 2 weeks	2012-03-28 14:06:47 +00:00
Konstantin Belousov	64c8ead942	Microoptimize: in qsync loop over mount vnodes, only unlock mount interlock after we committed to try to vget() the vnode. Submitted by: bde Reviewed by: mckusick Tested by: pho MFC after: 1 week	2012-03-28 13:56:18 +00:00
Konstantin Belousov	e0c1740853	Update comment. MFC after: 3 days	2012-03-28 13:47:07 +00:00
Kirk McKusick	75a5838904	Add a third flags argument to ffs_syncvnode to avoid a possible conflict with MNT_WAIT flags that passed in its second argument. This will be MFC'ed together with r232351. Discussed with: kib	2012-03-25 00:02:37 +00:00
Konstantin Belousov	064f517d2b	Supply boolean as the second argument to ffs_update(), and not a MNT_[NO]WAIT constants, which in fact always caused sync operation. Based on the submission by: bde Reviewed by: mckusick MFC after: 2 weeks	2012-03-13 22:04:27 +00:00
Konstantin Belousov	92ccae0399	Remove superfluous brackets. Submitted by: alc MFC after: 2 weeks	2012-03-11 21:25:42 +00:00
Konstantin Belousov	dd522d76dc	Do schedule delayed writes for async mounts. While there, make some style adjustments, like missed () around return values. Submitted by: bde Reviewed by: mckusick Tested by: pho MFC after: 2 weeks	2012-03-11 20:26:19 +00:00
Konstantin Belousov	2fd2c0b1e3	Do not fall back to slow synchronous i/o when low on memory or buffers. The bawrite() schedules the write to happen immediately, and its use frees the current thread to do more cleanups. Submitted by: bde Reviewed by: mckusick Tested by: pho MFC after: 2 weeks	2012-03-11 20:23:46 +00:00
Konstantin Belousov	4cd74eecda	In ffs_syncvnode(), pass boolean false as second argument of ffs_update(). Synchronous inode block update is not needed for MNT_LAZY callers (syncer), and since waitfor values are not zero, code did unneccessary synchronous update. Submitted by: bde Reviewed by: mckusick Tested by: pho MFC after: 2 weeks	2012-03-11 20:18:14 +00:00
Konstantin Belousov	18ef3670e5	Remove not needed ARGSUSED lint command. Submitted by: bde MFC after: 3 days	2012-03-11 20:15:12 +00:00
Konstantin Belousov	b80dcb55aa	Remove fifo.h. The only used function declaration from the header is migrated to sys/vnode.h. Submitted by: gianni	2012-03-11 12:19:58 +00:00
Peter Holm	e521b5288a	Revert r232692 as the correct place to fix this is at the syscall level.	2012-03-09 17:19:50 +00:00
Konstantin Belousov	38ddb5725b	Decomission mnt_noasync. Introduce MNTK_NOASYNC mnt_kern_flag which allows a filesystem to request VFS to not allow MNTK_ASYNC. MFC after: 1 week	2012-03-09 00:12:05 +00:00
John Baldwin	b47f624183	Add KTR_VFS traces to track modifications to a vnode's writecount.	2012-03-08 20:27:20 +00:00
Peter Holm	80042581a5	syscall() fuzzing can trigger this panic. Return EINVAL instead. MFC after: 1 week	2012-03-08 12:49:08 +00:00
John Baldwin	58d65e8031	Similar to the fixes in 226967 and 226987, purge any name cache entries associated with the previous vnode (if any) associated with the target of a rename(). Otherwise, a lookup of the target pathname concurrent with a rename() could re-add a name cache entry after the namei(RENAME) lookup in kern_renameat() had purged the target pathname. MFC after: 2 weeks	2012-03-02 18:55:19 +00:00
Kirk McKusick	35338e6091	This change avoids a kernel deadlock on "snaplk" when using snapshots on UFS filesystems running with journaled soft updates. This is the first of several bugs that need to be fixed before removing the restriction added in -r230250 to prevent the use of snapshots on filesystems running with journaled soft updates. The deadlock occurs when holding the snapshot lock (snaplk) and then trying to flush an inode via ffs_update(). We become blocked by another process trying to flush a different inode contained in the same inode block that we need. It holds the inode block for which we are waiting locked. When it tries to write the inode block, it gets blocked waiting for the our snaplk when it calls ffs_copyonwrite() to see if the inode block needs to be copied in our snapshot. The most obvious place that this deadlock arises is in the ffs_copyonwrite() routine when it updates critical metadata in a snapshot and tries to write it out before proceeding. The fix here is to write the data and indirect block pointer for the snapshot, but to skip the call to ffs_update() to write the snapshot inode. To ensure that we will never have to update a pointer in the inode itself, the ffs_snapshot() routine that creates the snapshot has to ensure that all the direct blocks are allocated as part of the creation of the snapshot. A less obvious place that this deadlock occurs is when we hold the snaplk because we are deleting a snapshot. In the course of doing the deletion, we need to allocate various soft update dependency structures and allocate some journal space. If we hit a resource limit while doing this we decrease the resources in use by flushing out an existing dirty file to get it to give up the soft dependency resources that it holds. The flush can cause an ffs_update() to be done on the inode for the file that we have selected to flush resulting in the same deadlock as described above when the inode that we have chosen to flush resides in the same inode block as the snapshot inode that we hold. The fix is to defer cleaning up any time that the inode on which we are operating is a snapshot. Help and review by: Jeff Roberson Tested by: Peter Holm MFC (to 9 only) after: 2 weeks	2012-03-01 18:45:25 +00:00
Konstantin Belousov	2b19466836	Properly lock DQREF() with dqhlock. Missed locking caused counter corruption. Assert that the dq reference value is sane before decrementing it. Reported and tested by: pho MFC after: 1 week	2012-02-22 20:03:51 +00:00
Konstantin Belousov	526d0bd547	Fix found places where uio_resid is truncated to int. Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from the usermode. Discussed with: bde, das (previous versions) MFC after: 1 month	2012-02-21 01:05:12 +00:00
Kirk McKusick	e8e848ef8e	Missing conditions in checking whether an inode has been written. Found and tested by: Peter Holm MFC after: 2 weeks (to 9 only)	2012-02-13 01:33:39 +00:00
Kirk McKusick	5ecba76999	Historically when an application wrote an entire block of a file, the kernel allocated a buffer but did not zero it as it was about to be completely filled by a uiomove() from the user's buffer. However, if the uiomove() failed, the old contents of the buffer could be exposed especially if the file was being mmap'ed. The fix was to always zero the buffer when it was allocated. This change first attempts the uiomove() to the newly allocated (and dirty) buffer and only zeros it if the uiomove() fails. The effect is to eliminate the gratuitous zeroing of the buffer in the usual case where the uiomove() successfully fills it. Reviewed by: kib Tested by: scottl MFC after: 2 weeks (to 9 only)	2012-02-09 22:34:16 +00:00
Kirk McKusick	19c87af0fd	In the original days of BSD, a sync was issued on every filesystem every 30 seconds. This spike in I/O caused the system to pause every 30 seconds which was quite annoying. So, the way that sync worked was changed so that when a vnode was first dirtied, it was put on a 30-second cleaning queue (see the syncer_workitem_pending queues in kern/vfs_subr.c). If the file has not been written or deleted after 30 seconds, the syncer pushes it out. As the syncer runs once per second, dirty files are trickled out slowly over the 30-second period instead of all at once by a call to sync(2). The one drawback to this is that it does not cover the filesystem metadata. To handle the metadata, vfs_allocate_syncvnode() is called to create a "filesystem syncer vnode" at mount time which cycles around the cleaning queue being sync'ed every 30 seconds. In the original design, the only things it would sync for UFS were the filesystem metadata: inode blocks, cylinder group bitmaps, and the superblock (e.g., by VOP_FSYNC'ing devvp, the device vnode from which the filesystem is mounted). Somewhere in its path to integration with FreeBSD the flushing of the filesystem syncer vnode got changed to sync every vnode associated with the filesystem. The result of this change is to return to the old filesystem-wide flush every 30-seconds behavior and makes the whole 30-second delay per vnode useless. This change goes back to the originally intended trickle out sync behavior. Key to ensuring that all the intended semantics are preserved (e.g., that all inode updates get flushed within a bounded period of time) is that all inode modifications get pushed to their corresponding inode blocks so that the metadata flush by the filesystem syncer vnode gets them to the disk in a timely way. Thanks to Konstantin Belousov (kib@) for doing the audit and commit -r231122 which ensures that all of these updates are being made. Reviewed by: kib Tested by: scottl MFC after: 2 weeks	2012-02-07 20:43:28 +00:00
Konstantin Belousov	36fc415d5d	Sprinkle missed calls to asynchronous UFS_UPDATE() in attempt to guarantee that all UFS inode metadata changes results in the dirtiness of the inodeblock. Due to missed inodeblock updates, syncer was required to fsync each mount point' vnode to guarantee periodic metadata flush. Reviewed by: mckusick Tested by: scottl MFC after: 2 weeks	2012-02-07 09:51:41 +00:00

1 2 3 4 5 ...

1823 Commits