freebsd-skq

Author	SHA1	Message	Date
Brooks Davis	44ca4575ea	vmapbuf: don't smuggle address or length in buf Instead, add arguments to vmapbuf. Since this argument is always a pointer use a type of void * and cast to vm_offset_t in vmapbuf. (In CheriBSD we've altered vm_fault_quick_hold_pages to take a pointer and check its bounds.) In no other situtation does b_data contain a user pointer and vmapbuf replaces b_data with the actual mapping. Suggested by: jhb Reviewed by: imp, jhb Obtained from: CheriBSD MFC after: 1 week Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D26784	2020-10-21 16:00:15 +00:00
Bryan Drewery	c2c6fb90e0	Use unlocked page lookup for inmem() to avoid object lock contention Reviewed By: kib, markj Submitted by: mlaier Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D26653	2020-10-09 23:49:42 +00:00
Bryan Drewery	9ceba22462	Revert r366340. CR wasn't finished and it breaks the build.	2020-10-01 20:08:27 +00:00
Bryan Drewery	2398cd1103	Use unlocked page lookup for inmem() to avoid object lock contention Reviewed By: kib, markj Sponsored by: Dell EMC Isilon Submitted by: mlaier Differential Revision: https://reviews.freebsd.org/D26597	2020-10-01 19:17:03 +00:00
Mateusz Guzik	6fed89b179	kern: clean up empty lines in .c and .h files	2020-09-01 22:12:32 +00:00
Mateusz Guzik	7ad2a82da2	vfs: drop the error parameter from vn_isdisk, introduce vn_isdisk_error Most consumers pass NULL.	2020-08-19 02:51:17 +00:00
Conrad Meyer	9da903e5d3	Unlocked getblk: Fix new false-positive assertion A free buf's lock may be held (temporarily) due to unlocked lookup, so buf_alloc() must acquire it without LK_NOWAIT. The unlocked getblk path should unlock it promptly once it realizes the identity does not match the buffer it was searching for. Reported by: gallatin Reviewed by: kib Tested by: pho X-MFC-With: r363482 Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D25914	2020-08-02 16:34:27 +00:00
Conrad Meyer	d6a75d39e9	getblk: Remove a non-sensical LK_NOWAIT \| LK_SLEEPFAIL No functional change. LK_SLEEPFAIL implies a behavior that is only possible if the lock operation can sleep. LK_NOWAIT prevents the lock operation from sleeping. Discussed with: kib	2020-07-31 00:13:40 +00:00
Conrad Meyer	59d13f6154	getblk: Avoid sleeping on wrong buf in lockless path If the buffer identity changed during lookup, sleeping could introduce a lock order reversal. Since we do not know if the identity changed until we get the lock, we must try-lock (LK_NOWAIT) only. EINTR and ERESTART error handling becomes irrelevant, as we no longer sleep. Reported by: kib Reviewed by: kib X-MFC-With: r363482 Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D25898	2020-07-31 00:07:01 +00:00
Conrad Meyer	81dc6c2c61	Use gbincore_unlocked for unprotected incore() Reviewed by: markj Sponsored by: Isilon Differential Revision: https://reviews.freebsd.org/D25790	2020-07-24 17:34:44 +00:00
Conrad Meyer	68ee1dda06	Add unlocked/SMR fast path to getblk() Convert the bufobj tries to an SMR zone/PCTRIE and add a gbincore_unlocked() API wrapping this functionality. Use it for a fast path in getblkx(), falling back to locked lookup if we raced a thread changing the buf's identity. Reported by: Attilio Reviewed by: kib, markj Testing: pho (in progress) Sponsored by: Isilon Differential Revision: https://reviews.freebsd.org/D25782	2020-07-24 17:34:04 +00:00
Mateusz Guzik	422f38d8ea	vfs: fix trivial whitespace issues which don't interefere with blame .. even without the -w switch	2020-07-10 09:01:36 +00:00
Chuck Silvers	d79ff54b5c	This commit enables a UFS filesystem to do a forcible unmount when the underlying media fails or becomes inaccessible. For example when a USB flash memory card hosting a UFS filesystem is unplugged. The strategy for handling disk I/O errors when soft updates are enabled is to stop writing to the disk of the affected file system but continue to accept I/O requests and report that all future writes by the file system to that disk actually succeed. Then initiate an asynchronous forced unmount of the affected file system. There are two cases for disk I/O errors: - ENXIO, which means that this disk is gone and the lower layers of the storage stack already guarantee that no future I/O to this disk will succeed. - EIO (or most other errors), which means that this particular I/O request has failed but subsequent I/O requests to this disk might still succeed. For ENXIO, we can just clear the error and continue, because we know that the file system cannot affect the on-disk state after we see this error. For EIO or other errors, we arrange for the geom_vfs layer to reject all future I/O requests with ENXIO just like is done when the geom_vfs is orphaned. In both cases, the file system code can just clear the error and proceed with the forcible unmount. This new treatment of I/O errors is needed for writes of any buffer that is involved in a dependency. Most dependencies are described by a structure attached to the buffer's b_dep field. But some are created and processed as a result of the completion of the dependencies attached to the buffer. Clearing of some dependencies require a read. For example if there is a dependency that requires an inode to be written, the disk block containing that inode must be read, the updated inode copied into place in that buffer, and the buffer then written back to disk. Often the needed buffer is already in memory and can be used. But if it needs to be read from the disk, the read will fail, so we fabricate a buffer full of zeroes and pretend that the read succeeded. This zero'ed buffer can be updated and written back to disk. The only case where a buffer full of zeros causes the code to do the wrong thing is when reading an inode buffer containing an inode that still has an inode dependency in memory that will reinitialize the effective link count (i_effnlink) based on the actual link count (i_nlink) that we read. To handle this case we now store the i_nlink value that we wrote in the inode dependency so that it can be restored into the zero'ed buffer thus keeping the tracking of the inode link count consistent. Because applications depend on knowing when an attempt to write their data to stable storage has failed, the fsync(2) and msync(2) system calls need to return errors if data fails to be written to stable storage. So these operations return ENXIO for every call made on files in a file system where we have otherwise been ignoring I/O errors. Coauthered by: mckusick Reviewed by: kib Tested by: Peter Holm Approved by: mckusick (mentor) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24088	2020-05-25 23:47:31 +00:00
Konstantin Belousov	bbca4bd7cd	buffer pager: skip bogus pages. We cannot validate bogus page by reading a buffer. PR: 244713 Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038	2020-03-30 21:42:46 +00:00
Konstantin Belousov	695e0701a0	buffer pager: deref ucred immediately after read. Ucred is passed to bread(9) so that non-local filesystems use proper credentials. But, since clean buffer might be cached unless buf_pager_relbuf is not enabled, it makes credentials to have extra reference until buffer is reclaimed. Ucred reference would prevent jail from destroying if creds are jailed. Dereferencing the read credentials on the valid buffer avoid that, and should be fine because the buffer is valid and does not need re-read. PR: 238032 Reported by: bz Reproduced and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D23775	2020-03-05 15:52:34 +00:00
Jeff Roberson	6be21eb778	Provide a lock free alternative to resolve bogus pages. This is not likely to be much of a perf win, just a nice code simplification. Reviewed by: markj, kib Differential Revision: https://reviews.freebsd.org/D23866	2020-02-28 21:42:48 +00:00
Jeff Roberson	7aaf252c96	Convert a few triviail consumers to the new unlocked grab API. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23847	2020-02-28 20:34:30 +00:00
Mark Johnston	c99d0c5801	Add a blocking counter KPI. refcount(9) was recently extended to support waiting on a refcount to drop to zero, as this was needed for a lockless VM object paging-in-progress counter. However, this adds overhead to all uses of refcount(9) and doesn't really match traditional refcounting semantics: once a counter has dropped to zero, the protected object may be freed at any point and it is not safe to dereference the counter. This change removes that extension and instead adds a new set of KPIs, blockcount_*, for use by VM object PIP and busy. Reviewed by: jeff, kib, mjg Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23723	2020-02-28 16:05:18 +00:00
Mateusz Guzik	f1fa1ba3d0	Fix up various vnode-related asserts which did not dump the used vnode	2020-02-03 14:25:32 +00:00
Mateusz Guzik	3ff65f71cb	Remove duplicated empty lines from kern/*.c No functional changes.	2020-01-30 20:05:05 +00:00
Mateusz Guzik	879e0604ee	Add KERNEL_PANICKED macro for use in place of direct panicstr tests	2020-01-12 06:07:54 +00:00
Mateusz Guzik	b249ce48ea	vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427	2020-01-03 22:29:58 +00:00
Jeff Roberson	686bcb5c14	schedlock 4/4 Don't hold the scheduler lock while doing context switches. Instead we unlock after selecting the new thread and switch within a spinlock section leaving interrupts and preemption disabled to prevent local concurrency. This means that mi_switch() is entered with the thread locked but returns without. This dramatically simplifies scheduler locking because we will not hold the schedlock while spinning on blocked lock in switch. This change has not been made to 4BSD but in principle it would be more straightforward. Discussed with: markj Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D22778	2019-12-15 21:26:50 +00:00
Edward Tomasz Napierala	d6fee74a0c	Add kern_sync(9), and make kernel code call it instead of going via sys_sync(2). Minor cleanup, no functional changes. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19366	2019-12-12 18:45:31 +00:00
Alexander Motin	61322a0a8a	Mark some more hot global variables with __read_mostly. MFC after: 1 week	2019-12-04 21:26:03 +00:00
Kirk McKusick	d00066a5f9	Currently the breadn_flags() and getblkx() interfaces are passed the vnode, logical block number, and size of data block that is being requested. They then use the VOP_BMAP function to calculate the mapping from logical block number to physical block number from which to access the data. This change expands the interface to also pass the physical block number in cases where the VOP_MAP function may no longer work, for example when a file is being truncated. No functional change. Reviewed by: kib Tested by: Peter Holm Sponsored by: Netflix	2019-12-03 23:07:09 +00:00
Jeff Roberson	6ee653cfeb	Drop the object lock in vfs_bio and cluster where it is now safe to do so. Recent changes to busy/valid/dirty have enabled page based synchronization and the object lock is no longer required in many cases. Reviewed by: kib Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21597	2019-10-29 20:37:59 +00:00
Jeff Roberson	0012f373e4	(4/6) Protect page valid with the busy lock. Atomics are used for page busy and valid state when the shared busy is held. The details of the locking protocol and valid and dirty synchronization are in the updated vm_page.h comments. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21594	2019-10-15 03:45:41 +00:00
Jeff Roberson	63e9755548	(1/6) Replace busy checks with acquires where it is trival to do so. This is the first in a series of patches that promotes the page busy field to a first class lock that no longer requires the object lock for consistency. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21548	2019-10-15 03:35:11 +00:00
Eric van Gyzen	e61e783b83	Add CTLFLAG_STATS to some vfs sysctl OIDs Add CTLFLAG_STATS to the following OIDs: vfs.altbufferflushes vfs.recursiveflushes vfs.barrierwrites vfs.flushwithdeps vfs.reassignbufcalls Refer to r353111. MFC after: 2 weeks Sponsored by: Dell EMC Isilon	2019-10-04 21:43:43 +00:00
Hans Petter Selasky	11b57401e6	Use REFCOUNT_COUNT() to obtain refcount where appropriate. Refcount waiting will set some flag bits in the refcount value. Make sure these bits get cleared by using the REFCOUNT_COUNT() macro to obtain the actual refcount. Differential Revision: https://reviews.freebsd.org/D21620 Reviewed by: kib@, markj@ MFC after: 1 week Sponsored by: Mellanox Technologies	2019-09-12 16:26:59 +00:00
Conrad Meyer	aaa3852435	buf: Add B_INVALONERR flag to discard data Setting the B_INVALONERR flag before a synchronous write causes the buf cache to forcibly invalidate contents if the write fails (BIO_ERROR). This is intended to be used to allow layers above the buffer cache to make more informed decisions about when discarding dirty buffers without successful write is acceptable. As a proof of concept, use in msdosfs to handle failures to mark the on-disk 'dirty' bit during rw mount or ro->rw update. Extending this to other filesystems is left as future work. PR: 210316 Reviewed by: kib (with objections) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D21539	2019-09-11 21:24:14 +00:00
Jeff Roberson	4cdea4a853	Use the sleepq lock rather than the page lock to protect against wakeup races with page busy state. The object lock is still used as an interlock to ensure that the identity stays valid. Most callers should use vm_page_sleep_if_busy() to handle the locking particulars. Reviewed by: alc, kib, markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21255	2019-09-10 18:27:45 +00:00
Conrad Meyer	a6935d085c	Remove long-dead BUF_ASSERT_{,UN}HELD assertions These were fully neutered in r177676 (2008), but not removed at the time for unclear reasons. They're totally dead code, so go ahead and yank them now. No functional change.	2019-09-05 21:43:33 +00:00
Mark Johnston	772dd133c6	Avoid direct accesses of the vm_page wire_count field. No functional change intended. Sponsored by: Netflix	2019-08-28 18:01:54 +00:00
Mark Johnston	98549e2dc6	Centralize the logic in vfs_vmio_unwire() and sendfile_free_page(). Both of these functions atomically unwire a page, optionally attempt to free the page, and enqueue or requeue the page. Add functions vm_page_release() and vm_page_release_locked() to perform the same task. The latter must be called with the page's object lock held. As a side effect of this refactoring, the buffer cache will no longer attempt to free mapped pages when completing direct I/O. This is consistent with the handling of pages by sendfile(SF_NOCACHE). Reviewed by: alc, kib MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20986	2019-07-29 22:01:28 +00:00
Conrad Meyer	daec92844e	Include ktr.h in more compilation units Similar to r348026, exhaustive search for uses of CTRn() and cross reference ktr.h includes. Where it was obvious that an OS compat header of some kind included ktr.h indirectly, .c files were left alone. Some of these files clearly got ktr.h via header pollution in some scenarios, or tinderbox would not be passing prior to this revision, but go ahead and explicitly include it in files using it anyway. Like r348026, these CUs did not show up in tinderbox as missing the include. Reported by: peterj (arm64/mp_machdep.c) X-MFC-With: r347984 Sponsored by: Dell EMC Isilon	2019-05-21 20:38:48 +00:00
Mark Johnston	8e7130a8a7	Stop checking TD_IDLETHREAD() in buffer cache routines. These predicates are vestigal and cannot be true today. For example, idle threads are not allowed to acquire locks. Also cache curthread in breada(). No functional change intended. Reviewed by: kib, mckusick MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20066	2019-04-29 13:23:32 +00:00
Alan Somers	f841e638fb	[skip ci] fix typo in comment from r59840 MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2019-04-26 15:00:59 +00:00
Kirk McKusick	3193b25a5a	This is an additional fix for bug report 230962. When using extended attributes, the kernel can panic with either "ffs_truncate3" or with "softdep_deallocate_dependencies: dangling deps". The problem arises because the flushbuflist() function which is called to clear out buffers is passed either the V_NORMAL flag to indicate that it should flush buffer associated with the contents of the file or the V_ALT flag to indicate that it should flush the buffers associated with the extended attribute data. The buffers containing the extended attribute data are identified by having their BX_ALTDATA flag set in the buffer's b_xflags field. The BX_ALTDATA flag is set on the buffer when the extended attribute block is first allocated or when its contents are read in from the disk. On a busy system, a buffer may be reused for another purpose, but the contents of the block that it contained continues to be held in the main page cache. Each physical page is identified as holding the contents of a logical block within a specified file (identified by a vnode). When a request is made to read a file, the kernel first looks for the block in the existing buffers. If it is not found there, it checks the page cache to see if it is still there. If it is found in the page cache, then it is remapped into a new buffer thus avoiding the need to read it in from the disk. The bug is that when a buffer request made for an extended attribute is fulfilled by reconstituting a buffer from the page cache rather than reading it in from disk, the BX_ALTDATA flag was not being set. Thus the flushbuflist() function would never clear it out and the "ffs_truncate3" panic would occur because the vnode being cleared still had buffers on its clean-buffer list. If the extended attribute was being updated, it is first read, then updated, and finally written. If the read is fulfilled by reconstituting the buffer from the page cache the BX_ALTDATA flag was not set and thus the dirty buffer would never be flushed by flushbuflist(). Eventually the buffer would be recycled. Since it was never written it would have an unfinished dependency which would trigger the "softdep_deallocate_dependencies: dangling deps" panic. The fix is to ensure that the BX_ALTDATA flag is set when a buffer has been reconstituted from the page cache. PR: 230962 Reported by: 2t8mr7kx9f@protonmail.com Reviewed by: kib Tested by: Peter Holm MFC after: 1 week Sponsored by: Netflix	2019-03-12 19:08:41 +00:00
Kirk McKusick	93fa5ae7f1	Augment DDB "show buffer" command to print the buffer's referenced vnode pointer (b_vp). The value of b_vp can be used by "show vnode" to print the vnode and "show vnodebufs" to print all the clean and dirty buffers associated with the vnode (which should include this buffer). Sponsored by: Netflix	2019-03-11 21:49:44 +00:00
Kirk McKusick	dab83bd1e8	Add printing of b_ioflags to DDB `show buffer' command. Sponsored by: Netflix	2019-01-25 21:24:09 +00:00
Gleb Smirnoff	d1bb5d7d50	Fix mistake in r343030: move nswbuf calculation back to kern_vfs_bio_buffer_alloc(), because in init_param2() nbuf isn't really initialized yet. Pointed out by: bde	2019-01-16 20:20:38 +00:00
Gleb Smirnoff	756a541279	Allocate pager bufs from UMA instead of 80-ish mutex protected linked list. o In vm_pager_bufferinit() create pbuf_zone and start accounting on how many pbufs are we going to have set. In various subsystems that are going to utilize pbufs create private zones via call to pbuf_zsecond_create(). The latter calls uma_zsecond_create(), and sets a limit on created zone. After startup preallocate pbufs according to requirements of all pbuf zones. Subsystems that used to have a private limit with old allocator now have private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS cluster, FFS, swap, vnode pager. The following subsystems use shared pbuf zone: cam(4), nvme(4), physio(9), aio(4). They should have their private limits, but changing that is out of scope of this commit. o Fetch tunable value of kern.nswbuf from init_param2() and while here move NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that was holding only this option. Default values aren't touched by this commit, but they probably should be reviewed wrt to modern hardware. This change removes a tight bottleneck from sendfile(2) operation, that uses pbufs in vnode pager. Other pagers also would benefit from faster allocation. Together with: gallatin Tested by: pho	2019-01-15 01:02:16 +00:00
Konstantin Belousov	200bf72793	Correct accuracy of the barrier writes accounting. Discussed with: mckusick MFC after: 1 week Sponsored by: The FreeBSD Foundation	2018-12-02 12:53:39 +00:00
Mark Johnston	f71ef9b686	Use plain atomic_{add,subtract} when that's sufficient. CID: 1386920 MFC after: 2 weeks	2018-11-06 17:32:25 +00:00
Mark Johnston	3fb14f61e1	Avoid completing I/O when dumping core after a panic. Filesystem or pager completion callbacks are generally non-functional after a panic and may trigger deadlocks if invoked in this context (e.g., by attempting to destroying a buffer mapping). To avoid this situation, short-circuit I/O completion in biodone(). Reviewed by: imp Discussed with: mav MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D15592	2018-06-01 23:49:32 +00:00
Matt Macy	84482abd21	vfs: annotate variables only used by debug builds as __unused	2018-05-19 04:59:39 +00:00
Konstantin Belousov	2ebc882927	Detect and optimize reads from the hole on UFS. - Create getblkx(9) variant of getblk(9) which can return error. - Add GB_NOSPARSE flag for getblk()/getblkx() which requests that BMAP was performed before the buffer is created, and EJUSTRETURN returned in case the requested block does not exist. - Make ffs_read() use GB_NOSPARSE to avoid instantiating buffer (and allocating the pages for it), copying from zero_region instead. The end result is less page allocations and buffer recycling when a hole is read, which is important for some benchmarks. Requested and reviewed by: jeff Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D14917	2018-05-13 09:47:28 +00:00
Mark Johnston	1b5c869d64	Fix some races introduced in r332974. With r332974, when performing a synchronized access of a page's "queue" field, one must first check whether the page is logically dequeued. If so, then the page lock does not prevent the page from being removed from its page queue. Intoduce vm_page_queue(), which returns the page's logical queue index. In some cases, direct access to the "queue" field is still required, but such accesses should be confined to sys/vm. Reported and tested by: pho Reviewed by: kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D15280	2018-05-04 17:17:30 +00:00

1 2 3 4 5 ...

802 Commits