freebsd-dev

Author	SHA1	Message	Date
Konstantin Belousov	41cf41fdfd	Extract the general-purpose code from tmpfs to perform uiomove from the page queue of some vm object. Discussed with: alc Tested by: pho Sponsored by: The FreeBSD Foundation	2013-08-21 17:23:24 +00:00
Attilio Rao	c7aebda8a1	The soft and hard busy mechanism rely on the vm object lock to work. Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl	2013-08-09 11:11:11 +00:00
Attilio Rao	be99683637	Revert r253939: We cannot busy a page before doing pagefaults. Infact, it can deadlock against vnode lock, as it tries to vget(). Other functions, right now, have an opposite lock ordering, like vm_object_sync(), which acquires the vnode lock first and then sleeps on the busy mechanism. Before this patch is reinserted we need to break this ordering. Sponsored by: EMC / Isilon storage division Reported by: kib	2013-08-05 08:55:35 +00:00
Attilio Rao	3b6714cacb	The page hold mechanism is fast but it has couple of fallouts: - It does not let pages respect the LRU policy - It bloats the active/inactive queues of few pages Try to avoid it as much as possible with the long-term target to completely remove it. Use the soft-busy mechanism to protect page content accesses during short-term operations (like uiomove_fromphys()). After this change only vm_fault_quick_hold_pages() is still using the hold mechanism for page content access. There is an additional complexity there as the quick path cannot immediately access the page object to busy the page and the slow path cannot however busy more than one page a time (to avoid deadlocks). Fixing such primitive can bring to complete removal of the page hold mechanism. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff Tested by: pho	2013-08-04 21:07:24 +00:00
Attilio Rao	878a788734	Remove unnecessary soft busy of the page before to do vn_rdwr() in kern_sendfile() which is unnecessary. The page is already wired so it will not be subjected to pagefault. The content cannot be effectively protected as it is full of races already. Multiple accesses to the same indexes are serialized through vn_rdwr(). Sponsored by: EMC / Isilon storage division Reviewed by: alc, jeff Tested by: pho	2013-08-04 15:56:19 +00:00
Alan Cox	f50b6721e1	Add missing VM object unlocks in an error case. Reviewed by: kib	2013-06-07 19:42:00 +00:00
Alan Cox	27a18d6a23	Don't busy the page unless we are likely to release the object lock. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2013-06-06 06:17:20 +00:00
Alan Cox	ba887a9b33	Eliminate unnecessary vm object locking from tmpfs_nocacheread().	2013-06-04 15:40:45 +00:00
Konstantin Belousov	3fa456b35d	Avoid deactivating the page if it is already on a queue, only requeue the page. This both reduces the number of queues locking and avoids moving the active page to inactive list just because the page was read or written. Based on the suggestion by: alc Reviewed by: alc Tested by: pho	2013-05-06 21:04:42 +00:00
Konstantin Belousov	df6b240b6f	Fix the v_object leak for non-regular tmpfs vnodes. Reported and tested by: pho Sponsored by: The FreeBSD Foundation	2013-05-02 18:46:31 +00:00
Konstantin Belousov	158cc900bb	For the new regular tmpfs vnode, v_object is initialized before insmntque() is called. The standard insmntque destructor resets the vop vector to deadfs one, and calls vgone() on the vnode. As result, v_object is kept unchanged, which triggers an assertion in the reclaim code, on instmntque() failure. Also, in this case, OBJ_TMPFS flag on the backed vm object is not cleared. Provide the tmpfs insmntque() destructor which properly clears OBJ_TMPFS flag and resets v_object. Reported and tested by: pho Sponsored by: The FreeBSD Foundation	2013-05-02 18:44:31 +00:00
Konstantin Belousov	bdefcb6959	The page read or written could be wired. Do not requeue if the page is not on a queue. Reported and tested by: pho Sponsored by: The FreeBSD Foundation	2013-05-02 18:36:52 +00:00
Konstantin Belousov	6f2af3fcf3	Rework the handling of the tmpfs node backing swap object and tmpfs vnode v_object to avoid double-buffering. Use the same object both as the backing store for tmpfs node and as the v_object. Besides reducing memory use up to 2x times for situation of mapping files from tmpfs, it also makes tmpfs read and write operations copy twice bytes less. VM subsystem was already slightly adapted to tolerate OBJT_SWAP object as v_object. Now the vm_object_deallocate() is modified to not reinstantiate OBJ_ONEMAPPING flag and help the VFS to correctly handle VV_TEXT flag on the last dereference of the tmpfs backing object. Reviewed by: alc Tested by: pho, bf MFC after: 1 month	2013-04-28 19:38:59 +00:00
Pawel Jakub Dawidek	051a23d4e8	- Constify local path variable for chflagsat(). - Use correct format characters (%lx) for u_long. This fixes the build broken in r248599.	2013-03-22 07:40:34 +00:00
Konstantin Belousov	0d3bb4afa8	Remove negative name cache entry pointing to the target name, which could be instantiated while tdvp was unlocked. Reported by: Rick Miller <vmiller at hostileadmin com> Tested by: pho MFC after: 1 week	2013-03-17 15:11:37 +00:00
Attilio Rao	89f6b8632c	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho	2013-03-09 02:32:23 +00:00
Attilio Rao	5e60cb948e	Remove a racy checks on resident and cached pages for tmpfs_mapped{read, write}() functions: - tmpfs_mapped{read, write}() are only called within VOP_{READ, WRITE}(), which check before-hand to work only on valid VREG vnodes. Also the vnode is locked for the duration of the work, making vnode reclaiming impossible, during the operation. Hence, vobj can never be NULL. - Currently check on resident pages and cached pages without vm object lock held is racy and can do even more harm than good, as a page could be transitioning between these 2 pools and then be skipped entirely. Skip the checks as lookups on empty splay trees are very cheap. Discussed with: alc Tested by: flo MFC after: 2 weeks	2013-02-10 01:04:10 +00:00
Gleb Kurtsou	4fd5efe79e	tmpfs: Replace directory entry linked list with RB-Tree. Use file name hash as a tree key, handle duplicate keys. Both VOP_LOOKUP and VOP_READDIR operations utilize same tree for search. Directory entry offset (cookie) is either file name hash or incremental id in case of hash collisions (duplicate-cookies). Keep sorted per directory list of duplicate-cookie entries to facilitate cookie number allocation. Don't fail if previous VOP_READDIR() offset is no longer valid, start with next dirent instead. Other file system handle it similarly. Workaround race prone tn_readdir_last[pn] fields update. Add tmpfs_dir_destroy() to free all dirents. Set NFS cookies in tmpfs_dir_getdents(). Return EJUSTRETURN from tmpfs_dir_getdents() instead of hard coded -1. Mark directory traversal routines static as they are no longer used outside of tmpfs_subr.c	2013-01-06 22:15:44 +00:00
Konstantin Belousov	1c771f9222	After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason to pull vm_param.h was removed. Other big dependency of vm_page.h on vm_param.h are PA_LOCK* definitions, which are only needed for in-kernel code, because modules use KBI-safe functions to lock the pages. Stop including vm_param.h into vm_page.h. Include vm_param.h explicitely for the kernel code which needs it. Suggested and reviewed by: alc MFC after: 2 weeks	2012-08-05 14:11:42 +00:00
Edward Tomasz Napierala	af6e6b87ad	Remove unused thread argument to vrecycle(). Reviewed by: kib	2012-04-23 14:10:34 +00:00
Attilio Rao	a0f2c37b6f	- Introduce a cache-miss optimization for consistency with other accesses of the cache member of vm_object objects. - Use novel vm_page_is_cached() for checks outside of the vm subsystem. Reviewed by: alc MFC after: 2 weeks X-MFC: r234039	2012-04-09 17:05:18 +00:00
Gleb Kurtsou	db94ad126a	Prevent tmpfs_rename() deadlock in a way similar to UFS Unlock vnodes and try to lock them one by one. Relookup fvp and tvp. Approved by: mdf (mentor)	2012-03-14 09:15:50 +00:00
Konstantin Belousov	b80dcb55aa	Remove fifo.h. The only used function declaration from the header is migrated to sys/vnode.h. Submitted by: gianni	2012-03-11 12:19:58 +00:00
John Baldwin	58d65e8031	Similar to the fixes in 226967 and 226987, purge any name cache entries associated with the previous vnode (if any) associated with the target of a rename(). Otherwise, a lookup of the target pathname concurrent with a rename() could re-add a name cache entry after the namei(RENAME) lookup in kern_renameat() had purged the target pathname. MFC after: 2 weeks	2012-03-02 18:55:19 +00:00
Tijl Coosemans	0662ee9826	Replace PRIdMAX with "jd" in a printf call. Cast the corresponding value to intmax_t instead of uintmax_t, because the original type is off_t.	2012-02-14 11:24:24 +00:00
Alan Cox	0b05cac3d2	When tmpfs_write() resets an extended file to its original size after an error, we want tmpfs_reg_resize() to ignore I/O errors and unconditionally update the file's size. Reviewed by: kib MFC after: 3 weeks	2012-01-16 00:26:49 +00:00
Alan Cox	93431cb74c	Neither tmpfs_nocacheread() nor tmpfs_mappedwrite() needs to call vm_object_pip_{add,subtract}() on the swap object because the swap object can't be destroyed while the vnode is exclusively locked. Moreover, even if the swap object could have been destroyed during tmpfs_nocacheread() and tmpfs_mappedwrite() this code is broken because vm_object_pip_subtract() does not wake up the sleeping thread that is trying to destroy the swap object. Free invalid pages after an I/O error. There is no virtue in keeping them around in the swap object creating more work for the page daemon. (I believe that any non-busy page in the swap object will now always be valid.) vm_pager_get_pages() does not return a standard errno, so its return value should not be returned by tmpfs without translation to an errno value. There is no reason for the wakeup on vpg in tmpfs_mappedwrite() to occur with the swap object locked. Eliminate printf()s from tmpfs_nocacheread() and tmpfs_mappedwrite(). (The swap pager already spam your console if data corruption is imminent.) Reviewed by: kib MFC after: 3 weeks	2012-01-14 23:04:27 +00:00
Alan Cox	04f883d798	Don't pass VM_ALLOC_ZERO to vm_page_grab() in tmpfs_mappedwrite() and tmpfs_nocacheread(). It is both unnecessary and a pessimization. It results in either the page being zeroed twice or zeroed first and then overwritten by an I/O operation. MFC after: 3 weeks	2012-01-03 03:29:01 +00:00
Ivan Voras	6e92aee4e2	Avoid panics from recursive rename operations. Not a perfect patch but good enough for now. PR: kern/159418 Submitted by: Gleb Kurtsou Reviewed by: kib MFC after: 1 month	2011-11-22 16:18:12 +00:00
Peter Holm	948fa27d49	Added missing cache purge of from argument for rename(). Reported by: Anton Yuzhaninov <citrin citrin ru> In collaboration with: kib MFC after: 1 week	2011-11-01 12:33:06 +00:00
Konstantin Belousov	3407fefef6	Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs. Document the changes to flags field to only require the page lock. Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced. Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)	2011-09-06 10:30:11 +00:00
Alan Cox	4d2f3d2cde	Eliminate two dubious attempts at optimizing the implementation of a file's last accessed, modified, and changed times: TMPFS_NODE_ACCESSED and TMPFS_NODE_CHANGED should be set unconditionally in tmpfs_remove() without regard to the number of hard links to the file. Otherwise, after the last directory entry for a file has been removed, a process that still has the file open could read stale values for the last accessed and changed times with fstat(2). Similarly, tmpfs_close() should update the time-related fields even if all directory entries for a file have been removed. In this case, the effect is that the time-related fields will have values that are later than expected. They will correspond to the time at which fstat(2) is called. In collaboration with: kib MFC after: 1 week	2011-02-22 14:47:10 +00:00
Alan Cox	7ded42ba28	tmpfs_remove() isn't modifying the file's data, so it shouldn't set TMPFS_NODE_MODIFIED on the node. PR: 152488 Submitted by: Anton Yuzhaninov Reviewed by: kib MFC after: 1 week	2011-02-19 21:04:36 +00:00
Konstantin Belousov	9fb9c623a6	In tmpfs_readdir(), normalize handling of the directory entries that either overflow the supplied buffer, or cause uiomove fail. Do not advance cached de when directory entry was not copied out. Do not return EOF when no entries could be copied due to first entry too large for supplied buffer, signal EINVAL instead. Reported by: Beat G?tzi <beat chruetertee ch> MFC after: 1 week	2011-01-20 09:39:16 +00:00
Andriy Gapon	e07b64c567	tmpfs + sendfile: do not produce partially valid pages for vnode's tail See r213730 for details of analogous change in ZFS. MFC after: 3 days	2010-10-12 17:16:51 +00:00
Andriy Gapon	21bd3e2576	tmpfs, zfs + sendfile: mark page bits as valid after populating it with data Otherwise, adding insult to injury, in addition to double-caching of data we would always copy the data into a vnode's vm object page from backend. This is specific to sendfile case only (VOP_READ with UIO_NOCOPY). PR: kern/141305 Reported by: Wiktor Niesiobedzki <bsd@vink.pl> Reviewed by: alc Tested by: tools/regression/sockets/sendfile MFC after: 2 weeks	2010-09-15 10:31:27 +00:00
Ivan Voras	b2143ecb99	Avoid "Entry can disappear before we lock fdvp" panic. PR: 150143 Submitted by: Gleb Kurtsou <gk at FreeBSD.org> Pretty sure it won't blow up: mckusick MFC after: 2 weeks	2010-09-07 22:40:45 +00:00
Ed Schouten	99d57a6bd8	Add support for whiteouts on tmpfs. Right now unionfs only allows filesystems to be mounted on top of another if it supports whiteouts. Even though I have sent a patch to daichi@ to let unionfs work without it, we'd better also add support for whiteouts to tmpfs. This patch implements .vop_whiteout and makes necessary changes to lookup() and readdir() to take them into account. We must also make sure that when adding or removing a file, we honour the componentname's DOWHITEOUT and ISWHITEOUT, to prevent duplicate filenames. MFC after: 1 month	2010-08-22 05:36:06 +00:00
Alan Cox	8393d186b9	Eliminate unnecessary page queues locking.	2010-06-16 00:41:21 +00:00
Edward Tomasz Napierala	307d88b787	Style fixes and removal of unneeded variable. Submitted by: bde@	2010-05-06 18:43:19 +00:00
Edward Tomasz Napierala	b5f770bd86	Move checking against RLIMIT_FSIZE into one place, vn_rlimit_fsize(). Reviewed by: kib	2010-05-05 16:44:25 +00:00
Alan Cox	e3ef0d2fcf	Push down the acquisition of the page queues lock into vm_page_unwire(). Update the comment describing which lock should be held on entry to vm_page_wire(). Reviewed by: kib	2010-05-05 03:45:46 +00:00
Alan Cox	c5a648516e	Acquire the page lock around vm_page_unwire() and vm_page_wire(). Reviewed by: kib	2010-05-03 16:41:11 +00:00
Alan Cox	b88b6c9d80	It makes no sense for vm_page_sleep_if_busy()'s helper, vm_page_sleep(), to unconditionally set PG_REFERENCED on a page before sleeping. In many cases, it's perfectly ok for the page to disappear, i.e., be reclaimed by the page daemon, before the caller to vm_page_sleep() is reawakened. Instead, we now explicitly set PG_REFERENCED in those cases where having the page persist until the caller is awakened is clearly desirable. Note, however, that setting PG_REFERENCED on the page is still only a hint, and not a guarantee that the page should persist.	2010-05-02 17:33:46 +00:00
Xin LI	82cf92d483	Add locking around access to parent node, and bail out when the parent node is already freed rather than panicking the system. PR: kern/122038 Submitted by: gk Tested by: pho MFC after: 1 week	2009-10-11 07:03:56 +00:00
Xin LI	3fa0694aaa	Add a special workaround to handle UIO_NOCOPY case. This fixes data corruption observed when sendfile() is being used. PR: kern/127213 Submitted by: gk MFC after: 2 weeks	2009-10-07 23:17:15 +00:00
Xin LI	7441ac4618	Fix a bug that causes the fsx test case of mmap'ed page being out of sync of read/write, inspired by ZFS's counterpart. PR: kern/139312 Submitted by: gk@ MFC after: 1 week	2009-10-04 10:38:04 +00:00
Alan Cox	47f11d9a46	Eliminate unnecessary variables.	2009-06-13 20:21:08 +00:00
Alan Cox	e2d0be0172	Eliminate redundant setting of a page's valid bits and pointless clearing of the same page's dirty bits.	2009-05-27 18:12:10 +00:00
Attilio Rao	dfd233edd5	Remove the thread argument from the FSD (File-System Dependent) parts of the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread. In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP. While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option. VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.	2009-05-11 15:33:26 +00:00

1 2

78 Commits