freebsd-dev

Author	SHA1	Message	Date
Jeff Roberson	8d6fbbb867	Replace manyinstances of VM_WAIT with blocking page allocation flags similar to the kernel memory allocator. This simplifies NUMA allocation because the domain will be known at wait time and races between failure and sleeping are eliminated. This also reduces boilerplate code and simplifies callers. A wait primitive is supplied for uma zones for similar reasons. This eliminates some non-specific VM_WAIT calls in favor of more explicit sleeps that may be satisfied without new pages. Reviewed by: alc, kib, markj Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon	2017-11-08 02:39:37 +00:00
Matt Joras	ba19246e07	Move clear_unrhdr to tmpfs_free_tmp. Clearing the unr in tmpfs_unmount is not correct. In the case of multiple references to the tmpfs mount (e.g. when there are lookup threads using it) it will not be the one to finish tmpfs_free_tmp. In those cases tmpfs_free_node_locked will be the final one to execute tmpfs_free_tmp, and until then the unr must be valid. Reported by: pho Approved/reviewed by: rstone (mentor) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12749	2017-10-23 15:43:38 +00:00
Matt Joras	9aaf913e13	When unmounting a tmpfs, do not call free_unr. tmpfs uses unr(9) to allocate inodes. Previously when unmounting it would individually free the units when it freed each vnode. This is unnecessary as we can use the newly-added unrhdr_clear function to clear out the unr in onde go. This measurably reduces the time to unmount a tmpfs with many files. Reviewed by: cem, lidl Approved by: rstone (mentor) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12591	2017-10-11 21:53:53 +00:00
John Baldwin	9d1d1d1900	Return 64 for pathconf(_PC_FILESIZEBITS) on tmpfs. Sponsored by: Chelsio Communications	2017-10-02 23:23:12 +00:00
Mateusz Guzik	e3e10c39f1	tmpfs: skip zero-sized page count updates Such updates consisted of vast majority of modificiations, especially in tmpfs_reg_resize. For the case where page count did no change and the size grew we only need to update tn_size. Use this fact to avoid vm object lock/relock. MFC after: 1 week	2017-09-30 18:23:45 +00:00
John Baldwin	5b01ccb01e	Use tmpfs_print for tmpfs FIFOs. Reviewed by: kib (part of a larger patch)	2017-09-25 20:26:16 +00:00
John Baldwin	15a88f8158	Consistently use vop_stdpathconf() for default pathconf values. Update filesystems not currently using vop_stdpathconf() in pathconf VOPs to use vop_stdpathconf() for any configuration variables that do not have filesystem-specific values. vop_stdpathconf() is used for variables that have system-wide settings as well as providing default values for some values based on system limits. Filesystems can still explicitly override individual settings. PR: 219851 Reported by: cem Reviewed by: cem, kib, ngie MFC after: 1 month Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D11541	2017-07-11 21:55:20 +00:00
Konstantin Belousov	538ee0d74e	Remove mistakenly merged field. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-19 20:03:26 +00:00
Konstantin Belousov	00ac6a98d8	Add mount option for tmpfs(5) to not use namecache. The option "nonc" disables using of namecache for the created mount, by default namecache is used. The rationale for the option is that namecache duplicates the information which is already kept in memory by tmpfs. Since it believed that namecache scales better than tmpfs, or will scale better, do not enable the option by default. On the other hand, smaller machines may benefit from lesser namecache pressure. Discussed with: mjg Tested by: pho (as part of larger patch) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-01-19 19:46:49 +00:00
Konstantin Belousov	08c053e71c	Implement VOP_VPTOCNP() for tmpfs. For directories, node->tn_spec.tn_dir.tn_parent pointer to the parent is used. For non-directories, the implementation is naive, all directory nodes are scanned to find a dirent linking the specified node. This can be significantly improved by maintaining tn_parent for all nodes, later. Tested by: pho (as part of larger patch) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-01-19 19:29:13 +00:00
Konstantin Belousov	b4ba3b6459	VNON nodes cannot exist. Tested by: pho (as part of larger patch) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-01-19 19:25:42 +00:00
Konstantin Belousov	64c250439f	Refcount tmpfs nodes and mount structures. On dotdot lookup and fhtovp operations, it is possible for the file represented by tmpfs node to be removed after the thread calculated the pointer. In this case, tmpfs_alloc_vp() accesses freed memory. Introduce the reference count on the nodes. The allnodes list from tmpfs mount owns 1 reference, and threads performing unlocked operations on the node, add one transient reference. Similarly, since struct tmpfs_mount maintains the list where nodes are enlisted, refcount it by one reference from struct mount and one reference from each node on the list. Both nodes and tmpfs_mounts are removed when refcount goes to zero. Note that this means that nodes and tmpfs_mounts might survive some time after the node is deleted or tmpfs_unmount() finished. The tmpfs_alloc_vp() in these cases returns error either due to node removal (tn_nlinks == 0) or because of insmntque1(9) error. Tested by: pho (as part of larger patch) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-01-19 19:15:21 +00:00
Konstantin Belousov	1c07d69bc2	Make tmpfs directory cursor available outside tmpfs_subr.c. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-19 18:38:58 +00:00
Konstantin Belousov	280ffa5ed7	Rename tmpfs_mount member allnode_lock to include namespace prefix. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-19 16:01:36 +00:00
Konstantin Belousov	4960d0d453	Protect macro argument. Requested by: hselasky Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-19 15:06:18 +00:00
Konstantin Belousov	e7e6c82067	Rework some tmpfs lock assertions. Remove TMPFS_ASSERT_ELOCKED(). Its claims are already stated by other asserts nearby and by VFS guarantees. Change TMPFS_ASSERT_LOCKED() and one inlined place to use ASSERT_VOP_(E)LOCKED() instead of hand-rolled imprecise asserts. Tested by: pho (as part of the larger patch) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-19 14:49:55 +00:00
Konstantin Belousov	bba7ed2054	Style fixes and comment updates. Edit comments which explain no longer relevant details, and add locking annotations to the struct tmpfs_node members. Tested by: pho (as part of the larger patch) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-19 14:27:37 +00:00
Konstantin Belousov	9e3ff5c594	Remove unused union member, fifos on tmpfs are implemented in common code. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-19 13:35:14 +00:00
Mateusz Guzik	ed2159c92c	tmpfs: manage tm_pages_used with atomics Reviewed by: kib (previous version)	2017-01-14 06:20:36 +00:00
Mateusz Guzik	3b622fc857	tmpfs: perform a lockless check in tmpfs_itimes Most of the time the status is 0 as the function is repeatedly called from tmpfs_getattr.	2017-01-06 19:58:20 +00:00
Mateusz Guzik	31e73fd434	tmpfs: enabled MNTK_EXTENDED_SHARED Discussed with: kib	2017-01-06 18:01:46 +00:00
Konstantin Belousov	5dc1128656	Lock tmpfs node tn_status updates done under the shared vnode lock. If tmpfs vnode is only shared locked, tn_status field still needs updates to note the access time modification. Use the same locking scheme as for UFS, protect tn_status with the node interlock + shared vnode lock. Fix nearby style. Noted and reviewed by: mjg Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-06 17:43:36 +00:00
Konstantin Belousov	305b422966	Use vnode lock assertion expression, and upgrade it to assert the required exclusive state of the vnode lock in tmpfs chflags, chmod, chown, chsize, chtimes operations. Fix nearby style. Reviewed by: mjg Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-06 17:32:44 +00:00
Konstantin Belousov	9a4d5dbbac	Remove dead code. Fifos overwrite file ops vector, and fifo VOP_KQFILTER is VOP_PANIC(). Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-06 17:03:08 +00:00
Konstantin Belousov	1c32456953	Use type-independent formats for printing nlink_t and ino_t. Extracted from: ino64 work by gleb, mckusick Discussed with: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-06 16:59:33 +00:00
Alan Cox	2d612d2dd2	When tmpfs and POSIX shm pagein a page for the sole purpose of performing truncation, immediately queue the page for asynchronous laundering rather than making the page pass through inactive queue first. Reviewed by: kib, markj	2016-12-11 19:24:41 +00:00
Alan Cox	bba39b9ae3	Remove PG_CACHED-related fields from struct vmmeter, because they are no longer used. More precisely, they are always zero because the code that decremented and incremented them no longer exists. Bump __FreeBSD_version to mark this change. Reviewed by: kib, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8583	2016-11-22 18:13:46 +00:00
Alan Cox	7667839a7e	Remove most of the code for implementing PG_CACHED pages. (This change does not remove user-space visible fields from vm_cnt or all of the references to cached pages from comments. Those changes will come later.) Reviewed by: kib, markj Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8497	2016-11-15 18:22:50 +00:00
Konstantin Belousov	15ad3e51c5	Convert another tmpfs assert into runtime check. The offset of the directory file, passed to getdirentries(2) syscall, is user-controllable. The value of the offset must not be asserted, instead the invalid value should be checked and rejected if invalid. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-10 13:50:21 +00:00
Pedro F. Giffuni	b3a15ddd5b	sys/fs: spelling fixes in comments. No functional change.	2016-04-29 20:51:24 +00:00
Pedro F. Giffuni	74b8d63dcc	Cleanup unnecessary semicolons from the kernel. Found with devel/coccinelle.	2016-04-10 23:07:00 +00:00
Mark Johnston	785eb42adf	Clear the cookie pointer on error in tmpfs_readdir(). It is otherwise left dangling, and callers that request cookies always free the cookie buffer, even when VOP_READDIR(9) returns an error. This results in a double free if tmpfs_readdir() returns an error to the NFS server or the Linux getdents(2) emulation code. Reported by: pho MFC after: 1 week Security: double free of malloc(9)-backed memory Sponsored by: EMC / Isilon Storage Division	2016-02-12 20:43:53 +00:00
Gleb Smirnoff	b0cd20172d	A change to KPI of vm_pager_get_pages() and underlying VOP_GETPAGES(). o With new KPI consumers can request contiguous ranges of pages, and unlike before, all pages will be kept busied on return, like it was done before with the 'reqpage' only. Now the reqpage goes away. With new interface it is easier to implement code protected from race conditions. Such arrayed requests for now should be preceeded by a call to vm_pager_haspage() to make sure that request is possible. This could be improved later, making vm_pager_haspage() obsolete. Strenghtening the promises on the business of the array of pages allows us to remove such hacks as swp_pager_free_nrpage() and vm_pager_free_nonreq(). o New KPI accepts two integer pointers that may optionally point at values for read ahead and read behind, that a pager may do, if it can. These pages are completely owned by pager, and not controlled by the caller. This shifts the UFS-specific readahead logic from vm_fault.c, which should be file system agnostic, into vnode_pager.c. It also removes one VOP_BMAP() request per hard fault. Discussed with: kib, alc, jeff, scottl Sponsored by: Nginx, Inc. Sponsored by: Netflix	2015-12-16 21:30:45 +00:00
Christian Brueffer	382353e2e8	In tmpfs_chtimes(), remove checks on the nanosecond level when determining whether a node changed. Other filesystems, e.g., UFS, only check on seconds, when determining whether something changed. This also corrects the birthtime case, where we checked tv_nsec twice, instead of tv_sec and tv_nsec (PR). PR: 201284 Submitted by: David Binderman Patch suggested by: kib Reviewed by: kib MFC after: 2 weeks Committed from: Essen FreeBSD Hackathon	2015-07-26 08:33:46 +00:00
Mark Johnston	5f34e93c58	Check suspendability on the mountpoint returned by VOP_GETWRITEMOUNT. This obviates the need for a MNTK_SUSPENDABLE flag, since passthrough filesystems like nullfs and unionfs no longer need to inherit this information from their lower layer(s). This change also restores the pre-r273336 behaviour of using the presence of a susp_clean VFS method to request suspension support. Reviewed by: kib, mjg Differential Revision: https://reviews.freebsd.org/D2937	2015-07-05 22:37:33 +00:00
Mark Murray	d1b06863fb	Huge cleanup of random(4) code. * GENERAL - Update copyright. - Make kernel options for RANDOM_YARROW and RANDOM_DUMMY. Set neither to ON, which means we want Fortuna - If there is no 'device random' in the kernel, there will be NO random(4) device in the kernel, and the KERN_ARND sysctl will return nothing. With RANDOM_DUMMY there will be a random(4) that always blocks. - Repair kern.arandom (KERN_ARND sysctl). The old version went through arc4random(9) and was a bit weird. - Adjust arc4random stirring a bit - the existing code looks a little suspect. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Redo read_random(9) so as to duplicate random(4)'s read internals. This makes it a first-class citizen rather than a hack. - Move stuff out of locked regions when it does not need to be there. - Trim RANDOM_DEBUG printfs. Some are excess to requirement, some behind boot verbose. - Use SYSINIT to sequence the startup. - Fix init/deinit sysctl stuff. - Make relevant sysctls also tunables. - Add different harvesting "styles" to allow for different requirements (direct, queue, fast). - Add harvesting of FFS atime events. This needs to be checked for weighing down the FS code. - Add harvesting of slab allocator events. This needs to be checked for weighing down the allocator code. - Fix the random(9) manpage. - Loadable modules are not present for now. These will be re-engineered when the dust settles. - Use macros for locks. - Fix comments. * src/share/man/... - Update the man pages. * src/etc/... - The startup/shutdown work is done in D2924. * src/UPDATING - Add UPDATING announcement. * src/sys/dev/random/build.sh - Add copyright. - Add libz for unit tests. * src/sys/dev/random/dummy.c - Remove; no longer needed. Functionality incorporated into randomdev.. live_entropy_sources.c live_entropy_sources.h - Remove; content moved. - move content to randomdev.[ch] and optimise. * src/sys/dev/random/random_adaptors.c src/sys/dev/random/random_adaptors.h - Remove; plugability is no longer used. Compile-time algorithm selection is the way to go. * src/sys/dev/random/random_harvestq.c src/sys/dev/random/random_harvestq.h - Add early (re)boot-time randomness caching. * src/sys/dev/random/randomdev_soft.c src/sys/dev/random/randomdev_soft.h - Remove; no longer needed. * src/sys/dev/random/uint128.h - Provide a fake uint128_t; if a real one ever arrived, we can use that instead. All that is needed here is N=0, N++, N==0, and some localised trickery is used to manufacture a 128-bit 0ULLL. * src/sys/dev/random/unit_test.c src/sys/dev/random/unit_test.h - Improve unit tests; previously the testing human needed clairvoyance; now the test will do a basic check of compressibility. Clairvoyant talent is still a good idea. - This is still a long way off a proper unit test. * src/sys/dev/random/fortuna.c src/sys/dev/random/fortuna.h - Improve messy union to just uint128_t. - Remove unneeded 'static struct fortuna_start_cache'. - Tighten up up arithmetic. - Provide a method to allow eternal junk to be introduced; harden it against blatant by compress/hashing. - Assert that locks are held correctly. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Turn into self-sufficient module (no longer requires randomdev_soft.[ch]) * src/sys/dev/random/yarrow.c src/sys/dev/random/yarrow.h - Improve messy union to just uint128_t. - Remove unneeded 'staic struct start_cache'. - Tighten up up arithmetic. - Provide a method to allow eternal junk to be introduced; harden it against blatant by compress/hashing. - Assert that locks are held correctly. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Turn into self-sufficient module (no longer requires randomdev_soft.[ch]) - Fix some magic numbers elsewhere used as FAST and SLOW. Differential Revision: https://reviews.freebsd.org/D2025 Reviewed by: vsevolod,delphij,rwatson,trasz,jmg Approved by: so (delphij)	2015-06-30 17:00:45 +00:00
Konstantin Belousov	8551285097	Restore the td_cookie value for the tmpfs directory entry which was a dup entry, upon detach from the parent directory. If the node is renamed, the entry is re-attached at the different directory, and invalud cookie value triggers assert (or corrupts directory rb tree, it seems). Reported by: clusteradm (gjb, antoine) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-06-19 07:25:15 +00:00
Gleb Smirnoff	093c7f396d	Make KPI of vm_pager_get_pages() more strict: if a pager changes a page in the requested array, then it is responsible for disposition of previous page and is responsible for updating the entry in the requested array. Now consumers of KPI do not need to re-lookup the pages after call to vm_pager_get_pages(). Reviewed by: kib Sponsored by: Netflix Sponsored by: Nginx, Inc.	2015-06-12 11:32:20 +00:00
Will Andrews	677c3c0c66	tmpfs_getattr(): Return more correct allocated byte counts. For VREG vnodes, return the resident page count (multiplied by PAGE_SIZE) for the tmpfs node's anonymous VM object that stores actual file contents. For all other vnodes, return the tmpfs_node's tn_size, which should not be rounded to a page. This change allows using stat(2) to identify a sparse file on tmpfs. Reviewed by: kib MFC after: 1 week	2015-04-10 19:04:39 +00:00
Konstantin Belousov	bf5fce2bee	Remove duplicated assignment. CID: 1267988 Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-02-03 12:09:48 +00:00
Konstantin Belousov	e0a60ae16a	Update directory times immediately after an entry is created or removed. Postponing it until tmpfs_getattr() is called causes discordant values reported for file times vs. directory times. Reported and tested by: madpilot Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-31 21:31:53 +00:00
Konstantin Belousov	f1a90a7bac	Remove single-use boolean. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-01-31 12:58:04 +00:00
Konstantin Belousov	311d39f2ee	POSIX states that write(2) "shall mark for update the last data modification and last file status change timestamps of the file". Currently, tmpfs only modifies ctime when file was extended. Since r277828 followed tmpfs_write(), mmaped writes also do not modify ctime. Fix this, by updating both ctime and mtime for writes to tmpfs files. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-01-31 12:27:18 +00:00
Konstantin Belousov	f40cb1c645	Update mtime for tmpfs files modified through memory mapping. Similar to UFS, perform updates during syncer scans, which in particular means that tmpfs now performs scan on sync. Also, this means that a mtime update may be delayed up to 30 seconds after the write. The vm_object' OBJ_TMPFS_DIRTY flag for tmpfs swap object is similar to the OBJ_MIGHTBEDIRTY flag for the vnode object, it indicates that object could have been dirtied. Adapt fast page fault handler and vm_object_set_writeable_dirty() to handle OBJ_TMPFS_NODE same as OBJT_VNODE. Reported by: Ronald Klop <ronald-lists@klop.ws> Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-01-28 10:37:23 +00:00
Konstantin Belousov	3544b0f68f	tmpfs does not use UVM on FreeBSD. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2015-01-28 10:25:35 +00:00
Konstantin Belousov	789bdfdbc6	Handle MAKEENTRY cnp flag in the VOP_CREATE(). Curiously, some fs, e.g. smbfs, already did it. Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-12-21 13:29:33 +00:00
Konstantin Belousov	6c21f6edb8	The VOP_LOOKUP() implementations for CREATE op do not put the name into namecache, to avoid cache trashing when doing large operations. E.g., tar archive extraction is not usually followed by access to many of the files created. Right now, each VOP_LOOKUP() implementation explicitely knowns about this quirk and tests for both MAKEENTRY flag presence and op != CREATE to make the call to cache_enter(). Centralize the handling of the quirk into VFS, by deciding to cache only by MAKEENTRY flag in VOP. VFS now sets NOCACHE flag for CREATE namei() calls. Note that the change in semantic is backward-compatible and could be merged to the stable branch, and is compatible with non-changed third-party filesystems which correctly handle MAKEENTRY. Suggested by: Chris Torek <torek@pi-coral.com> Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-12-18 10:01:12 +00:00
Mateusz Guzik	12e2a30ef9	tmpfs: allow shared file lookups Tested by: pho	2014-10-21 21:27:13 +00:00
Mateusz Guzik	4fce16e4c9	Provide vfs suspension support only for filesystems which need it, take two. nullfs and unionfs need to request suspension if underlying filesystem(s) use it. Utilize mnt_kern_flag for this purpose. This is a fixup for 273271. No strong objections from: kib Pointy hat to: mjg MFC after: 2 weeks	2014-10-20 18:00:50 +00:00
Mateusz Guzik	020b8f17a0	Provide vfs suspension support only for filesystems which need it. Need is expressed by providing vfs_susp_clean function in vfsops. Differential Revision: D952 Reviewed by: kib (previous version) MFC after: 2 weeks	2014-10-19 06:59:33 +00:00
Konstantin Belousov	22bdc15a57	Do not ignore error from tmpfs_alloc_vp(). It results in access to the random memory. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-07-16 14:08:01 +00:00
Konstantin Belousov	de75292a5b	Remove unused header. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-07-16 14:06:16 +00:00
Konstantin Belousov	65589a29f4	Check for the cross-device cross-link attempt in the VFS, instead of forcing filesystem VOP_LINK() methods to repeat the code. In tmpfs_link(), remove redundand check for the type of the source, already done by VFS. Note that NFS server already performs this check before calling VOP_LINK(). Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-07-16 14:04:46 +00:00
Konstantin Belousov	4cda7f7ece	Rework the tmpfs unmount. - Suspend filesystem for unmount. This prevents new tmpfs nodes from instantiating, and also ensures that only unmount thread can destroy nodes. - Do not start tmpfs node deletion until all vnodes are reclaimed, which guarantees that no thread can access tmpfs data. For this, call vflush() in the loop, until the mnt_nvnodelistsize is non-zero. Note that after mnt_nvnodelistsize becomes 0, insmntque() blocks insertion of a vnode germ into the mount list of vnodes. - Fail node allocation when the filesystem is being unmounted. This is race-free due to the vflush() call in loop. This is mostly cosmetic, avoiding some more work which might be done until suspension in unmount is started. Note that there is currently no way to prevent new vnode instantiation from readers during the unmount. Due to this, forced unmount might live-lock if vflush() loop cannot get to the zero vnode count due to races with readers. The unmount would proceed after the load is lifted. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-07-14 09:52:33 +00:00
Konstantin Belousov	b5b3326191	Change forgotten in r268615. Set the OBJ_TMPFS_NODE flag for vm_object of VREG tmpfs node. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-07-14 09:35:14 +00:00
Konstantin Belousov	eb2c06b63a	Use tmpfs_vn_get_ino_gen() to handle the races with reclaim in tmpfs dotdot lookup. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-07-14 09:16:55 +00:00
Konstantin Belousov	fd63693dcf	Style. Add comment about lock mode. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-07-14 09:13:56 +00:00
Konstantin Belousov	7a41bc2f41	In tmpfs_alloc_file(), code after the 'out' label does only 'return error;'. Replace goto's with the return. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-07-14 09:02:40 +00:00
Konstantin Belousov	d2ca06cdd2	Add convenience macro to assert tmpfs node lock. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-07-14 08:59:25 +00:00
Konstantin Belousov	55781cb922	Add some assertions for the code handling vm_object for tmpfs vnode. In particular, vnode must be exclusively locked when the tmpfs vnode and object are divorced. When the vnode is opened, the object must be still alive, since only live vnode can be opened, and the tmpfs node owns a reference on the object. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-07-14 08:55:02 +00:00
Konstantin Belousov	706f80801d	The tmpfs_link() must not dereference the filesystem-specific data for a vnode until it is verified that the vnode indeed belongs to tmpfs mount. Otherwise, it might access random memory, at least in the debug kernel. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-07-14 08:45:29 +00:00
Konstantin Belousov	fca015d301	Remove code separator lines which do not conform to style(9). Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-07-14 08:17:11 +00:00
Konstantin Belousov	7b81a399a4	In msdosfs_setattr(), add a check for result of the utimes(2) permissions test, forgotten in r164033. Refactor the permission checks for utimes(2) into vnode helper function vn_utimes_perm(9), and simplify its code comparing with the UFS origin, by writing the call to VOP_ACCESSX only once. Use the helper for UFS(5), tmpfs(5), devfs(5) and msdosfs(5). Reported by: bde Reviewed by: bde, trasz Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-06-17 07:11:00 +00:00
Konstantin Belousov	60c5c866aa	Allow shared locking for the tmpfs vnodes. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-06-04 15:30:49 +00:00
Bryan Drewery	44f1c91610	Rename global cnt to vm_cnt to avoid shadowing. To reduce the diff struct pcu.cnt field was not renamed, so PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in kvm(3) and vmstat(8). The goal was to not affect externally used KPI. Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the the global cnt variable. Exp-run revealed no ports using it directly. No objection from: arch@ Sponsored by: EMC / Isilon Storage Division	2014-03-22 10:26:09 +00:00
Bryan Drewery	504bde017a	Add missing FALLTHROUGH comment in tmpfs_dir_getdents for looking up '.' and '..'. Reviewed by: Russell Cattelan Sponsored by: EMC / Isilon Storage Division MFC after: 2 weeks	2014-03-14 13:58:02 +00:00
Bryan Drewery	ac09d109ca	Rename cnt to maxcookies and change its use as the condition for when to lookup cookies to be less obscure. No functional change. Since r245115, cnt has not really been needed in tmpfs_dir_getdents(). Keep it for the MPASS() for now though. Sponsored by: EMC / Isilon Storage Division MFC after: 2 weeks	2014-03-14 13:55:48 +00:00
Bryan Drewery	62dca316da	Cleanup redundant logic and add some comments to help explain how it works in lieu of potentially less clear code. Sponsored by: EMC / Isilon Storage Division Discussed with: Russell Cattelan	2014-03-14 02:10:30 +00:00
Bryan Drewery	0742ebc98f	Fix -o size less than PAGE_SIZE resulting in SIZE_MAX being used. Discussed with: kib MFC after: 2 weeks	2014-03-14 01:43:55 +00:00
Kenneth D. Merry	3b5f179d2a	Support storing 7 additional file flags in tmpfs: UF_SYSTEM, UF_SPARSE, UF_OFFLINE, UF_REPARSE, UF_ARCHIVE, UF_READONLY, and UF_HIDDEN. Sort the file flags tmpfs supports alphabetically. tmpfs now supports the same flags as UFS, with the exception of SF_SNAPSHOT. Reported by: bdrewery, antoine Sponsored by: Spectra Logic	2013-08-28 22:12:56 +00:00
Xin LI	2454886e05	Allow tmpfs be mounted inside jail.	2013-08-23 22:52:20 +00:00
Konstantin Belousov	41cf41fdfd	Extract the general-purpose code from tmpfs to perform uiomove from the page queue of some vm object. Discussed with: alc Tested by: pho Sponsored by: The FreeBSD Foundation	2013-08-21 17:23:24 +00:00
Attilio Rao	c7aebda8a1	The soft and hard busy mechanism rely on the vm object lock to work. Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl	2013-08-09 11:11:11 +00:00
Konstantin Belousov	8239a7a878	The tmpfs_alloc_vp() is used to instantiate vnode for the tmpfs node, in particular, from the tmpfs_lookup VOP method. If LK_NOWAIT is not specified in the lkflags, the lookup is supposed to return an alive vnode whenever the underlying node is valid. Currently, the tmpfs_alloc_vp() returns ENOENT if the vnode attached to node exists and is being reclaimed. This causes spurious ENOENT errors from lookup on tmpfs and corresponding random 'No such file' failures from syscalls working with tmpfs files. Fix this by waiting for the doomed vnode to be detached from the tmpfs node if sleepable allocation is requested. Note that filesystems which use vfs_hash.c, correctly handle the case due to vfs_hash_get() looping when vget() returns ENOENT for sleepable requests. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-08-05 18:53:59 +00:00
Attilio Rao	be99683637	Revert r253939: We cannot busy a page before doing pagefaults. Infact, it can deadlock against vnode lock, as it tries to vget(). Other functions, right now, have an opposite lock ordering, like vm_object_sync(), which acquires the vnode lock first and then sleeps on the busy mechanism. Before this patch is reinserted we need to break this ordering. Sponsored by: EMC / Isilon storage division Reported by: kib	2013-08-05 08:55:35 +00:00
Attilio Rao	3b6714cacb	The page hold mechanism is fast but it has couple of fallouts: - It does not let pages respect the LRU policy - It bloats the active/inactive queues of few pages Try to avoid it as much as possible with the long-term target to completely remove it. Use the soft-busy mechanism to protect page content accesses during short-term operations (like uiomove_fromphys()). After this change only vm_fault_quick_hold_pages() is still using the hold mechanism for page content access. There is an additional complexity there as the quick path cannot immediately access the page object to busy the page and the slow path cannot however busy more than one page a time (to avoid deadlocks). Fixing such primitive can bring to complete removal of the page hold mechanism. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff Tested by: pho	2013-08-04 21:07:24 +00:00
Attilio Rao	878a788734	Remove unnecessary soft busy of the page before to do vn_rdwr() in kern_sendfile() which is unnecessary. The page is already wired so it will not be subjected to pagefault. The content cannot be effectively protected as it is full of races already. Multiple accesses to the same indexes are serialized through vn_rdwr(). Sponsored by: EMC / Isilon storage division Reviewed by: alc, jeff Tested by: pho	2013-08-04 15:56:19 +00:00
Nathan Whitehorn	59169d9156	tmpfs works perfectly fine with -o union -- there is no reason to exclude it from the list of options.	2013-07-23 14:48:37 +00:00
Alan Cox	f50b6721e1	Add missing VM object unlocks in an error case. Reviewed by: kib	2013-06-07 19:42:00 +00:00
Alan Cox	27a18d6a23	Don't busy the page unless we are likely to release the object lock. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2013-06-06 06:17:20 +00:00
Alan Cox	ba887a9b33	Eliminate unnecessary vm object locking from tmpfs_nocacheread().	2013-06-04 15:40:45 +00:00
Konstantin Belousov	67b4ed4b88	Assert that OBJ_TMPFS flag on the vm object for the tmpfs node is cleared when the tmpfs node is going away. Tested by: bdrewery, pho	2013-05-30 19:51:33 +00:00
Konstantin Belousov	3fa456b35d	Avoid deactivating the page if it is already on a queue, only requeue the page. This both reduces the number of queues locking and avoids moving the active page to inactive list just because the page was read or written. Based on the suggestion by: alc Reviewed by: alc Tested by: pho	2013-05-06 21:04:42 +00:00
Konstantin Belousov	df6b240b6f	Fix the v_object leak for non-regular tmpfs vnodes. Reported and tested by: pho Sponsored by: The FreeBSD Foundation	2013-05-02 18:46:31 +00:00
Konstantin Belousov	158cc900bb	For the new regular tmpfs vnode, v_object is initialized before insmntque() is called. The standard insmntque destructor resets the vop vector to deadfs one, and calls vgone() on the vnode. As result, v_object is kept unchanged, which triggers an assertion in the reclaim code, on instmntque() failure. Also, in this case, OBJ_TMPFS flag on the backed vm object is not cleared. Provide the tmpfs insmntque() destructor which properly clears OBJ_TMPFS flag and resets v_object. Reported and tested by: pho Sponsored by: The FreeBSD Foundation	2013-05-02 18:44:31 +00:00
Konstantin Belousov	bdefcb6959	The page read or written could be wired. Do not requeue if the page is not on a queue. Reported and tested by: pho Sponsored by: The FreeBSD Foundation	2013-05-02 18:36:52 +00:00
Konstantin Belousov	6f2af3fcf3	Rework the handling of the tmpfs node backing swap object and tmpfs vnode v_object to avoid double-buffering. Use the same object both as the backing store for tmpfs node and as the v_object. Besides reducing memory use up to 2x times for situation of mapping files from tmpfs, it also makes tmpfs read and write operations copy twice bytes less. VM subsystem was already slightly adapted to tolerate OBJT_SWAP object as v_object. Now the vm_object_deallocate() is modified to not reinstantiate OBJ_ONEMAPPING flag and help the VFS to correctly handle VV_TEXT flag on the last dereference of the tmpfs backing object. Reviewed by: alc Tested by: pho, bf MFC after: 1 month	2013-04-28 19:38:59 +00:00
Pawel Jakub Dawidek	051a23d4e8	- Constify local path variable for chflagsat(). - Use correct format characters (%lx) for u_long. This fixes the build broken in r248599.	2013-03-22 07:40:34 +00:00
Pawel Jakub Dawidek	b4b2596b97	- Make 'flags' argument to chflags(2), fchflags(2) and lchflags(2) of type u_long. Before this change it was of type int for syscalls, but prototypes in sys/stat.h and documentation for chflags(2) and fchflags(2) (but not for lchflags(2)) stated that it was u_long. Now some related functions use u_long type for flags (strtofflags(3), fflagstostr(3)). - Make path argument of type 'const char *' for consistency. Discussed on: arch Sponsored by: The FreeBSD Foundation	2013-03-21 22:44:33 +00:00
Konstantin Belousov	0d3bb4afa8	Remove negative name cache entry pointing to the target name, which could be instantiated while tdvp was unlocked. Reported by: Rick Miller <vmiller at hostileadmin com> Tested by: pho MFC after: 1 week	2013-03-17 15:11:37 +00:00
Attilio Rao	89f6b8632c	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho	2013-03-09 02:32:23 +00:00
Attilio Rao	5e60cb948e	Remove a racy checks on resident and cached pages for tmpfs_mapped{read, write}() functions: - tmpfs_mapped{read, write}() are only called within VOP_{READ, WRITE}(), which check before-hand to work only on valid VREG vnodes. Also the vnode is locked for the duration of the work, making vnode reclaiming impossible, during the operation. Hence, vobj can never be NULL. - Currently check on resident pages and cached pages without vm object lock held is racy and can do even more harm than good, as a page could be transitioning between these 2 pools and then be skipped entirely. Skip the checks as lookups on empty splay trees are very cheap. Discussed with: alc Tested by: flo MFC after: 2 weeks	2013-02-10 01:04:10 +00:00
Gleb Kurtsou	4fd5efe79e	tmpfs: Replace directory entry linked list with RB-Tree. Use file name hash as a tree key, handle duplicate keys. Both VOP_LOOKUP and VOP_READDIR operations utilize same tree for search. Directory entry offset (cookie) is either file name hash or incremental id in case of hash collisions (duplicate-cookies). Keep sorted per directory list of duplicate-cookie entries to facilitate cookie number allocation. Don't fail if previous VOP_READDIR() offset is no longer valid, start with next dirent instead. Other file system handle it similarly. Workaround race prone tn_readdir_last[pn] fields update. Add tmpfs_dir_destroy() to free all dirents. Set NFS cookies in tmpfs_dir_getdents(). Return EJUSTRETURN from tmpfs_dir_getdents() instead of hard coded -1. Mark directory traversal routines static as they are no longer used outside of tmpfs_subr.c	2013-01-06 22:15:44 +00:00
Attilio Rao	bc2258da88	Complete MPSAFE VFS interface and remove MNTK_MPSAFE flag. Porters should refer to __FreeBSD_version 1000021 for this change as it may have happened at the same timeframe.	2012-11-09 18:02:25 +00:00
Matthew D Fleming	fc8fdae0df	Fix up kernel sources to be ready for a 64-bit ino_t. Original code by: Gleb Kurtsou	2012-09-27 23:30:49 +00:00
Konstantin Belousov	1c771f9222	After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason to pull vm_param.h was removed. Other big dependency of vm_page.h on vm_param.h are PA_LOCK* definitions, which are only needed for in-kernel code, because modules use KBI-safe functions to lock the pages. Stop including vm_param.h into vm_page.h. Include vm_param.h explicitely for the kernel code which needs it. Suggested and reviewed by: alc MFC after: 2 weeks	2012-08-05 14:11:42 +00:00
Edward Tomasz Napierala	af6e6b87ad	Remove unused thread argument to vrecycle(). Reviewed by: kib	2012-04-23 14:10:34 +00:00
Jaakko Heinonen	fd1062ce4c	Return EOPNOTSUPP rather than EPERM for the SF_SNAPSHOT flag because tmpfs doesn't support snapshots. Suggested by: bde	2012-04-18 15:22:08 +00:00
Jaakko Heinonen	587fdb536f	Sync tmpfs_chflags() with the recent changes to UFS: - Add a check for unsupported file flags. - Return EPERM when an user without PRIV_VFS_SYSFLAGS privilege attempts to toggle SF_SETTABLE flags.	2012-04-16 18:10:34 +00:00
Jaakko Heinonen	c5ab5ce345	tmpfs: Allow update mounts only for certain options. Since r230208 update mounts were allowed if the list of mount options contained the "export" option. This is not correct as tmpfs doesn't really support updating all options. Reviewed by: kevlo, trociny	2012-04-16 18:07:42 +00:00
Gleb Kurtsou	f8439900d6	Provide better description for vfs.tmpfs.memory_reserved sysctl. Suggested by: Anton Yuzhaninov <citrin@citrin.ru>	2012-04-15 21:59:28 +00:00
Attilio Rao	a0f2c37b6f	- Introduce a cache-miss optimization for consistency with other accesses of the cache member of vm_object objects. - Use novel vm_page_is_cached() for checks outside of the vm subsystem. Reviewed by: alc MFC after: 2 weeks X-MFC: r234039	2012-04-09 17:05:18 +00:00
Gleb Kurtsou	9295c62814	tmpfs supports only INT_MAX nodes due to limitations of unit number allocator. Replace UINT32_MAX checks with INT_MAX. Keeping more than 2^31 nodes in memory is not likely to become possible in foreseeable feature and would require new unit number allocator. Discussed with: delphij MFC after: 2 weeks	2012-04-07 15:30:46 +00:00
Gleb Kurtsou	0ff93c48da	Add vfs_getopt_size. Support human readable file system options in tmpfs. Increase maximum tmpfs file system size to 4GB*PAGE_SIZE on 32 bit archs. Discussed with: delphij MFC after: 2 weeks	2012-04-07 15:27:34 +00:00
Gleb Kurtsou	da7aa2778e	Add reserved memory limit sysctl to tmpfs. Cleanup availble and used memory functions. Check if free pages available before allocating new node. Discussed with: delphij	2012-04-07 15:23:51 +00:00
Gleb Kurtsou	db94ad126a	Prevent tmpfs_rename() deadlock in a way similar to UFS Unlock vnodes and try to lock them one by one. Relookup fvp and tvp. Approved by: mdf (mentor)	2012-03-14 09:15:50 +00:00
Gleb Kurtsou	ca846258e2	Don't enforce LK_RETRY to get existing vnode in tmpfs_alloc_vp() Doomed vnode is hardly of any use here, besides all callers handle error case. vfs_hash_get() does the same. Don't mess with vnode holdcount, vget() takes care of it already. Approved by: mdf (mentor)	2012-03-14 08:29:21 +00:00
Konstantin Belousov	b80dcb55aa	Remove fifo.h. The only used function declaration from the header is migrated to sys/vnode.h. Submitted by: gianni	2012-03-11 12:19:58 +00:00
John Baldwin	58d65e8031	Similar to the fixes in 226967 and 226987, purge any name cache entries associated with the previous vnode (if any) associated with the target of a rename(). Otherwise, a lookup of the target pathname concurrent with a rename() could re-add a name cache entry after the namei(RENAME) lookup in kern_renameat() had purged the target pathname. MFC after: 2 weeks	2012-03-02 18:55:19 +00:00
Tijl Coosemans	0662ee9826	Replace PRIdMAX with "jd" in a printf call. Cast the corresponding value to intmax_t instead of uintmax_t, because the original type is off_t.	2012-02-14 11:24:24 +00:00
Kevin Lo	e0d3195bd6	Return EOPNOTSUPP since we only support update mounts for NFS export. Spotted by: trociny	2012-01-17 01:25:53 +00:00
Kevin Lo	57eb5548c9	Add nfs export support to tmpfs(5) Reviewed by: kib	2012-01-16 10:25:22 +00:00
Alan Cox	0b05cac3d2	When tmpfs_write() resets an extended file to its original size after an error, we want tmpfs_reg_resize() to ignore I/O errors and unconditionally update the file's size. Reviewed by: kib MFC after: 3 weeks	2012-01-16 00:26:49 +00:00
Alan Cox	93431cb74c	Neither tmpfs_nocacheread() nor tmpfs_mappedwrite() needs to call vm_object_pip_{add,subtract}() on the swap object because the swap object can't be destroyed while the vnode is exclusively locked. Moreover, even if the swap object could have been destroyed during tmpfs_nocacheread() and tmpfs_mappedwrite() this code is broken because vm_object_pip_subtract() does not wake up the sleeping thread that is trying to destroy the swap object. Free invalid pages after an I/O error. There is no virtue in keeping them around in the swap object creating more work for the page daemon. (I believe that any non-busy page in the swap object will now always be valid.) vm_pager_get_pages() does not return a standard errno, so its return value should not be returned by tmpfs without translation to an errno value. There is no reason for the wakeup on vpg in tmpfs_mappedwrite() to occur with the swap object locked. Eliminate printf()s from tmpfs_nocacheread() and tmpfs_mappedwrite(). (The swap pager already spam your console if data corruption is imminent.) Reviewed by: kib MFC after: 3 weeks	2012-01-14 23:04:27 +00:00
Alan Cox	2971897d51	Correct an error of omission in the implementation of the truncation operation on POSIX shared memory objects and tmpfs. Previously, neither of these modules correctly handled the case in which the new size of the object or file was not a multiple of the page size. Specifically, they did not handle partial page truncation of data stored on swap. As a result, stale data might later be returned to an application. Interestingly, a data inconsistency was less likely to occur under tmpfs than POSIX shared memory objects. The reason being that a different mistake by the tmpfs truncation operation helped avoid a data inconsistency. If the data was still resident in memory in a PG_CACHED page, then the tmpfs truncation operation would reactivate that page, zero the truncated portion, and leave the page pinned in memory. More precisely, the benevolent error was that the truncation operation didn't add the reactivated page to any of the paging queues, effectively pinning the page. This page would remain pinned until the file was destroyed or the page was read or written. With this change, the page is now added to the inactive queue. Discussed with: jhb Reviewed by: kib (an earlier version) MFC after: 3 weeks	2012-01-08 20:09:26 +00:00
Alan Cox	04f883d798	Don't pass VM_ALLOC_ZERO to vm_page_grab() in tmpfs_mappedwrite() and tmpfs_nocacheread(). It is both unnecessary and a pessimization. It results in either the page being zeroed twice or zeroed first and then overwritten by an I/O operation. MFC after: 3 weeks	2012-01-03 03:29:01 +00:00
Ivan Voras	6e92aee4e2	Avoid panics from recursive rename operations. Not a perfect patch but good enough for now. PR: kern/159418 Submitted by: Gleb Kurtsou Reviewed by: kib MFC after: 1 month	2011-11-22 16:18:12 +00:00
Xin LI	296a25a245	Improve the way to calculate available pages in tmpfs: - Don't deduct wired pages from total usable counts because it does not make any sense. To make things worse, on systems where swap size is smaller than physical memory and use a lot of wired pages (e.g. ZFS), tmpfs can suddenly have free space of 0 because of this; - Count cached pages as available; [1] - Don't count inactive pages as available, technically we could but that might be too aggressive; [1] [1] Suggested by kib@ MFC after: 1 week	2011-11-21 20:26:22 +00:00
Marcel Moolenaar	82543c5928	Don astbestos garment and remove the warning about TMPFS being experimental -- highly experimental even. So far the closest to a bug in TMPFS that people have gotten to relates to how ZFS can take away from the memory that TMPFS needs. One can argue that such is not a bug in TMPFS. Irrespective, even if there is a bug here and there in TMPFS, it's not in our own advantage to scare people away from using TMPFS. I for one have been using it, even with ZFS, very successfully.	2011-11-07 16:21:50 +00:00
Peter Holm	948fa27d49	Added missing cache purge of from argument for rename(). Reported by: Anton Yuzhaninov <citrin citrin ru> In collaboration with: kib MFC after: 1 week	2011-11-01 12:33:06 +00:00
Konstantin Belousov	3407fefef6	Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs. Document the changes to flags field to only require the page lock. Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced. Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)	2011-09-06 10:30:11 +00:00
Alan Cox	6bbee8e28a	Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this option to vm_object_page_remove() asserts that the specified range of pages is not mapped, or more precisely that none of these pages have any managed mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on the pages. This change not only saves time by eliminating pointless calls to pmap_remove_all(), but it also eliminates an inconsistency in the use of pmap_remove_all() versus related functions, like pmap_remove_write(). It eliminates harmless but pointless calls to pmap_remove_all() that were being performed on PG_UNMANAGED pages. Update all of the existing assertions on pmap_remove_all() to reflect this change. Reviewed by: kib	2011-06-29 16:40:41 +00:00
Rick Macklem	694a586a43	Add a lock flags argument to the VFS_FHTOVP() file system method, so that callers can indicate the minimum vnode locking requirement. This will allow some file systems to choose to return a LK_SHARED locked vnode when LK_SHARED is specified for the flags argument. This patch only adds the flag. It does not change any file system to use it and all callers specify LK_EXCLUSIVE, so file system semantics are not changed. Reviewed by: kib	2011-05-22 01:07:54 +00:00
Alan Cox	4d2f3d2cde	Eliminate two dubious attempts at optimizing the implementation of a file's last accessed, modified, and changed times: TMPFS_NODE_ACCESSED and TMPFS_NODE_CHANGED should be set unconditionally in tmpfs_remove() without regard to the number of hard links to the file. Otherwise, after the last directory entry for a file has been removed, a process that still has the file open could read stale values for the last accessed and changed times with fstat(2). Similarly, tmpfs_close() should update the time-related fields even if all directory entries for a file have been removed. In this case, the effect is that the time-related fields will have values that are later than expected. They will correspond to the time at which fstat(2) is called. In collaboration with: kib MFC after: 1 week	2011-02-22 14:47:10 +00:00
Alan Cox	7ded42ba28	tmpfs_remove() isn't modifying the file's data, so it shouldn't set TMPFS_NODE_MODIFIED on the node. PR: 152488 Submitted by: Anton Yuzhaninov Reviewed by: kib MFC after: 1 week	2011-02-19 21:04:36 +00:00
Alan Cox	4673c751f8	Further simplify tmpfs_reg_resize(). Also, update its comments, including style fixes.	2011-02-14 15:36:38 +00:00
Alan Cox	b10d1d5d60	Eliminate tn_reg.tn_aobj_pages. Instead, correctly maintain the vm object's size field. Previously, that field was always zero, even when the object tn_reg.tn_aobj contained numerous pages. Apply style fixes to tmpfs_reg_resize(). In collaboration with: kib	2011-02-13 14:46:39 +00:00
Konstantin Belousov	9fb9c623a6	In tmpfs_readdir(), normalize handling of the directory entries that either overflow the supplied buffer, or cause uiomove fail. Do not advance cached de when directory entry was not copied out. Do not return EOF when no entries could be copied due to first entry too large for supplied buffer, signal EINVAL instead. Reported by: Beat G?tzi <beat chruetertee ch> MFC after: 1 week	2011-01-20 09:39:16 +00:00
Andriy Gapon	e07b64c567	tmpfs + sendfile: do not produce partially valid pages for vnode's tail See r213730 for details of analogous change in ZFS. MFC after: 3 days	2010-10-12 17:16:51 +00:00
Andriy Gapon	21bd3e2576	tmpfs, zfs + sendfile: mark page bits as valid after populating it with data Otherwise, adding insult to injury, in addition to double-caching of data we would always copy the data into a vnode's vm object page from backend. This is specific to sendfile case only (VOP_READ with UIO_NOCOPY). PR: kern/141305 Reported by: Wiktor Niesiobedzki <bsd@vink.pl> Reviewed by: alc Tested by: tools/regression/sockets/sendfile MFC after: 2 weeks	2010-09-15 10:31:27 +00:00
Ivan Voras	b2143ecb99	Avoid "Entry can disappear before we lock fdvp" panic. PR: 150143 Submitted by: Gleb Kurtsou <gk at FreeBSD.org> Pretty sure it won't blow up: mckusick MFC after: 2 weeks	2010-09-07 22:40:45 +00:00
Ed Schouten	99d57a6bd8	Add support for whiteouts on tmpfs. Right now unionfs only allows filesystems to be mounted on top of another if it supports whiteouts. Even though I have sent a patch to daichi@ to let unionfs work without it, we'd better also add support for whiteouts to tmpfs. This patch implements .vop_whiteout and makes necessary changes to lookup() and readdir() to take them into account. We must also make sure that when adding or removing a file, we honour the componentname's DOWHITEOUT and ISWHITEOUT, to prevent duplicate filenames. MFC after: 1 month	2010-08-22 05:36:06 +00:00
Alan Cox	8393d186b9	Eliminate unnecessary page queues locking.	2010-06-16 00:41:21 +00:00
Edward Tomasz Napierala	307d88b787	Style fixes and removal of unneeded variable. Submitted by: bde@	2010-05-06 18:43:19 +00:00
Edward Tomasz Napierala	b5f770bd86	Move checking against RLIMIT_FSIZE into one place, vn_rlimit_fsize(). Reviewed by: kib	2010-05-05 16:44:25 +00:00
Alan Cox	e3ef0d2fcf	Push down the acquisition of the page queues lock into vm_page_unwire(). Update the comment describing which lock should be held on entry to vm_page_wire(). Reviewed by: kib	2010-05-05 03:45:46 +00:00
Alan Cox	c5a648516e	Acquire the page lock around vm_page_unwire() and vm_page_wire(). Reviewed by: kib	2010-05-03 16:41:11 +00:00
Alan Cox	b88b6c9d80	It makes no sense for vm_page_sleep_if_busy()'s helper, vm_page_sleep(), to unconditionally set PG_REFERENCED on a page before sleeping. In many cases, it's perfectly ok for the page to disappear, i.e., be reclaimed by the page daemon, before the caller to vm_page_sleep() is reawakened. Instead, we now explicitly set PG_REFERENCED in those cases where having the page persist until the caller is awakened is clearly desirable. Note, however, that setting PG_REFERENCED on the page is still only a hint, and not a guarantee that the page should persist.	2010-05-02 17:33:46 +00:00
Jaakko Heinonen	dec3772ee4	Add "maxfilesize" mount option for tmpfs to allow specifying the maximum file size limit. Default is UINT64_MAX when the option is not specified. It was useless to set the limit to the total amount of memory and swap in the system. Use tmpfs_mem_info() rather than get_swpgtotal() in tmpfs_mount() to check if there is enough memory available. Remove now unused get_swpgtotal(). Reviewed by: Gleb Kurtsou Approved by: trasz (mentor)	2010-01-29 12:09:14 +00:00
Jaakko Heinonen	189ee6be40	- Change the type of nodes_max to u_int and use "%u" format string to convert its value. [1] - Set default tm_nodes_max to min(pages + 3, UINT32_MAX). It's more reasonable than the old four nodes per page (with page size 4096) because non-empty regular files always use at least one page. This fixes possible overflow in the calculation. [2] - Don't allow more than tm_nodes_max nodes allocated in tmpfs_alloc_node(). PR: kern/138367 Suggested by: bde [1], Gleb Kurtsou [2] Approved by: trasz (mentor)	2010-01-20 16:56:20 +00:00
Jaakko Heinonen	5364a38dba	- Fix some style bugs in tmpfs_mount(). [1] - Remove a stale comment about tmpfs_mem_info() 'total' argument. Reported by: bde [1]	2010-01-13 14:17:21 +00:00
Jaakko Heinonen	720c50b339	- Change the type of size_max to u_quad_t because its value is converted with vfs_scanopt(9) using the "%qu" format string. - Limit the maximum value of size_max to (SIZE_MAX - PAGE_SIZE) to prevent overflow in howmany() macro. PR: kern/141194 Approved by: trasz (mentor) MFC after: 2 weeks	2010-01-08 07:57:43 +00:00
Alan Cox	4afcae9ba3	There is no need to "busy" a page when the object is locked for the duration of the operation.	2009-10-26 18:02:05 +00:00
Xin LI	82cf92d483	Add locking around access to parent node, and bail out when the parent node is already freed rather than panicking the system. PR: kern/122038 Submitted by: gk Tested by: pho MFC after: 1 week	2009-10-11 07:03:56 +00:00
Xin LI	3fa0694aaa	Add a special workaround to handle UIO_NOCOPY case. This fixes data corruption observed when sendfile() is being used. PR: kern/127213 Submitted by: gk MFC after: 2 weeks	2009-10-07 23:17:15 +00:00
Xin LI	7441ac4618	Fix a bug that causes the fsx test case of mmap'ed page being out of sync of read/write, inspired by ZFS's counterpart. PR: kern/139312 Submitted by: gk@ MFC after: 1 week	2009-10-04 10:38:04 +00:00
Konstantin Belousov	3364c323e6	Implement global and per-uid accounting of the anonymous memory. Add rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved for the uid. The accounting information (charge) is associated with either map entry, or vm object backing the entry, assuming the object is the first one in the shadow chain and entry does not require COW. Charge is moved from entry to object on allocation of the object, e.g. during the mmap, assuming the object is allocated, or on the first page fault on the entry. It moves back to the entry on forks due to COW setup. The per-entry granularity of accounting makes the charge process fair for processes that change uid during lifetime, and decrements charge for proper uid when region is unmapped. The interface of vm_pager_allocate(9) is extended by adding struct ucred *, that is used to charge appropriate uid when allocation if performed by kernel, e.g. md(4). Several syscalls, among them is fork(2), may now return ENOMEM when global or per-uid limits are enforced. In collaboration with: pho Reviewed by: alc Approved by: re (kensmith)	2009-06-23 20:45:22 +00:00
Alan Cox	47f11d9a46	Eliminate unnecessary variables.	2009-06-13 20:21:08 +00:00
Alan Cox	e2d0be0172	Eliminate redundant setting of a page's valid bits and pointless clearing of the same page's dirty bits.	2009-05-27 18:12:10 +00:00
Attilio Rao	dfd233edd5	Remove the thread argument from the FSD (File-System Dependent) parts of the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread. In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP. While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option. VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.	2009-05-11 15:33:26 +00:00

1 2 3 4 5 ...

295 Commits