Commit Graph

140 Commits

Author SHA1 Message Date
Mateusz Guzik
f4aa64528e tmpfs: save on relocking the allnode lock in tmpfs_free_node_locked 2021-05-31 23:21:15 +00:00
Mateusz Guzik
439d942b9e tmpfs: drop a redundant NULL check in tmpfs_alloc_vp 2021-05-29 22:04:10 +00:00
Mateusz Guzik
7fbeaf33b8 tmpfs: drop useless parent locking from tmpfs_dir_getdotdotdent
The id field is immutable until the node gets freed.
2021-05-29 22:04:10 +00:00
Mateusz Guzik
eec2e4ef7f tmpfs: reimplement the mtime scan to use the lazy list
Tested by:	pho
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D30065
2021-05-15 20:48:45 +00:00
Konstantin Belousov
28bc23ab92 tmpfs: dynamically register tmpfs pager
Remove OBJT_SWAP_TMPFS. Move tmpfs-specific swap pager bits into
tmpfs_subr.c.

There is no longer any code to directly support tmpfs in sys/vm, most
tmpfs knowledge is shared by non-anon swap object type implementation.
The tmpfs-specific methods are provided by registered tmpfs pager, which
inherits from the swap pager.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30168
2021-05-13 20:13:34 +03:00
Konstantin Belousov
4b8365d752 Add OBJT_SWAP_TMPFS pager
This is OBJT_SWAP pager, specialized for tmpfs.  Right now, both swap pager
and generic vm code have to explicitly handle swap objects which are tmpfs
vnode v_object, in the special ways.  Replace (almost) all such places with
proper methods.

Since VM still needs a notion of the 'swap object', regardless of its
use, add yet another type-classification flag OBJ_SWAP. Set it in
vm_object_allocate() where other type-class flags are set.

This change almost completely eliminates the knowledge of tmpfs from VM,
and opens a way to make OBJT_SWAP_TMPFS loadable from tmpfs.ko.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30070
2021-05-07 17:08:03 +03:00
Mateusz Guzik
c09f799271 tmpfs: drop acq fence now that vn_load_v_data_smr has consume semantics 2021-01-25 22:40:15 +00:00
Mateusz Guzik
cc96f92a57 atomic: make atomic_store_ptr type-aware 2021-01-25 22:40:15 +00:00
Mateusz Guzik
618029af50 tmpfs: add support for lockless symlink lookup
Reviewed by:	kib (previous version)
Tested by:	pho (previous version)
Differential Revision:	https://reviews.freebsd.org/D27488
2021-01-23 15:04:43 +00:00
Mark Johnston
90f580b954 Ensure that dirent's d_off field is initialized
We have the d_off field in struct dirent for providing the seek offset
of the next directory entry.  Several filesystems were not initializing
the field, which ends up being copied out to userland.

Reported by:	Syed Faraz Abrar <faraz@elttam.com>
Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27792
2021-01-03 11:50:31 -05:00
Mateusz Guzik
3e506a67bb vfs: add v_irflag accessors
Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D27793
2021-01-03 06:50:06 +00:00
Mateusz Guzik
7c58c37ebb tmpfs: change tmpfs dirent zone into a malloc type
It is 64 bytes.
2020-10-30 14:07:25 +00:00
Eric van Gyzen
f9cc8410e1 vm_ooffset_t is now unsigned
vm_ooffset_t is now unsigned. Remove some tests for negative values,
or make other adjustments accordingly.

Reported by:	Coverity
Reviewed by:	kib markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D26214
2020-09-18 16:48:08 +00:00
Konstantin Belousov
016b7c7e39 tmpfs: restore atime updates for reads from page cache.
Split TMPFS_NODE_ACCCESSED bit into dedicated byte that can be updated
atomically without locks or (locked) atomics.

tn_update_getattr() change also contains unrelated bug fix.

Reported by:	lwhsu
PR:	249362
Reviewed by:	markj (previous version)
Discussed with:	mjg
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26451
2020-09-16 21:28:18 +00:00
Konstantin Belousov
23f9071466 Style.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2020-09-16 21:24:34 +00:00
Konstantin Belousov
081e36e760 Add tmpfs page cache read support.
Or it could be explained as lockless (for vnode lock) reads.  Reads
are performed from the node tn_obj object.  Tmpfs regular vnode object
lifecycle is significantly different from the normal OBJT_VNODE: it is
alive as far as ref_count > 0.

Ensure liveness of the tmpfs VREG node and consequently v_object
inside VOP_READ_PGCACHE by referencing tmpfs node in tmpfs_open().
Provide custom tmpfs fo_close() method on file, to ensure that close
is paired with open.

Add tmpfs VOP_READ_PGCACHE that takes advantage of all tmpfs quirks.
It is quite cheap in code size sense to support page-ins for read for
tmpfs even if we do not own tmpfs vnode lock.  Also, we can handle
holes in tmpfs node without additional efforts, and do not have
limitation of the transfer size.

Reviewed by:	markj
Discussed with and benchmarked by:	mjg (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26346
2020-09-15 22:19:16 +00:00
Konstantin Belousov
4601f5f5ee Microoptimize tmpfs node ref/unref by using atomics.
Avoid tmpfs mount and node locks when ref count is greater than zero,
which is the case until node is being destroyed by unlink or unmount.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26346
2020-09-15 22:13:21 +00:00
Mateusz Guzik
586ee69f09 fs: clean up empty lines in .c and .h files 2020-09-01 21:18:40 +00:00
Mateusz Guzik
1abe36567f tmpfs: use vget_prep/vget_finish instead of vget + vnode 2020-08-16 17:19:23 +00:00
Mateusz Guzik
a92a971bbb vfs: remove the thread argument from vget
It was already asserted to be curthread.

Semantic patch:

@@

expression arg1, arg2, arg3;

@@

- vget(arg1, arg2, arg3)
+ vget(arg1, arg2)
2020-08-16 17:18:54 +00:00
Mateusz Guzik
172ffe702c tmpfs: add support for lockless lookup
Reviewed by:    kib
Tested by:      pho (in a patchset)
Differential Revision:	https://reviews.freebsd.org/D25580
2020-07-25 10:38:44 +00:00
Mark Johnston
84242cf68a Call swap_pager_freespace() from vm_object_page_remove().
All vm_object_page_remove() callers, except
linux_invalidate_mapping_pages() in the LinuxKPI, free swap space when
removing a range of pages from an object.  The LinuxKPI case appears to
be an unintentional omission that could result in leaked swap blocks, so
unconditionally free swap space in vm_object_page_remove() to protect
against similar bugs in the future.

Reviewed by:	alc, kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25329
2020-06-25 15:21:21 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Mateusz Guzik
2abdae33b1 tmpfs: inline tmpfs_update
It was generated to be just a jumping off point to tmpfs_itimes.

While here provide a dedicated variant for getattr since we normally don't
expect to need to the update from that caller.
2020-02-03 17:06:21 +00:00
Jeff Roberson
d6e13f3b4d Don't hold the object lock while calling getpages.
The vnode pager does not want the object lock held.  Moving this out allows
further object lock scope reduction in callers.  While here add some missing
paging in progress calls and an assert.  The object handle is now protected
explicitly with pip.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23033
2020-01-19 23:47:32 +00:00
Mateusz Guzik
2a829749d3 tmpfs: add missing CLTFLAG_MPSAFE annotation 2020-01-15 01:32:11 +00:00
Mark Johnston
9f5632e6c8 Remove page locking for queue operations.
With the previous reviews, the page lock is no longer required in order
to perform queue operations on a page.  It is also no longer needed in
the page queue scans.  This change effectively eliminates remaining uses
of the page lock and also the false sharing caused by multiple pages
sharing a page lock.

Reviewed by:	jeff
Tested by:	pho
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22885
2019-12-28 19:04:00 +00:00
Jeff Roberson
a808177864 Add a deferred free mechanism for freeing swap space that does not require
an exclusive object lock.

Previously swap space was freed on a best effort basis when a page that
had valid swap was dirtied, thus invalidating the swap copy.  This may be
done inconsistently and requires the object lock which is not always
convenient.

Instead, track when swap space is present.  The first dirty is responsible
for deleting space or setting PGA_SWAP_FREE which will trigger background
scans to free the swap space.

Simplify the locking in vm_fault_dirty() now that we can reliably identify
the first dirty.

Discussed with:	alc, kib, markj
Differential Revision:	https://reviews.freebsd.org/D22654
2019-12-15 03:15:06 +00:00
Mateusz Guzik
c8b29d1212 vfs: locking primitives which elide ->v_vnlock and shared locking disablement
Both of these features are not needed by many consumers and result in avoidable
reads which in turn puts them on profiles due to cache-line ping ponging.

On top of that the current lockgmr entry point is slower than necessary
single-threaded. As an attempted clean up preparing for other changes,
provide new routines which don't support any of the aforementioned features.

With these patches in place vop_stdlock and vop_stdunlock disappear from
flamegraphs during -j 104 buildkernel.

Reviewed by:	jeff (previous version)
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22665
2019-12-11 23:11:21 +00:00
Mateusz Guzik
abd80ddb94 vfs: introduce v_irflag and make v_type smaller
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.

v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.

Reviewed by:	kib, jeff
Differential Revision:	https://reviews.freebsd.org/D22715
2019-12-08 21:30:04 +00:00
Konstantin Belousov
a51c8071a3 Stop using per-mount tmpfs zones.
Requested and reviewed by:	jeff
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22643
2019-12-05 00:03:17 +00:00
Jeff Roberson
639676877b Simplify anonymous memory handling with an OBJ_ANON flag. This eliminates
reudundant complicated checks and additional locking required only for
anonymous memory.  Introduce vm_object_allocate_anon() to create these
objects.  DEFAULT and SWAP objects now have the correct settings for
non-anonymous consumers and so individual consumers need not modify the
default flags to create super-pages and avoid ONEMAPPING/NOSPLIT.

Reviewed by:	alc, dougm, kib, markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22119
2019-11-19 23:19:43 +00:00
Jeff Roberson
67d0e29304 Replace OBJ_MIGHTBEDIRTY with a system using atomics. Remove the TMPFS_DIRTY
flag and use the same system.

This enables further fault locking improvements by allowing more faults to
proceed with a shared lock.

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22116
2019-10-29 21:06:34 +00:00
Jeff Roberson
0012f373e4 (4/6) Protect page valid with the busy lock.
Atomics are used for page busy and valid state when the shared busy is
held.  The details of the locking protocol and valid and dirty
synchronization are in the updated vm_page.h comments.

Reviewed by:    kib, markj
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:        https://reviews.freebsd.org/D21594
2019-10-15 03:45:41 +00:00
Jeff Roberson
63e9755548 (1/6) Replace busy checks with acquires where it is trival to do so.
This is the first in a series of patches that promotes the page busy field
to a first class lock that no longer requires the object lock for
consistency.

Reviewed by:	kib, markj
Tested by:	pho
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D21548
2019-10-15 03:35:11 +00:00
Konstantin Belousov
c5dac63c15 tmpfs_readdir(): unlock the locked node.
During readdir() we guarantee that the tn_dir.tn_parent does not go
away, but it might be replaced by a parallel rename.  Read tn_parent
only once, then use the cached value.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-10-03 19:55:05 +00:00
Mateusz Guzik
4cace859c2 vfs: convert struct mount counters to per-cpu
There are 3 counters modified all the time in this structure - one for
keeping the structure alive, one for preventing unmount and one for
tracking active writers. Exact values of these counters are very rarely
needed, which makes them a prime candidate for conversion to a per-cpu
scheme, resulting in much better performance.

Sample benchmark performing fstatfs (modifying 2 out of 3 counters) on
a 104-way 2 socket Skylake system:
before:   852393 ops/s
after:  76682077 ops/s

Reviewed by:	kib, jeff
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21637
2019-09-16 21:37:47 +00:00
Mark Johnston
fee2a2fa39 Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator.  In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well.  These references are protected by the page lock, which must
therefore be acquired for many per-page operations.  This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.

Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter.  A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held.  As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.

The vm_page_wire() and vm_page_unwire() KPIs are changed.  The former
requires that either the object lock or the busy lock is held.  The
latter no longer has a return value and may free the page if it releases
the last reference to that page.  vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate.  vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold().  It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler.  vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state).  In particular, synchronization details are no longer
leaked into the caller.

The change excises the page lock from several frequently executed code
paths.  In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock.  In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.

__FreeBSD_version is bumped.  The DRM ports have been updated to
accomodate the KPI changes.

Reviewed by:	jeff (earlier version)
Tested by:	gallatin (earlier version), pho
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
Konstantin Belousov
3c93d22758 Manually clear text references on reclaim for nullfs and tmpfs.
Both filesystems do no use vnode_pager_dealloc() which would handle
this case otherwise.  Nullfs because vnode vm_object handle never
points to nullfs vnode.  Tmpfs because its vm_object is never vnode
object at all.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-06-05 20:16:25 +00:00
Konstantin Belousov
e1cdc30faa tmpfs: ignore tmpfs_set_status() if mount point is read-only.
In particular, this fixes atimes still changing for ro tmpfs.
tmpfs_set_status() gains tmpfs_mount * argument.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D19737
2019-04-02 13:49:32 +00:00
Konstantin Belousov
ae26575394 Block creation of the new nodes for read-only tmpfs mounts.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D19737
2019-04-02 13:41:26 +00:00
Mateusz Guzik
cc426dd319 Remove unused argument to priv_check_cred.
Patch mostly generated with cocinnelle:

@@
expression E1,E2;
@@

- priv_check_cred(E1,E2,0)
+ priv_check_cred(E1,E2)

Sponsored by:	The FreeBSD Foundation
2018-12-11 19:32:16 +00:00
Mark Johnston
6d2e2df764 Ensure that directory entry padding bytes are zeroed.
Directory entries must be padded to maintain alignment; in many
filesystems the padding was not initialized, resulting in stack
memory being copied out to userspace.  With the ino64 work there
are also some explicit pad fields in struct dirent.  Add a subroutine
to clear these bytes and use it in the in-tree filesystems.  The
NFS client is omitted for now as it was fixed separately in r340787.

Reported by:	Thomas Barabosch, Fraunhofer FKIE
Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2018-11-23 22:24:59 +00:00
Mateusz Guzik
30e0cf499f tmpfs: use unr64 for inode numbers
Sponsored by:	The FreeBSD Foundation
2018-11-20 15:14:30 +00:00
Brooks Davis
1493c2ee62 Make vop_symlink take a const target path.
This will enable callers to take const paths as part of syscall
decleration improvements.

Where doing so is easy and non-distruptive carry the const through
implementations. In UFS the value is passed to an interface that must
take non-const values. In ZFS, const poisoning would touch code shared
with upstream and it's not worth adding diffs.

Bump __FreeBSD_version for external API consumers.

Reviewed by:	kib (prior version)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17805
2018-11-02 14:42:36 +00:00
Mark Murray
19fa89e938 Remove the Yarrow PRNG algorithm option in accordance with due notice
given in random(4).

This includes updating of the relevant man pages, and no-longer-used
harvesting parameters.

Ensure that the pseudo-unit-test still does something useful, now also
with the "other" algorithm instead of Yarrow.

PR:		230870
Reviewed by:	cem
Approved by:	so(delphij,gtetlow)
Approved by:	re(marius)
Differential Revision:	https://reviews.freebsd.org/D16898
2018-08-26 12:51:46 +00:00
Jeff Roberson
e2068d0bcd Use per-domain locks for vm page queue free. Move paging control from
global to per-domain state.  Protect reservations with the free lock
from the domain that they belong to.  Refactor to make vm domains more
of a first class object.

Reviewed by:    markj, kib, gallatin
Tested by:      pho
Sponsored by:   Netflix, Dell/EMC Isilon
Differential Revision:  https://reviews.freebsd.org/D14000
2018-02-06 22:10:07 +00:00
John Baldwin
35b1a3abd3 Update tmpfs link count handling for ino64.
Add a new TMPFS_LINK_MAX to use in place of LINK_MAX for link overflow
checks and pathconf() reporting.  Rather than storing a full 64-bit
link count, just use a plain int and use INT_MAX as TMPFS_LINK_MAX.

Discussed with:	bde
Reviewed by:	kib (part of a larger patch)
Sponsored by:	Chelsio Communications
2017-12-19 20:19:07 +00:00
Gleb Smirnoff
135beaf60e Reduce pollution via tmpfs.h. 2017-12-06 00:42:08 +00:00
Pedro F. Giffuni
d63027b668 sys/fs: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:15:37 +00:00