Commit Graph

163 Commits

Author SHA1 Message Date
Gleb Smirnoff
b0cd20172d A change to KPI of vm_pager_get_pages() and underlying VOP_GETPAGES().
o With new KPI consumers can request contiguous ranges of pages, and
  unlike before, all pages will be kept busied on return, like it was
  done before with the 'reqpage' only. Now the reqpage goes away. With
  new interface it is easier to implement code protected from race
  conditions.

  Such arrayed requests for now should be preceeded by a call to
  vm_pager_haspage() to make sure that request is possible. This
  could be improved later, making vm_pager_haspage() obsolete.

  Strenghtening the promises on the business of the array of pages
  allows us to remove such hacks as swp_pager_free_nrpage() and
  vm_pager_free_nonreq().

o New KPI accepts two integer pointers that may optionally point at
  values for read ahead and read behind, that a pager may do, if it
  can. These pages are completely owned by pager, and not controlled
  by the caller.

  This shifts the UFS-specific readahead logic from vm_fault.c, which
  should be file system agnostic, into vnode_pager.c. It also removes
  one VOP_BMAP() request per hard fault.

Discussed with:	kib, alc, jeff, scottl
Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2015-12-16 21:30:45 +00:00
Christian Brueffer
382353e2e8 In tmpfs_chtimes(), remove checks on the nanosecond level when
determining whether a node changed.

Other filesystems, e.g., UFS, only check on seconds, when determining
whether something changed.

This also corrects the birthtime case, where we checked tv_nsec
twice, instead of tv_sec and tv_nsec (PR).

PR:			201284
Submitted by:		David Binderman
Patch suggested by:	kib
Reviewed by:		kib
MFC after:		2 weeks
Committed from:		Essen FreeBSD Hackathon
2015-07-26 08:33:46 +00:00
Mark Johnston
5f34e93c58 Check suspendability on the mountpoint returned by VOP_GETWRITEMOUNT.
This obviates the need for a MNTK_SUSPENDABLE flag, since passthrough
filesystems like nullfs and unionfs no longer need to inherit this
information from their lower layer(s). This change also restores the
pre-r273336 behaviour of using the presence of a susp_clean VFS method to
request suspension support.

Reviewed by:	kib, mjg
Differential Revision:	https://reviews.freebsd.org/D2937
2015-07-05 22:37:33 +00:00
Mark Murray
d1b06863fb Huge cleanup of random(4) code.
* GENERAL
- Update copyright.
- Make kernel options for RANDOM_YARROW and RANDOM_DUMMY. Set
  neither to ON, which means we want Fortuna
- If there is no 'device random' in the kernel, there will be NO
  random(4) device in the kernel, and the KERN_ARND sysctl will
  return nothing. With RANDOM_DUMMY there will be a random(4) that
  always blocks.
- Repair kern.arandom (KERN_ARND sysctl). The old version went
  through arc4random(9) and was a bit weird.
- Adjust arc4random stirring a bit - the existing code looks a little
  suspect.
- Fix the nasty pre- and post-read overloading by providing explictit
  functions to do these tasks.
- Redo read_random(9) so as to duplicate random(4)'s read internals.
  This makes it a first-class citizen rather than a hack.
- Move stuff out of locked regions when it does not need to be
  there.
- Trim RANDOM_DEBUG printfs. Some are excess to requirement, some
  behind boot verbose.
- Use SYSINIT to sequence the startup.
- Fix init/deinit sysctl stuff.
- Make relevant sysctls also tunables.
- Add different harvesting "styles" to allow for different requirements
  (direct, queue, fast).
- Add harvesting of FFS atime events. This needs to be checked for
  weighing down the FS code.
- Add harvesting of slab allocator events. This needs to be checked for
  weighing down the allocator code.
- Fix the random(9) manpage.
- Loadable modules are not present for now. These will be re-engineered
  when the dust settles.
- Use macros for locks.
- Fix comments.

* src/share/man/...
- Update the man pages.

* src/etc/...
- The startup/shutdown work is done in D2924.

* src/UPDATING
- Add UPDATING announcement.

* src/sys/dev/random/build.sh
- Add copyright.
- Add libz for unit tests.

* src/sys/dev/random/dummy.c
- Remove; no longer needed. Functionality incorporated into randomdev.*.

* live_entropy_sources.c live_entropy_sources.h
- Remove; content moved.
- move content to randomdev.[ch] and optimise.

* src/sys/dev/random/random_adaptors.c src/sys/dev/random/random_adaptors.h
- Remove; plugability is no longer used. Compile-time algorithm
  selection is the way to go.

* src/sys/dev/random/random_harvestq.c src/sys/dev/random/random_harvestq.h
- Add early (re)boot-time randomness caching.

* src/sys/dev/random/randomdev_soft.c src/sys/dev/random/randomdev_soft.h
- Remove; no longer needed.

* src/sys/dev/random/uint128.h
- Provide a fake uint128_t; if a real one ever arrived, we can use
  that instead. All that is needed here is N=0, N++, N==0, and some
  localised trickery is used to manufacture a 128-bit 0ULLL.

* src/sys/dev/random/unit_test.c src/sys/dev/random/unit_test.h
- Improve unit tests; previously the testing human needed clairvoyance;
  now the test will do a basic check of compressibility. Clairvoyant
  talent is still a good idea.
- This is still a long way off a proper unit test.

* src/sys/dev/random/fortuna.c src/sys/dev/random/fortuna.h
- Improve messy union to just uint128_t.
- Remove unneeded 'static struct fortuna_start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
  it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
  functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])

* src/sys/dev/random/yarrow.c src/sys/dev/random/yarrow.h
- Improve messy union to just uint128_t.
- Remove unneeded 'staic struct start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
  it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
  functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])
- Fix some magic numbers elsewhere used as FAST and SLOW.

Differential Revision: https://reviews.freebsd.org/D2025
Reviewed by: vsevolod,delphij,rwatson,trasz,jmg
Approved by: so (delphij)
2015-06-30 17:00:45 +00:00
Konstantin Belousov
8551285097 Restore the td_cookie value for the tmpfs directory entry which was a
dup entry, upon detach from the parent directory.  If the node is
renamed, the entry is re-attached at the different directory, and
invalud cookie value triggers assert (or corrupts directory rb tree,
it seems).

Reported by:	clusteradm (gjb, antoine)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-06-19 07:25:15 +00:00
Gleb Smirnoff
093c7f396d Make KPI of vm_pager_get_pages() more strict: if a pager changes a page
in the requested array, then it is responsible for disposition of previous
page and is responsible for updating the entry in the requested array.
Now consumers of KPI do not need to re-lookup the pages after call to
vm_pager_get_pages().

Reviewed by:	kib
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-06-12 11:32:20 +00:00
Will Andrews
677c3c0c66 tmpfs_getattr(): Return more correct allocated byte counts.
For VREG vnodes, return the resident page count (multiplied by PAGE_SIZE)
for the tmpfs node's anonymous VM object that stores actual file contents.

For all other vnodes, return the tmpfs_node's tn_size, which should not
be rounded to a page.

This change allows using stat(2) to identify a sparse file on tmpfs.

Reviewed by:	kib
MFC after:	1 week
2015-04-10 19:04:39 +00:00
Konstantin Belousov
bf5fce2bee Remove duplicated assignment.
CID:	1267988
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-02-03 12:09:48 +00:00
Konstantin Belousov
e0a60ae16a Update directory times immediately after an entry is created or
removed.  Postponing it until tmpfs_getattr() is called causes
discordant values reported for file times vs. directory times.

Reported and tested by:	madpilot
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-31 21:31:53 +00:00
Konstantin Belousov
f1a90a7bac Remove single-use boolean.
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-01-31 12:58:04 +00:00
Konstantin Belousov
311d39f2ee POSIX states that write(2) "shall mark for update the last data
modification and last file status change timestamps of the file".
Currently, tmpfs only modifies ctime when file was extended.  Since
r277828 followed tmpfs_write(), mmaped writes also do not modify
ctime.

Fix this, by updating both ctime and mtime for writes to tmpfs files.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-01-31 12:27:18 +00:00
Konstantin Belousov
f40cb1c645 Update mtime for tmpfs files modified through memory mapping. Similar
to UFS, perform updates during syncer scans, which in particular means
that tmpfs now performs scan on sync.  Also, this means that a mtime
update may be delayed up to 30 seconds after the write.

The vm_object' OBJ_TMPFS_DIRTY flag for tmpfs swap object is similar
to the OBJ_MIGHTBEDIRTY flag for the vnode object, it indicates that
object could have been dirtied.  Adapt fast page fault handler and
vm_object_set_writeable_dirty() to handle OBJ_TMPFS_NODE same as
OBJT_VNODE.

Reported by:	Ronald Klop <ronald-lists@klop.ws>
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-01-28 10:37:23 +00:00
Konstantin Belousov
3544b0f68f tmpfs does not use UVM on FreeBSD.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2015-01-28 10:25:35 +00:00
Konstantin Belousov
789bdfdbc6 Handle MAKEENTRY cnp flag in the VOP_CREATE(). Curiously, some
fs, e.g. smbfs, already did it.

Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-12-21 13:29:33 +00:00
Konstantin Belousov
6c21f6edb8 The VOP_LOOKUP() implementations for CREATE op do not put the name
into namecache, to avoid cache trashing when doing large operations.
E.g., tar archive extraction is not usually followed by access to many
of the files created.

Right now, each VOP_LOOKUP() implementation explicitely knowns about
this quirk and tests for both MAKEENTRY flag presence and op != CREATE
to make the call to cache_enter().  Centralize the handling of the
quirk into VFS, by deciding to cache only by MAKEENTRY flag in VOP.
VFS now sets NOCACHE flag for CREATE namei() calls.

Note that the change in semantic is backward-compatible and could be
merged to the stable branch, and is compatible with non-changed
third-party filesystems which correctly handle MAKEENTRY.

Suggested by:	Chris Torek <torek@pi-coral.com>
Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-12-18 10:01:12 +00:00
Mateusz Guzik
12e2a30ef9 tmpfs: allow shared file lookups
Tested by: pho
2014-10-21 21:27:13 +00:00
Mateusz Guzik
4fce16e4c9 Provide vfs suspension support only for filesystems which need it, take
two.

nullfs and unionfs need to request suspension if underlying filesystem(s)
use it. Utilize mnt_kern_flag for this purpose.

This is a fixup for 273271.

No strong objections from: kib
Pointy hat to: mjg
MFC after:	2 weeks
2014-10-20 18:00:50 +00:00
Mateusz Guzik
020b8f17a0 Provide vfs suspension support only for filesystems which need it.
Need is expressed by providing vfs_susp_clean function in vfsops.

Differential Revision:	D952
Reviewed by:	kib (previous version)
MFC after:	2 weeks
2014-10-19 06:59:33 +00:00
Konstantin Belousov
22bdc15a57 Do not ignore error from tmpfs_alloc_vp(). It results in access to
the random memory.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-16 14:08:01 +00:00
Konstantin Belousov
de75292a5b Remove unused header.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-16 14:06:16 +00:00
Konstantin Belousov
65589a29f4 Check for the cross-device cross-link attempt in the VFS, instead of
forcing filesystem VOP_LINK() methods to repeat the code.  In
tmpfs_link(), remove redundand check for the type of the source,
already done by VFS.

Note that NFS server already performs this check before calling
VOP_LINK().

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-16 14:04:46 +00:00
Konstantin Belousov
4cda7f7ece Rework the tmpfs unmount.
- Suspend filesystem for unmount.  This prevents new tmpfs nodes from
  instantiating, and also ensures that only unmount thread can destroy
  nodes.

- Do not start tmpfs node deletion until all vnodes are reclaimed,
  which guarantees that no thread can access tmpfs data.  For this,
  call vflush() in the loop, until the mnt_nvnodelistsize is non-zero.
  Note that after mnt_nvnodelistsize becomes 0, insmntque() blocks
  insertion of a vnode germ into the mount list of vnodes.

- Fail node allocation when the filesystem is being unmounted.  This
  is race-free due to the vflush() call in loop.  This is mostly
  cosmetic, avoiding some more work which might be done until
  suspension in unmount is started.

Note that there is currently no way to prevent new vnode instantiation
from readers during the unmount.  Due to this, forced unmount might
live-lock if vflush() loop cannot get to the zero vnode count due to
races with readers.  The unmount would proceed after the load is
lifted.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 09:52:33 +00:00
Konstantin Belousov
b5b3326191 Change forgotten in r268615. Set the OBJ_TMPFS_NODE flag for
vm_object of VREG tmpfs node.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 09:35:14 +00:00
Konstantin Belousov
eb2c06b63a Use tmpfs_vn_get_ino_gen() to handle the races with reclaim in tmpfs
dotdot lookup.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 09:16:55 +00:00
Konstantin Belousov
fd63693dcf Style. Add comment about lock mode.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 09:13:56 +00:00
Konstantin Belousov
7a41bc2f41 In tmpfs_alloc_file(), code after the 'out' label does only 'return
error;'.  Replace goto's with the return.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 09:02:40 +00:00
Konstantin Belousov
d2ca06cdd2 Add convenience macro to assert tmpfs node lock.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 08:59:25 +00:00
Konstantin Belousov
55781cb922 Add some assertions for the code handling vm_object for tmpfs vnode.
In particular, vnode must be exclusively locked when the tmpfs vnode
and object are divorced.  When the vnode is opened, the object must be
still alive, since only live vnode can be opened, and the tmpfs node
owns a reference on the object.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 08:55:02 +00:00
Konstantin Belousov
706f80801d The tmpfs_link() must not dereference the filesystem-specific data for
a vnode until it is verified that the vnode indeed belongs to tmpfs
mount.  Otherwise, it might access random memory, at least in the
debug kernel.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 08:45:29 +00:00
Konstantin Belousov
fca015d301 Remove code separator lines which do not conform to style(9).
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 08:17:11 +00:00
Konstantin Belousov
7b81a399a4 In msdosfs_setattr(), add a check for result of the utimes(2)
permissions test, forgotten in r164033.

Refactor the permission checks for utimes(2) into vnode helper
function vn_utimes_perm(9), and simplify its code comparing with the
UFS origin, by writing the call to VOP_ACCESSX only once.  Use the
helper for UFS(5), tmpfs(5), devfs(5) and msdosfs(5).

Reported by:	bde
Reviewed by:	bde, trasz
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-06-17 07:11:00 +00:00
Konstantin Belousov
60c5c866aa Allow shared locking for the tmpfs vnodes.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-06-04 15:30:49 +00:00
Bryan Drewery
44f1c91610 Rename global cnt to vm_cnt to avoid shadowing.
To reduce the diff struct pcu.cnt field was not renamed, so
PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in
kvm(3) and vmstat(8). The goal was to not affect externally used KPI.

Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the
the global cnt variable.

Exp-run revealed no ports using it directly.

No objection from:	arch@
Sponsored by:	EMC / Isilon Storage Division
2014-03-22 10:26:09 +00:00
Bryan Drewery
504bde017a Add missing FALLTHROUGH comment in tmpfs_dir_getdents for looking up '.' and
'..'.

Reviewed by:	Russell Cattelan
Sponsored by:	EMC / Isilon Storage Division
MFC after:	2 weeks
2014-03-14 13:58:02 +00:00
Bryan Drewery
ac09d109ca Rename cnt to maxcookies and change its use as the condition for when to
lookup cookies to be less obscure.

No functional change.

Since r245115, cnt has not really been needed in tmpfs_dir_getdents().  Keep
it for the MPASS() for now though.

Sponsored by:	EMC / Isilon Storage Division
MFC after:	2 weeks
2014-03-14 13:55:48 +00:00
Bryan Drewery
62dca316da Cleanup redundant logic and add some comments to help explain how
it works in lieu of potentially less clear code.

Sponsored by:	EMC / Isilon Storage Division
Discussed with:	Russell Cattelan
2014-03-14 02:10:30 +00:00
Bryan Drewery
0742ebc98f Fix -o size less than PAGE_SIZE resulting in SIZE_MAX being used.
Discussed with:	kib
MFC after:	2 weeks
2014-03-14 01:43:55 +00:00
Kenneth D. Merry
3b5f179d2a Support storing 7 additional file flags in tmpfs:
UF_SYSTEM, UF_SPARSE, UF_OFFLINE, UF_REPARSE, UF_ARCHIVE, UF_READONLY,
and UF_HIDDEN.

Sort the file flags tmpfs supports alphabetically.  tmpfs now
supports the same flags as UFS, with the exception of SF_SNAPSHOT.

Reported by:	bdrewery, antoine
Sponsored by:	Spectra Logic
2013-08-28 22:12:56 +00:00
Xin LI
2454886e05 Allow tmpfs be mounted inside jail. 2013-08-23 22:52:20 +00:00
Konstantin Belousov
41cf41fdfd Extract the general-purpose code from tmpfs to perform uiomove from
the page queue of some vm object.

Discussed with:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2013-08-21 17:23:24 +00:00
Attilio Rao
c7aebda8a1 The soft and hard busy mechanism rely on the vm object lock to work.
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.

Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
  and vm_page_grab are being executed.  This will be very helpful
  once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag

The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.

Sponsored by:	EMC / Isilon storage division
Discussed with:	alc
Reviewed by:	jeff, kib
Tested by:	gavin, bapt (older version)
Tested by:	pho, scottl
2013-08-09 11:11:11 +00:00
Konstantin Belousov
8239a7a878 The tmpfs_alloc_vp() is used to instantiate vnode for the tmpfs node,
in particular, from the tmpfs_lookup VOP method.  If LK_NOWAIT is not
specified in the lkflags, the lookup is supposed to return an alive
vnode whenever the underlying node is valid.

Currently, the tmpfs_alloc_vp() returns ENOENT if the vnode attached
to node exists and is being reclaimed.  This causes spurious ENOENT
errors from lookup on tmpfs and corresponding random 'No such file'
failures from syscalls working with tmpfs files.

Fix this by waiting for the doomed vnode to be detached from the tmpfs
node if sleepable allocation is requested.

Note that filesystems which use vfs_hash.c, correctly handle the case
due to vfs_hash_get() looping when vget() returns ENOENT for sleepable
requests.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2013-08-05 18:53:59 +00:00
Attilio Rao
be99683637 Revert r253939:
We cannot busy a page before doing pagefaults.
Infact, it can deadlock against vnode lock, as it tries to vget().
Other functions, right now, have an opposite lock ordering, like
vm_object_sync(), which acquires the vnode lock first and then
sleeps on the busy mechanism.

Before this patch is reinserted we need to break this ordering.

Sponsored by:	EMC / Isilon storage division
Reported by:	kib
2013-08-05 08:55:35 +00:00
Attilio Rao
3b6714cacb The page hold mechanism is fast but it has couple of fallouts:
- It does not let pages respect the LRU policy
- It bloats the active/inactive queues of few pages

Try to avoid it as much as possible with the long-term target to
completely remove it.
Use the soft-busy mechanism to protect page content accesses during
short-term operations (like uiomove_fromphys()).

After this change only vm_fault_quick_hold_pages() is still using the
hold mechanism for page content access.
There is an additional complexity there as the quick path cannot
immediately access the page object to busy the page and the slow path
cannot however busy more than one page a time (to avoid deadlocks).

Fixing such primitive can bring to complete removal of the page hold
mechanism.

Sponsored by:	EMC / Isilon storage division
Discussed with:	alc
Reviewed by:	jeff
Tested by:	pho
2013-08-04 21:07:24 +00:00
Attilio Rao
878a788734 Remove unnecessary soft busy of the page before to do vn_rdwr() in
kern_sendfile() which is unnecessary.
The page is already wired so it will not be subjected to pagefault.
The content cannot be effectively protected as it is full of races
already.
Multiple accesses to the same indexes are serialized through vn_rdwr().

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc, jeff
Tested by:	pho
2013-08-04 15:56:19 +00:00
Nathan Whitehorn
59169d9156 tmpfs works perfectly fine with -o union -- there is no reason to exclude it
from the list of options.
2013-07-23 14:48:37 +00:00
Alan Cox
f50b6721e1 Add missing VM object unlocks in an error case.
Reviewed by:	kib
2013-06-07 19:42:00 +00:00
Alan Cox
27a18d6a23 Don't busy the page unless we are likely to release the object lock.
Reviewed by:	kib
Sponsored by:	EMC / Isilon Storage Division
2013-06-06 06:17:20 +00:00
Alan Cox
ba887a9b33 Eliminate unnecessary vm object locking from tmpfs_nocacheread(). 2013-06-04 15:40:45 +00:00
Konstantin Belousov
67b4ed4b88 Assert that OBJ_TMPFS flag on the vm object for the tmpfs node is
cleared when the tmpfs node is going away.

Tested by:	bdrewery, pho
2013-05-30 19:51:33 +00:00