Commit Graph

3521 Commits

Author SHA1 Message Date
kib
6e2a82f4df Remove mistakenly merged field.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-19 20:03:26 +00:00
kib
49b653be7a Add mount option for tmpfs(5) to not use namecache.
The option "nonc" disables using of namecache for the created mount,
by default namecache is used.  The rationale for the option is that
namecache duplicates the information which is already kept in memory
by tmpfs.  Since it believed that namecache scales better than tmpfs,
or will scale better, do not enable the option by default.  On the
other hand, smaller machines may benefit from lesser namecache
pressure.

Discussed with:	mjg
Tested by:	pho (as part of larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-01-19 19:46:49 +00:00
kib
c30dfec101 Implement VOP_VPTOCNP() for tmpfs.
For directories, node->tn_spec.tn_dir.tn_parent pointer to the parent
is used.  For non-directories, the implementation is naive, all
directory nodes are scanned to find a dirent linking the specified
node.  This can be significantly improved by maintaining tn_parent for
all nodes, later.

Tested by:	pho (as part of larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-01-19 19:29:13 +00:00
kib
4b4202e092 VNON nodes cannot exist.
Tested by:	pho (as part of larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-01-19 19:25:42 +00:00
kib
65a8e18fa7 Refcount tmpfs nodes and mount structures.
On dotdot lookup and fhtovp operations, it is possible for the file
represented by tmpfs node to be removed after the thread calculated
the pointer.  In this case, tmpfs_alloc_vp() accesses freed memory.

Introduce the reference count on the nodes.  The allnodes list from
tmpfs mount owns 1 reference, and threads performing unlocked
operations on the node, add one transient reference.  Similarly, since
struct tmpfs_mount maintains the list where nodes are enlisted,
refcount it by one reference from struct mount and one reference from
each node on the list.  Both nodes and tmpfs_mounts are removed when
refcount goes to zero.

Note that this means that nodes and tmpfs_mounts might survive some
time after the node is deleted or tmpfs_unmount() finished.  The
tmpfs_alloc_vp() in these cases returns error either due to node
removal (tn_nlinks == 0) or because of insmntque1(9) error.

Tested by:	pho (as part of larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-01-19 19:15:21 +00:00
kib
dbff4844a1 Make tmpfs directory cursor available outside tmpfs_subr.c.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-19 18:38:58 +00:00
kib
022209a5df Rename tmpfs_mount member allnode_lock to include namespace prefix.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-19 16:01:36 +00:00
kib
ba16314dcf Protect macro argument.
Requested by:	hselasky
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-19 15:06:18 +00:00
kib
0df67f81eb Rework some tmpfs lock assertions.
Remove TMPFS_ASSERT_ELOCKED().  Its claims are already stated by other
asserts nearby and by VFS guarantees.
Change TMPFS_ASSERT_LOCKED() and one inlined place to use
ASSERT_VOP_(E)LOCKED() instead of hand-rolled imprecise asserts.

Tested by:	pho (as part of the larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-19 14:49:55 +00:00
kib
e1ce156f26 Style fixes and comment updates.
Edit comments which explain no longer relevant details, and add
locking annotations to the struct tmpfs_node members.

Tested by:	pho (as part of the larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-19 14:27:37 +00:00
kib
d899aacb44 Remove unused union member, fifos on tmpfs are implemented in common code.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-19 13:35:14 +00:00
mjg
dc0e2e9361 tmpfs: manage tm_pages_used with atomics
Reviewed by:	kib (previous version)
2017-01-14 06:20:36 +00:00
mjg
f4d0160750 cd9660: fix up compilation on sparc after r311665
Reported by:	linimon
2017-01-10 04:17:53 +00:00
cem
49db28277b cd9660: typedef cd_ino_t in preference to #define
Suggested by:	kib@
2017-01-09 23:56:45 +00:00
cem
a1649edd53 cd9660: Add a prototype for cd9660_vfs_hash_cmp
GCC warns (and errors, with -Werror) about it otherwise.  Clang doesn't care.

Introduced in r311665.

Reported by:	np@
2017-01-09 23:51:31 +00:00
kib
40da33f9f3 Forcibly remove the cached items from pseudofs vncache on module unload.
If some process' nodes were accessed using procfs and the process
cannot exit properly at the time modunload event is reported to the
pseudofs-backed filesystem, the assertion in pfs_vncache_unload() is
triggered.  Assertion is correct, the cache should be cleaned.

Approved by:	des (pseudofs maintainer)
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-09 20:14:18 +00:00
cem
a2b76bd9d8 iso_rrip.h: Hide kernel definitions from makefs(8)
Reported by:	O. Hartmann <ohartmann at walstatt.org>
2017-01-08 09:16:07 +00:00
cem
54ed41ccd8 Do not truncate inode calculation from ISO9660 block offset
PR:		190655
Reported by:	Thomas Schmitt <scdbackup at gmx.net>
Obtained from:	NetBSD sys/fs/cd9660/cd9660_node.c,r1.31
2017-01-08 06:22:35 +00:00
cem
8bed11dd0b cd9660: Expand internal inum size to 64 bits
Inums in cd9660 refer to byte offsets on the media.  DVD and BD media
can have entries above 4GB, especially with multi-session images.

PR:		190655
Reported by:	Thomas Schmitt <scdbackup at gmx.net>
2017-01-08 06:21:49 +00:00
mjg
d9e9feb135 tmpfs: perform a lockless check in tmpfs_itimes
Most of the time the status is 0 as the function is repeatedly
called from tmpfs_getattr.
2017-01-06 19:58:20 +00:00
mjg
12ff365883 tmpfs: enabled MNTK_EXTENDED_SHARED
Discussed with:	kib
2017-01-06 18:01:46 +00:00
kib
20c81e42e5 Lock tmpfs node tn_status updates done under the shared vnode lock.
If tmpfs vnode is only shared locked, tn_status field still needs
updates to note the access time modification.  Use the same locking
scheme as for UFS, protect tn_status with the node interlock + shared
vnode lock.

Fix nearby style.

Noted and reviewed by:	mjg
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-06 17:43:36 +00:00
kib
e38c18e1fe Use vnode lock assertion expression, and upgrade it to assert the
required exclusive state of the vnode lock in tmpfs chflags, chmod,
chown, chsize, chtimes operations.

Fix nearby style.

Reviewed by:	mjg
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-06 17:32:44 +00:00
kib
f8b9008a47 Remove dead code.
Fifos overwrite file ops vector, and fifo VOP_KQFILTER is VOP_PANIC().

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-06 17:03:08 +00:00
kib
7c67dd5f60 Use type-independent formats for printing nlink_t and ino_t.
Extracted from:	ino64 work by gleb, mckusick
Discussed with:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-06 16:59:33 +00:00
kib
5def9fa2c2 Do not allocate struct statfs on kernel stack.
Right now size of the structure is 472 bytes on amd64, which is
already large and stack allocations are indesirable.  With the ino64
work, MNAMELEN is increased to 1024, which will make it impossible to have
struct statfs on the stack.

Extracted from:	ino64 work by gleb
Discussed with:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-05 17:19:26 +00:00
jpaetzel
cbc978682d Workaround NFS bug with readdirplus when there are greater than 1 billion files in a filesystem.
Reviewed by	kib
MFC after:	2 weeks
Sponsored by:	iXsystems
Differential Revision:	D9009
2017-01-02 19:18:56 +00:00
pfg
1f1abed933 Undo small wrong style change.
Reported by:	kib
2016-12-28 16:16:36 +00:00
pfg
8fb4a19fe0 style(9) cleanups.
Just to reduce some of the issues found with indent(1).

MFC after:	1 week
2016-12-28 15:43:17 +00:00
rmacklem
cbf29bafe0 Fix NFSv4.1 client recovery from NFS4ERR_BAD_SESSION errors.
For most NFSv4.1 servers, a NFS4ERR_BAD_SESSION error is a rare failure
that indicates that the server has lost session/open/lock state.
However, recent testing by cperciva@ against the AmazonEFS server found
several problems with client recovery from this due to it generating this
failure frequently.
Briefly, the problems fixed are:
- If all session slots were in use at the time of the failure, some processes
  would continue to loop waiting for a slot on the old session forever.
- If an RPC that doesn't use open/lock state failed with NFS4ERR_BAD_SESSION,
  it would fail the RPC/syscall instead of initiating recovery and then
  looping to retry the RPC.
- If a successful reply to an RPC for an old session wasn't processed
  until after a new session was created for a NFS4ERR_BAD_SESSION error,
  it would erroneously update the new session and corrupt it.
- The use of the first element of the session list in the nfs mount
  structure (which is always the current metadata session) was slightly
  racey. With changes for the above problems it became more racey, so all
  uses of this head pointer was wrapped with a NFSLOCKMNT()/NFSUNLOCKMNT().
- Although the kernel malloc() usually allocates more bytes than requested
  and, as such, this wouldn't have caused problems, the allocation of a
  session structure was 1 byte smaller than it should have been.
  (Null termination byte for the string not included in byte count.)

There are probably still problems with a pNFS data server that fails
with NFS4ERR_BAD_SESSION, but I have no server that does this to test
against (the AmazonEFS server doesn't do pNFS), so I can't fix these yet.

Although this patch is fairly large, it should only affect the handling
of NFS4ERR_BAD_SESSION error replies from an NFSv4.1 server.
Thanks go to cperciva@ for the extension testing he did to help isolate/fix
these problems.

Reported by:	cperciva
Tested by:	cperciva
MFC after:	3 months
Differential Revision:	https://reviews.freebsd.org/D8745
2016-12-23 23:14:53 +00:00
alc
924c556274 When tmpfs and POSIX shm pagein a page for the sole purpose of performing
truncation, immediately queue the page for asynchronous laundering rather
than making the page pass through inactive queue first.

Reviewed by:	kib, markj
2016-12-11 19:24:41 +00:00
rmacklem
05c246d986 Fix the NFSv4.1 server for Open reclaim after a reboot.
The NFSv4.1 server failed to update the nfs-stablerestart file for
a client when the client was issued its first Open. As such, recovery
of Opens after a server reboot failed with NFSERR_NOGRACE.
This patch fixes this.
It also changes the code so that it malloc()'s the 1024 byte array
instead of allocating it on the kernel stack for both NFSv4.0 and NFSv4.1.
Note that this bug only affected NFSv4.1 and only when clients attempted
to reclaim Opens after a server reboot.

MFC after:	2 weeks
2016-12-05 22:36:25 +00:00
pfg
a34b4baca3 ext2fs: renumber the license clauses to avoid skipping #3.
This is to keep consistency with other files, and help license-checking
utilities determine the number of clauses that apply.

No functional change.
2016-12-02 19:47:23 +00:00
kib
82f9c275c4 NFSv4 client tracks opens, and the track records are only dropped when
the vnode is inactivated.  This contradicts with the nullfs caching
which keeps upper vnode around, as consequence keeping the use
reference to lower vnode.

Add a filesystem flag to request nullfs to not cache when mounted over
that filesystem, and set the flag for nfs v4 mounts.

Reported by:	asomers
Reviewed by:	rmacklem
Tested by:	asomers, rmacklem
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-11-27 09:20:58 +00:00
pfg
e5c648e9d3 ext2: avoid possible overflow when calculating malloc size.
This is inspired on r308064 for case of reloading UFS.

MFC after:	1 week
2016-11-26 02:06:33 +00:00
rmacklem
4a6ea51885 Stop "nfsstat -z" from clearing counts of NFSv4 state structures.
The "-z" option on nfsstats was erroneously zeroing out the counts
of NFSv4 state structures. These counts will normally go back down
to zero as state is released. When zeroed out by "-z", these counts
can go negative. This patch fixes this problem.

MFC after:	2 weeks
2016-11-25 23:28:09 +00:00
markj
4159d33f6b Release laundered vnode pages to the head of the inactive queue.
The swap pager enqueues laundered pages near the head of the inactive queue
to avoid another trip through LRU before reclamation. This change adds
support for this behaviour to the vnode pager and makes use of it in UFS and
ext2fs. Some ioflag handling is consolidated into a common subroutine so
that this support can be easily extended to other filesystems which make use
of the buffer cache. No changes are needed for ZFS since its putpages
routine always undirties the pages before returning, and the laundry
thread requeues the pages appropriately in this case.

Reviewed by:	alc, kib
Differential Revision:	https://reviews.freebsd.org/D8589
2016-11-23 17:53:07 +00:00
alc
4be9876033 Remove PG_CACHED-related fields from struct vmmeter, because they are no
longer used.  More precisely, they are always zero because the code that
decremented and incremented them no longer exists.

Bump __FreeBSD_version to mark this change.

Reviewed by:	kib, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8583
2016-11-22 18:13:46 +00:00
kib
46c724e4a0 On error, bread(9) zeroes buffer pointer, do not dereference it.
See r294954 for the bread(9) change and r297401 for similar cd9660 fix.

Reported and tested by:	Joshua Kinard <kumba@gentoo.org>
PR:	214705
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-11-22 13:24:57 +00:00
kib
ed311f1e82 Use buffer pager for NFS.
The pager, due to its construction, implements clustering for the
page-ins.  In particular, buildworld load demonstrates reduction of
the READ RPCs from 39k down to 24k.  No change in real or CPU time was
observed.

Discussed with, and measured by:	bde
No objections from:	rmacklem
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-11-22 10:58:24 +00:00
kib
882d53922b Minor cleanup, remove unneeded XXX comments and unused re-define.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-11-22 10:24:59 +00:00
cperciva
b7810553a1 Reduce NFS "NFSv4( mounted on)? fileid > 32bits" log spam.
Rather than printing a warning for every time we receive a fileid > 2^32
from the NFS server, count warnings and print at most one of each warning
type per minute, e.g.,

Nov 15 05:17:34 ip-172-30-1-221 kernel: NFSv4 fileid > 32bits (24730 occurrences)
Nov 15 05:17:56 ip-172-30-1-221 kernel: NFSv4 mounted on fileid > 32bits (178 occurrences)
Nov 15 05:18:53 ip-172-30-1-221 kernel: NFSv4 fileid > 32bits (7582 occurrences)
Nov 15 05:18:58 ip-172-30-1-221 kernel: NFSv4 mounted on fileid > 32bits (23 occurrences)

A buildworld with an NFS mounted /usr/obj can otherwise result in
hundreds of thousands of lines being printed, which seems unnecessarily
verbose.

When ino_t becomes a 64-bit type, these printfs will no longer be needed
(and the problems associated with truncating 64-bit fileids to generate
32-bit inode numbers will also go away).

Reviewed by:	rmacklem
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D8523
2016-11-16 01:11:49 +00:00
alc
2fa3607305 Remove most of the code for implementing PG_CACHED pages. (This change does
not remove user-space visible fields from vm_cnt or all of the references to
cached pages from comments.  Those changes will come later.)

Reviewed by:	kib, markj
Tested by:	pho
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8497
2016-11-15 18:22:50 +00:00
trasz
2b55107720 Remove spurious space.
MFC after:	1 month
2016-11-13 12:06:25 +00:00
bdrewery
30f99dbeef Fix improper use of "its".
Sponsored by:	Dell EMC Isilon
2016-11-08 23:59:41 +00:00
trasz
e61af21d3a Value returned by taskqueue_enqueue_timeout(9) is not an error; don't treat
it as such.

MFC after:	1 month
2016-11-05 12:30:10 +00:00
kib
a41f4cc9a5 Allow some dotdot lookups in capability mode.
If dotdot lookup does not escape from the file descriptor passed as
the lookup root, we can allow the component traversal.  Track the
directories traversed, and check the result of dotdot lookup against
the recorded list of the directory vnodes.

Dotdot lookups are enabled by sysctl vfs.lookup_cap_dotdot, currently
disabled by default until more verification of the approach is done.

Disallow non-local filesystems for dotdot, since remote server might
conspire with the local process to allow it to escape the namespace.
This might be too cautious, provide the knob
vfs.lookup_cap_dotdot_nonlocal to override as well.

Idea by:	rwatson
Discussed with:	emaste, jonathan, rwatson
Reviewed by:	mjg (previous version)
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 week
Differential revision:	https://reviews.freebsd.org/D8110
2016-11-02 12:43:15 +00:00
kib
bdd259c16e Use buffer pager for cd9660.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-10-28 11:46:39 +00:00
kib
2d6cf591a0 Use buffer pager for msdosfs.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-10-28 11:46:15 +00:00
kib
84700300cf Enable vn_io_fault() deadlock avoidance for msdosfs.
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-10-28 11:35:06 +00:00