Commit Graph

3489 Commits

Author SHA1 Message Date
Pedro F. Giffuni
a86fdf887e Undo small wrong style change.
Reported by:	kib
2016-12-28 16:16:36 +00:00
Pedro F. Giffuni
bf9a211dff style(9) cleanups.
Just to reduce some of the issues found with indent(1).

MFC after:	1 week
2016-12-28 15:43:17 +00:00
Rick Macklem
b2fc0141d9 Fix NFSv4.1 client recovery from NFS4ERR_BAD_SESSION errors.
For most NFSv4.1 servers, a NFS4ERR_BAD_SESSION error is a rare failure
that indicates that the server has lost session/open/lock state.
However, recent testing by cperciva@ against the AmazonEFS server found
several problems with client recovery from this due to it generating this
failure frequently.
Briefly, the problems fixed are:
- If all session slots were in use at the time of the failure, some processes
  would continue to loop waiting for a slot on the old session forever.
- If an RPC that doesn't use open/lock state failed with NFS4ERR_BAD_SESSION,
  it would fail the RPC/syscall instead of initiating recovery and then
  looping to retry the RPC.
- If a successful reply to an RPC for an old session wasn't processed
  until after a new session was created for a NFS4ERR_BAD_SESSION error,
  it would erroneously update the new session and corrupt it.
- The use of the first element of the session list in the nfs mount
  structure (which is always the current metadata session) was slightly
  racey. With changes for the above problems it became more racey, so all
  uses of this head pointer was wrapped with a NFSLOCKMNT()/NFSUNLOCKMNT().
- Although the kernel malloc() usually allocates more bytes than requested
  and, as such, this wouldn't have caused problems, the allocation of a
  session structure was 1 byte smaller than it should have been.
  (Null termination byte for the string not included in byte count.)

There are probably still problems with a pNFS data server that fails
with NFS4ERR_BAD_SESSION, but I have no server that does this to test
against (the AmazonEFS server doesn't do pNFS), so I can't fix these yet.

Although this patch is fairly large, it should only affect the handling
of NFS4ERR_BAD_SESSION error replies from an NFSv4.1 server.
Thanks go to cperciva@ for the extension testing he did to help isolate/fix
these problems.

Reported by:	cperciva
Tested by:	cperciva
MFC after:	3 months
Differential Revision:	https://reviews.freebsd.org/D8745
2016-12-23 23:14:53 +00:00
Alan Cox
2d612d2dd2 When tmpfs and POSIX shm pagein a page for the sole purpose of performing
truncation, immediately queue the page for asynchronous laundering rather
than making the page pass through inactive queue first.

Reviewed by:	kib, markj
2016-12-11 19:24:41 +00:00
Rick Macklem
a5d19b81b4 Fix the NFSv4.1 server for Open reclaim after a reboot.
The NFSv4.1 server failed to update the nfs-stablerestart file for
a client when the client was issued its first Open. As such, recovery
of Opens after a server reboot failed with NFSERR_NOGRACE.
This patch fixes this.
It also changes the code so that it malloc()'s the 1024 byte array
instead of allocating it on the kernel stack for both NFSv4.0 and NFSv4.1.
Note that this bug only affected NFSv4.1 and only when clients attempted
to reclaim Opens after a server reboot.

MFC after:	2 weeks
2016-12-05 22:36:25 +00:00
Pedro F. Giffuni
fca15474a0 ext2fs: renumber the license clauses to avoid skipping #3.
This is to keep consistency with other files, and help license-checking
utilities determine the number of clauses that apply.

No functional change.
2016-12-02 19:47:23 +00:00
Konstantin Belousov
abc1515601 NFSv4 client tracks opens, and the track records are only dropped when
the vnode is inactivated.  This contradicts with the nullfs caching
which keeps upper vnode around, as consequence keeping the use
reference to lower vnode.

Add a filesystem flag to request nullfs to not cache when mounted over
that filesystem, and set the flag for nfs v4 mounts.

Reported by:	asomers
Reviewed by:	rmacklem
Tested by:	asomers, rmacklem
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-11-27 09:20:58 +00:00
Pedro F. Giffuni
bb9535bbc7 ext2: avoid possible overflow when calculating malloc size.
This is inspired on r308064 for case of reloading UFS.

MFC after:	1 week
2016-11-26 02:06:33 +00:00
Rick Macklem
1a2079d936 Stop "nfsstat -z" from clearing counts of NFSv4 state structures.
The "-z" option on nfsstats was erroneously zeroing out the counts
of NFSv4 state structures. These counts will normally go back down
to zero as state is released. When zeroed out by "-z", these counts
can go negative. This patch fixes this problem.

MFC after:	2 weeks
2016-11-25 23:28:09 +00:00
Mark Johnston
99e6e1930c Release laundered vnode pages to the head of the inactive queue.
The swap pager enqueues laundered pages near the head of the inactive queue
to avoid another trip through LRU before reclamation. This change adds
support for this behaviour to the vnode pager and makes use of it in UFS and
ext2fs. Some ioflag handling is consolidated into a common subroutine so
that this support can be easily extended to other filesystems which make use
of the buffer cache. No changes are needed for ZFS since its putpages
routine always undirties the pages before returning, and the laundry
thread requeues the pages appropriately in this case.

Reviewed by:	alc, kib
Differential Revision:	https://reviews.freebsd.org/D8589
2016-11-23 17:53:07 +00:00
Alan Cox
bba39b9ae3 Remove PG_CACHED-related fields from struct vmmeter, because they are no
longer used.  More precisely, they are always zero because the code that
decremented and incremented them no longer exists.

Bump __FreeBSD_version to mark this change.

Reviewed by:	kib, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8583
2016-11-22 18:13:46 +00:00
Konstantin Belousov
1fa81dab7d On error, bread(9) zeroes buffer pointer, do not dereference it.
See r294954 for the bread(9) change and r297401 for similar cd9660 fix.

Reported and tested by:	Joshua Kinard <kumba@gentoo.org>
PR:	214705
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-11-22 13:24:57 +00:00
Konstantin Belousov
753a007f0d Use buffer pager for NFS.
The pager, due to its construction, implements clustering for the
page-ins.  In particular, buildworld load demonstrates reduction of
the READ RPCs from 39k down to 24k.  No change in real or CPU time was
observed.

Discussed with, and measured by:	bde
No objections from:	rmacklem
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-11-22 10:58:24 +00:00
Konstantin Belousov
fc2c3afee0 Minor cleanup, remove unneeded XXX comments and unused re-define.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-11-22 10:24:59 +00:00
Colin Percival
63659ba6df Reduce NFS "NFSv4( mounted on)? fileid > 32bits" log spam.
Rather than printing a warning for every time we receive a fileid > 2^32
from the NFS server, count warnings and print at most one of each warning
type per minute, e.g.,

Nov 15 05:17:34 ip-172-30-1-221 kernel: NFSv4 fileid > 32bits (24730 occurrences)
Nov 15 05:17:56 ip-172-30-1-221 kernel: NFSv4 mounted on fileid > 32bits (178 occurrences)
Nov 15 05:18:53 ip-172-30-1-221 kernel: NFSv4 fileid > 32bits (7582 occurrences)
Nov 15 05:18:58 ip-172-30-1-221 kernel: NFSv4 mounted on fileid > 32bits (23 occurrences)

A buildworld with an NFS mounted /usr/obj can otherwise result in
hundreds of thousands of lines being printed, which seems unnecessarily
verbose.

When ino_t becomes a 64-bit type, these printfs will no longer be needed
(and the problems associated with truncating 64-bit fileids to generate
32-bit inode numbers will also go away).

Reviewed by:	rmacklem
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D8523
2016-11-16 01:11:49 +00:00
Alan Cox
7667839a7e Remove most of the code for implementing PG_CACHED pages. (This change does
not remove user-space visible fields from vm_cnt or all of the references to
cached pages from comments.  Those changes will come later.)

Reviewed by:	kib, markj
Tested by:	pho
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8497
2016-11-15 18:22:50 +00:00
Edward Tomasz Napierala
53232c0d1d Remove spurious space.
MFC after:	1 month
2016-11-13 12:06:25 +00:00
Bryan Drewery
28323add09 Fix improper use of "its".
Sponsored by:	Dell EMC Isilon
2016-11-08 23:59:41 +00:00
Edward Tomasz Napierala
a79e9d0fec Value returned by taskqueue_enqueue_timeout(9) is not an error; don't treat
it as such.

MFC after:	1 month
2016-11-05 12:30:10 +00:00
Konstantin Belousov
7359fdcf5f Allow some dotdot lookups in capability mode.
If dotdot lookup does not escape from the file descriptor passed as
the lookup root, we can allow the component traversal.  Track the
directories traversed, and check the result of dotdot lookup against
the recorded list of the directory vnodes.

Dotdot lookups are enabled by sysctl vfs.lookup_cap_dotdot, currently
disabled by default until more verification of the approach is done.

Disallow non-local filesystems for dotdot, since remote server might
conspire with the local process to allow it to escape the namespace.
This might be too cautious, provide the knob
vfs.lookup_cap_dotdot_nonlocal to override as well.

Idea by:	rwatson
Discussed with:	emaste, jonathan, rwatson
Reviewed by:	mjg (previous version)
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 week
Differential revision:	https://reviews.freebsd.org/D8110
2016-11-02 12:43:15 +00:00
Konstantin Belousov
c329ee711b Use buffer pager for cd9660.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-10-28 11:46:39 +00:00
Konstantin Belousov
06965e96b3 Use buffer pager for msdosfs.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-10-28 11:46:15 +00:00
Konstantin Belousov
2aa3944510 Enable vn_io_fault() deadlock avoidance for msdosfs.
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-10-28 11:35:06 +00:00
Konstantin Belousov
b05088aeeb Ensure that cluster allocations never allocate clusters outside the
volume limits.  In particular:
- Assert that usemap_alloc() and usemap_free() cluster number argument
  is valid.
- In chainlength(), return 0 if cluster start is after the max cluster.
- In chainlength(), cut the calculated cluster chain length at the max
  cluster.
- For true paranoia, after the pm_inusemap is calculated in
  fillinusemap(), reset all bits in the array for clusters after the
  max cluster, as in-use.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-10-28 11:34:32 +00:00
Konstantin Belousov
03b8a419e4 If the fatchain() call in chainalloc() returned an error, revert
marking the cluster run as in-use.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-10-28 11:26:44 +00:00
Konstantin Belousov
f33d62b2d2 Use symbolic name for the value of fully free word in pm_inusemap.
Explicitely mention every bit in the value.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-10-28 11:23:36 +00:00
Konstantin Belousov
1c4ec415e2 Use symbolic name for the free cluster number.
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-10-28 11:01:49 +00:00
Konstantin Belousov
f220587d03 Fix comment formatting.
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-10-28 10:59:34 +00:00
Konstantin Belousov
8b3c8abc45 Remove useless NULL check.
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-10-28 10:57:41 +00:00
Rick Macklem
dcb19c3886 A problem w.r.t. interoperation between the FreeBSD NFSv4.1 server with
delegations enabled and the Linux NFSv4.1 client was reported in
reviews.freebsd.org/D7891.
I believe that the FreeBSD server behaviour conforms to the RFC and that
the Linux client has a bug. Therefore, I do not think the proposed patch
is appropriate. When nfsrv_writedelegifpos is non-zero, the FreeBSD
server will issue a write delegation for a read open if possible.
The Linux client then erroneously assumes that the credentials used for
the read open can write the file.
This patch reverses the default value for nfsrv_writedelegifpos to 0 so
that the default behaviour is Linux compatible and adds a sysctl that can
be used to set nfsrv_writedelegifpos.

This change should only affect users that are mounting a FreeBSD server
with delegations enabled (they are not enabled by default) with a Linux
NFSv4.1 client mount.

Reported by:	fatih.acar@gandi.net
Tested by:	fatih.acar@gandi.net
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D7891
2016-10-20 23:53:16 +00:00
Ganael LAPLANCHE
2ee1ec5d82 Fix panic() message reporting ufs instead of nandfs
PR:		213438
Approved by:	kib
2016-10-13 19:33:07 +00:00
Mateusz Guzik
8660b707ff vfs: remove the __bo_vnode field from struct vnode
The pointer can be obtained using __containerof instead.

Reviewed by:	kib
2016-09-30 17:11:03 +00:00
Alan Somers
0696afbe09 Mount msdosfs with longnames support by default.
The old behavior depended on the FAT version and on what files were in the
root directory. "mount_msdosfs -o shortnames" is still supported.

Reviewed by:	wblock, cem
Discussed with:	trasz, adrian, imp
MFC after:	4 weeks
X-MFC-Notes:	Don't MFC the removal of findwin95
Differential Revision:	https://reviews.freebsd.org/D8018
2016-09-23 19:05:07 +00:00
Hans Petter Selasky
19f1b6fb7b Prevent cuse4bsd.ko and cuse.ko from loading at the same time by
declaring support for the cuse4bsd interface in cuse.ko.

Found by:	Sergey V. Dyatko <sergey.dyatko@gmail.com>
MFC after:	1 week
2016-09-23 07:41:23 +00:00
Edward Tomasz Napierala
e583d99909 Change the getnewvnode(9) tag for nullfs from "null" to "nullfs".
It's more consistent, and besides, the "null" alone looks weird.

MFC after:	1 month
2016-09-15 13:57:37 +00:00
Mateusz Guzik
6a3e46059a nullfs: plug vnode ref leak in null_vptocnp
The lower vnode is already referenced and nodeget is supposed to consume
the reference. Thus the extra vref call was causing a leak.

Reported by:	pho
Reviewed by:	kib
MFC after:	1 week
2016-09-09 10:40:55 +00:00
Mateusz Guzik
2740551545 nullfs: stop special-casing directories in null_vptocnp
The previous code was forcing an expensive walk in vop_stdvptocnp,
which was causing performance issues on highly contended zfs.

No objections:	kib
MFC after:	2 weeks
2016-09-06 21:22:03 +00:00
Konstantin Belousov
47e61f6cc6 Implement VOP_FDATASYNC() for msdosfs.
Standard VOP_FSYNC() implementation just syncs data buffers, and due
to this, is the correct and efficient implementation for msdosfs or
any other filesystem which uses bufer cache trivially.  Provide
globally visible wrapper vop_stdfdatasync_buf() for future consumption
by other filesystems.

Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D7471
2016-08-15 19:17:00 +00:00
Rick Macklem
1b819cf265 Update the nfsstats structure to include the changes needed by
the patch in D1626 plus changes so that it includes counts for
NFSv4.1 (and the draft of NFSv4.2).
Also, make all the counts uint64_t and add a vers field at the
beginning, so that future revisions can easily be implemented.
There is code in place to handle the old vesion of the nfsstats
structure for backwards binary compatibility.

Subsequent commits will update nfsstat(8) to use the new fields.

Submitted by:	will (earlier version)
Reviewed by:	ken
MFC after:	1 month
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D1626
2016-08-12 22:44:59 +00:00
Edward Tomasz Napierala
c12582546e Implement autofs_print(), for improved debugging experience.
MFC after:	1 month
2016-08-11 14:27:23 +00:00
Edward Tomasz Napierala
411455a8fb Replace all remaining calls to vprint(9) with vn_printf(9), and remove
the old macro.

MFC after:	1 month
2016-08-10 16:12:31 +00:00
Konstantin Belousov
15ad3e51c5 Convert another tmpfs assert into runtime check.
The offset of the directory file, passed to getdirentries(2) syscall,
is user-controllable.  The value of the offset must not be asserted,
instead the invalid value should be checked and rejected if invalid.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-08-10 13:50:21 +00:00
Pedro F. Giffuni
25efc7c822 ext2fs: Add defines for some missing ext4 feature flags.
These are currently unused in our implementation and some even appear to
have not been implemented yet on linux but it is good to keep them for
reference.

Obtained from:	NetBSD (CVS Rev. 1.41)
MFC after:	1 month
2016-08-06 17:24:35 +00:00
Pedro F. Giffuni
ef4736fea8 ext2fs: Add some more inode flags.
These are currently unused in out implementation but it is good to keep
them for reference.

Obtained from:	NetBSD (CVS Rev. 1.35)
MFC after:	1 month
2016-08-06 16:48:40 +00:00
Konstantin Belousov
ad600ac8e3 Remove ncl_printf(), use printf(9) directly. After r303710 the
function duplicates printf().

Correct function names in the messages [*].

Noted by:	bde [*]
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-08-03 15:58:20 +00:00
Konstantin Belousov
83d7cf21ea Remove unneeded (recursing) Giant acquisition around vprintf(9).
Reviewed by:	rmacklem
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-08-03 11:49:17 +00:00
Konstantin Belousov
29ffb32ccd Remove Giant asserts. Update comment.
Owning Giant in the init/uninit is accidental due to the moment where
VFS modules initialization is performed, and is not enforced by the
VFS interface.  The Giant lock does not prevent a parallel execution
of the code, it is VFS which implements the proper protocol.

Approved by:	des (pseudofs maintainer)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-08-03 08:57:15 +00:00
Konstantin Belousov
6828ba639a Some style changes. Fix a typo in comment.
Approved by:	des (pseudofs maintainer)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-08-03 08:53:29 +00:00
Edward Tomasz Napierala
905807264d Remove write-only variable.
MFC after:	1 month
2016-07-29 12:15:55 +00:00
Konstantin Belousov
584b675ed6 Hide the boottime and bootimebin globals, provide the getboottime(9)
and getboottimebin(9) KPI. Change consumers of boottime to use the
KPI.  The variables were renamed to avoid shadowing issues with local
variables of the same name.

Issue is that boottime* should be adjusted from tc_windup(), which
requires them to be members of the timehands structure.  As a
preparation, this commit only introduces the interface.

Some uses of boottime were found doubtful, e.g. NLM uses boottime to
identify the system boot instance.  Arguably the identity should not
change on the leap second adjustment, but the commit is about the
timekeeping code and the consumers were kept bug-to-bug compatible.

Tested by:	pho (as part of the bigger patch)
Reviewed by:	jhb (same)
Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
X-Differential revision:	https://reviews.freebsd.org/D7302
2016-07-27 11:08:59 +00:00