Commit Graph

3698 Commits

Author SHA1 Message Date
kib
f8b9008a47 Remove dead code.
Fifos overwrite file ops vector, and fifo VOP_KQFILTER is VOP_PANIC().

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-06 17:03:08 +00:00
kib
7c67dd5f60 Use type-independent formats for printing nlink_t and ino_t.
Extracted from:	ino64 work by gleb, mckusick
Discussed with:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-06 16:59:33 +00:00
kib
5def9fa2c2 Do not allocate struct statfs on kernel stack.
Right now size of the structure is 472 bytes on amd64, which is
already large and stack allocations are indesirable.  With the ino64
work, MNAMELEN is increased to 1024, which will make it impossible to have
struct statfs on the stack.

Extracted from:	ino64 work by gleb
Discussed with:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-05 17:19:26 +00:00
jpaetzel
cbc978682d Workaround NFS bug with readdirplus when there are greater than 1 billion files in a filesystem.
Reviewed by	kib
MFC after:	2 weeks
Sponsored by:	iXsystems
Differential Revision:	D9009
2017-01-02 19:18:56 +00:00
pfg
1f1abed933 Undo small wrong style change.
Reported by:	kib
2016-12-28 16:16:36 +00:00
pfg
8fb4a19fe0 style(9) cleanups.
Just to reduce some of the issues found with indent(1).

MFC after:	1 week
2016-12-28 15:43:17 +00:00
rmacklem
cbf29bafe0 Fix NFSv4.1 client recovery from NFS4ERR_BAD_SESSION errors.
For most NFSv4.1 servers, a NFS4ERR_BAD_SESSION error is a rare failure
that indicates that the server has lost session/open/lock state.
However, recent testing by cperciva@ against the AmazonEFS server found
several problems with client recovery from this due to it generating this
failure frequently.
Briefly, the problems fixed are:
- If all session slots were in use at the time of the failure, some processes
  would continue to loop waiting for a slot on the old session forever.
- If an RPC that doesn't use open/lock state failed with NFS4ERR_BAD_SESSION,
  it would fail the RPC/syscall instead of initiating recovery and then
  looping to retry the RPC.
- If a successful reply to an RPC for an old session wasn't processed
  until after a new session was created for a NFS4ERR_BAD_SESSION error,
  it would erroneously update the new session and corrupt it.
- The use of the first element of the session list in the nfs mount
  structure (which is always the current metadata session) was slightly
  racey. With changes for the above problems it became more racey, so all
  uses of this head pointer was wrapped with a NFSLOCKMNT()/NFSUNLOCKMNT().
- Although the kernel malloc() usually allocates more bytes than requested
  and, as such, this wouldn't have caused problems, the allocation of a
  session structure was 1 byte smaller than it should have been.
  (Null termination byte for the string not included in byte count.)

There are probably still problems with a pNFS data server that fails
with NFS4ERR_BAD_SESSION, but I have no server that does this to test
against (the AmazonEFS server doesn't do pNFS), so I can't fix these yet.

Although this patch is fairly large, it should only affect the handling
of NFS4ERR_BAD_SESSION error replies from an NFSv4.1 server.
Thanks go to cperciva@ for the extension testing he did to help isolate/fix
these problems.

Reported by:	cperciva
Tested by:	cperciva
MFC after:	3 months
Differential Revision:	https://reviews.freebsd.org/D8745
2016-12-23 23:14:53 +00:00
alc
924c556274 When tmpfs and POSIX shm pagein a page for the sole purpose of performing
truncation, immediately queue the page for asynchronous laundering rather
than making the page pass through inactive queue first.

Reviewed by:	kib, markj
2016-12-11 19:24:41 +00:00
rmacklem
05c246d986 Fix the NFSv4.1 server for Open reclaim after a reboot.
The NFSv4.1 server failed to update the nfs-stablerestart file for
a client when the client was issued its first Open. As such, recovery
of Opens after a server reboot failed with NFSERR_NOGRACE.
This patch fixes this.
It also changes the code so that it malloc()'s the 1024 byte array
instead of allocating it on the kernel stack for both NFSv4.0 and NFSv4.1.
Note that this bug only affected NFSv4.1 and only when clients attempted
to reclaim Opens after a server reboot.

MFC after:	2 weeks
2016-12-05 22:36:25 +00:00
pfg
a34b4baca3 ext2fs: renumber the license clauses to avoid skipping #3.
This is to keep consistency with other files, and help license-checking
utilities determine the number of clauses that apply.

No functional change.
2016-12-02 19:47:23 +00:00
kib
82f9c275c4 NFSv4 client tracks opens, and the track records are only dropped when
the vnode is inactivated.  This contradicts with the nullfs caching
which keeps upper vnode around, as consequence keeping the use
reference to lower vnode.

Add a filesystem flag to request nullfs to not cache when mounted over
that filesystem, and set the flag for nfs v4 mounts.

Reported by:	asomers
Reviewed by:	rmacklem
Tested by:	asomers, rmacklem
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-11-27 09:20:58 +00:00
pfg
e5c648e9d3 ext2: avoid possible overflow when calculating malloc size.
This is inspired on r308064 for case of reloading UFS.

MFC after:	1 week
2016-11-26 02:06:33 +00:00
rmacklem
4a6ea51885 Stop "nfsstat -z" from clearing counts of NFSv4 state structures.
The "-z" option on nfsstats was erroneously zeroing out the counts
of NFSv4 state structures. These counts will normally go back down
to zero as state is released. When zeroed out by "-z", these counts
can go negative. This patch fixes this problem.

MFC after:	2 weeks
2016-11-25 23:28:09 +00:00
markj
4159d33f6b Release laundered vnode pages to the head of the inactive queue.
The swap pager enqueues laundered pages near the head of the inactive queue
to avoid another trip through LRU before reclamation. This change adds
support for this behaviour to the vnode pager and makes use of it in UFS and
ext2fs. Some ioflag handling is consolidated into a common subroutine so
that this support can be easily extended to other filesystems which make use
of the buffer cache. No changes are needed for ZFS since its putpages
routine always undirties the pages before returning, and the laundry
thread requeues the pages appropriately in this case.

Reviewed by:	alc, kib
Differential Revision:	https://reviews.freebsd.org/D8589
2016-11-23 17:53:07 +00:00
alc
4be9876033 Remove PG_CACHED-related fields from struct vmmeter, because they are no
longer used.  More precisely, they are always zero because the code that
decremented and incremented them no longer exists.

Bump __FreeBSD_version to mark this change.

Reviewed by:	kib, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8583
2016-11-22 18:13:46 +00:00
kib
46c724e4a0 On error, bread(9) zeroes buffer pointer, do not dereference it.
See r294954 for the bread(9) change and r297401 for similar cd9660 fix.

Reported and tested by:	Joshua Kinard <kumba@gentoo.org>
PR:	214705
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-11-22 13:24:57 +00:00
kib
ed311f1e82 Use buffer pager for NFS.
The pager, due to its construction, implements clustering for the
page-ins.  In particular, buildworld load demonstrates reduction of
the READ RPCs from 39k down to 24k.  No change in real or CPU time was
observed.

Discussed with, and measured by:	bde
No objections from:	rmacklem
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-11-22 10:58:24 +00:00
kib
882d53922b Minor cleanup, remove unneeded XXX comments and unused re-define.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-11-22 10:24:59 +00:00
cperciva
b7810553a1 Reduce NFS "NFSv4( mounted on)? fileid > 32bits" log spam.
Rather than printing a warning for every time we receive a fileid > 2^32
from the NFS server, count warnings and print at most one of each warning
type per minute, e.g.,

Nov 15 05:17:34 ip-172-30-1-221 kernel: NFSv4 fileid > 32bits (24730 occurrences)
Nov 15 05:17:56 ip-172-30-1-221 kernel: NFSv4 mounted on fileid > 32bits (178 occurrences)
Nov 15 05:18:53 ip-172-30-1-221 kernel: NFSv4 fileid > 32bits (7582 occurrences)
Nov 15 05:18:58 ip-172-30-1-221 kernel: NFSv4 mounted on fileid > 32bits (23 occurrences)

A buildworld with an NFS mounted /usr/obj can otherwise result in
hundreds of thousands of lines being printed, which seems unnecessarily
verbose.

When ino_t becomes a 64-bit type, these printfs will no longer be needed
(and the problems associated with truncating 64-bit fileids to generate
32-bit inode numbers will also go away).

Reviewed by:	rmacklem
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D8523
2016-11-16 01:11:49 +00:00
alc
2fa3607305 Remove most of the code for implementing PG_CACHED pages. (This change does
not remove user-space visible fields from vm_cnt or all of the references to
cached pages from comments.  Those changes will come later.)

Reviewed by:	kib, markj
Tested by:	pho
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8497
2016-11-15 18:22:50 +00:00
trasz
2b55107720 Remove spurious space.
MFC after:	1 month
2016-11-13 12:06:25 +00:00
bdrewery
30f99dbeef Fix improper use of "its".
Sponsored by:	Dell EMC Isilon
2016-11-08 23:59:41 +00:00
trasz
e61af21d3a Value returned by taskqueue_enqueue_timeout(9) is not an error; don't treat
it as such.

MFC after:	1 month
2016-11-05 12:30:10 +00:00
kib
a41f4cc9a5 Allow some dotdot lookups in capability mode.
If dotdot lookup does not escape from the file descriptor passed as
the lookup root, we can allow the component traversal.  Track the
directories traversed, and check the result of dotdot lookup against
the recorded list of the directory vnodes.

Dotdot lookups are enabled by sysctl vfs.lookup_cap_dotdot, currently
disabled by default until more verification of the approach is done.

Disallow non-local filesystems for dotdot, since remote server might
conspire with the local process to allow it to escape the namespace.
This might be too cautious, provide the knob
vfs.lookup_cap_dotdot_nonlocal to override as well.

Idea by:	rwatson
Discussed with:	emaste, jonathan, rwatson
Reviewed by:	mjg (previous version)
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 week
Differential revision:	https://reviews.freebsd.org/D8110
2016-11-02 12:43:15 +00:00
kib
bdd259c16e Use buffer pager for cd9660.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-10-28 11:46:39 +00:00
kib
2d6cf591a0 Use buffer pager for msdosfs.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-10-28 11:46:15 +00:00
kib
84700300cf Enable vn_io_fault() deadlock avoidance for msdosfs.
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-10-28 11:35:06 +00:00
kib
097a1d5fbb Ensure that cluster allocations never allocate clusters outside the
volume limits.  In particular:
- Assert that usemap_alloc() and usemap_free() cluster number argument
  is valid.
- In chainlength(), return 0 if cluster start is after the max cluster.
- In chainlength(), cut the calculated cluster chain length at the max
  cluster.
- For true paranoia, after the pm_inusemap is calculated in
  fillinusemap(), reset all bits in the array for clusters after the
  max cluster, as in-use.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-10-28 11:34:32 +00:00
kib
65a0ccdfc8 If the fatchain() call in chainalloc() returned an error, revert
marking the cluster run as in-use.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-10-28 11:26:44 +00:00
kib
1e5991e494 Use symbolic name for the value of fully free word in pm_inusemap.
Explicitely mention every bit in the value.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-10-28 11:23:36 +00:00
kib
01e0e13b85 Use symbolic name for the free cluster number.
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-10-28 11:01:49 +00:00
kib
1ba1829b64 Fix comment formatting.
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-10-28 10:59:34 +00:00
kib
80c583ea78 Remove useless NULL check.
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-10-28 10:57:41 +00:00
rmacklem
9c3c069006 A problem w.r.t. interoperation between the FreeBSD NFSv4.1 server with
delegations enabled and the Linux NFSv4.1 client was reported in
reviews.freebsd.org/D7891.
I believe that the FreeBSD server behaviour conforms to the RFC and that
the Linux client has a bug. Therefore, I do not think the proposed patch
is appropriate. When nfsrv_writedelegifpos is non-zero, the FreeBSD
server will issue a write delegation for a read open if possible.
The Linux client then erroneously assumes that the credentials used for
the read open can write the file.
This patch reverses the default value for nfsrv_writedelegifpos to 0 so
that the default behaviour is Linux compatible and adds a sysctl that can
be used to set nfsrv_writedelegifpos.

This change should only affect users that are mounting a FreeBSD server
with delegations enabled (they are not enabled by default) with a Linux
NFSv4.1 client mount.

Reported by:	fatih.acar@gandi.net
Tested by:	fatih.acar@gandi.net
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D7891
2016-10-20 23:53:16 +00:00
martymac
786a65db12 Fix panic() message reporting ufs instead of nandfs
PR:		213438
Approved by:	kib
2016-10-13 19:33:07 +00:00
mjg
6a50fe29a5 vfs: remove the __bo_vnode field from struct vnode
The pointer can be obtained using __containerof instead.

Reviewed by:	kib
2016-09-30 17:11:03 +00:00
asomers
ca02d20de1 Mount msdosfs with longnames support by default.
The old behavior depended on the FAT version and on what files were in the
root directory. "mount_msdosfs -o shortnames" is still supported.

Reviewed by:	wblock, cem
Discussed with:	trasz, adrian, imp
MFC after:	4 weeks
X-MFC-Notes:	Don't MFC the removal of findwin95
Differential Revision:	https://reviews.freebsd.org/D8018
2016-09-23 19:05:07 +00:00
hselasky
7169d20b74 Prevent cuse4bsd.ko and cuse.ko from loading at the same time by
declaring support for the cuse4bsd interface in cuse.ko.

Found by:	Sergey V. Dyatko <sergey.dyatko@gmail.com>
MFC after:	1 week
2016-09-23 07:41:23 +00:00
trasz
a6a8ef1821 Change the getnewvnode(9) tag for nullfs from "null" to "nullfs".
It's more consistent, and besides, the "null" alone looks weird.

MFC after:	1 month
2016-09-15 13:57:37 +00:00
mjg
0f1a94c426 nullfs: plug vnode ref leak in null_vptocnp
The lower vnode is already referenced and nodeget is supposed to consume
the reference. Thus the extra vref call was causing a leak.

Reported by:	pho
Reviewed by:	kib
MFC after:	1 week
2016-09-09 10:40:55 +00:00
mjg
c9cf1102e5 nullfs: stop special-casing directories in null_vptocnp
The previous code was forcing an expensive walk in vop_stdvptocnp,
which was causing performance issues on highly contended zfs.

No objections:	kib
MFC after:	2 weeks
2016-09-06 21:22:03 +00:00
kib
30646b2071 Implement VOP_FDATASYNC() for msdosfs.
Standard VOP_FSYNC() implementation just syncs data buffers, and due
to this, is the correct and efficient implementation for msdosfs or
any other filesystem which uses bufer cache trivially.  Provide
globally visible wrapper vop_stdfdatasync_buf() for future consumption
by other filesystems.

Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D7471
2016-08-15 19:17:00 +00:00
rmacklem
29eb253c98 Update the nfsstats structure to include the changes needed by
the patch in D1626 plus changes so that it includes counts for
NFSv4.1 (and the draft of NFSv4.2).
Also, make all the counts uint64_t and add a vers field at the
beginning, so that future revisions can easily be implemented.
There is code in place to handle the old vesion of the nfsstats
structure for backwards binary compatibility.

Subsequent commits will update nfsstat(8) to use the new fields.

Submitted by:	will (earlier version)
Reviewed by:	ken
MFC after:	1 month
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D1626
2016-08-12 22:44:59 +00:00
trasz
d8ce902a47 Implement autofs_print(), for improved debugging experience.
MFC after:	1 month
2016-08-11 14:27:23 +00:00
trasz
255ed885fa Replace all remaining calls to vprint(9) with vn_printf(9), and remove
the old macro.

MFC after:	1 month
2016-08-10 16:12:31 +00:00
kib
f477e34e28 Convert another tmpfs assert into runtime check.
The offset of the directory file, passed to getdirentries(2) syscall,
is user-controllable.  The value of the offset must not be asserted,
instead the invalid value should be checked and rejected if invalid.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-08-10 13:50:21 +00:00
pfg
5767604b3a ext2fs: Add defines for some missing ext4 feature flags.
These are currently unused in our implementation and some even appear to
have not been implemented yet on linux but it is good to keep them for
reference.

Obtained from:	NetBSD (CVS Rev. 1.41)
MFC after:	1 month
2016-08-06 17:24:35 +00:00
pfg
a6734e9812 ext2fs: Add some more inode flags.
These are currently unused in out implementation but it is good to keep
them for reference.

Obtained from:	NetBSD (CVS Rev. 1.35)
MFC after:	1 month
2016-08-06 16:48:40 +00:00
kib
2406e8e022 Remove ncl_printf(), use printf(9) directly. After r303710 the
function duplicates printf().

Correct function names in the messages [*].

Noted by:	bde [*]
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-08-03 15:58:20 +00:00
kib
0e5bb85f9d Remove unneeded (recursing) Giant acquisition around vprintf(9).
Reviewed by:	rmacklem
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-08-03 11:49:17 +00:00
kib
6092948278 Remove Giant asserts. Update comment.
Owning Giant in the init/uninit is accidental due to the moment where
VFS modules initialization is performed, and is not enforced by the
VFS interface.  The Giant lock does not prevent a parallel execution
of the code, it is VFS which implements the proper protocol.

Approved by:	des (pseudofs maintainer)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-08-03 08:57:15 +00:00
kib
5567fc3cb5 Some style changes. Fix a typo in comment.
Approved by:	des (pseudofs maintainer)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-08-03 08:53:29 +00:00
trasz
81a2f26569 Remove write-only variable.
MFC after:	1 month
2016-07-29 12:15:55 +00:00
kib
42da5a6952 Hide the boottime and bootimebin globals, provide the getboottime(9)
and getboottimebin(9) KPI. Change consumers of boottime to use the
KPI.  The variables were renamed to avoid shadowing issues with local
variables of the same name.

Issue is that boottime* should be adjusted from tc_windup(), which
requires them to be members of the timehands structure.  As a
preparation, this commit only introduces the interface.

Some uses of boottime were found doubtful, e.g. NLM uses boottime to
identify the system boot instance.  Arguably the identity should not
change on the leap second adjustment, but the commit is about the
timekeeping code and the consumers were kept bug-to-bug compatible.

Tested by:	pho (as part of the bigger patch)
Reviewed by:	jhb (same)
Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
X-Differential revision:	https://reviews.freebsd.org/D7302
2016-07-27 11:08:59 +00:00
cem
4c8503deb3 devfs: Move most ioctl logic down to vnode layer
Devfs' file layer ioctl is now just a thin shim around the vnode layer.

Reviewed by:	kib
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D7286
2016-07-25 16:28:02 +00:00
hselasky
fe840b6ea6 Handle IOC_VOID special case of passing an integer IOCTL argument through CUSE.
Submitted by:	Vladimir Kondratyev <wulf@cicgroup.ru>
Approved by:	re (gjb)
2016-07-06 22:21:22 +00:00
kib
2b281bf08f Rewrite sigdeferstop(9) and sigallowstop(9) into more flexible
framework allowing to set the suspension policy for the dynamic block.
Extend the currently possible policies of stopping on interruptible
sleeps and ignoring such sleeps by two more: do not suspend at
interruptible sleeps, but interrupt them with either EINTR or ERESTART.

Reviewed by:	jilles
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Approved by:	re (gjb)
2016-06-26 20:07:24 +00:00
kib
feef92098a Clean other flags in ncl_inactive, only. Add comment explaining why other
flags should be unset.

Suggested and reviewed by:	rmacklem
Sponsored by:	The FreeBSD Foundation
MFC after:	12 days
Approved by:	re (gjb)
2016-06-26 14:18:28 +00:00
kib
dd2c794a7d Since VOP_INACTIVE() is not guaranteed to be called, all cleanups
executed by inactive methods, must be repeated on reclaim.  In
particular, unlink and free sillyrenamed vnode both on inactivation
and reclaim.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Approved by:	re (gjb)
2016-06-25 11:34:06 +00:00
kib
082d766398 Do not access NFS data for reclaimed vnode.
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:	re (delphij)
2016-06-19 18:29:43 +00:00
kib
f3b36332af Another follow-up to r291460. Only access vp->v_rdev for VCHR vnodes
in devfs_reclaim().

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
MFC after:	1 week
2016-06-15 15:55:14 +00:00
kevlo
391c05bae4 Fix a style bug. 2016-06-08 02:39:10 +00:00
pfg
9751db1596 ext2fs: Stop dropping and reacquiring Giant around geom calls.
As in UFS r300366.
2016-06-07 21:40:42 +00:00
cem
d579d254f0 nfs_clvfsops: Fix leading whitespace introduced in r299848
Replace spaces with tabs.  No functional change.

Sponsored by:	EMC / Isilon Storage Division
2016-06-07 20:16:01 +00:00
cem
19014d17cc nfs_clvfsops: Prevent strdup of stack garbage with bogus mount specs
If strlen(hostp) was zero, the stack array 'nam' would never be initialized
before being strdup()ed.  Fix this by initializing it to the empty string.

It's possible some external condition makes this case impossible, in which
case, an assertion instead of this workaround is appropriate.

Introduced in r299848.

Reported by:	Coverity
CID:		1355336
Sponsored by:	EMC / Isilon Storage Division
2016-06-07 20:00:20 +00:00
pfg
0177f05fbf ext2fs: rearrange ext4_bmapext().
While here assign error a bit later.

Reviewed by:	Damjan Jovanovich
Obtained from:	NetBSD
2016-06-07 18:23:22 +00:00
pfg
28a1ddcb8f ext2fs(5): Cosmetic cleanups, mostly to the ext4 code.
Obtained from:	NetBSD
2016-06-07 17:08:34 +00:00
pfg
900e707c8a ext2fs: cleanup generation number management.
Ext2/3/4 manages generation numbers differently than UFS so adopt
some rules that should work well. When allocating a new inode,
make sure we generate a "good" random value specifically avoiding
zero.

Don't interfere with the numbers that are already generated in
the filesystem: ext2fs doesn't have the backwards compatibility
issues  where there were no generation numbers.

Reviewed by:	kevlo
MFC after:	1 week
2016-06-07 14:37:43 +00:00
kib
b86b034cff Remove drop/reacquire of Giant around geom calls for cd9660 and udf.
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-05-22 18:16:25 +00:00
kevlo
1006f009c6 arc4random() returns 0 to (2**32)−1, use an alternative to initialize
i_gen if it's zero rather than a divide by 2.

With inputs from  delphij, mckusick, rmacklem

Reviewed by:	mckusick
2016-05-22 14:31:20 +00:00
kib
a9b92aa58d Same as for UFS, remove drop/reacquire of Giant, and use si_mountpt as
the mount semaphore.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-05-21 11:40:41 +00:00
kib
d0b1101f75 Remove zero assignments in the cdev allocator. cdp memory is
requested with M_ZERO.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-05-21 09:55:32 +00:00
rmacklem
c6b3045143 If a local (AF_LOCAL, AF_UNIX) socket creation (bind) is attempted
on a fuse mounted file system, it will crash. Although it may be
possible to make this work correctly, this patch avoids the crash
in the meantime.
I removed the MPASS(), since panicing for the FIFO case didn't make
a lot of sense when it returns an error for the others.

PR:		195000
Submitted by:	henry.hu.sh@gmail.com (earlier version)
MFC after:	2 weeks
2016-05-18 22:23:20 +00:00
glebius
e81041a0fd Comment fix: the getsockaddr() is actually meant here.
Reviewed by:	rmacklem
2016-05-18 17:40:53 +00:00
trasz
970c8ffe26 Silence down the "insmntque() failed" autofs error; it happens
on shutdown and is perfectly normal.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-17 12:04:39 +00:00
rmacklem
e8e4b22eec Fix fuse for "cp" of a mode 0444 file to the file system.
When "cp" of a file with read-only (mode 0444) to a fuse mounted
file system was attempted it would fail with EACCES. This was because
fuse would attempt to open the file WRONLY and the open would fail.
This patch changes the fuse_vnop_open() to test for an extant read-write
open and use that, if it is available.
This makes the "cp" of a read-only file to the fuse mounted file system
work ok.
There are simpler ways to fix this than adding the fuse_filehandle_validrw()
function, but this function is useful for future patches related to
exporting a fuse filesystem via NFS.

MFC after:	2 weeks
2016-05-15 23:15:10 +00:00
trasz
d285612c31 Make it possible to reroot into NFS. This means one can have
eg an NFSv4 root over WiFi: boot from md_root (small rootfs image
preloaded by loader(8)), setup WiFi, and then reroot into the actual
root, over NFS.

Note that it's currently limited to NFSv4, and due to problems with
nfsuserd(8) it requres a workaround on the server side: one needs
to set the vfs.nfsd.enable_stringtouid=1 sysctl and not run nfsuserd(8)
on either the server or the client side.

Reviewed by:	rmacklem@
MFC after:	1 month
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D6347
2016-05-15 08:34:59 +00:00
rmacklem
8e995c5bbe Fix fuse so that stale buffer cache data isn't read.
When I/O on a file under fuse is switched from buffered to DIRECT_IO,
it was possible to read stale (before a recent modification) data from
the buffer cache. This patch invalidates the buffer cache for the
file to fix this.

PR:		194293
MFC after:	2 weeks
2016-05-15 00:45:17 +00:00
rmacklem
8d3f87b2b7 Fix fuse to use DIRECT_IO when required.
When a file is opened write-only and a partial block was written,
buffered I/O would try and read the whole block in. This would
result in a hung thread, since there was no open (fuse filehandle)
that allowed reading. This patch avoids the problem by forcing
DIRECT_IO for this case.
It also sets DIRECT_IO when the file system specifies the FN_DIRECTIO
flag in its reply to the open.

Tested by:	nishida@asusa.net, freebsd@moosefs.com
PR:		194293, 206238
MFC after:	2 weeks
2016-05-14 20:03:22 +00:00
cem
5f28b4bf85 nfsd: Fix use-after-free in NFS4 lock test service
Trivial use-after-free where stp was freed too soon in the non-error path.
To fix, simply move its release to the end of the routine.

Reported by:	Coverity
CID:		1006105
Sponsored by:	EMC / Isilon Storage Division
2016-05-12 05:03:12 +00:00
kib
f6eb7ae037 Use vfs_hash_ref(9) to eliminate LK_EXCLOTHER kludge. As a
consequence, the nfs client override of VOP_LOCK1() is no longer
needed.

Reviewed and tested by:	rmacklem
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-05-11 06:35:46 +00:00
rmacklem
2f51e8d0c7 Don't increment srvrpccnt[] for the NFSv4.1 operations.
When support for NFSv4.1 was added to the NFS server, it broke
the server rpc count stats, since newnfsstats.srvrpccnt[] doesn't
have entries for the new NFSv4.1 operations.
Without this patch, the code was incrementing bogus entries in
newnfsstats for the new NFSv4.1 operations.
This patch is an interim fix. The nfsstats structure needs to be
updated and that will come in a future commit.

Reported by:	cem
MFC after:	2 weeks
2016-05-07 22:45:08 +00:00
pfg
4f457bceb7 nfsserver: minor spelling fix in comment.
No functional change.
2016-05-06 23:40:37 +00:00
rmacklem
b53514c2e2 Give mountd -S priority over outstanding RPC requests when suspending the nfsd.
It was reported via email that under certain heavy RPC loads
long delays before the exports would be updated was observed
when using "mountd -S". This patch reverses the priority between
the exclusive lock request to suspend the nfsd threads and the
shared lock request for performing RPCs.
As such, when mountd attempts to suspend the nfsd threads, it
gets priority over outstanding RPC requests to do this.
I suspect that the case reported was an artificial test load,
but this patch did fix the problem for the reporter.

Reported and Tested by:	josephlai@qnap.com
MFC after:	2 weeks
2016-05-06 23:26:17 +00:00
emaste
f73a7179da Add nid_namelen bounds check to nfssvc system call
This is only allowed by root and only used by the nfs daemon, which
should not provide an incorrect value. However, it's still good
practice to validate data provided by userland.

PR:		206626
Reported by:	CTurt <cturt@hardenedbsd.org>
Reviewed by:	rmacklem
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D6201
2016-05-06 21:19:28 +00:00
emaste
cee9c1ee19 Rationalize license numbering in fdescfs(5) 2016-04-30 16:01:37 +00:00
pfg
e72339bbf0 sys: Make use of our rounddown() macro when sys/param.h is available.
No functional change.
2016-04-30 14:41:18 +00:00
emaste
c153cda74f ANSIfy fdescfs(5) 2016-04-30 12:44:03 +00:00
pfg
9ed8e933a3 sys/fs: spelling fixes in comments.
No functional change.
2016-04-29 20:51:24 +00:00
pfg
9c151ad321 fs/ext2fs: spelling fixes on comment.
No functional change.
2016-04-29 20:45:50 +00:00
pfg
21e15c627b NFS: spelling fixes on comments.
No funcional change.
2016-04-29 16:07:25 +00:00
pfg
0f281bd3eb sys/devfs: unsign an index to prevent signed integer overflow.
cdp_maxdirent in struct:cdev_priv is of type u_int.  Use the same
type for the corresponding index in devfs_revoke().

MFC after:	1 week
2016-04-28 02:39:43 +00:00
kp
b91af2a23d msdosfs: Prevent buffer overflow when expanding win95 names
In win2unixfn() we expand Windows 95 style long names. In some cases that
requires moving the data in the nbp->nb_buf buffer backwards to make room. That
code failed to check for overflows, leading to a stack overflow in win2unixfn().

We now check for this event, and mark the entire conversion as failed in that
case. This means we present the 8 character, dos style, name instead.

PR: 204643
Differential Revision:	https://reviews.freebsd.org/D6015
2016-04-26 20:36:32 +00:00
pfg
fc01419148 sys: extend use of the howmany() macro when available.
We have a howmany() macro in the <sys/param.h> header that is
convenient to re-use as it makes things easier to read.
2016-04-26 15:38:17 +00:00
pfg
9a0417ac07 ext2fs: make use of the howmany() macro when available.
We have a howmany() macro in the <sys/param.h> header that is
convenient to re-use as it makes things easier to read.

MFC after:	2 weeks
2016-04-26 01:41:15 +00:00
rmacklem
2a28af72f4 Allow the NFSv4 server to reply NFSERR_WRONGSEC for the SetClientID operation.
It was reported via email that a Linux client couldn't do a Kerberized
NFS mount when only "sec=krb5" was specified for the exports. The Linux
client attempted a mount via krb5i and the server replied NFSERR_SERVERFAULT.
Although NFSERR_WRONGSEC isn't listed as an error for SetClientID, I
think it is the correct reply, so this patch enables that.
I do not know if this fixes the mount attempt, but adding "krb5i" to the
list of allowed security flavours does allow the mount to work.

Reported by:	joef@spectralogic.com
MFC after:	2 weeks
2016-04-23 21:18:45 +00:00
pfg
70b5d15970 ext2_htree_release(): prevent signed integer overflow in a loop.
h_levels_num, as most data structs in ext2fs, is unsigned so
the index that addresses it has to be unsigned as well.

To get to overflow here we would probably be considering a
degenerate case though.

MFC after:	5 days
2016-04-23 18:28:59 +00:00
rmacklem
dc6a2918e1 Fix a LOR in the NFSv4.1 server.
The ordering of acquisition of the state and session mutexes was
reversed in two cases executed when an NFSv4.1 client created/freed
a session. Since clients will typically do this only when mounting
and dismounting, the likelyhood of causing a deadlock was low but possible.
This can only occur for NFSv4.1 mounts, since the others do not
use sessions.
This was detected while testing the pNFS server/client where the
client crashed during dismounting.
The patch also reorders the unlocks, although that isn't necessary
for correct operation.

MFC after:	2 weeks
2016-04-23 01:22:04 +00:00
pfg
729533413f sys: use our roundup2/rounddown2() macros when param.h is available.
rounddown2 tends to produce longer lines than the original code
and when the code has a high indentation level it was not really
advantageous to do the replacement.

This tries to strike a balance between readability using the macros
and flexibility of having the expressions, so not everything is
converted.
2016-04-21 19:57:40 +00:00
pfg
a7d40a88c9 kernel: use our nitems() macro when it is available through param.h.
No functional change, only trivial cases are done in this sweep,

Discussed in:	freebsd-current
2016-04-19 23:48:27 +00:00
pfg
e0bee002cf fs misc: for pointers replace 0 with NULL.
Mostly cosmetical, no functional change.

Found with devel/coccinelle.
2016-04-15 17:28:24 +00:00
rmacklem
c78bfcfb8f If the VOP_SETATTR() call that saves the exclusive create verifier failed,
the NFS server would leave the newly created vnode locked. This could
result in a file system that would not unmount and processes wedged,
waiting for the file to be unlocked.
Since this VOP_SETATTR() never fails for most file systems, this bug
doesn't normally manifest itself. I found it during testing of an
exported GlusterFS file system, which can fail.
This patch adds the vput() and changes the error to the correct NFS one.

MFC after:	2 weeks
2016-04-12 20:23:09 +00:00
rmacklem
772bcbe7a3 Bruce Evans reported that there was a performance regression between
the old and new NFS clients. He did a good job of isolating the problem
which was caused by the new NFS client not setting the post write mtime
correctly. The new NFS client code was cloned from the old client, but
was incorrect, because the mtime in the nfs vnode's cache wasn't yet
updated. This patch fixes this problem. The patch also adds missing mutex
locking.

Reported and tested by:	bde
MFC after:	2 weeks
2016-04-11 21:55:21 +00:00
pfg
eb1815a4cd ext2fs: replace 0 with NULL for pointers.
While here do late initialization of ebap, similar as was
done in UFS.

Found with devel/coccinelle.

MFC after:	2 weeks
2016-04-11 00:12:24 +00:00
pfg
b63211eed5 Cleanup unnecessary semicolons from the kernel.
Found with devel/coccinelle.
2016-04-10 23:07:00 +00:00
kevlo
87acea459d Fix comment. 2016-04-08 04:29:05 +00:00
trasz
825d80e01c Add four new RCTL resources - readbps, readiops, writebps and writeiops,
for limiting disk (actually filesystem) IO.

Note that in some cases these limits are not quite precise. It's ok,
as long as it's within some reasonable bounds.

Testing - and review of the code, in particular the VFS and VM parts - is
very welcome.

MFC after:	1 month
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5080
2016-04-07 04:23:25 +00:00
kevlo
b381dd918e Update comment: Linux does set a randomized generation number of an inode
on ext2/3/4.

While here use arc4random() instead of random().

Reviewed by:	pfg
MFC after:	3 days
2016-04-01 03:21:01 +00:00
kib
6fb59d3d9b Do not access buffer if bread(9) or cluster_read(9) failed. On error,
the functions free the buffer and set the pointer to NULL.  Also
remove useless call to brelse(9) on the error path.

PR:	208275
Submitted by:	Fabian Keil <fk@fabiankeil.de>
MFC after:	2 weeks
2016-03-29 19:59:44 +00:00
kevlo
fa2fefe1a3 Update superblock and inode structs for ext4.
Reviewed by:	pfg
2016-03-28 07:44:55 +00:00
trasz
9037a1c529 Speed up lookups in autofs(5) by using red-black trees instead of linear
searches.

Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5627
2016-03-24 13:34:39 +00:00
trasz
d66d04f246 Pacify Coverity in a better way, to avoid write-only variable when building
without INVARIANTS.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-03-16 14:00:45 +00:00
trasz
533cbdc4b8 Pacify Coverity.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-03-15 20:42:36 +00:00
trasz
f291714326 Remove name length limitation from autofs(5). The linear search with
strlens is somewhat suboptimal, but it's a temporary measure that will
be replaced with red-black trees later on.

PR:		204417
Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5266
2016-03-13 14:17:23 +00:00
trasz
c15527c9e3 Use S_BLKSIZE instead of magic constant.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-03-12 09:33:26 +00:00
trasz
beb648d9cc Remove cn_consume from 'struct componentname'. It was never set to anything
other than 0.

Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5611
2016-03-12 08:50:38 +00:00
trasz
faec271eeb Fix autofs triggering problem. Assume you have an NFS server,
192.168.1.1, with share "share". This commit fixes a problem
where "mkdir /net/192.168.1.1/share/meh" would return spurious
error instead of creating the directory if the target filesystem
wasn't mounted yet; subsequent attempts would work correctly.

The failure scenario is kind of complicated to explain, but it all
boils down to calling VOP_MKDIR() for the target filesystem (NFS)
with wrong dvp - the autofs vnode instead of the filesystem root
mounted over it.

Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5442
2016-03-12 07:54:42 +00:00
kib
ef2ed17f02 Do not perform unneccessary shared recursion on the allproc_lock in
pfs_visible().  The recursion does not cause deadlock because the sx
implementation does not prefer exclusive waiters over the shared, but
this is an implementation detail.

Reported by:	pho, Matthew Bryan <matthew.bryan@isilon.com>
Reviewed by:	jhb
Tested by:	pho
Approved by:	des (pseudofs maintainer)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-03-11 11:51:38 +00:00
kib
4ec31f9122 Pass MNTK_NO_IOPF and MNTK_UNMAPPED_BUFS flags from the lower
filesystem to the nullfs mount.

MNTK_NO_IOPF must be present on the nullfs struct mount so that struct
file fo_read and fo_write fops operate in the mode requested by the
lower mount.

MNTK_UNMAPPED_BUFS allows VOP_GETPAGES() to use unmapped buffers.  It
does not matter for VOP_GETPAGES() calls from vm_fault() since handle
of the vm_object always points to the lower vnode.  But it may be
useful for other situations where VOP_GETPAGES() is used.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-03-04 17:24:28 +00:00
pfg
e6b8864942 Ext2: cleanup setting of ctime/mtime/birthtime.
This adopts the same change as r291936 for UFS.
Directly clear IN_ACCESS or IN_UPDATE when user supplied the time, and
copy the value into the inode.

This keeps the behaviour cleaner and is consistent with UFS.

Reviewed by:	bde
MFC after:	1 month (only 10)
2016-02-19 15:53:08 +00:00
kib
9d6a6ca561 After nullfs rmdir operation, reclaim the directory vnode which was
unlinked.  Otherwise the vnode stays cached, causing leak.  This is
similar to r292961 for regular files.

Reported and tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-02-17 19:43:03 +00:00
pfg
1960ec586d ext2fs: Remove panics for rename() race conditions.
Sync with r84642 from UFS:

The panics are inappropriate because the IN_RENAME flag only fixes a
few of the huge number of race conditions that can result in the
source path becoming invalid even prior to the VOP_RENAME() call.

Found accidentally while checking an issue from PVS Static Analysis.

MFC after:	3 days
2016-02-14 19:52:50 +00:00
pfg
e9b92972cc cd9660: More "check for NULL" cleaunps.
Cleanup some checks for NULL. Most of these were always unnecessary and
starting with r294954 brelse() doesn't need any NULL checks at all.

For now keep the checks somewhat consistent with NetBSD in case we want to
merge the cleanups to older versions.
2016-02-12 22:46:14 +00:00
markj
c39d0036ae Clear the cookie pointer on error in tmpfs_readdir().
It is otherwise left dangling, and callers that request cookies always free
the cookie buffer, even when VOP_READDIR(9) returns an error. This results
in a double free if tmpfs_readdir() returns an error to the NFS server or
the Linux getdents(2) emulation code.

Reported by:	pho
MFC after:	1 week
Security:	double free of malloc(9)-backed memory
Sponsored by:	EMC / Isilon Storage Division
2016-02-12 20:43:53 +00:00
pfg
4d87c06386 Ext4: Use boolean type instead of '0' and '1'
There are precedents of uses of bool in the kernel and
it is incorrect style to use integers as replacement for
a boolean type.
2016-02-11 15:27:14 +00:00
pfg
cdb1cc5394 Ext4: fix handling of files with sparse blocks before extent's index.
This is ongoing work from Damjan Jovanovic to improve ext4 read support
with sparse files:

Keep track of the first and last block in each extent as it descends down
the extent tree, thus being able to work out that some blocks are sparse
earlier. This solves an issue on r293680.

In ext4_bmapext() start supporting the runb parameter, which appears to be
the number of adjacent blocks prior to the block being converted in the
same way that runp is the number of blocks after, speding up random access
to mmaped files.

PR:	206652
2016-02-11 00:34:11 +00:00
pfg
d7b2b433b4 Revert r295359:
CID 1018688 is a false positive.

The initialization is done by calling vn_start_write(... &mp, flags).
mp is only an output parameter unless (flags & V_MNTREF), and fdesc
doesn't put V_MNTREF in flags.

Pointed out by:	bde
2016-02-07 15:40:01 +00:00
pfg
b42dfac655 msdosfs_rename: yet another unused value.
As with r295355, it seems to be left over from a cleanup
in r33548. The code is not in NetBSD either.

Thanks to bde for checking out the history.
2016-02-07 15:36:16 +00:00
pfg
29ef016884 cd9660: Drop an unnecessary check for NULL.
This was unnecessary and also confused Coverity.

Confirmed on:	NetBSD
CID:		978558
2016-02-07 03:48:40 +00:00
pfg
0bbadbe82b fdesc_setattr: unitialized pointer read
CID:	1018688
2016-02-07 01:09:38 +00:00
pfg
5b12d896ba msdosfs_rename: Unused value
Assigned value to pmp, is immediatedly overwritten before it can be used.

CID:	1304892
2016-02-06 21:54:02 +00:00
pfg
fcb93180f5 Revert r294695:
ext2fs: passthrough any extra timestamps to the dinode struct.

While it passed the classic testing, the change appears to have
caused some regression and still requires some more precautions.

PR:		206820
MFC after:	3 days
2016-02-03 14:31:23 +00:00
pfg
fe5a17c2a7 ext2fs: passthrough any extra timestamps to the dinode struct.
In general we don't trust any of the extended timestamps unless the
EXT2F_ROCOMPAT_EXTRA_ISIZE feature is set. However, in the case where
we freshly allocated a new inode the information is valid and it is
better to pass it along instead of leaving the value undefined.

This should have no practical effect but should reduce the amount of
garbage if EXT2F_ROCOMPAT_EXTRA_ISIZE is set, like in cases where the
filesystem is converted from ext3 to ext4.

MFC after:	4 days
2016-01-24 23:24:47 +00:00
pfg
84cfebb132 ext2: rename some directory index constants.
Missed from r294653.

Pointyhat:	me
2016-01-24 04:30:30 +00:00
pfg
01bfa389ba Fix comment. 2016-01-24 02:44:00 +00:00
pfg
3fde4bfd1c Rename some directory index constants.
Directory index was introduced in ext3. We don't always use the
prefix to denote the ext2 variant they belong to but when we
do we should try to be accurate.
2016-01-24 02:41:49 +00:00
pfg
d2a41899f8 ext2: Initialize i_flag after allocation.
We use i_flag to carry some flags like IN_E4INDEX which newer
ext2fs variants uses internally.

fsck.ext3 rightfully complains after our implementation tags
non-directory inodes with INDEX_FL.

Initializing i_flag during allocation removes the noise factor
and quiets down fsck.

Patch from:	Damjan Jovanovic
PR:		206530
2016-01-24 02:25:41 +00:00
kib
8c18805577 When devfs dirent is freed, a vnode might still keep a pointer to it,
apparently.  Interlock and clear the pointer to avoid free memory
dereference.

Submitted by:	bde (previous version)
MFC after:	3 weeks
2016-01-22 20:30:51 +00:00
pfg
d16eeed462 ext2fs: Bring back the htree dir_index implementation.
The htree dir_index is perhaps one of the most characteristic
features of the linux ext3 implementation. It was removed
in r281670, due to repeated bug reports.

Damjan Jovanic detected and fixed three bugs and did some
stress testing by building Apache OpenOffice on top of it
so it is now in good shape to bring back.

Differential Revision:	https://reviews.freebsd.org/D5007

Submitted by:	Damjan Jovanovic
Reviewed by:	pfg
Tested by:	pho
Relnotes:	Yes
MFC after:	2 months (only 10.x)
2016-01-21 14:50:28 +00:00
kib
32d7f35235 Assert that the linkage between struct cdev_privdata and and struct
file is consistent.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-01-17 08:34:35 +00:00
rpokala
157f6a3fb8 [PR 206224] bv_cnt is sometimes examined without holding the bufobj lock
Add locking around access to bv_cnt which is currently being done unlocked

PR:		206224
Reviewed by:	imp
Approved by:	jhb
MFC after:	1 week
Sponsored by:	Panasas, Inc.
Differential Revision:	https://reviews.freebsd.org/D4931
2016-01-17 01:04:20 +00:00
bz
4d126fab0e Unbreak NOIP builds after r294084. 2016-01-15 16:45:36 +00:00
melifaro
3243205726 Make nfscl_getmyip() use new routing KPI.
* Use standard IPv6 SAS instead of rt->rt_ifa address.
* Make address lookup work for IPv6 LLA.
* Save address into buffer provided by caller instead of using static vars.

Discussed with:	rmacklem
2016-01-15 09:05:14 +00:00
kib
21f21e7647 Make devfs_fpdrop() static. It was not a public KPI, and it has no
reason to remain exported for some time.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-01-13 14:03:06 +00:00
pfg
4ba3f35490 ext4: mount panic from freeing invalid pointers
Initialize the struct with those fields to zeroes on allocation,
preventing the panic.

Patch by:	Damjan Jovanovic.

PR:		206056
MFC after:	3 days
2016-01-11 19:25:43 +00:00
pfg
52388dd9b7 ext4: add support for reading sparse files
Add support for sparse files in ext4. Also implement read-ahead, which
greatly increases the performance when transferring files from ext4.

Both features implemented by Damjan Jovanovic.

PR:		205816
MFC after:	1 week
2016-01-11 19:14:55 +00:00
ae
8c83f31276 Change the type of newsize argument in the smbfs_smb_setfsize() function
from int to int64.
MSDN says that SMB_SET_FILE_END_OF_FILE_INFO uses signed 64-bit integer
to specify offset, but since smbfs_smb_setfsize() has used plain int,
a value was truncated in case when offset was larger than 2G.
	https://msdn.microsoft.com/en-us/library/ff469975.aspx

In particular, now `truncate -s 10G` will work correctly on the mounted
SMB share.

Reported and tested by:	Eugene Grosbein <eugen at grosbein dot net>
MFC after:	1 week
2016-01-11 18:11:06 +00:00
pfg
a32f535abc ext2fs: reading mmaped file in Ext4 causes panic
Always call brelse(path.ep_bp), fixing reading EXT4 files using mmap().

Patch by Damjan Jovanovic.

PR:		205938
MFC after:	1 week
2016-01-07 21:43:43 +00:00
kib
ae62b8f932 Hide transient EBADF errors caused by the parallel revoke(2) or forced
unmount of devfs mounts, by restarting the failed syscall.

When restarted, failing syscalls eventually either stop finding the
node and returning ENOENT, or the vnode op vectors finally transition
to the deadfs vop.  The later return EIO or other error, more
appropriate for the operation.

Submitted by:	bde
Tested by:	pho
MFC after:	3 weeks
2016-01-02 20:29:28 +00:00
kib
348ef00d1d Minor style cleanup.
Submitted by:	bde
MFC after:	1 week
2016-01-01 15:48:48 +00:00
kib
f0089fdb6f Force nullfs vnode reclaim after unlinking, to potentially unlink
lower vnode.  Otherwise, reference to the lower vnode from the upper
one prevents final unlink.

PR:	178238
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-12-30 19:49:22 +00:00
pfg
ea76029f56 ext2: recognize ext4 INCOMPAT_RECOVER flag
This is a flag specific for journalling in ext4.
Add it to the list of ext4 features we ignore for
read-only purposes.

PR:		205668
MFC after:	1 week
2015-12-29 15:51:52 +00:00
kib
76abdf80ab Make it possible for the cdevsw d_close() driver method to detect last
close and close due to revoke(2)-like operation.

A new FLASTCLOSE flag indicates that this is last close.  FREVOKE is
set for revokes, and FNONBLOCK is also set, same as is already done
for VOP_CLOSE() call from vgonel().

The flags reuse user open(2) flags which are never stored in f_flag,
to not consume bit space in the ABI visible way.  Assert this with the
static check.

Requested and reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-12-22 20:37:34 +00:00
kib
2a63539543 Keep devfs mount locked for the whole duration of the devfs_setattr(),
and ensure that our dirent is instantiated.

Reported and tested by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-12-22 20:22:17 +00:00
hselasky
c3f11e9f0e Make CUSE usable with platforms where the size of "unsigned long" is
different from the size of a pointer.
2015-12-22 09:55:44 +00:00
hselasky
76efdc2ae9 Make CUSE usable with platforms where the size of "unsigned long" is
different from the size of a pointer.
2015-12-22 09:41:33 +00:00
hselasky
66012f316c Guard against the same process being both CUSE server and client at
the same time. This can easily lead to a deadlock when destroying the
character devices nodes.
2015-12-22 09:26:24 +00:00
glebius
910a73cc44 Fix breakage caused by r292373 in ZFS/FUSE/NFS/SMBFS.
With the new VOP_GETPAGES() KPI the "count" argument counts pages already,
and doesn't need to be translated from bytes to pages.

While here make it consistent that *rbehind and *rahead are updated only
if we doesn't return error.

Pointy hat to:	glebius
2015-12-16 23:48:50 +00:00
glebius
63cd1c131a A change to KPI of vm_pager_get_pages() and underlying VOP_GETPAGES().
o With new KPI consumers can request contiguous ranges of pages, and
  unlike before, all pages will be kept busied on return, like it was
  done before with the 'reqpage' only. Now the reqpage goes away. With
  new interface it is easier to implement code protected from race
  conditions.

  Such arrayed requests for now should be preceeded by a call to
  vm_pager_haspage() to make sure that request is possible. This
  could be improved later, making vm_pager_haspage() obsolete.

  Strenghtening the promises on the business of the array of pages
  allows us to remove such hacks as swp_pager_free_nrpage() and
  vm_pager_free_nonreq().

o New KPI accepts two integer pointers that may optionally point at
  values for read ahead and read behind, that a pager may do, if it
  can. These pages are completely owned by pager, and not controlled
  by the caller.

  This shifts the UFS-specific readahead logic from vm_fault.c, which
  should be file system agnostic, into vnode_pager.c. It also removes
  one VOP_BMAP() request per hard fault.

Discussed with:	kib, alc, jeff, scottl
Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2015-12-16 21:30:45 +00:00
jhb
f4698eb999 The cdevpriv_dtr_t typedef was not able to be used in a function prototype
like the various d_*_t typedefs since it declared a function pointer rather
than a function.  Add a new d_priv_dtor_t typedef that declares the function
and can be used as a function prototype.  The previous typedef wasn't
useful outside of the cdevpriv implementation, so retire it.

The name d_priv_dtor_t was chosen to be more consistent with cdev methods
since it is commonly used in place of d_close_t even though it is not a
direct pointer in struct cdevsw.

Reviewed by:	kib, imp
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D4340
2015-12-02 18:27:30 +00:00
rmacklem
3b49f0eca8 Fix the memory leak that occurs when the nfscommon.ko module is unloaded.
This leak was introduced by r291527.
Since the nfscommon.ko module is rarely unloaded, this leak would not
have been much of an issue.

MFC after:	2 weeks
2015-12-02 02:47:13 +00:00
rmacklem
8d3825522e Delete the TUNABLE_INT() line. It was in r291527 so that it could be
MFC'd to stable/10 and still work.
2015-11-30 23:37:09 +00:00
rmacklem
493738a552 Add kernel support to the NFS server for the "-manage-gids"
option that will be added to the nfsuserd daemon in a future
commit. It modifies the cache used by NFSv4 for name<-->id
translation (both username/uid and group/gid) to support this.
When "-manage-gids" is set, the server looks up each uid
for the RPC and uses the list of groups cached in the server
instead of the list of groups provided in the RPC request.
The cached group list is acquired for the cache by the nfsuserd
daemon via getgrouplist(3).
This avoids the 16 groups limit for the list in the RPC request.
Since the cache is now used for every RPC when "-manage-gids"
is enabled, the code also modifies the cache to use a separate
mutex for each hash list instead of a single global mutex.

Suggested by:	jpaetzel
Tested by:	jpaetzel
MFC after:	2 weeks
2015-11-30 21:54:27 +00:00
mckusick
cb4ab786a1 For performance reasons, it is useful to have a single string used as
the name of a filesystem when setting it as the first parameter to the
getnewvnode() function. Most filesystems call getnewvnode from just one
place so can use a literal string as the first parameter. However, NFS
calls getnewvnode from two places, so we create a global constant string
that can be used by the two instances. This change also collapses two
instances of getnewvnode() in the UFS filesystem to a single call.

Reviewed by: kib
Tested by:   Peter Holm
2015-11-29 21:01:02 +00:00
rmacklem
8f26d7b382 When the nfsd threads are terminated, the NFSv4 server state
(opens, locks, etc) is retained, which I believe is correct behaviour.
However, for NFSv4.1, the server also retained a reference to the xprt
(RPC transport socket structure) for the backchannel. This caused
svcpool_destroy() to not call SVC_DESTROY() for the xprt and allowed
a socket upcall to occur after the mutexes in the svcpool were destroyed,
causing a crash.
This patch fixes the code so that the backchannel xprt structure is
dereferenced just before svcpool_destroy() is called, so the code
does do an SVC_DESTROY() on the xprt, which shuts down the socket upcall.

Tested by:	g_amanakis@yahoo.com
PR:		204340
MFC after:	2 weeks
2015-11-21 23:55:46 +00:00
rmacklem
7b391bfea3 Revert r283330 since it broke directory caching in the client.
At this time I cannot see a way to fix directory caching when it
has partial blocks in the buffer cache, due to the fact that the
syscall's uio_offset won't stay the same as the lblkno * NFS_DIRBLKSIZ
offset.

Reported by:	bde
MFC after:	2 weeks
2015-11-21 00:15:41 +00:00
rmacklem
c1f0354622 mnt_stat.f_iosize (which is used to set bo_bsize) must be set to
the largest size of buffer cache block or the mapping of the buffer
is bogus. When a mount with rsize=4096,wsize=4096 was done, f_iosize
would be set to 4096. This resulted in corrupted directory data, since
the buffer cache block size for directories is NFS_DIRBLKSIZ (8192).
This patch fixes the code so that it always sets f_iosize to at least
NFS_DIRBLKSIZ.

Tested by:	krichy@cflinux.hu
PR:		177971
MFC after:	2 weeks
2015-11-17 01:44:26 +00:00
markj
a7bb6eb720 - Consistently use PROC_ASSERT_HELD() to verify that a process' hold count
is non-zero.
- Include the process address in the PROC_ASSERT_HELD() and
  PROC_ASSERT_NOT_HELD() assertion messages so that the corresponding
  process can be found easily when debugging.

MFC after:	1 week
2015-11-08 01:38:56 +00:00
kib
6fac89c875 Ensure that when a blockable open of fifo returns success, a valid
file descriptor opened for complimentary access exists as well.

The implementation of the guarantee is done by counting the
generations of readers and writers opens.  We return success and not
EINTR or ERESTART error, when the sleep for complimentary opening is
interrupted, but the generation was changed during the sleep.

Longer explanation: assume there are two threads, A doing open("fifo",
O_RDONLY) and B doing open("fifo", O_WRONLY), and no other threads
either trying to open the fifo, nor there are any file descriptors
referencing the fifo.  Before the change, it was possible e.g. for for
thread A to return a valid file descriptor, while thread B returned
EINTR if a signal to B was delivered simultaneously with the wakeup
from A.  After the change, in this situation both A::open() and
B::open() succeed and the signal is made "as if" it was noticed
slightly later.  Note that the signal actual delivery is not changed,
it is done by ast on syscall return path, so signal handler is still
executed before first instruction after syscall.

See PR for the code demonstrating the issue.

PR:	203162
Reported by:	Victor Stinner victor.stinner@gmail.com
Reviewed by:	jilles
Tested by:	bapt, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-09-20 21:18:33 +00:00
trasz
1604109813 Fix an NFS server bug that manifested in "ls -al" displaying a plus
sign on every directory exported via NFSv4 with NFSv4 ACLs enabled.

Reviewed by:	rmacklem@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3502
2015-08-28 14:26:11 +00:00
trasz
db61d1271a Make it possible to forcibly unmount devfs.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-08-24 14:04:44 +00:00
trasz
9e31188bdc After r286237 it should be fine to call vgone(9) on a busy GEOM vnode;
remove KASSERT that would prevent forced devfs unmount from working.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-08-23 14:53:54 +00:00
rmacklem
6ee70894f0 For the case where an NFSv4.1 ExchangeID operation has the client identifier
that already has a confirmed ClientID, the nfsrv_setclient() function would
not fill in the clientidp being returned. As such, the value of ClientID
returned would be whatever garbage was on the stack.
An NFSv4.1 client would not normally do this, but it appears that it can
happen for certain Linux clients. When it happens, the client persistently
retries the ExchangeID and Create_session after Create_session fails when
it uses the bogus clientid. With this patch, the correct clientid is replied.
This problem was identified in a packet trace supplied by
Ahmed Kamal via email.

Reported by:	email.ahmedkamal@googlemail.com
MFC after:	2 weeks
2015-08-14 22:02:14 +00:00
jhb
3fab33edd0 The changes that introduced fo_mmap() treated all character device
mappings as if MAP_SHARED was always present since in general MAP_PRIVATE
is not permitted for character devices.  However, there is one exception
in that MAP_PRIVATE mappings are permitted for /dev/zero.

Only require a writable file descriptor (FWRITE) for shared, writable
mappings of character devices.  vm_mmap_cdev() will reject any private
mappings for other devices.

Reviewed by:	kib
Reported by:	sbruno (broke qemu cross-builds), peter
Differential Revision:	https://reviews.freebsd.org/D3316
2015-08-06 16:50:37 +00:00
cem
3d8d5f23ac nfsclient: Protest loudly when GETATTR responses are invalid
BROKEN NFS SERVER OR MIDDLEWARE: Certain WAN "accelerators" attempt to cache
NFS GETATTR traffic, but actually corrupt it (e.g., responding to requests
with attributes for totally different files).

Warn very verbosely when this is detected. Linux' NFS client has a similar
warning.

Adds a sysctl/tunable (vfs.nfs.fileid_maxwarnings) to configure the quantity
of warnings; default to 10. (Zero disables; -1 is unlimited.)

Adds a failpoint to aid in validating the warning / behavior with a
non-broken server. Use something like:

    sysctl 'debug.fail_point.nfscl_force_fileid_warning=10%return(1)'

Reviewed by:	rmacklem
Approved by:	markj (mentor)
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3304
2015-08-05 22:27:30 +00:00
rmacklem
e15fd657a2 This patch fixes a problem where, if the NFSv4 server has a previous
unconfirmed clientid structure for the same client on the last hash list,
this old entry would not be removed/deleted. I do not think this bug would have
caused serious problems, since the new entry would have been before the old one
on the list. This old entry would have eventually been scavenged/removed.
Detected while reading the code looking for another bug.

MFC after:	3 days
2015-07-29 23:06:30 +00:00
jeff
b7b72de7da - Remove some dead code copied from ffs. 2015-07-29 03:06:08 +00:00
brueffer
95419ac921 In tmpfs_chtimes(), remove checks on the nanosecond level when
determining whether a node changed.

Other filesystems, e.g., UFS, only check on seconds, when determining
whether something changed.

This also corrects the birthtime case, where we checked tv_nsec
twice, instead of tv_sec and tv_nsec (PR).

PR:			201284
Submitted by:		David Binderman
Patch suggested by:	kib
Reviewed by:		kib
MFC after:		2 weeks
Committed from:		Essen FreeBSD Hackathon
2015-07-26 08:33:46 +00:00
kib
48ccbdea81 The si_status field of the siginfo_t, provided by the waitid(2) and
SIGCHLD signal, should keep full 32 bits of the status passed to the
_exit(2).

Split the combined p_xstat of the struct proc into the separate exit
status p_xexit for normal process exit, and signalled termination
information p_xsig.  Kernel-visible macro KW_EXITCODE() reconstructs
old p_xstat from p_xexit and p_xsig.  p_xexit contains complete status
and copied out into si_status.

Requested by:	Joerg Schilling
Reviewed by:	jilles (previous version), pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2015-07-18 09:02:50 +00:00
markj
d19ba3f89d Check suspendability on the mountpoint returned by VOP_GETWRITEMOUNT.
This obviates the need for a MNTK_SUSPENDABLE flag, since passthrough
filesystems like nullfs and unionfs no longer need to inherit this
information from their lower layer(s). This change also restores the
pre-r273336 behaviour of using the presence of a susp_clean VFS method to
request suspension support.

Reviewed by:	kib, mjg
Differential Revision:	https://reviews.freebsd.org/D2937
2015-07-05 22:37:33 +00:00
mjg
feeee4c707 fd: make 'rights' a manadatory argument to fget* functions 2015-07-05 19:05:16 +00:00
rmacklem
5ebe352487 If a "principal" argument isn't provided for a Kerberized NFS mount,
the kernel would generate a bogus one with a ":/<path>" suffix.
This would only occur for the case where there was no explicit
"principal" argument and the getaddrinfo() call in mount_nfs.c failed to a
return a cannonical name for the server.
This patch fixes this unusual case.

PR:		201073
Submitted by:	masato@itc.naist.jp
MFC after:	2 weeks
2015-07-03 22:11:07 +00:00
rmacklem
97c35a724e Alex Burlyga reported a POLA violation for the new NFS client as
compared to the old NFS client via email to the freebsd-fs@ mailing list.
For the new client, when multiple clients attempted to create a symbolic
link concurrently, more that one client would report success instead of
EEXIST. This was caused by code in the new client that mapped EEXIST to
OK assuming it was caused by a retried RPC request.
Since the old client did not do this, the patch defaults to the old
behaviour and permits the new behaviour to be enabled via a sysctl.

Reported by:	alex.burlyga.ietf@gmail.com
Tested by:	alex.burlyga.ietf@gmail.com
MFC after:	2 weeks
2015-07-03 01:15:21 +00:00
markm
d586165577 Huge cleanup of random(4) code.
* GENERAL
- Update copyright.
- Make kernel options for RANDOM_YARROW and RANDOM_DUMMY. Set
  neither to ON, which means we want Fortuna
- If there is no 'device random' in the kernel, there will be NO
  random(4) device in the kernel, and the KERN_ARND sysctl will
  return nothing. With RANDOM_DUMMY there will be a random(4) that
  always blocks.
- Repair kern.arandom (KERN_ARND sysctl). The old version went
  through arc4random(9) and was a bit weird.
- Adjust arc4random stirring a bit - the existing code looks a little
  suspect.
- Fix the nasty pre- and post-read overloading by providing explictit
  functions to do these tasks.
- Redo read_random(9) so as to duplicate random(4)'s read internals.
  This makes it a first-class citizen rather than a hack.
- Move stuff out of locked regions when it does not need to be
  there.
- Trim RANDOM_DEBUG printfs. Some are excess to requirement, some
  behind boot verbose.
- Use SYSINIT to sequence the startup.
- Fix init/deinit sysctl stuff.
- Make relevant sysctls also tunables.
- Add different harvesting "styles" to allow for different requirements
  (direct, queue, fast).
- Add harvesting of FFS atime events. This needs to be checked for
  weighing down the FS code.
- Add harvesting of slab allocator events. This needs to be checked for
  weighing down the allocator code.
- Fix the random(9) manpage.
- Loadable modules are not present for now. These will be re-engineered
  when the dust settles.
- Use macros for locks.
- Fix comments.

* src/share/man/...
- Update the man pages.

* src/etc/...
- The startup/shutdown work is done in D2924.

* src/UPDATING
- Add UPDATING announcement.

* src/sys/dev/random/build.sh
- Add copyright.
- Add libz for unit tests.

* src/sys/dev/random/dummy.c
- Remove; no longer needed. Functionality incorporated into randomdev.*.

* live_entropy_sources.c live_entropy_sources.h
- Remove; content moved.
- move content to randomdev.[ch] and optimise.

* src/sys/dev/random/random_adaptors.c src/sys/dev/random/random_adaptors.h
- Remove; plugability is no longer used. Compile-time algorithm
  selection is the way to go.

* src/sys/dev/random/random_harvestq.c src/sys/dev/random/random_harvestq.h
- Add early (re)boot-time randomness caching.

* src/sys/dev/random/randomdev_soft.c src/sys/dev/random/randomdev_soft.h
- Remove; no longer needed.

* src/sys/dev/random/uint128.h
- Provide a fake uint128_t; if a real one ever arrived, we can use
  that instead. All that is needed here is N=0, N++, N==0, and some
  localised trickery is used to manufacture a 128-bit 0ULLL.

* src/sys/dev/random/unit_test.c src/sys/dev/random/unit_test.h
- Improve unit tests; previously the testing human needed clairvoyance;
  now the test will do a basic check of compressibility. Clairvoyant
  talent is still a good idea.
- This is still a long way off a proper unit test.

* src/sys/dev/random/fortuna.c src/sys/dev/random/fortuna.h
- Improve messy union to just uint128_t.
- Remove unneeded 'static struct fortuna_start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
  it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
  functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])

* src/sys/dev/random/yarrow.c src/sys/dev/random/yarrow.h
- Improve messy union to just uint128_t.
- Remove unneeded 'staic struct start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
  it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
  functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])
- Fix some magic numbers elsewhere used as FAST and SLOW.

Differential Revision: https://reviews.freebsd.org/D2025
Reviewed by: vsevolod,delphij,rwatson,trasz,jmg
Approved by: so (delphij)
2015-06-30 17:00:45 +00:00
kib
f69fd0ed6d Restore the td_cookie value for the tmpfs directory entry which was a
dup entry, upon detach from the parent directory.  If the node is
renamed, the entry is re-attached at the different directory, and
invalud cookie value triggers assert (or corrupts directory rb tree,
it seems).

Reported by:	clusteradm (gjb, antoine)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-06-19 07:25:15 +00:00
glebius
5b81a20433 o Un-inline vm_pager_get_pages(), vm_pager_get_pages_async().
o Provide an extensive set of assertions for input array of pages.
o Remove now duplicate assertions from different pagers.

Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2015-06-17 22:44:27 +00:00
mjg
1a3e7a935e Replace struct filedesc argument in getvnode with struct thread
This is is a step towards removal of spurious arguments.
2015-06-16 13:09:18 +00:00
glebius
519f1ccd36 Make KPI of vm_pager_get_pages() more strict: if a pager changes a page
in the requested array, then it is responsible for disposition of previous
page and is responsible for updating the entry in the requested array.
Now consumers of KPI do not need to re-lookup the pages after call to
vm_pager_get_pages().

Reviewed by:	kib
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-06-12 11:32:20 +00:00
mjg
d7bc9285a6 Implement lockless resource limits.
Use the same scheme implemented to manage credentials.

Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.

Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.
2015-06-10 10:48:12 +00:00
markj
8563211eff unionfs: fix suspendability check bugs
- MNTK_SUSPENDABLE is set in mnt_kern_flag, not mnt_flag.
- The lower layer of a unionfs mount is read-only, so the mount should
  be suspendable iff the upper layer is suspendable.
- Remove a couple of superfluous comments.

Differential Revision:	https://reviews.freebsd.org/D2714
Reviewed by:	kib, mjg
2015-06-06 16:36:13 +00:00
jhb
bba1e1e047 Add a new file operations hook for mmap operations. File type-specific
logic is now placed in the mmap hook implementation rather than requiring
it to be placed in sys/vm/vm_mmap.c.  This hook allows new file types to
support mmap() as well as potentially allowing mmap() for existing file
types that do not currently support any mapping.

The vm_mmap() function is now split up into two functions.  A new
vm_mmap_object() function handles the "back half" of vm_mmap() and accepts
a referenced VM object to map rather than a (handle, handle_type) tuple.
vm_mmap() is now reduced to converting a (handle, handle_type) tuple to a
a VM object and then calling vm_mmap_object() to handle the actual mapping.
The vm_mmap() function remains for use by other parts of the kernel
(e.g. device drivers and exec) but now only supports mapping vnodes,
character devices, and anonymous memory.

The mmap() system call invokes vm_mmap_object() directly with a NULL object
for anonymous mappings.  For mappings using a file descriptor, the
descriptors fo_mmap() hook is invoked instead.  The fo_mmap() hook is
responsible for performing type-specific checks and adjustments to
arguments as well as possibly modifying mapping parameters such as flags
or the object offset.  The fo_mmap() hook routines then call
vm_mmap_object() to handle the actual mapping.

The fo_mmap() hook is optional.  If it is not set, then fo_mmap() will
fail with ENODEV.  A fo_mmap() hook is implemented for regular files,
character devices, and shared memory objects (created via shm_open()).

While here, consistently use the VM_PROT_* constants for the vm_prot_t
type for the 'prot' variable passed to vm_mmap() and vm_mmap_object()
as well as the vm_mmap_vnode() and vm_mmap_cdev() helper routines.
Previously some places were using the mmap()-specific PROT_* constants
instead.  While this happens to work because PROT_xx == VM_PROT_xx,
using VM_PROT_* is more correct.

Differential Revision:	https://reviews.freebsd.org/D2658
Reviewed by:	alc (glanced over), kib
MFC after:	1 month
Sponsored by:	Chelsio
2015-06-04 19:41:15 +00:00
vangyzen
597cee37df Provide vnode in memory map info for files on tmpfs
When providing memory map information to userland, populate the vnode pointer
for tmpfs files.  Set the memory mapping to appear as a vnode type, to match
FreeBSD 9 behavior.

This fixes the use of tmpfs files with the dtrace pid provider,
procstat -v, procfs, linprocfs, pmc (pmcstat), and ptrace (PT_VM_ENTRY).

Submitted by:   Eric Badger <eric@badgerio.us> (initial revision)
Obtained from:  Dell Inc.
PR:             198431
MFC after:      2 weeks
Reviewed by:    jhb
Approved by:    kib (mentor)
2015-06-02 18:37:04 +00:00
delphij
77722a1db5 Clear p_stops upon PROCFS_CTL_DETACH, similar to r283889.
Noticed by:	jhb
Reviewed by:	sef
Sponsored by:	iXsystems, Inc.
MFC after:	2 weeks
2015-06-01 18:49:31 +00:00
rmacklem
d7f3fa6b20 Make the NFS server use shared vnode locks for a few cases
that are allowed by the VFS/VOP interface instead of using
exclusive locks.

MFC after:	2 weeks
2015-05-29 20:22:53 +00:00
pfg
dce46f4095 Provide VOP_GETPAGES_ASYNC() for extfs.
Merge the filesystem specific part from r274914 to ext2fs.

I only did regular testing with the change but UFS and our ext2fs
are similar enough that the code should just work with the new
sendfile.

Discussed with:	glebius
2015-05-28 21:06:59 +00:00
rmacklem
7c550ca17d Make the size of the hash tables used by the NFSv4 server tunable.
No appreciable change in performance was observed after increasing
the sizes of these tables and then testing with a single client.
However, there was an email that indicated high CPU overheads for
a heavily loaded NFSv4 and it is hoped that increasing the sizes
of the hash tables via these tunables might help.
The tables remain the same size by default.

Differential Revision:	https://reviews.freebsd.org/D2596
MFC after:	2 weeks
2015-05-27 22:00:05 +00:00
kib
ff588ae9b0 Currently, softupdate code detects overstepping on the workitems
limits in the code which is deep in the call stack, and owns several
critical system resources, like vnode locks.  Attempt to wait while
the per-mount softupdate thread cleans up the backlog may deadlock,
because the thread might need to lock the same vnode which is owned by
the waiting thread.

Instead of synchronously waiting for the worker, perform the worker'
tickle and pause until the backlog is cleaned, at the safe point
during return from kernel to usermode.  A new ast request to call
softdep_ast_cleanup() is created, the SU code now only checks the size
of queue and schedules ast.

There is no ast delivery for the kernel threads, so they are exempted
from the mechanism, except NFS daemon threads.  NFS server loop
explicitely checks for the request, and informs the schedule_cleanup()
that it is capable of handling the requests by the process P2_AST_SU
flag.  This is needed because nfsd may be the sole cause of the SU
workqueue overflow.  But, to not cause nsfd to spawn additional
threads just because we slow down existing workers, only tickle su
threads, without waiting for the backlog cleanup.

Reviewed by:	jhb, mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-05-27 09:20:42 +00:00
dchagin
4e888d6b57 Hide vfs.pfs.trace variable if it is not used. 2015-05-24 18:11:22 +00:00
rmacklem
b51d622ba8 The NFS client generated directory block(s) with d_fileno == 0
so that it would not return less data than requested.
Since returning less directory data than requested is not a problem
for FreeBSD and even UFS no longer returns directory structures
with d_fileno == 0, this patch stops the client from doing this.
Although entries with d_fileno == 0 should not be a problem,
the man pages no longer document that these entries should be
ignored, so there was a concern that these entries might be an
issue in the future.

Suggested by:	trasz
Tested by:	trasz
MFC after:	2 weeks
2015-05-23 21:58:41 +00:00
jkim
318c4f97e6 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
jhb
9b4a921aab Always set p_oppid when attaching to an existing process via procfs
tracing.  This matches the behavior of ptrace(PT_ATTACH).  Also,
the procfs detach request assumes p_oppid is always set.

Reviewed by:	kib
MFC after:	2 weeks
2015-05-22 11:03:51 +00:00
rmacklem
9a564b79b5 The NFS client wasn't handling getdirentries(2) requests for sizes
that are not an exact multiple of DIRBLKSIZ correctly. Fortunately
readdir(3) always uses an exact multiple of DIRBLKSIZ, so few applications
were affected. This patch fixes this problem by reducing the size
of the directory read to an exact multiple of DIRBLKSIZ.

Tested by:	trasz
Reported by:	trasz
Reviewed by:	trasz
MFC after:	2 weeks
2015-05-21 23:14:18 +00:00
mav
983d16c8e2 Do not promote large async writes to sync.
Present implementation of large sync writes is too strict and so can be
quite slow.  Instead of doing that, execute large async write in chunks,
syncing each chunk separately.

It would be good to fix large sync writes too, but I leave it to somebody
with more skills in this area.

Reviewed by:	rmacklem
MFC after:	1 week
2015-05-14 10:04:42 +00:00
rmacklem
5a5431c415 Fix the NFS server's handling of a bogus NFSv2 ROOT RPC.
The ROOT RPC is deprecated in the NFSv2 RFC, RFC-1094
and should never be used by a client.

Tested by:	thmu@freenet.de
MFC after:	1 week
2015-04-25 00:58:24 +00:00
rmacklem
e0a9cb76d2 MAXBSIZE defines both the largest UFS block size and the
largest size for a buffer in the buffer cache. This patch
defines a new constant MAXBCACHEBUF, which is the largest
size for a buffer in the buffer cache. Having a separate
constant allows MAXBCACHEBUF to be set larger than MAXBSIZE
on a per-architecture basis, so that NFS can do larger read/writes
for these architectures. It modifies sys/param.h so that BKVASIZE
can also be set on a per-architecture basis.
A couple of cases where NFS used MAXBSIZE instead of NFS_MAXBSIZE
is fixed as well.

Differential Revision:	https://reviews.freebsd.org/D2330
Reviewed by:	mav, kib
MFC after:	2 weeks
2015-04-25 00:52:01 +00:00
pfg
2c313e6688 Prevent a double free.
This is similar to r281756 so set the ptr NULL after free as a safety belt
against future changes.

Obtained from:	HardenedBSD (b2e77ced9ae213d358b44d98f552d9ae4636ecac)
Submitted by:	Oliver Pinter
Revewed by:	rmacklem
2015-04-20 16:40:13 +00:00
pfg
32880107f3 nfsrpc_createv4: fix double free.
Reported by:	Oliver Pinter, clang static checker
Obtained from:	HardenedBSD (commit 63cac77c42c0c3fc67da62f97d5ab651d52ae707)
Reviewed by:	rmacklem
MFC after:	5 days
2015-04-19 23:55:59 +00:00
mav
d9ba2b8e84 Change wcommitsize default from one empirical value to another.
The new value is more predictable with growing RAM size:

        hibufspace maxvnodes      old      new
i386:
  256MB   32980992     15800  2198732  2097152
    2GB   94027776    107677   878764  4194304
amd64:
  256MB   32980992     15800  2198732  2097152
    1GB  114114560     68062  1678155  4194304
    4GB  217055232    111807  1955452  4194304
   16GB 1717846016    337308  5097465 16777216
   64GB 1734918144   1164427  1490479 16777216
  256GB 1734918144   4426453   391983 16777216

Reviewed by:	rmacklem
MFC after:	2 weeks
2015-04-19 11:34:41 +00:00
trasz
4252f860ce Replace "new NFS" with just "NFS" in some sysctl description strings.
Sponsored by:	The FreeBSD Foundation
2015-04-19 06:18:41 +00:00
pfg
2bead96db0 Drop experimental dir_index support.
The htree directory index is a highly desirable feature for research
purposes and was meant to improve performance in our ext2/3 driver.
Unfortunately our implementation has two problems:

- It never really delivered any performance improvement.
- It appears to corrupt the filesystem in undetermined circumstances.

Strictly speaking dir_index is not required for read/write support in
ext2/3 and our limited ext4 support still works fine without it.

Regain stability in the ext2 driver by removing it. We may need it back
(fixed) if we want to support encrypted ext4 support but thanks to the
wonders of version control we can always revert this change and bring it
back.

PR:	191895
PR:	198731
PR:	199309

MFC after:	5 days
2015-04-17 22:26:01 +00:00
rmacklem
b4d8a8d1f7 mav@ has found that NFS servers exporting ZFS file systems
can perform better when using a 128K read/write data size.
This patch changes NFS_MAXDATA from 64K to 128K so that
clients can use 128K for NFS mounts to allow this.
The patch also renames NFS_MAXDATA to NFS_SRVMAXIO so
that it is clear that it applies to the NFS server side
only. It also avoids a name conflict with the NFS_MAXDATA
defined in rpcsvc/nfs_prot.h, that is used for userland RPC.

Tested by:	mav
Reviewed by:	mav
MFC after:	2 weeks
2015-04-16 22:35:15 +00:00
rmacklem
ad77d0b1c1 File systems that do not use the buffer cache (such as ZFS) must
use VOP_FSYNC() to perform the NFS server's Commit operation.
This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which
is set by file systems that use the buffer cache. If this flag
is not set, the NFS server always does a VOP_FSYNC().
This should be ok for old file system modules that do not set
MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although
it might not be optimal for file systems that use the buffer cache.

Reviewed by:	kib
MFC after:	2 weeks
2015-04-15 20:16:31 +00:00
will
e2c616f11c tmpfs_getattr(): Return more correct allocated byte counts.
For VREG vnodes, return the resident page count (multiplied by PAGE_SIZE)
for the tmpfs node's anonymous VM object that stores actual file contents.

For all other vnodes, return the tmpfs_node's tn_size, which should not
be rounded to a page.

This change allows using stat(2) to identify a sparse file on tmpfs.

Reviewed by:	kib
MFC after:	1 week
2015-04-10 19:04:39 +00:00
kib
1440f3812a Do not call msdosfs_sync() on the read-only msdosfs mounts. In fact,
it should be a nop for ro.

PR:	199152
Reviewed by:	bde (PR version of the patch)
Submitted by:	longwitz@incore.de
MFC after:	1 week
2015-04-05 21:10:38 +00:00
kib
95e5199578 Assert that an msdosfs mount is not read-only when FAT modifications
are requested.

PR:	199152
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-05 21:08:04 +00:00
kib
b7c417708f Refine r280308. Do not completely disable timestamping of devfs nodes
on reads or writes, the time marks are used to display idle time by
w(1) [1].  Instead, use vfs.devfs.dotimes as the selector of default
precision vs. using time_second.  The later gives seconds precision,
which is good enough for the purpose.

Note that timestamp updates are unlocked and the updates itself, as
well as the check in devfs_timestamp, are non-atomic.

Noted by:	truckman [1]
Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-01 08:25:40 +00:00
kib
89382b6533 msdosfs: mark unused compat-mount fields
The magic number MSDOSFS_ARGSMAGIC, which used to distinguish
"old" vs "new" msdosfs mount arguments, has not been used since
2005; it should just go away now.

Likewise, the local-to-Unicode table that changed at the same
time is unused.

Leave the space reserved in the old style mount arguments, though,
since we still support the old mount call (via the cmount entry
point).

Submitted by:	Chris Torek <chris.torek@gmail.com>
MFC after:	2 weeks
2015-03-22 09:09:26 +00:00
delphij
041657da93 Disable timestamping on devfs read/write operations by default.
Currently we update timestamps unconditionally when doing read or
write operations.  This may slow things down on hardware where
reading timestamps is expensive (e.g. HPET, because of the default
vfs.timestamp_precision setting is nanosecond now) with limited
benefit.

A new sysctl variable, vfs.devfs.dotimes is added, which can be
set to non-zero value when the old behavior is desirable.

Differential Revision:	https://reviews.freebsd.org/D2104
Reported by:	Mike Tancsa <mike sentex net>
Reviewed by:	kib
Relnotes:	yes
Sponsored by:	iXsystems, Inc.
MFC after:	2 weeks
2015-03-21 01:14:11 +00:00
glebius
398be53682 o Enhance vm_pager_free_nonreq() function:
- Allow to call the function with vm object lock held.
  - Allow to specify reqpage that doesn't match any page in the region,
    meaning freeing all pages.
o Utilize the new function in couple more places in vnode pager.

Reviewed by:	alc, kib
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-03-17 19:19:19 +00:00
jkim
d07e2757d9 Fix white spaces. 2015-03-02 19:14:58 +00:00
trasz
ab90d82e08 Make fuse(4) respect FOPEN_DIRECT_IO. This is required for correct
operation of GlusterFS.

PR:		192701
Submitted by:	harsha at harshavardhana.net
Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-03-02 19:04:27 +00:00
imp
58c9460670 nandfs_meta_bread() calls bread() which can set bp to NULL in some
error cases. Calling brelse() with a NULL pointer is not allowed,
so only call brelse() when the bp is non-NULL.

Reported by: Maxime Villard (reported as uninitialized variable)
2015-03-01 21:41:37 +00:00
kan
a95ac78b9c Do not leak 'copy' buffer if bmap_truncate_indirect fails.
Reported by: Brainy Code Scanner, by Maxime Villard.
MFC after: 2 weeks
2015-02-28 22:24:45 +00:00
kib
661b19b40e Some fixes for fdescfs lookup code.
Do not ever return doomed vnode from lookup.  This could happen, if
not checked, since dvp is relocked in the 'looking up ourselves' case.

In the other case, since dvp is relocked, mount point might go away
while fdesc_allocvp() is called.  Prevent the situation by doing
vfs_busy() before unlocking dvp.  Reuse the vn_vget_ino_gen() helper.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-02-28 19:57:22 +00:00
kib
3bc9cbc06a The VNASSERT in vflush() FORCECLOSE case is trying to panic early to
prevent errors from yanking devices out from under filesystems.  Only
care about special vnodes on devfs, special nodes on other kinds of
filesystems do not have special properties.

Sponsored by:  EMC / Isilon Storage Division
Submitted by:   Conrad Meyer
MFC after:	1 week
2015-02-27 16:43:50 +00:00
pfg
e22521379a ext2fs: Plug small memory leak
free() e2fs_contigdirs upon error.
Undo zeroing of e2fs_gd as this was actually a false positive.

X-MFC with:	278790
2015-02-15 14:25:00 +00:00
pfg
03988df8a2 Reuse value of cursize instead of recalculating.
Reported by:	Clang static checker
MFC after:	1 week
2015-02-15 01:34:00 +00:00
pfg
d2ad05642a Initialize the allocation of variables related to the ext2 allocator.
The e2fs_gd struct was not being initialized and garbage was
being used for hinting the ext2 allocator variant.
Use malloc to clear the values and also initialize e2fs_contigdirs
during allocation to keep consistency.

While here clean up small style issues.

Reported by:	Clang static analyser
MFC after:	1 week
2015-02-15 01:12:15 +00:00
trasz
e13ac6cd7e Restore ABI compatibility, broken in r273127. Note that while this fixes
ABI with 10.1, it breaks ABI for 11-CURRENT, so rebuild of automountd(8)
is neccessary.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2015-02-10 16:17:16 +00:00
kib
09bdd8a7f8 Remove duplicated assignment.
CID:	1267988
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-02-03 12:09:48 +00:00
kib
1831e3d7dc Update directory times immediately after an entry is created or
removed.  Postponing it until tmpfs_getattr() is called causes
discordant values reported for file times vs. directory times.

Reported and tested by:	madpilot
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-31 21:31:53 +00:00
kib
1ccf1fa71b Remove single-use boolean.
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-01-31 12:58:04 +00:00
kib
8773784e5a POSIX states that write(2) "shall mark for update the last data
modification and last file status change timestamps of the file".
Currently, tmpfs only modifies ctime when file was extended.  Since
r277828 followed tmpfs_write(), mmaped writes also do not modify
ctime.

Fix this, by updating both ctime and mtime for writes to tmpfs files.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-01-31 12:27:18 +00:00
dim
6b8eea4924 Fix a -Wcast-qual warning in smbfs_subr.c, by using __DECONST. No
functional change.

MFC after:	3 days
2015-01-30 22:02:32 +00:00
dim
edbaff1357 Fix a -Wcast-qual warning in udf_vnops.c, by using __DECONST. No
functional change.

MFC after:	3 days
2015-01-30 22:01:45 +00:00
dim
edba0c462e Fix a bunch of -Wcast-qual warnings in cd9660_util.c, by using
__DECONST.  No functional change.

MFC after:	3 days
2015-01-29 20:40:25 +00:00
dim
07f28f1df7 Fix a bunch of -Wcast-qual warnings in msdosfs_conv.c, by using
__DECONST.  No functional change.

MFC after:	3 days
2015-01-29 20:30:13 +00:00
jamie
c7d0935d11 Add allow.mount.fdescfs jail flag.
PR:		192951
Submitted by:	ruben@verweg.com
MFC after:	3 days
2015-01-28 21:08:09 +00:00
kib
19abfd4698 Update mtime for tmpfs files modified through memory mapping. Similar
to UFS, perform updates during syncer scans, which in particular means
that tmpfs now performs scan on sync.  Also, this means that a mtime
update may be delayed up to 30 seconds after the write.

The vm_object' OBJ_TMPFS_DIRTY flag for tmpfs swap object is similar
to the OBJ_MIGHTBEDIRTY flag for the vnode object, it indicates that
object could have been dirtied.  Adapt fast page fault handler and
vm_object_set_writeable_dirty() to handle OBJ_TMPFS_NODE same as
OBJT_VNODE.

Reported by:	Ronald Klop <ronald-lists@klop.ws>
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-01-28 10:37:23 +00:00
kib
53810519b4 tmpfs does not use UVM on FreeBSD.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2015-01-28 10:25:35 +00:00
kib
f748dc7ade Stop enforcing additional reference on all cdevs, which was introduced
in r277199.  Acquire the neccessary reference in delist_dev_locked()
and inform destroy_devl() about it using CDP_UNREF_DTR flag.

Fix some style nits, add asserts.

Discussed with:	hselasky
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-19 17:36:52 +00:00
kib
b3741c8701 Ignore devfs directory entries for devices either being destroyed or
delisted.  The check is racy.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-19 17:24:52 +00:00
ngie
f94357dba9 Fix the build when INVARIANTS is defined by restoring bo's definition in
ext2_truncate(..) and by putting it under INVARIANTS ifdefs

X-MFC with: r277354
MFC after: 2 weeks
2015-01-19 07:10:08 +00:00
pfg
8fa2e2513f ext2: Garbage-collect some unused variables
Reported by:	clang static analysis
MFC after:	2 weeks
2015-01-19 03:30:45 +00:00
pfg
142fb530ca ext2: fix for uninitialized pointer read.
path.ep_bp was being used uninitialized in ext4_ext_find_extent().

CID:		1062344
MFC after:	1 week
2015-01-18 21:18:28 +00:00
pfg
f71f36cb87 Remove dead code.
After the ext2 variant of the "orlov allocator" was implemented,
the case for a negative or zero dirsize disappeared.

Drop the dead code and unsign dirsize given that it can't be
negative anyways.

CID:		1008669
MFC after:	1 week
2015-01-18 20:26:27 +00:00
kib
53832db395 Make SIGSTOP working for sleeps done while waiting for fifo readers or
writers in open(2), when the fifo is located on an NFS mount.

Reported by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-18 15:03:26 +00:00
pfg
51b45a8019 ext2: cosmetical issues
Minor sorting and note when the cases are expected to fall through.

MFC after:	1 week
2015-01-17 15:19:18 +00:00
hselasky
b04cbf0c36 Avoid race with "dev_rel()" when using the recently added
"delist_dev()" function. Make sure the character device structure
doesn't go away until the end of the "destroy_dev()" function due to
concurrently running cleanup code inside "devfs_populate()".

MFC after:	1 week
Reported by:	dchagin@
2015-01-14 22:07:13 +00:00
hselasky
4d7a9f7cc1 Don't use POLLNVAL as a return value from the client side poll
function. Many existing clients don't understand POLLNVAL and instead
relies on an error code from the read(), write() or ioctl() system
call. Also make sure we wakeup any client pollers before the cuse
server is closing, so they don't wait forever for an event.
2015-01-13 13:32:18 +00:00