Commit Graph

3363 Commits

Author SHA1 Message Date
Mateusz Guzik
f131759f54 fd: make 'rights' a manadatory argument to fget* functions 2015-07-05 19:05:16 +00:00
Rick Macklem
2a3508eb48 If a "principal" argument isn't provided for a Kerberized NFS mount,
the kernel would generate a bogus one with a ":/<path>" suffix.
This would only occur for the case where there was no explicit
"principal" argument and the getaddrinfo() call in mount_nfs.c failed to a
return a cannonical name for the server.
This patch fixes this unusual case.

PR:		201073
Submitted by:	masato@itc.naist.jp
MFC after:	2 weeks
2015-07-03 22:11:07 +00:00
Rick Macklem
d189dcb6e2 Alex Burlyga reported a POLA violation for the new NFS client as
compared to the old NFS client via email to the freebsd-fs@ mailing list.
For the new client, when multiple clients attempted to create a symbolic
link concurrently, more that one client would report success instead of
EEXIST. This was caused by code in the new client that mapped EEXIST to
OK assuming it was caused by a retried RPC request.
Since the old client did not do this, the patch defaults to the old
behaviour and permits the new behaviour to be enabled via a sysctl.

Reported by:	alex.burlyga.ietf@gmail.com
Tested by:	alex.burlyga.ietf@gmail.com
MFC after:	2 weeks
2015-07-03 01:15:21 +00:00
Mark Murray
d1b06863fb Huge cleanup of random(4) code.
* GENERAL
- Update copyright.
- Make kernel options for RANDOM_YARROW and RANDOM_DUMMY. Set
  neither to ON, which means we want Fortuna
- If there is no 'device random' in the kernel, there will be NO
  random(4) device in the kernel, and the KERN_ARND sysctl will
  return nothing. With RANDOM_DUMMY there will be a random(4) that
  always blocks.
- Repair kern.arandom (KERN_ARND sysctl). The old version went
  through arc4random(9) and was a bit weird.
- Adjust arc4random stirring a bit - the existing code looks a little
  suspect.
- Fix the nasty pre- and post-read overloading by providing explictit
  functions to do these tasks.
- Redo read_random(9) so as to duplicate random(4)'s read internals.
  This makes it a first-class citizen rather than a hack.
- Move stuff out of locked regions when it does not need to be
  there.
- Trim RANDOM_DEBUG printfs. Some are excess to requirement, some
  behind boot verbose.
- Use SYSINIT to sequence the startup.
- Fix init/deinit sysctl stuff.
- Make relevant sysctls also tunables.
- Add different harvesting "styles" to allow for different requirements
  (direct, queue, fast).
- Add harvesting of FFS atime events. This needs to be checked for
  weighing down the FS code.
- Add harvesting of slab allocator events. This needs to be checked for
  weighing down the allocator code.
- Fix the random(9) manpage.
- Loadable modules are not present for now. These will be re-engineered
  when the dust settles.
- Use macros for locks.
- Fix comments.

* src/share/man/...
- Update the man pages.

* src/etc/...
- The startup/shutdown work is done in D2924.

* src/UPDATING
- Add UPDATING announcement.

* src/sys/dev/random/build.sh
- Add copyright.
- Add libz for unit tests.

* src/sys/dev/random/dummy.c
- Remove; no longer needed. Functionality incorporated into randomdev.*.

* live_entropy_sources.c live_entropy_sources.h
- Remove; content moved.
- move content to randomdev.[ch] and optimise.

* src/sys/dev/random/random_adaptors.c src/sys/dev/random/random_adaptors.h
- Remove; plugability is no longer used. Compile-time algorithm
  selection is the way to go.

* src/sys/dev/random/random_harvestq.c src/sys/dev/random/random_harvestq.h
- Add early (re)boot-time randomness caching.

* src/sys/dev/random/randomdev_soft.c src/sys/dev/random/randomdev_soft.h
- Remove; no longer needed.

* src/sys/dev/random/uint128.h
- Provide a fake uint128_t; if a real one ever arrived, we can use
  that instead. All that is needed here is N=0, N++, N==0, and some
  localised trickery is used to manufacture a 128-bit 0ULLL.

* src/sys/dev/random/unit_test.c src/sys/dev/random/unit_test.h
- Improve unit tests; previously the testing human needed clairvoyance;
  now the test will do a basic check of compressibility. Clairvoyant
  talent is still a good idea.
- This is still a long way off a proper unit test.

* src/sys/dev/random/fortuna.c src/sys/dev/random/fortuna.h
- Improve messy union to just uint128_t.
- Remove unneeded 'static struct fortuna_start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
  it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
  functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])

* src/sys/dev/random/yarrow.c src/sys/dev/random/yarrow.h
- Improve messy union to just uint128_t.
- Remove unneeded 'staic struct start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
  it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
  functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])
- Fix some magic numbers elsewhere used as FAST and SLOW.

Differential Revision: https://reviews.freebsd.org/D2025
Reviewed by: vsevolod,delphij,rwatson,trasz,jmg
Approved by: so (delphij)
2015-06-30 17:00:45 +00:00
Konstantin Belousov
8551285097 Restore the td_cookie value for the tmpfs directory entry which was a
dup entry, upon detach from the parent directory.  If the node is
renamed, the entry is re-attached at the different directory, and
invalud cookie value triggers assert (or corrupts directory rb tree,
it seems).

Reported by:	clusteradm (gjb, antoine)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-06-19 07:25:15 +00:00
Gleb Smirnoff
093ebe1d28 o Un-inline vm_pager_get_pages(), vm_pager_get_pages_async().
o Provide an extensive set of assertions for input array of pages.
o Remove now duplicate assertions from different pagers.

Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2015-06-17 22:44:27 +00:00
Mateusz Guzik
4da8456f0a Replace struct filedesc argument in getvnode with struct thread
This is is a step towards removal of spurious arguments.
2015-06-16 13:09:18 +00:00
Gleb Smirnoff
093c7f396d Make KPI of vm_pager_get_pages() more strict: if a pager changes a page
in the requested array, then it is responsible for disposition of previous
page and is responsible for updating the entry in the requested array.
Now consumers of KPI do not need to re-lookup the pages after call to
vm_pager_get_pages().

Reviewed by:	kib
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-06-12 11:32:20 +00:00
Mateusz Guzik
f6f6d24062 Implement lockless resource limits.
Use the same scheme implemented to manage credentials.

Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.

Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.
2015-06-10 10:48:12 +00:00
Mark Johnston
068a3d319a unionfs: fix suspendability check bugs
- MNTK_SUSPENDABLE is set in mnt_kern_flag, not mnt_flag.
- The lower layer of a unionfs mount is read-only, so the mount should
  be suspendable iff the upper layer is suspendable.
- Remove a couple of superfluous comments.

Differential Revision:	https://reviews.freebsd.org/D2714
Reviewed by:	kib, mjg
2015-06-06 16:36:13 +00:00
John Baldwin
7077c42623 Add a new file operations hook for mmap operations. File type-specific
logic is now placed in the mmap hook implementation rather than requiring
it to be placed in sys/vm/vm_mmap.c.  This hook allows new file types to
support mmap() as well as potentially allowing mmap() for existing file
types that do not currently support any mapping.

The vm_mmap() function is now split up into two functions.  A new
vm_mmap_object() function handles the "back half" of vm_mmap() and accepts
a referenced VM object to map rather than a (handle, handle_type) tuple.
vm_mmap() is now reduced to converting a (handle, handle_type) tuple to a
a VM object and then calling vm_mmap_object() to handle the actual mapping.
The vm_mmap() function remains for use by other parts of the kernel
(e.g. device drivers and exec) but now only supports mapping vnodes,
character devices, and anonymous memory.

The mmap() system call invokes vm_mmap_object() directly with a NULL object
for anonymous mappings.  For mappings using a file descriptor, the
descriptors fo_mmap() hook is invoked instead.  The fo_mmap() hook is
responsible for performing type-specific checks and adjustments to
arguments as well as possibly modifying mapping parameters such as flags
or the object offset.  The fo_mmap() hook routines then call
vm_mmap_object() to handle the actual mapping.

The fo_mmap() hook is optional.  If it is not set, then fo_mmap() will
fail with ENODEV.  A fo_mmap() hook is implemented for regular files,
character devices, and shared memory objects (created via shm_open()).

While here, consistently use the VM_PROT_* constants for the vm_prot_t
type for the 'prot' variable passed to vm_mmap() and vm_mmap_object()
as well as the vm_mmap_vnode() and vm_mmap_cdev() helper routines.
Previously some places were using the mmap()-specific PROT_* constants
instead.  While this happens to work because PROT_xx == VM_PROT_xx,
using VM_PROT_* is more correct.

Differential Revision:	https://reviews.freebsd.org/D2658
Reviewed by:	alc (glanced over), kib
MFC after:	1 month
Sponsored by:	Chelsio
2015-06-04 19:41:15 +00:00
Eric van Gyzen
63e4c6cdf9 Provide vnode in memory map info for files on tmpfs
When providing memory map information to userland, populate the vnode pointer
for tmpfs files.  Set the memory mapping to appear as a vnode type, to match
FreeBSD 9 behavior.

This fixes the use of tmpfs files with the dtrace pid provider,
procstat -v, procfs, linprocfs, pmc (pmcstat), and ptrace (PT_VM_ENTRY).

Submitted by:   Eric Badger <eric@badgerio.us> (initial revision)
Obtained from:  Dell Inc.
PR:             198431
MFC after:      2 weeks
Reviewed by:    jhb
Approved by:    kib (mentor)
2015-06-02 18:37:04 +00:00
Xin LI
6e55e724a6 Clear p_stops upon PROCFS_CTL_DETACH, similar to r283889.
Noticed by:	jhb
Reviewed by:	sef
Sponsored by:	iXsystems, Inc.
MFC after:	2 weeks
2015-06-01 18:49:31 +00:00
Rick Macklem
0c419e226c Make the NFS server use shared vnode locks for a few cases
that are allowed by the VFS/VOP interface instead of using
exclusive locks.

MFC after:	2 weeks
2015-05-29 20:22:53 +00:00
Pedro F. Giffuni
e54a659a26 Provide VOP_GETPAGES_ASYNC() for extfs.
Merge the filesystem specific part from r274914 to ext2fs.

I only did regular testing with the change but UFS and our ext2fs
are similar enough that the code should just work with the new
sendfile.

Discussed with:	glebius
2015-05-28 21:06:59 +00:00
Rick Macklem
1f54e596ad Make the size of the hash tables used by the NFSv4 server tunable.
No appreciable change in performance was observed after increasing
the sizes of these tables and then testing with a single client.
However, there was an email that indicated high CPU overheads for
a heavily loaded NFSv4 and it is hoped that increasing the sizes
of the hash tables via these tunables might help.
The tables remain the same size by default.

Differential Revision:	https://reviews.freebsd.org/D2596
MFC after:	2 weeks
2015-05-27 22:00:05 +00:00
Konstantin Belousov
1bc93bb7b9 Currently, softupdate code detects overstepping on the workitems
limits in the code which is deep in the call stack, and owns several
critical system resources, like vnode locks.  Attempt to wait while
the per-mount softupdate thread cleans up the backlog may deadlock,
because the thread might need to lock the same vnode which is owned by
the waiting thread.

Instead of synchronously waiting for the worker, perform the worker'
tickle and pause until the backlog is cleaned, at the safe point
during return from kernel to usermode.  A new ast request to call
softdep_ast_cleanup() is created, the SU code now only checks the size
of queue and schedules ast.

There is no ast delivery for the kernel threads, so they are exempted
from the mechanism, except NFS daemon threads.  NFS server loop
explicitely checks for the request, and informs the schedule_cleanup()
that it is capable of handling the requests by the process P2_AST_SU
flag.  This is needed because nfsd may be the sole cause of the SU
workqueue overflow.  But, to not cause nsfd to spawn additional
threads just because we slow down existing workers, only tickle su
threads, without waiting for the backlog cleanup.

Reviewed by:	jhb, mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-05-27 09:20:42 +00:00
Dmitry Chagin
63cc3320d9 Hide vfs.pfs.trace variable if it is not used. 2015-05-24 18:11:22 +00:00
Rick Macklem
262a84286d The NFS client generated directory block(s) with d_fileno == 0
so that it would not return less data than requested.
Since returning less directory data than requested is not a problem
for FreeBSD and even UFS no longer returns directory structures
with d_fileno == 0, this patch stops the client from doing this.
Although entries with d_fileno == 0 should not be a problem,
the man pages no longer document that these entries should be
ignored, so there was a concern that these entries might be an
issue in the future.

Suggested by:	trasz
Tested by:	trasz
MFC after:	2 weeks
2015-05-23 21:58:41 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
John Baldwin
312827253b Always set p_oppid when attaching to an existing process via procfs
tracing.  This matches the behavior of ptrace(PT_ATTACH).  Also,
the procfs detach request assumes p_oppid is always set.

Reviewed by:	kib
MFC after:	2 weeks
2015-05-22 11:03:51 +00:00
Rick Macklem
86b9457f5b The NFS client wasn't handling getdirentries(2) requests for sizes
that are not an exact multiple of DIRBLKSIZ correctly. Fortunately
readdir(3) always uses an exact multiple of DIRBLKSIZ, so few applications
were affected. This patch fixes this problem by reducing the size
of the directory read to an exact multiple of DIRBLKSIZ.

Tested by:	trasz
Reported by:	trasz
Reviewed by:	trasz
MFC after:	2 weeks
2015-05-21 23:14:18 +00:00
Alexander Motin
a87627b26b Do not promote large async writes to sync.
Present implementation of large sync writes is too strict and so can be
quite slow.  Instead of doing that, execute large async write in chunks,
syncing each chunk separately.

It would be good to fix large sync writes too, but I leave it to somebody
with more skills in this area.

Reviewed by:	rmacklem
MFC after:	1 week
2015-05-14 10:04:42 +00:00
Rick Macklem
2fbb9d563b Fix the NFS server's handling of a bogus NFSv2 ROOT RPC.
The ROOT RPC is deprecated in the NFSv2 RFC, RFC-1094
and should never be used by a client.

Tested by:	thmu@freenet.de
MFC after:	1 week
2015-04-25 00:58:24 +00:00
Rick Macklem
7cfdc2a7bc MAXBSIZE defines both the largest UFS block size and the
largest size for a buffer in the buffer cache. This patch
defines a new constant MAXBCACHEBUF, which is the largest
size for a buffer in the buffer cache. Having a separate
constant allows MAXBCACHEBUF to be set larger than MAXBSIZE
on a per-architecture basis, so that NFS can do larger read/writes
for these architectures. It modifies sys/param.h so that BKVASIZE
can also be set on a per-architecture basis.
A couple of cases where NFS used MAXBSIZE instead of NFS_MAXBSIZE
is fixed as well.

Differential Revision:	https://reviews.freebsd.org/D2330
Reviewed by:	mav, kib
MFC after:	2 weeks
2015-04-25 00:52:01 +00:00
Pedro F. Giffuni
2f39c91019 Prevent a double free.
This is similar to r281756 so set the ptr NULL after free as a safety belt
against future changes.

Obtained from:	HardenedBSD (b2e77ced9ae213d358b44d98f552d9ae4636ecac)
Submitted by:	Oliver Pinter
Revewed by:	rmacklem
2015-04-20 16:40:13 +00:00
Pedro F. Giffuni
a3a4b110da nfsrpc_createv4: fix double free.
Reported by:	Oliver Pinter, clang static checker
Obtained from:	HardenedBSD (commit 63cac77c42c0c3fc67da62f97d5ab651d52ae707)
Reviewed by:	rmacklem
MFC after:	5 days
2015-04-19 23:55:59 +00:00
Alexander Motin
afdfc9a40d Change wcommitsize default from one empirical value to another.
The new value is more predictable with growing RAM size:

        hibufspace maxvnodes      old      new
i386:
  256MB   32980992     15800  2198732  2097152
    2GB   94027776    107677   878764  4194304
amd64:
  256MB   32980992     15800  2198732  2097152
    1GB  114114560     68062  1678155  4194304
    4GB  217055232    111807  1955452  4194304
   16GB 1717846016    337308  5097465 16777216
   64GB 1734918144   1164427  1490479 16777216
  256GB 1734918144   4426453   391983 16777216

Reviewed by:	rmacklem
MFC after:	2 weeks
2015-04-19 11:34:41 +00:00
Edward Tomasz Napierala
50a220c699 Replace "new NFS" with just "NFS" in some sysctl description strings.
Sponsored by:	The FreeBSD Foundation
2015-04-19 06:18:41 +00:00
Pedro F. Giffuni
f738ee4825 Drop experimental dir_index support.
The htree directory index is a highly desirable feature for research
purposes and was meant to improve performance in our ext2/3 driver.
Unfortunately our implementation has two problems:

- It never really delivered any performance improvement.
- It appears to corrupt the filesystem in undetermined circumstances.

Strictly speaking dir_index is not required for read/write support in
ext2/3 and our limited ext4 support still works fine without it.

Regain stability in the ext2 driver by removing it. We may need it back
(fixed) if we want to support encrypted ext4 support but thanks to the
wonders of version control we can always revert this change and bring it
back.

PR:	191895
PR:	198731
PR:	199309

MFC after:	5 days
2015-04-17 22:26:01 +00:00
Rick Macklem
66e80f77d2 mav@ has found that NFS servers exporting ZFS file systems
can perform better when using a 128K read/write data size.
This patch changes NFS_MAXDATA from 64K to 128K so that
clients can use 128K for NFS mounts to allow this.
The patch also renames NFS_MAXDATA to NFS_SRVMAXIO so
that it is clear that it applies to the NFS server side
only. It also avoids a name conflict with the NFS_MAXDATA
defined in rpcsvc/nfs_prot.h, that is used for userland RPC.

Tested by:	mav
Reviewed by:	mav
MFC after:	2 weeks
2015-04-16 22:35:15 +00:00
Rick Macklem
dda11d4ab9 File systems that do not use the buffer cache (such as ZFS) must
use VOP_FSYNC() to perform the NFS server's Commit operation.
This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which
is set by file systems that use the buffer cache. If this flag
is not set, the NFS server always does a VOP_FSYNC().
This should be ok for old file system modules that do not set
MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although
it might not be optimal for file systems that use the buffer cache.

Reviewed by:	kib
MFC after:	2 weeks
2015-04-15 20:16:31 +00:00
Will Andrews
677c3c0c66 tmpfs_getattr(): Return more correct allocated byte counts.
For VREG vnodes, return the resident page count (multiplied by PAGE_SIZE)
for the tmpfs node's anonymous VM object that stores actual file contents.

For all other vnodes, return the tmpfs_node's tn_size, which should not
be rounded to a page.

This change allows using stat(2) to identify a sparse file on tmpfs.

Reviewed by:	kib
MFC after:	1 week
2015-04-10 19:04:39 +00:00
Konstantin Belousov
2359e2dcc3 Do not call msdosfs_sync() on the read-only msdosfs mounts. In fact,
it should be a nop for ro.

PR:	199152
Reviewed by:	bde (PR version of the patch)
Submitted by:	longwitz@incore.de
MFC after:	1 week
2015-04-05 21:10:38 +00:00
Konstantin Belousov
420d65d9e4 Assert that an msdosfs mount is not read-only when FAT modifications
are requested.

PR:	199152
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-05 21:08:04 +00:00
Konstantin Belousov
bda2eb9ae8 Refine r280308. Do not completely disable timestamping of devfs nodes
on reads or writes, the time marks are used to display idle time by
w(1) [1].  Instead, use vfs.devfs.dotimes as the selector of default
precision vs. using time_second.  The later gives seconds precision,
which is good enough for the purpose.

Note that timestamp updates are unlocked and the updates itself, as
well as the check in devfs_timestamp, are non-atomic.

Noted by:	truckman [1]
Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-01 08:25:40 +00:00
Konstantin Belousov
c66d17c7bc msdosfs: mark unused compat-mount fields
The magic number MSDOSFS_ARGSMAGIC, which used to distinguish
"old" vs "new" msdosfs mount arguments, has not been used since
2005; it should just go away now.

Likewise, the local-to-Unicode table that changed at the same
time is unused.

Leave the space reserved in the old style mount arguments, though,
since we still support the old mount call (via the cmount entry
point).

Submitted by:	Chris Torek <chris.torek@gmail.com>
MFC after:	2 weeks
2015-03-22 09:09:26 +00:00
Xin LI
4f9343fc7c Disable timestamping on devfs read/write operations by default.
Currently we update timestamps unconditionally when doing read or
write operations.  This may slow things down on hardware where
reading timestamps is expensive (e.g. HPET, because of the default
vfs.timestamp_precision setting is nanosecond now) with limited
benefit.

A new sysctl variable, vfs.devfs.dotimes is added, which can be
set to non-zero value when the old behavior is desirable.

Differential Revision:	https://reviews.freebsd.org/D2104
Reported by:	Mike Tancsa <mike sentex net>
Reviewed by:	kib
Relnotes:	yes
Sponsored by:	iXsystems, Inc.
MFC after:	2 weeks
2015-03-21 01:14:11 +00:00
Gleb Smirnoff
4d6481a4c9 o Enhance vm_pager_free_nonreq() function:
- Allow to call the function with vm object lock held.
  - Allow to specify reqpage that doesn't match any page in the region,
    meaning freeing all pages.
o Utilize the new function in couple more places in vnode pager.

Reviewed by:	alc, kib
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-03-17 19:19:19 +00:00
Jung-uk Kim
2d427c524d Fix white spaces. 2015-03-02 19:14:58 +00:00
Edward Tomasz Napierala
ead063e0a2 Make fuse(4) respect FOPEN_DIRECT_IO. This is required for correct
operation of GlusterFS.

PR:		192701
Submitted by:	harsha at harshavardhana.net
Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-03-02 19:04:27 +00:00
Warner Losh
03afe9ba70 nandfs_meta_bread() calls bread() which can set bp to NULL in some
error cases. Calling brelse() with a NULL pointer is not allowed,
so only call brelse() when the bp is non-NULL.

Reported by: Maxime Villard (reported as uninitialized variable)
2015-03-01 21:41:37 +00:00
Alexander Kabaev
0fd841369a Do not leak 'copy' buffer if bmap_truncate_indirect fails.
Reported by: Brainy Code Scanner, by Maxime Villard.
MFC after: 2 weeks
2015-02-28 22:24:45 +00:00
Konstantin Belousov
4fc4286fbc Some fixes for fdescfs lookup code.
Do not ever return doomed vnode from lookup.  This could happen, if
not checked, since dvp is relocked in the 'looking up ourselves' case.

In the other case, since dvp is relocked, mount point might go away
while fdesc_allocvp() is called.  Prevent the situation by doing
vfs_busy() before unlocking dvp.  Reuse the vn_vget_ino_gen() helper.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-02-28 19:57:22 +00:00
Konstantin Belousov
08189ed667 The VNASSERT in vflush() FORCECLOSE case is trying to panic early to
prevent errors from yanking devices out from under filesystems.  Only
care about special vnodes on devfs, special nodes on other kinds of
filesystems do not have special properties.

Sponsored by:  EMC / Isilon Storage Division
Submitted by:   Conrad Meyer
MFC after:	1 week
2015-02-27 16:43:50 +00:00
Pedro F. Giffuni
e5c356b2a2 ext2fs: Plug small memory leak
free() e2fs_contigdirs upon error.
Undo zeroing of e2fs_gd as this was actually a false positive.

X-MFC with:	278790
2015-02-15 14:25:00 +00:00
Pedro F. Giffuni
6be4cf2244 Reuse value of cursize instead of recalculating.
Reported by:	Clang static checker
MFC after:	1 week
2015-02-15 01:34:00 +00:00
Pedro F. Giffuni
f3ee91ed2b Initialize the allocation of variables related to the ext2 allocator.
The e2fs_gd struct was not being initialized and garbage was
being used for hinting the ext2 allocator variant.
Use malloc to clear the values and also initialize e2fs_contigdirs
during allocation to keep consistency.

While here clean up small style issues.

Reported by:	Clang static analyser
MFC after:	1 week
2015-02-15 01:12:15 +00:00
Edward Tomasz Napierala
29836e077a Restore ABI compatibility, broken in r273127. Note that while this fixes
ABI with 10.1, it breaks ABI for 11-CURRENT, so rebuild of automountd(8)
is neccessary.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2015-02-10 16:17:16 +00:00
Konstantin Belousov
bf5fce2bee Remove duplicated assignment.
CID:	1267988
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-02-03 12:09:48 +00:00
Konstantin Belousov
e0a60ae16a Update directory times immediately after an entry is created or
removed.  Postponing it until tmpfs_getattr() is called causes
discordant values reported for file times vs. directory times.

Reported and tested by:	madpilot
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-31 21:31:53 +00:00
Konstantin Belousov
f1a90a7bac Remove single-use boolean.
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-01-31 12:58:04 +00:00
Konstantin Belousov
311d39f2ee POSIX states that write(2) "shall mark for update the last data
modification and last file status change timestamps of the file".
Currently, tmpfs only modifies ctime when file was extended.  Since
r277828 followed tmpfs_write(), mmaped writes also do not modify
ctime.

Fix this, by updating both ctime and mtime for writes to tmpfs files.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-01-31 12:27:18 +00:00
Dimitry Andric
7f4daa88f1 Fix a -Wcast-qual warning in smbfs_subr.c, by using __DECONST. No
functional change.

MFC after:	3 days
2015-01-30 22:02:32 +00:00
Dimitry Andric
38fc0aa484 Fix a -Wcast-qual warning in udf_vnops.c, by using __DECONST. No
functional change.

MFC after:	3 days
2015-01-30 22:01:45 +00:00
Dimitry Andric
03ce3d7219 Fix a bunch of -Wcast-qual warnings in cd9660_util.c, by using
__DECONST.  No functional change.

MFC after:	3 days
2015-01-29 20:40:25 +00:00
Dimitry Andric
09bb4a314a Fix a bunch of -Wcast-qual warnings in msdosfs_conv.c, by using
__DECONST.  No functional change.

MFC after:	3 days
2015-01-29 20:30:13 +00:00
Jamie Gritton
464aad1407 Add allow.mount.fdescfs jail flag.
PR:		192951
Submitted by:	ruben@verweg.com
MFC after:	3 days
2015-01-28 21:08:09 +00:00
Konstantin Belousov
f40cb1c645 Update mtime for tmpfs files modified through memory mapping. Similar
to UFS, perform updates during syncer scans, which in particular means
that tmpfs now performs scan on sync.  Also, this means that a mtime
update may be delayed up to 30 seconds after the write.

The vm_object' OBJ_TMPFS_DIRTY flag for tmpfs swap object is similar
to the OBJ_MIGHTBEDIRTY flag for the vnode object, it indicates that
object could have been dirtied.  Adapt fast page fault handler and
vm_object_set_writeable_dirty() to handle OBJ_TMPFS_NODE same as
OBJT_VNODE.

Reported by:	Ronald Klop <ronald-lists@klop.ws>
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-01-28 10:37:23 +00:00
Konstantin Belousov
3544b0f68f tmpfs does not use UVM on FreeBSD.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2015-01-28 10:25:35 +00:00
Konstantin Belousov
3b50dff506 Stop enforcing additional reference on all cdevs, which was introduced
in r277199.  Acquire the neccessary reference in delist_dev_locked()
and inform destroy_devl() about it using CDP_UNREF_DTR flag.

Fix some style nits, add asserts.

Discussed with:	hselasky
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-19 17:36:52 +00:00
Konstantin Belousov
a57a934a38 Ignore devfs directory entries for devices either being destroyed or
delisted.  The check is racy.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-19 17:24:52 +00:00
Enji Cooper
ae266f893f Fix the build when INVARIANTS is defined by restoring bo's definition in
ext2_truncate(..) and by putting it under INVARIANTS ifdefs

X-MFC with: r277354
MFC after: 2 weeks
2015-01-19 07:10:08 +00:00
Pedro F. Giffuni
9a53618ab2 ext2: Garbage-collect some unused variables
Reported by:	clang static analysis
MFC after:	2 weeks
2015-01-19 03:30:45 +00:00
Pedro F. Giffuni
7075482d4a ext2: fix for uninitialized pointer read.
path.ep_bp was being used uninitialized in ext4_ext_find_extent().

CID:		1062344
MFC after:	1 week
2015-01-18 21:18:28 +00:00
Pedro F. Giffuni
955ba37baa Remove dead code.
After the ext2 variant of the "orlov allocator" was implemented,
the case for a negative or zero dirsize disappeared.

Drop the dead code and unsign dirsize given that it can't be
negative anyways.

CID:		1008669
MFC after:	1 week
2015-01-18 20:26:27 +00:00
Konstantin Belousov
e3612a4c1f Make SIGSTOP working for sleeps done while waiting for fifo readers or
writers in open(2), when the fifo is located on an NFS mount.

Reported by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-18 15:03:26 +00:00
Pedro F. Giffuni
84b170d298 ext2: cosmetical issues
Minor sorting and note when the cases are expected to fall through.

MFC after:	1 week
2015-01-17 15:19:18 +00:00
Hans Petter Selasky
d2955419cd Avoid race with "dev_rel()" when using the recently added
"delist_dev()" function. Make sure the character device structure
doesn't go away until the end of the "destroy_dev()" function due to
concurrently running cleanup code inside "devfs_populate()".

MFC after:	1 week
Reported by:	dchagin@
2015-01-14 22:07:13 +00:00
Hans Petter Selasky
9570004318 Don't use POLLNVAL as a return value from the client side poll
function. Many existing clients don't understand POLLNVAL and instead
relies on an error code from the read(), write() or ioctl() system
call. Also make sure we wakeup any client pollers before the cuse
server is closing, so they don't wait forever for an event.
2015-01-13 13:32:18 +00:00
Ed Maste
4882501b5c ANSIfy msdosfs
Add a few cases and style(9) fixes missed in r276887

Sponsored by:	The FreeBSD Foundation
2015-01-12 21:55:48 +00:00
Ed Maste
10c9700f3e ANSIfy sys/fs/msdosfs
There are a number of msdosfs improvements in NetBSD that may be worth
bringing over, and this reduces noise in the comparison.

Differential Revision:	https://reviews.freebsd.org/D1466
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
2015-01-09 14:50:08 +00:00
Robert Watson
eae6da3db4 Use M_SIZE() instead of hand-crafted (and mostly correct) NFSMSIZ() macro
in the NFS server; garbage collect now-unused NFSMSIZ() and M_HASCL()
macros.  Also garbage collect now-unused versions in headers for the
removed previous NFS client and server.

Reviewed by:	rmacklem
Sponsored by:	EMC / Isilon Storage Division
2015-01-07 17:22:56 +00:00
Mateusz Guzik
cd29b292b8 Convert nullfs hash lock from a mutex to an rwlock. 2014-12-30 21:41:35 +00:00
Rick Macklem
07d491dede r245508 modified the NFS client's Setattr RPC to
use VA_UTIMES_NULL to indicate whether it should
set the time to the current tod on the server.
This had the side effect of making the NFS client
use the client's timestamp for exclusive create,
starting with FreeBSD9.2.
Unfortunately a bug in some Solaris NFS servers
causes these servers to return NFS_OK to the
Setattr RPC done during exclusive create, but not
actually set the file's mode, leaving the file's
mode == 0.
This patch restores the NFS client's behaviour to
use the server's tod for the exclusive open's
Setattr RPC, to avoid the Solaris server bug and
to restore the pre-FreeBSD9.2 NFS behaviour.

Discussed on:	freebsd-fs
PR:	186293
MFC after:	3 months
2014-12-28 21:13:52 +00:00
Rick Macklem
2f88b3d20a Delete some duplicate code that was harmless because
exactly the same code is at the end of the nfscl_checksattr()
function that is called just before it. As such, this code
had already been executed and didn't do anything.

MFC after:	1 week
2014-12-25 22:29:37 +00:00
Rick Macklem
52f1bb38c2 A deadlock in the NFSv4 server with vfs.nfsd.enable_locallocks=1
was reported via email. This was caused by a LOR between the
sleep lock used to serialize the local locking (nfsrv_locklf())
and locking the vnode. I believe this patch fixes the problem
by delaying relocking of the vnode until the sleep lock is
unlocked (nfsrv_unlocklf()). To avoid nfsvno_advlock() having the side
effect of unlocking the vnode, unlocking the vnode was moved to before
the functions that call nfsvno_advlock().
It shouldn't affect the execution of the default case where
vfs.nfsd.enable_locallocks=0.

Reported by:	loic.blot@unix-experience.fr
Discussed with:	kib
MFC after:	1 week
2014-12-25 01:55:17 +00:00
Rick Macklem
62c23db947 Fix kernel builds with "options NFS_DEBUG" that
were broken by r276096. Also delete the two
kernel options NFS_GATHERDELAY, NFS_WDELAYHASHSIZ
which are no longer used.

Reported by:	bz
2014-12-23 14:24:36 +00:00
Rick Macklem
c15882f091 Remove the old NFS client and server from head,
which means that the NFSCLIENT and NFSSERVER
kernel options will no longer work. This commit
only removes the kernel components. Removal of
unused code in the user utilities will be done
later. This commit does not include an addition
to UPDATING, but that will be committed in a
few minutes.

Discussed on: freebsd-fs
2014-12-23 00:47:46 +00:00
Konstantin Belousov
789bdfdbc6 Handle MAKEENTRY cnp flag in the VOP_CREATE(). Curiously, some
fs, e.g. smbfs, already did it.

Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-12-21 13:29:33 +00:00
Benno Rice
6d659a5d9b Adjust the test of a KASSERT to better match the intent.
This assertion was added in r246213 as a guard against corrupted mbufs
arriving from drivers, the key distinguishing factor of said mbufs being
that they had a negative length. Given we're in a while loop specifically
designed to skip over zero-length mbufs, panicking on a zero-length mbuf
seems incorrect.

No objection from:	kib
2014-12-19 19:09:22 +00:00
Konstantin Belousov
6c21f6edb8 The VOP_LOOKUP() implementations for CREATE op do not put the name
into namecache, to avoid cache trashing when doing large operations.
E.g., tar archive extraction is not usually followed by access to many
of the files created.

Right now, each VOP_LOOKUP() implementation explicitely knowns about
this quirk and tests for both MAKEENTRY flag presence and op != CREATE
to make the call to cache_enter().  Centralize the handling of the
quirk into VFS, by deciding to cache only by MAKEENTRY flag in VOP.
VFS now sets NOCACHE flag for CREATE namei() calls.

Note that the change in semantic is backward-compatible and could be
merged to the stable branch, and is compatible with non-changed
third-party filesystems which correctly handle MAKEENTRY.

Suggested by:	Chris Torek <torek@pi-coral.com>
Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-12-18 10:01:12 +00:00
Gleb Kurtsou
dde58752db Adjust printf format specifiers for dev_t and ino_t in kernel.
ino_t and dev_t are about to become uint64_t.

Reviewed by:	kib, mckusick
2014-12-17 07:27:19 +00:00
Pedro F. Giffuni
8f87059b41 ext2fs: Fix old out-of-bounds access.
Overrunning buffer pointed to by (caddr_t)&oip->i_db[0] of 48 bytes by
passing it to a function which accesses it at byte offset 59 using
argument 60UL.

The issue was inherited from an older FFS implementation and
fixed there with by merging UFS2 in r98542. We follow the
FFS fix.

Discussed with:	bde
CID:		1007665
MFC after:	3 days
2014-12-09 14:56:00 +00:00
Konstantin Belousov
b2344ab5ff Do not call VFS_SYNC() before VFS_UNMOUNT() for forced unmount.
Since VFS does not/cannot stop writes, sync might run indefinitely, or
be a wrong thing to do at all.  E. g. NFS ignores VFS_SYNC() for
forced unmounts, since non-responding server does not allow sync to
finish.  On the other hand, filesystems can and do stop writes using
fs-specific facilities, and should already fully flush caches in
VFS_UNMOUNT() due to the race.

Adjust msdosfs tp sync in unmount for forced call, to accomodate the
new behaviour.  Note that it is still racy, since writes are not
stopped.

Discussed with:	avg, bjk, mckusick
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2014-12-09 10:00:47 +00:00
Konstantin Belousov
5c7bebf961 The process spin lock currently has the following distinct uses:
- Threads lifetime cycle, in particular, counting of the threads in
  the process, and interlocking with process mutex and thread lock.
  The main reason of this is that turnstile locks are after thread
  locks, so you e.g. cannot unlock blockable mutex (think process
  mutex) while owning thread lock.

- Virtual and profiling itimers, since the timers activation is done
  from the clock interrupt context.  Replace the p_slock by p_itimmtx
  and PROC_ITIMLOCK().

- Profiling code (profil(2)), for similar reason.  Replace the p_slock
  by p_profmtx and PROC_PROFLOCK().

- Resource usage accounting.  Need for the spinlock there is subtle,
  my understanding is that spinlock blocks context switching for the
  current thread, which prevents td_runtime and similar fields from
  changing (updates are done at the mi_switch()).  Replace the p_slock
  by p_statmtx and PROC_STATLOCK().

The split is done mostly for code clarity, and should not affect
scalability.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-26 14:10:00 +00:00
Edward Tomasz Napierala
e3d5f1fe3b Implement "automount -c".
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2014-11-22 16:48:29 +00:00
Edward Tomasz Napierala
836856e3e6 Fix smbfs to not zero out statfs f_flags field. Previously, this
made getmntinfo() return empty flags for smbfs filesystems when
called with MNT_WAIT. It's not visible with mount(8), since it uses
MNT_NOWAIT, but broke autounmount(8) operation.

PR:		195161
Differential Revision:	https://reviews.freebsd.org/D1194
Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2014-11-21 06:21:39 +00:00
Pedro F. Giffuni
33587684a6 ifdef ext2_print_inode which is not really used.
ext2_print_inode is not really used but it was nice to
have for initial development work. #ifdef it under a
new EXT2FS_DEBUG knob so that we don't spend time
compiling it.

MFC after:	3 days
2014-11-12 16:23:56 +00:00
Mateusz Guzik
4dba07b216 Fix up some session-related races in devfs.
One was introduced with r272596, the rest was there to begin with.

Noted by: jhb
2014-11-03 03:12:15 +00:00
Edward Tomasz Napierala
2fbe0cff73 Fix handling of "conn" mount_nfs(8) option.
Reviewed by:	rmacklem@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2014-10-30 09:25:03 +00:00
Edward Tomasz Napierala
5a06ac3540 Add support for "timeo", "actimeo", "noac", and "proto" options
to mount_nfs(8).  They are implemented on Linux, OS X, and Solaris,
and thus can be expected to appear in automounter maps.

Reviewed by:	rmacklem@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2014-10-30 08:50:01 +00:00
Konstantin Belousov
42ecb595f2 Allow the vfs.nfsd knobs to be set from loader.conf (or using
kenv(8)).  This is useful when nfsd is loaded as module.

Reviewed by:	rmacklem
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-10-27 07:47:13 +00:00
Rick Macklem
6a30c96cdc Clip the settings for the NFS rsize, wsize mount options
to a power of 2. For non-power of 2 settings, intermittent
page faults have been reported. Although the bug that causes
these page faults/crashes has not been identified, it does
not appear to occur when rsize, wsize is a power of 2.

Reported by:	tcberner@gmail.com
MFC after:	2 weeks
2014-10-22 22:27:51 +00:00
Rick Macklem
fcf121d481 Revert r273481 so it can be recoded using fls(), which
some feel will make it more readable.
2014-10-22 21:57:35 +00:00
Rick Macklem
88cc4e92da Clip the settings for the NFS rsize, wsize mount options
to a power of 2. For non-power of 2 settings, intermittent
page faults have been reported. Although the bug that causes
these page faults/crashes has not been identified, it does
not appear to occur when rsize, wsize is a power of 2.

Reported by:	tcberner@gmail.com
MFC after:	2 weeks
2014-10-22 20:47:11 +00:00
Mateusz Guzik
12e2a30ef9 tmpfs: allow shared file lookups
Tested by: pho
2014-10-21 21:27:13 +00:00
Hans Petter Selasky
f0188618f2 Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
Mateusz Guzik
4fce16e4c9 Provide vfs suspension support only for filesystems which need it, take
two.

nullfs and unionfs need to request suspension if underlying filesystem(s)
use it. Utilize mnt_kern_flag for this purpose.

This is a fixup for 273271.

No strong objections from: kib
Pointy hat to: mjg
MFC after:	2 weeks
2014-10-20 18:00:50 +00:00
Mateusz Guzik
a8a07fd613 unionfs: hold mount interlock while manipulating mnt_flag
This is for consistency with other filesystems.
2014-10-20 17:53:49 +00:00