Commit Graph

3302 Commits

Author SHA1 Message Date
kib
ff588ae9b0 Currently, softupdate code detects overstepping on the workitems
limits in the code which is deep in the call stack, and owns several
critical system resources, like vnode locks.  Attempt to wait while
the per-mount softupdate thread cleans up the backlog may deadlock,
because the thread might need to lock the same vnode which is owned by
the waiting thread.

Instead of synchronously waiting for the worker, perform the worker'
tickle and pause until the backlog is cleaned, at the safe point
during return from kernel to usermode.  A new ast request to call
softdep_ast_cleanup() is created, the SU code now only checks the size
of queue and schedules ast.

There is no ast delivery for the kernel threads, so they are exempted
from the mechanism, except NFS daemon threads.  NFS server loop
explicitely checks for the request, and informs the schedule_cleanup()
that it is capable of handling the requests by the process P2_AST_SU
flag.  This is needed because nfsd may be the sole cause of the SU
workqueue overflow.  But, to not cause nsfd to spawn additional
threads just because we slow down existing workers, only tickle su
threads, without waiting for the backlog cleanup.

Reviewed by:	jhb, mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-05-27 09:20:42 +00:00
dchagin
4e888d6b57 Hide vfs.pfs.trace variable if it is not used. 2015-05-24 18:11:22 +00:00
rmacklem
b51d622ba8 The NFS client generated directory block(s) with d_fileno == 0
so that it would not return less data than requested.
Since returning less directory data than requested is not a problem
for FreeBSD and even UFS no longer returns directory structures
with d_fileno == 0, this patch stops the client from doing this.
Although entries with d_fileno == 0 should not be a problem,
the man pages no longer document that these entries should be
ignored, so there was a concern that these entries might be an
issue in the future.

Suggested by:	trasz
Tested by:	trasz
MFC after:	2 weeks
2015-05-23 21:58:41 +00:00
jkim
318c4f97e6 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
jhb
9b4a921aab Always set p_oppid when attaching to an existing process via procfs
tracing.  This matches the behavior of ptrace(PT_ATTACH).  Also,
the procfs detach request assumes p_oppid is always set.

Reviewed by:	kib
MFC after:	2 weeks
2015-05-22 11:03:51 +00:00
rmacklem
9a564b79b5 The NFS client wasn't handling getdirentries(2) requests for sizes
that are not an exact multiple of DIRBLKSIZ correctly. Fortunately
readdir(3) always uses an exact multiple of DIRBLKSIZ, so few applications
were affected. This patch fixes this problem by reducing the size
of the directory read to an exact multiple of DIRBLKSIZ.

Tested by:	trasz
Reported by:	trasz
Reviewed by:	trasz
MFC after:	2 weeks
2015-05-21 23:14:18 +00:00
mav
983d16c8e2 Do not promote large async writes to sync.
Present implementation of large sync writes is too strict and so can be
quite slow.  Instead of doing that, execute large async write in chunks,
syncing each chunk separately.

It would be good to fix large sync writes too, but I leave it to somebody
with more skills in this area.

Reviewed by:	rmacklem
MFC after:	1 week
2015-05-14 10:04:42 +00:00
rmacklem
5a5431c415 Fix the NFS server's handling of a bogus NFSv2 ROOT RPC.
The ROOT RPC is deprecated in the NFSv2 RFC, RFC-1094
and should never be used by a client.

Tested by:	thmu@freenet.de
MFC after:	1 week
2015-04-25 00:58:24 +00:00
rmacklem
e0a9cb76d2 MAXBSIZE defines both the largest UFS block size and the
largest size for a buffer in the buffer cache. This patch
defines a new constant MAXBCACHEBUF, which is the largest
size for a buffer in the buffer cache. Having a separate
constant allows MAXBCACHEBUF to be set larger than MAXBSIZE
on a per-architecture basis, so that NFS can do larger read/writes
for these architectures. It modifies sys/param.h so that BKVASIZE
can also be set on a per-architecture basis.
A couple of cases where NFS used MAXBSIZE instead of NFS_MAXBSIZE
is fixed as well.

Differential Revision:	https://reviews.freebsd.org/D2330
Reviewed by:	mav, kib
MFC after:	2 weeks
2015-04-25 00:52:01 +00:00
pfg
2c313e6688 Prevent a double free.
This is similar to r281756 so set the ptr NULL after free as a safety belt
against future changes.

Obtained from:	HardenedBSD (b2e77ced9ae213d358b44d98f552d9ae4636ecac)
Submitted by:	Oliver Pinter
Revewed by:	rmacklem
2015-04-20 16:40:13 +00:00
pfg
32880107f3 nfsrpc_createv4: fix double free.
Reported by:	Oliver Pinter, clang static checker
Obtained from:	HardenedBSD (commit 63cac77c42c0c3fc67da62f97d5ab651d52ae707)
Reviewed by:	rmacklem
MFC after:	5 days
2015-04-19 23:55:59 +00:00
mav
d9ba2b8e84 Change wcommitsize default from one empirical value to another.
The new value is more predictable with growing RAM size:

        hibufspace maxvnodes      old      new
i386:
  256MB   32980992     15800  2198732  2097152
    2GB   94027776    107677   878764  4194304
amd64:
  256MB   32980992     15800  2198732  2097152
    1GB  114114560     68062  1678155  4194304
    4GB  217055232    111807  1955452  4194304
   16GB 1717846016    337308  5097465 16777216
   64GB 1734918144   1164427  1490479 16777216
  256GB 1734918144   4426453   391983 16777216

Reviewed by:	rmacklem
MFC after:	2 weeks
2015-04-19 11:34:41 +00:00
trasz
4252f860ce Replace "new NFS" with just "NFS" in some sysctl description strings.
Sponsored by:	The FreeBSD Foundation
2015-04-19 06:18:41 +00:00
pfg
2bead96db0 Drop experimental dir_index support.
The htree directory index is a highly desirable feature for research
purposes and was meant to improve performance in our ext2/3 driver.
Unfortunately our implementation has two problems:

- It never really delivered any performance improvement.
- It appears to corrupt the filesystem in undetermined circumstances.

Strictly speaking dir_index is not required for read/write support in
ext2/3 and our limited ext4 support still works fine without it.

Regain stability in the ext2 driver by removing it. We may need it back
(fixed) if we want to support encrypted ext4 support but thanks to the
wonders of version control we can always revert this change and bring it
back.

PR:	191895
PR:	198731
PR:	199309

MFC after:	5 days
2015-04-17 22:26:01 +00:00
rmacklem
b4d8a8d1f7 mav@ has found that NFS servers exporting ZFS file systems
can perform better when using a 128K read/write data size.
This patch changes NFS_MAXDATA from 64K to 128K so that
clients can use 128K for NFS mounts to allow this.
The patch also renames NFS_MAXDATA to NFS_SRVMAXIO so
that it is clear that it applies to the NFS server side
only. It also avoids a name conflict with the NFS_MAXDATA
defined in rpcsvc/nfs_prot.h, that is used for userland RPC.

Tested by:	mav
Reviewed by:	mav
MFC after:	2 weeks
2015-04-16 22:35:15 +00:00
rmacklem
ad77d0b1c1 File systems that do not use the buffer cache (such as ZFS) must
use VOP_FSYNC() to perform the NFS server's Commit operation.
This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which
is set by file systems that use the buffer cache. If this flag
is not set, the NFS server always does a VOP_FSYNC().
This should be ok for old file system modules that do not set
MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although
it might not be optimal for file systems that use the buffer cache.

Reviewed by:	kib
MFC after:	2 weeks
2015-04-15 20:16:31 +00:00
will
e2c616f11c tmpfs_getattr(): Return more correct allocated byte counts.
For VREG vnodes, return the resident page count (multiplied by PAGE_SIZE)
for the tmpfs node's anonymous VM object that stores actual file contents.

For all other vnodes, return the tmpfs_node's tn_size, which should not
be rounded to a page.

This change allows using stat(2) to identify a sparse file on tmpfs.

Reviewed by:	kib
MFC after:	1 week
2015-04-10 19:04:39 +00:00
kib
1440f3812a Do not call msdosfs_sync() on the read-only msdosfs mounts. In fact,
it should be a nop for ro.

PR:	199152
Reviewed by:	bde (PR version of the patch)
Submitted by:	longwitz@incore.de
MFC after:	1 week
2015-04-05 21:10:38 +00:00
kib
95e5199578 Assert that an msdosfs mount is not read-only when FAT modifications
are requested.

PR:	199152
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-05 21:08:04 +00:00
kib
b7c417708f Refine r280308. Do not completely disable timestamping of devfs nodes
on reads or writes, the time marks are used to display idle time by
w(1) [1].  Instead, use vfs.devfs.dotimes as the selector of default
precision vs. using time_second.  The later gives seconds precision,
which is good enough for the purpose.

Note that timestamp updates are unlocked and the updates itself, as
well as the check in devfs_timestamp, are non-atomic.

Noted by:	truckman [1]
Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-01 08:25:40 +00:00
kib
89382b6533 msdosfs: mark unused compat-mount fields
The magic number MSDOSFS_ARGSMAGIC, which used to distinguish
"old" vs "new" msdosfs mount arguments, has not been used since
2005; it should just go away now.

Likewise, the local-to-Unicode table that changed at the same
time is unused.

Leave the space reserved in the old style mount arguments, though,
since we still support the old mount call (via the cmount entry
point).

Submitted by:	Chris Torek <chris.torek@gmail.com>
MFC after:	2 weeks
2015-03-22 09:09:26 +00:00
delphij
041657da93 Disable timestamping on devfs read/write operations by default.
Currently we update timestamps unconditionally when doing read or
write operations.  This may slow things down on hardware where
reading timestamps is expensive (e.g. HPET, because of the default
vfs.timestamp_precision setting is nanosecond now) with limited
benefit.

A new sysctl variable, vfs.devfs.dotimes is added, which can be
set to non-zero value when the old behavior is desirable.

Differential Revision:	https://reviews.freebsd.org/D2104
Reported by:	Mike Tancsa <mike sentex net>
Reviewed by:	kib
Relnotes:	yes
Sponsored by:	iXsystems, Inc.
MFC after:	2 weeks
2015-03-21 01:14:11 +00:00
glebius
398be53682 o Enhance vm_pager_free_nonreq() function:
- Allow to call the function with vm object lock held.
  - Allow to specify reqpage that doesn't match any page in the region,
    meaning freeing all pages.
o Utilize the new function in couple more places in vnode pager.

Reviewed by:	alc, kib
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-03-17 19:19:19 +00:00
jkim
d07e2757d9 Fix white spaces. 2015-03-02 19:14:58 +00:00
trasz
ab90d82e08 Make fuse(4) respect FOPEN_DIRECT_IO. This is required for correct
operation of GlusterFS.

PR:		192701
Submitted by:	harsha at harshavardhana.net
Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-03-02 19:04:27 +00:00
imp
58c9460670 nandfs_meta_bread() calls bread() which can set bp to NULL in some
error cases. Calling brelse() with a NULL pointer is not allowed,
so only call brelse() when the bp is non-NULL.

Reported by: Maxime Villard (reported as uninitialized variable)
2015-03-01 21:41:37 +00:00
kan
a95ac78b9c Do not leak 'copy' buffer if bmap_truncate_indirect fails.
Reported by: Brainy Code Scanner, by Maxime Villard.
MFC after: 2 weeks
2015-02-28 22:24:45 +00:00
kib
661b19b40e Some fixes for fdescfs lookup code.
Do not ever return doomed vnode from lookup.  This could happen, if
not checked, since dvp is relocked in the 'looking up ourselves' case.

In the other case, since dvp is relocked, mount point might go away
while fdesc_allocvp() is called.  Prevent the situation by doing
vfs_busy() before unlocking dvp.  Reuse the vn_vget_ino_gen() helper.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-02-28 19:57:22 +00:00
kib
3bc9cbc06a The VNASSERT in vflush() FORCECLOSE case is trying to panic early to
prevent errors from yanking devices out from under filesystems.  Only
care about special vnodes on devfs, special nodes on other kinds of
filesystems do not have special properties.

Sponsored by:  EMC / Isilon Storage Division
Submitted by:   Conrad Meyer
MFC after:	1 week
2015-02-27 16:43:50 +00:00
pfg
e22521379a ext2fs: Plug small memory leak
free() e2fs_contigdirs upon error.
Undo zeroing of e2fs_gd as this was actually a false positive.

X-MFC with:	278790
2015-02-15 14:25:00 +00:00
pfg
03988df8a2 Reuse value of cursize instead of recalculating.
Reported by:	Clang static checker
MFC after:	1 week
2015-02-15 01:34:00 +00:00
pfg
d2ad05642a Initialize the allocation of variables related to the ext2 allocator.
The e2fs_gd struct was not being initialized and garbage was
being used for hinting the ext2 allocator variant.
Use malloc to clear the values and also initialize e2fs_contigdirs
during allocation to keep consistency.

While here clean up small style issues.

Reported by:	Clang static analyser
MFC after:	1 week
2015-02-15 01:12:15 +00:00
trasz
e13ac6cd7e Restore ABI compatibility, broken in r273127. Note that while this fixes
ABI with 10.1, it breaks ABI for 11-CURRENT, so rebuild of automountd(8)
is neccessary.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2015-02-10 16:17:16 +00:00
kib
09bdd8a7f8 Remove duplicated assignment.
CID:	1267988
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-02-03 12:09:48 +00:00
kib
1831e3d7dc Update directory times immediately after an entry is created or
removed.  Postponing it until tmpfs_getattr() is called causes
discordant values reported for file times vs. directory times.

Reported and tested by:	madpilot
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-31 21:31:53 +00:00
kib
1ccf1fa71b Remove single-use boolean.
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-01-31 12:58:04 +00:00
kib
8773784e5a POSIX states that write(2) "shall mark for update the last data
modification and last file status change timestamps of the file".
Currently, tmpfs only modifies ctime when file was extended.  Since
r277828 followed tmpfs_write(), mmaped writes also do not modify
ctime.

Fix this, by updating both ctime and mtime for writes to tmpfs files.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-01-31 12:27:18 +00:00
dim
6b8eea4924 Fix a -Wcast-qual warning in smbfs_subr.c, by using __DECONST. No
functional change.

MFC after:	3 days
2015-01-30 22:02:32 +00:00
dim
edbaff1357 Fix a -Wcast-qual warning in udf_vnops.c, by using __DECONST. No
functional change.

MFC after:	3 days
2015-01-30 22:01:45 +00:00
dim
edba0c462e Fix a bunch of -Wcast-qual warnings in cd9660_util.c, by using
__DECONST.  No functional change.

MFC after:	3 days
2015-01-29 20:40:25 +00:00
dim
07f28f1df7 Fix a bunch of -Wcast-qual warnings in msdosfs_conv.c, by using
__DECONST.  No functional change.

MFC after:	3 days
2015-01-29 20:30:13 +00:00
jamie
c7d0935d11 Add allow.mount.fdescfs jail flag.
PR:		192951
Submitted by:	ruben@verweg.com
MFC after:	3 days
2015-01-28 21:08:09 +00:00
kib
19abfd4698 Update mtime for tmpfs files modified through memory mapping. Similar
to UFS, perform updates during syncer scans, which in particular means
that tmpfs now performs scan on sync.  Also, this means that a mtime
update may be delayed up to 30 seconds after the write.

The vm_object' OBJ_TMPFS_DIRTY flag for tmpfs swap object is similar
to the OBJ_MIGHTBEDIRTY flag for the vnode object, it indicates that
object could have been dirtied.  Adapt fast page fault handler and
vm_object_set_writeable_dirty() to handle OBJ_TMPFS_NODE same as
OBJT_VNODE.

Reported by:	Ronald Klop <ronald-lists@klop.ws>
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-01-28 10:37:23 +00:00
kib
53810519b4 tmpfs does not use UVM on FreeBSD.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2015-01-28 10:25:35 +00:00
kib
f748dc7ade Stop enforcing additional reference on all cdevs, which was introduced
in r277199.  Acquire the neccessary reference in delist_dev_locked()
and inform destroy_devl() about it using CDP_UNREF_DTR flag.

Fix some style nits, add asserts.

Discussed with:	hselasky
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-19 17:36:52 +00:00
kib
b3741c8701 Ignore devfs directory entries for devices either being destroyed or
delisted.  The check is racy.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-19 17:24:52 +00:00
ngie
f94357dba9 Fix the build when INVARIANTS is defined by restoring bo's definition in
ext2_truncate(..) and by putting it under INVARIANTS ifdefs

X-MFC with: r277354
MFC after: 2 weeks
2015-01-19 07:10:08 +00:00
pfg
8fa2e2513f ext2: Garbage-collect some unused variables
Reported by:	clang static analysis
MFC after:	2 weeks
2015-01-19 03:30:45 +00:00
pfg
142fb530ca ext2: fix for uninitialized pointer read.
path.ep_bp was being used uninitialized in ext4_ext_find_extent().

CID:		1062344
MFC after:	1 week
2015-01-18 21:18:28 +00:00
pfg
f71f36cb87 Remove dead code.
After the ext2 variant of the "orlov allocator" was implemented,
the case for a negative or zero dirsize disappeared.

Drop the dead code and unsign dirsize given that it can't be
negative anyways.

CID:		1008669
MFC after:	1 week
2015-01-18 20:26:27 +00:00