BROKEN NFS SERVER OR MIDDLEWARE: Certain WAN "accelerators" attempt to cache
NFS GETATTR traffic, but actually corrupt it (e.g., responding to requests
with attributes for totally different files).
Warn very verbosely when this is detected. Linux' NFS client has a similar
warning.
Adds a sysctl/tunable (vfs.nfs.fileid_maxwarnings) to configure the quantity
of warnings; default to 10. (Zero disables; -1 is unlimited.)
Adds a failpoint to aid in validating the warning / behavior with a
non-broken server. Use something like:
sysctl 'debug.fail_point.nfscl_force_fileid_warning=10%return(1)'
Reviewed by: rmacklem
Approved by: markj (mentor)
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3304
unconfirmed clientid structure for the same client on the last hash list,
this old entry would not be removed/deleted. I do not think this bug would have
caused serious problems, since the new entry would have been before the old one
on the list. This old entry would have eventually been scavenged/removed.
Detected while reading the code looking for another bug.
MFC after: 3 days
determining whether a node changed.
Other filesystems, e.g., UFS, only check on seconds, when determining
whether something changed.
This also corrects the birthtime case, where we checked tv_nsec
twice, instead of tv_sec and tv_nsec (PR).
PR: 201284
Submitted by: David Binderman
Patch suggested by: kib
Reviewed by: kib
MFC after: 2 weeks
Committed from: Essen FreeBSD Hackathon
SIGCHLD signal, should keep full 32 bits of the status passed to the
_exit(2).
Split the combined p_xstat of the struct proc into the separate exit
status p_xexit for normal process exit, and signalled termination
information p_xsig. Kernel-visible macro KW_EXITCODE() reconstructs
old p_xstat from p_xexit and p_xsig. p_xexit contains complete status
and copied out into si_status.
Requested by: Joerg Schilling
Reviewed by: jilles (previous version), pho
Tested by: pho
Sponsored by: The FreeBSD Foundation
This obviates the need for a MNTK_SUSPENDABLE flag, since passthrough
filesystems like nullfs and unionfs no longer need to inherit this
information from their lower layer(s). This change also restores the
pre-r273336 behaviour of using the presence of a susp_clean VFS method to
request suspension support.
Reviewed by: kib, mjg
Differential Revision: https://reviews.freebsd.org/D2937
the kernel would generate a bogus one with a ":/<path>" suffix.
This would only occur for the case where there was no explicit
"principal" argument and the getaddrinfo() call in mount_nfs.c failed to a
return a cannonical name for the server.
This patch fixes this unusual case.
PR: 201073
Submitted by: masato@itc.naist.jp
MFC after: 2 weeks
compared to the old NFS client via email to the freebsd-fs@ mailing list.
For the new client, when multiple clients attempted to create a symbolic
link concurrently, more that one client would report success instead of
EEXIST. This was caused by code in the new client that mapped EEXIST to
OK assuming it was caused by a retried RPC request.
Since the old client did not do this, the patch defaults to the old
behaviour and permits the new behaviour to be enabled via a sysctl.
Reported by: alex.burlyga.ietf@gmail.com
Tested by: alex.burlyga.ietf@gmail.com
MFC after: 2 weeks
* GENERAL
- Update copyright.
- Make kernel options for RANDOM_YARROW and RANDOM_DUMMY. Set
neither to ON, which means we want Fortuna
- If there is no 'device random' in the kernel, there will be NO
random(4) device in the kernel, and the KERN_ARND sysctl will
return nothing. With RANDOM_DUMMY there will be a random(4) that
always blocks.
- Repair kern.arandom (KERN_ARND sysctl). The old version went
through arc4random(9) and was a bit weird.
- Adjust arc4random stirring a bit - the existing code looks a little
suspect.
- Fix the nasty pre- and post-read overloading by providing explictit
functions to do these tasks.
- Redo read_random(9) so as to duplicate random(4)'s read internals.
This makes it a first-class citizen rather than a hack.
- Move stuff out of locked regions when it does not need to be
there.
- Trim RANDOM_DEBUG printfs. Some are excess to requirement, some
behind boot verbose.
- Use SYSINIT to sequence the startup.
- Fix init/deinit sysctl stuff.
- Make relevant sysctls also tunables.
- Add different harvesting "styles" to allow for different requirements
(direct, queue, fast).
- Add harvesting of FFS atime events. This needs to be checked for
weighing down the FS code.
- Add harvesting of slab allocator events. This needs to be checked for
weighing down the allocator code.
- Fix the random(9) manpage.
- Loadable modules are not present for now. These will be re-engineered
when the dust settles.
- Use macros for locks.
- Fix comments.
* src/share/man/...
- Update the man pages.
* src/etc/...
- The startup/shutdown work is done in D2924.
* src/UPDATING
- Add UPDATING announcement.
* src/sys/dev/random/build.sh
- Add copyright.
- Add libz for unit tests.
* src/sys/dev/random/dummy.c
- Remove; no longer needed. Functionality incorporated into randomdev.*.
* live_entropy_sources.c live_entropy_sources.h
- Remove; content moved.
- move content to randomdev.[ch] and optimise.
* src/sys/dev/random/random_adaptors.c src/sys/dev/random/random_adaptors.h
- Remove; plugability is no longer used. Compile-time algorithm
selection is the way to go.
* src/sys/dev/random/random_harvestq.c src/sys/dev/random/random_harvestq.h
- Add early (re)boot-time randomness caching.
* src/sys/dev/random/randomdev_soft.c src/sys/dev/random/randomdev_soft.h
- Remove; no longer needed.
* src/sys/dev/random/uint128.h
- Provide a fake uint128_t; if a real one ever arrived, we can use
that instead. All that is needed here is N=0, N++, N==0, and some
localised trickery is used to manufacture a 128-bit 0ULLL.
* src/sys/dev/random/unit_test.c src/sys/dev/random/unit_test.h
- Improve unit tests; previously the testing human needed clairvoyance;
now the test will do a basic check of compressibility. Clairvoyant
talent is still a good idea.
- This is still a long way off a proper unit test.
* src/sys/dev/random/fortuna.c src/sys/dev/random/fortuna.h
- Improve messy union to just uint128_t.
- Remove unneeded 'static struct fortuna_start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])
* src/sys/dev/random/yarrow.c src/sys/dev/random/yarrow.h
- Improve messy union to just uint128_t.
- Remove unneeded 'staic struct start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])
- Fix some magic numbers elsewhere used as FAST and SLOW.
Differential Revision: https://reviews.freebsd.org/D2025
Reviewed by: vsevolod,delphij,rwatson,trasz,jmg
Approved by: so (delphij)
dup entry, upon detach from the parent directory. If the node is
renamed, the entry is re-attached at the different directory, and
invalud cookie value triggers assert (or corrupts directory rb tree,
it seems).
Reported by: clusteradm (gjb, antoine)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
o Provide an extensive set of assertions for input array of pages.
o Remove now duplicate assertions from different pagers.
Sponsored by: Nginx, Inc.
Sponsored by: Netflix
in the requested array, then it is responsible for disposition of previous
page and is responsible for updating the entry in the requested array.
Now consumers of KPI do not need to re-lookup the pages after call to
vm_pager_get_pages().
Reviewed by: kib
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
Use the same scheme implemented to manage credentials.
Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.
Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.
- MNTK_SUSPENDABLE is set in mnt_kern_flag, not mnt_flag.
- The lower layer of a unionfs mount is read-only, so the mount should
be suspendable iff the upper layer is suspendable.
- Remove a couple of superfluous comments.
Differential Revision: https://reviews.freebsd.org/D2714
Reviewed by: kib, mjg
logic is now placed in the mmap hook implementation rather than requiring
it to be placed in sys/vm/vm_mmap.c. This hook allows new file types to
support mmap() as well as potentially allowing mmap() for existing file
types that do not currently support any mapping.
The vm_mmap() function is now split up into two functions. A new
vm_mmap_object() function handles the "back half" of vm_mmap() and accepts
a referenced VM object to map rather than a (handle, handle_type) tuple.
vm_mmap() is now reduced to converting a (handle, handle_type) tuple to a
a VM object and then calling vm_mmap_object() to handle the actual mapping.
The vm_mmap() function remains for use by other parts of the kernel
(e.g. device drivers and exec) but now only supports mapping vnodes,
character devices, and anonymous memory.
The mmap() system call invokes vm_mmap_object() directly with a NULL object
for anonymous mappings. For mappings using a file descriptor, the
descriptors fo_mmap() hook is invoked instead. The fo_mmap() hook is
responsible for performing type-specific checks and adjustments to
arguments as well as possibly modifying mapping parameters such as flags
or the object offset. The fo_mmap() hook routines then call
vm_mmap_object() to handle the actual mapping.
The fo_mmap() hook is optional. If it is not set, then fo_mmap() will
fail with ENODEV. A fo_mmap() hook is implemented for regular files,
character devices, and shared memory objects (created via shm_open()).
While here, consistently use the VM_PROT_* constants for the vm_prot_t
type for the 'prot' variable passed to vm_mmap() and vm_mmap_object()
as well as the vm_mmap_vnode() and vm_mmap_cdev() helper routines.
Previously some places were using the mmap()-specific PROT_* constants
instead. While this happens to work because PROT_xx == VM_PROT_xx,
using VM_PROT_* is more correct.
Differential Revision: https://reviews.freebsd.org/D2658
Reviewed by: alc (glanced over), kib
MFC after: 1 month
Sponsored by: Chelsio
When providing memory map information to userland, populate the vnode pointer
for tmpfs files. Set the memory mapping to appear as a vnode type, to match
FreeBSD 9 behavior.
This fixes the use of tmpfs files with the dtrace pid provider,
procstat -v, procfs, linprocfs, pmc (pmcstat), and ptrace (PT_VM_ENTRY).
Submitted by: Eric Badger <eric@badgerio.us> (initial revision)
Obtained from: Dell Inc.
PR: 198431
MFC after: 2 weeks
Reviewed by: jhb
Approved by: kib (mentor)
Merge the filesystem specific part from r274914 to ext2fs.
I only did regular testing with the change but UFS and our ext2fs
are similar enough that the code should just work with the new
sendfile.
Discussed with: glebius
No appreciable change in performance was observed after increasing
the sizes of these tables and then testing with a single client.
However, there was an email that indicated high CPU overheads for
a heavily loaded NFSv4 and it is hoped that increasing the sizes
of the hash tables via these tunables might help.
The tables remain the same size by default.
Differential Revision: https://reviews.freebsd.org/D2596
MFC after: 2 weeks
limits in the code which is deep in the call stack, and owns several
critical system resources, like vnode locks. Attempt to wait while
the per-mount softupdate thread cleans up the backlog may deadlock,
because the thread might need to lock the same vnode which is owned by
the waiting thread.
Instead of synchronously waiting for the worker, perform the worker'
tickle and pause until the backlog is cleaned, at the safe point
during return from kernel to usermode. A new ast request to call
softdep_ast_cleanup() is created, the SU code now only checks the size
of queue and schedules ast.
There is no ast delivery for the kernel threads, so they are exempted
from the mechanism, except NFS daemon threads. NFS server loop
explicitely checks for the request, and informs the schedule_cleanup()
that it is capable of handling the requests by the process P2_AST_SU
flag. This is needed because nfsd may be the sole cause of the SU
workqueue overflow. But, to not cause nsfd to spawn additional
threads just because we slow down existing workers, only tickle su
threads, without waiting for the backlog cleanup.
Reviewed by: jhb, mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
so that it would not return less data than requested.
Since returning less directory data than requested is not a problem
for FreeBSD and even UFS no longer returns directory structures
with d_fileno == 0, this patch stops the client from doing this.
Although entries with d_fileno == 0 should not be a problem,
the man pages no longer document that these entries should be
ignored, so there was a concern that these entries might be an
issue in the future.
Suggested by: trasz
Tested by: trasz
MFC after: 2 weeks
years for head. However, it is continuously misused as the mpsafe argument
for callout_init(9). Deprecate the flag and clean up callout_init() calls
to make them more consistent.
Differential Revision: https://reviews.freebsd.org/D2613
Reviewed by: jhb
MFC after: 2 weeks
tracing. This matches the behavior of ptrace(PT_ATTACH). Also,
the procfs detach request assumes p_oppid is always set.
Reviewed by: kib
MFC after: 2 weeks
that are not an exact multiple of DIRBLKSIZ correctly. Fortunately
readdir(3) always uses an exact multiple of DIRBLKSIZ, so few applications
were affected. This patch fixes this problem by reducing the size
of the directory read to an exact multiple of DIRBLKSIZ.
Tested by: trasz
Reported by: trasz
Reviewed by: trasz
MFC after: 2 weeks
Present implementation of large sync writes is too strict and so can be
quite slow. Instead of doing that, execute large async write in chunks,
syncing each chunk separately.
It would be good to fix large sync writes too, but I leave it to somebody
with more skills in this area.
Reviewed by: rmacklem
MFC after: 1 week
largest size for a buffer in the buffer cache. This patch
defines a new constant MAXBCACHEBUF, which is the largest
size for a buffer in the buffer cache. Having a separate
constant allows MAXBCACHEBUF to be set larger than MAXBSIZE
on a per-architecture basis, so that NFS can do larger read/writes
for these architectures. It modifies sys/param.h so that BKVASIZE
can also be set on a per-architecture basis.
A couple of cases where NFS used MAXBSIZE instead of NFS_MAXBSIZE
is fixed as well.
Differential Revision: https://reviews.freebsd.org/D2330
Reviewed by: mav, kib
MFC after: 2 weeks
This is similar to r281756 so set the ptr NULL after free as a safety belt
against future changes.
Obtained from: HardenedBSD (b2e77ced9ae213d358b44d98f552d9ae4636ecac)
Submitted by: Oliver Pinter
Revewed by: rmacklem
The htree directory index is a highly desirable feature for research
purposes and was meant to improve performance in our ext2/3 driver.
Unfortunately our implementation has two problems:
- It never really delivered any performance improvement.
- It appears to corrupt the filesystem in undetermined circumstances.
Strictly speaking dir_index is not required for read/write support in
ext2/3 and our limited ext4 support still works fine without it.
Regain stability in the ext2 driver by removing it. We may need it back
(fixed) if we want to support encrypted ext4 support but thanks to the
wonders of version control we can always revert this change and bring it
back.
PR: 191895
PR: 198731
PR: 199309
MFC after: 5 days
can perform better when using a 128K read/write data size.
This patch changes NFS_MAXDATA from 64K to 128K so that
clients can use 128K for NFS mounts to allow this.
The patch also renames NFS_MAXDATA to NFS_SRVMAXIO so
that it is clear that it applies to the NFS server side
only. It also avoids a name conflict with the NFS_MAXDATA
defined in rpcsvc/nfs_prot.h, that is used for userland RPC.
Tested by: mav
Reviewed by: mav
MFC after: 2 weeks
use VOP_FSYNC() to perform the NFS server's Commit operation.
This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which
is set by file systems that use the buffer cache. If this flag
is not set, the NFS server always does a VOP_FSYNC().
This should be ok for old file system modules that do not set
MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although
it might not be optimal for file systems that use the buffer cache.
Reviewed by: kib
MFC after: 2 weeks
For VREG vnodes, return the resident page count (multiplied by PAGE_SIZE)
for the tmpfs node's anonymous VM object that stores actual file contents.
For all other vnodes, return the tmpfs_node's tn_size, which should not
be rounded to a page.
This change allows using stat(2) to identify a sparse file on tmpfs.
Reviewed by: kib
MFC after: 1 week
on reads or writes, the time marks are used to display idle time by
w(1) [1]. Instead, use vfs.devfs.dotimes as the selector of default
precision vs. using time_second. The later gives seconds precision,
which is good enough for the purpose.
Note that timestamp updates are unlocked and the updates itself, as
well as the check in devfs_timestamp, are non-atomic.
Noted by: truckman [1]
Reviewed by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The magic number MSDOSFS_ARGSMAGIC, which used to distinguish
"old" vs "new" msdosfs mount arguments, has not been used since
2005; it should just go away now.
Likewise, the local-to-Unicode table that changed at the same
time is unused.
Leave the space reserved in the old style mount arguments, though,
since we still support the old mount call (via the cmount entry
point).
Submitted by: Chris Torek <chris.torek@gmail.com>
MFC after: 2 weeks
Currently we update timestamps unconditionally when doing read or
write operations. This may slow things down on hardware where
reading timestamps is expensive (e.g. HPET, because of the default
vfs.timestamp_precision setting is nanosecond now) with limited
benefit.
A new sysctl variable, vfs.devfs.dotimes is added, which can be
set to non-zero value when the old behavior is desirable.
Differential Revision: https://reviews.freebsd.org/D2104
Reported by: Mike Tancsa <mike sentex net>
Reviewed by: kib
Relnotes: yes
Sponsored by: iXsystems, Inc.
MFC after: 2 weeks
- Allow to call the function with vm object lock held.
- Allow to specify reqpage that doesn't match any page in the region,
meaning freeing all pages.
o Utilize the new function in couple more places in vnode pager.
Reviewed by: alc, kib
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
error cases. Calling brelse() with a NULL pointer is not allowed,
so only call brelse() when the bp is non-NULL.
Reported by: Maxime Villard (reported as uninitialized variable)
Do not ever return doomed vnode from lookup. This could happen, if
not checked, since dvp is relocked in the 'looking up ourselves' case.
In the other case, since dvp is relocked, mount point might go away
while fdesc_allocvp() is called. Prevent the situation by doing
vfs_busy() before unlocking dvp. Reuse the vn_vget_ino_gen() helper.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
prevent errors from yanking devices out from under filesystems. Only
care about special vnodes on devfs, special nodes on other kinds of
filesystems do not have special properties.
Sponsored by: EMC / Isilon Storage Division
Submitted by: Conrad Meyer
MFC after: 1 week
The e2fs_gd struct was not being initialized and garbage was
being used for hinting the ext2 allocator variant.
Use malloc to clear the values and also initialize e2fs_contigdirs
during allocation to keep consistency.
While here clean up small style issues.
Reported by: Clang static analyser
MFC after: 1 week
removed. Postponing it until tmpfs_getattr() is called causes
discordant values reported for file times vs. directory times.
Reported and tested by: madpilot
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
modification and last file status change timestamps of the file".
Currently, tmpfs only modifies ctime when file was extended. Since
r277828 followed tmpfs_write(), mmaped writes also do not modify
ctime.
Fix this, by updating both ctime and mtime for writes to tmpfs files.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
to UFS, perform updates during syncer scans, which in particular means
that tmpfs now performs scan on sync. Also, this means that a mtime
update may be delayed up to 30 seconds after the write.
The vm_object' OBJ_TMPFS_DIRTY flag for tmpfs swap object is similar
to the OBJ_MIGHTBEDIRTY flag for the vnode object, it indicates that
object could have been dirtied. Adapt fast page fault handler and
vm_object_set_writeable_dirty() to handle OBJ_TMPFS_NODE same as
OBJT_VNODE.
Reported by: Ronald Klop <ronald-lists@klop.ws>
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
in r277199. Acquire the neccessary reference in delist_dev_locked()
and inform destroy_devl() about it using CDP_UNREF_DTR flag.
Fix some style nits, add asserts.
Discussed with: hselasky
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
After the ext2 variant of the "orlov allocator" was implemented,
the case for a negative or zero dirsize disappeared.
Drop the dead code and unsign dirsize given that it can't be
negative anyways.
CID: 1008669
MFC after: 1 week
"delist_dev()" function. Make sure the character device structure
doesn't go away until the end of the "destroy_dev()" function due to
concurrently running cleanup code inside "devfs_populate()".
MFC after: 1 week
Reported by: dchagin@
function. Many existing clients don't understand POLLNVAL and instead
relies on an error code from the read(), write() or ioctl() system
call. Also make sure we wakeup any client pollers before the cuse
server is closing, so they don't wait forever for an event.
There are a number of msdosfs improvements in NetBSD that may be worth
bringing over, and this reduces noise in the comparison.
Differential Revision: https://reviews.freebsd.org/D1466
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
in the NFS server; garbage collect now-unused NFSMSIZ() and M_HASCL()
macros. Also garbage collect now-unused versions in headers for the
removed previous NFS client and server.
Reviewed by: rmacklem
Sponsored by: EMC / Isilon Storage Division
use VA_UTIMES_NULL to indicate whether it should
set the time to the current tod on the server.
This had the side effect of making the NFS client
use the client's timestamp for exclusive create,
starting with FreeBSD9.2.
Unfortunately a bug in some Solaris NFS servers
causes these servers to return NFS_OK to the
Setattr RPC done during exclusive create, but not
actually set the file's mode, leaving the file's
mode == 0.
This patch restores the NFS client's behaviour to
use the server's tod for the exclusive open's
Setattr RPC, to avoid the Solaris server bug and
to restore the pre-FreeBSD9.2 NFS behaviour.
Discussed on: freebsd-fs
PR: 186293
MFC after: 3 months
exactly the same code is at the end of the nfscl_checksattr()
function that is called just before it. As such, this code
had already been executed and didn't do anything.
MFC after: 1 week
was reported via email. This was caused by a LOR between the
sleep lock used to serialize the local locking (nfsrv_locklf())
and locking the vnode. I believe this patch fixes the problem
by delaying relocking of the vnode until the sleep lock is
unlocked (nfsrv_unlocklf()). To avoid nfsvno_advlock() having the side
effect of unlocking the vnode, unlocking the vnode was moved to before
the functions that call nfsvno_advlock().
It shouldn't affect the execution of the default case where
vfs.nfsd.enable_locallocks=0.
Reported by: loic.blot@unix-experience.fr
Discussed with: kib
MFC after: 1 week
which means that the NFSCLIENT and NFSSERVER
kernel options will no longer work. This commit
only removes the kernel components. Removal of
unused code in the user utilities will be done
later. This commit does not include an addition
to UPDATING, but that will be committed in a
few minutes.
Discussed on: freebsd-fs
This assertion was added in r246213 as a guard against corrupted mbufs
arriving from drivers, the key distinguishing factor of said mbufs being
that they had a negative length. Given we're in a while loop specifically
designed to skip over zero-length mbufs, panicking on a zero-length mbuf
seems incorrect.
No objection from: kib
into namecache, to avoid cache trashing when doing large operations.
E.g., tar archive extraction is not usually followed by access to many
of the files created.
Right now, each VOP_LOOKUP() implementation explicitely knowns about
this quirk and tests for both MAKEENTRY flag presence and op != CREATE
to make the call to cache_enter(). Centralize the handling of the
quirk into VFS, by deciding to cache only by MAKEENTRY flag in VOP.
VFS now sets NOCACHE flag for CREATE namei() calls.
Note that the change in semantic is backward-compatible and could be
merged to the stable branch, and is compatible with non-changed
third-party filesystems which correctly handle MAKEENTRY.
Suggested by: Chris Torek <torek@pi-coral.com>
Reviewed by: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Overrunning buffer pointed to by (caddr_t)&oip->i_db[0] of 48 bytes by
passing it to a function which accesses it at byte offset 59 using
argument 60UL.
The issue was inherited from an older FFS implementation and
fixed there with by merging UFS2 in r98542. We follow the
FFS fix.
Discussed with: bde
CID: 1007665
MFC after: 3 days
Since VFS does not/cannot stop writes, sync might run indefinitely, or
be a wrong thing to do at all. E. g. NFS ignores VFS_SYNC() for
forced unmounts, since non-responding server does not allow sync to
finish. On the other hand, filesystems can and do stop writes using
fs-specific facilities, and should already fully flush caches in
VFS_UNMOUNT() due to the race.
Adjust msdosfs tp sync in unmount for forced call, to accomodate the
new behaviour. Note that it is still racy, since writes are not
stopped.
Discussed with: avg, bjk, mckusick
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
- Threads lifetime cycle, in particular, counting of the threads in
the process, and interlocking with process mutex and thread lock.
The main reason of this is that turnstile locks are after thread
locks, so you e.g. cannot unlock blockable mutex (think process
mutex) while owning thread lock.
- Virtual and profiling itimers, since the timers activation is done
from the clock interrupt context. Replace the p_slock by p_itimmtx
and PROC_ITIMLOCK().
- Profiling code (profil(2)), for similar reason. Replace the p_slock
by p_profmtx and PROC_PROFLOCK().
- Resource usage accounting. Need for the spinlock there is subtle,
my understanding is that spinlock blocks context switching for the
current thread, which prevents td_runtime and similar fields from
changing (updates are done at the mi_switch()). Replace the p_slock
by p_statmtx and PROC_STATLOCK().
The split is done mostly for code clarity, and should not affect
scalability.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
made getmntinfo() return empty flags for smbfs filesystems when
called with MNT_WAIT. It's not visible with mount(8), since it uses
MNT_NOWAIT, but broke autounmount(8) operation.
PR: 195161
Differential Revision: https://reviews.freebsd.org/D1194
Reviewed by: kib@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
ext2_print_inode is not really used but it was nice to
have for initial development work. #ifdef it under a
new EXT2FS_DEBUG knob so that we don't spend time
compiling it.
MFC after: 3 days
to mount_nfs(8). They are implemented on Linux, OS X, and Solaris,
and thus can be expected to appear in automounter maps.
Reviewed by: rmacklem@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
to a power of 2. For non-power of 2 settings, intermittent
page faults have been reported. Although the bug that causes
these page faults/crashes has not been identified, it does
not appear to occur when rsize, wsize is a power of 2.
Reported by: tcberner@gmail.com
MFC after: 2 weeks
to a power of 2. For non-power of 2 settings, intermittent
page faults have been reported. Although the bug that causes
these page faults/crashes has not been identified, it does
not appear to occur when rsize, wsize is a power of 2.
Reported by: tcberner@gmail.com
MFC after: 2 weeks
- Wrong integer type was specified.
- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.
- Logical OR where binary OR was expected.
- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.
- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.
- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.
- Updated "EXAMPLES" section in SYSCTL manual page.
MFC after: 3 days
Sponsored by: Mellanox Technologies
two.
nullfs and unionfs need to request suspension if underlying filesystem(s)
use it. Utilize mnt_kern_flag for this purpose.
This is a fixup for 273271.
No strong objections from: kib
Pointy hat to: mjg
MFC after: 2 weeks
in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv().
This fixes a namespace collision with libc symbols.
Submitted by: kmacy
Tested by: make universe
user nobody and/or setting group nogroup as owner of a file or directory.
Usually at the client side, if there is an username that is not in the
client's passwd database, some clients will send 'nobody@<your.dns.domain>'
in the wire and the NFSv4 server will treat it as an ERROR.
However, if you have a valid user nobody in your passwd database,
the NFSv4 server will treat it as a NFSERR_BADOWNER as its believes the
client doesn't has the username mapped.
Submitted by: Loic Blot <loic.blot@unix-experience.fr>
Reviewed by: rmacklem
Approved by: rmacklem
MFC after: 2 weeks
read/write/poll/ioctl, call standard vnode filedescriptor fop. This
restores the special handling for terminals by calling the deadfs VOP,
instead of always returning ENXIO for destroyed devices or revoked
terminals.
Since destroyed (and not revoked) device would use devfs_specops VOP
vector, make dead_read/write/poll non-static and fill VOP table with
pointers to the functions, to instead of VOP_PANIC.
Noted and reviewed by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
is interested in i/o state. Return POLLNVAL for invalid bits, similar
to poll_no_poll(). Note that POLLOUT must not be returned, since
POLLHUP is set.
Noted and reviewed by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
have wildcards. This makes it possible for autofs(4) to avoid requesting
automountd(8) action on access to nonexistent nodes - unless wildcards
are actually used.
Note that this change breaks ABI for automountd(8).
Tested by: dhw@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
survives remount in rw, also it is set for vnodes on rootfs before
noatime can be set or clock is adjusted. All conditions result in
wrong atime for accessed vnodes.
Submitted by: bde
MFC after: 1 week
This fix addresses only issues with the pynfs reports, none of these
issues are know to create problems for extant real clients.
Submitted by: Bart Hsiao <bart.hsiao@gmail.com>
Reworked by: myself
Reviewed by: rmacklem
Approved by: rmacklem
Sponsored by: QNAP Systems Inc.
made autofs mix them up: the second one wasn't visible in ls(1) output,
and trying to access it would trigger mount for the first one.
foobar host:/foobar
foo host:/foo
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
struct kinfo_file.
- Move the various fill_*_info() methods out of kern_descrip.c and into the
various file type implementations.
- Rework the support for kinfo_ofile to generate a suitable kinfo_file object
for each file and then convert that to a kinfo_ofile structure rather than
keeping a second, different set of code that directly manipulates
type-specific file information.
- Remove the shm_path() and ksem_info() layering violations.
Differential Revision: https://reviews.freebsd.org/D775
Reviewed by: kib, glebius (earlier version)
by ffs and ext2fs. Remove duplicated call to vm_page_zero_invalid(),
done by VOP and by vm_pager_getpages(). Use vm_pager_free_nonreq().
Reviewed by: alc (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 6 weeks (after r271596)
path through the NFS clients' getpages functions.
Introduce vm_pager_free_nonreq(). This function can be used to eliminate
code that is duplicated in many getpages functions. Also, in contrast to
the code that currently appears in those getpages functions,
vm_pager_free_nonreq() avoids acquiring an exclusive object lock in one
case.
Reviewed by: kib
MFC after: 6 weeks
Sponsored by: EMC / Isilon Storage Division
mbuf-initialisation logic that is best left to centralised mbuf utility
code rather than scattered around the kernel.
MFC after: 3 days
Sponsored by: EMC / Isilon Storage Division
variable, and don't store in autofs_mount. Also rename it from 'sc'
to 'autofs_softc', since it's global and extern.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
UNIX systems, eg. MacOS X and Solaris. It uses Sun-compatible map format,
has proper kernel support, and LDAP integration.
There are still a few outstanding problems; they will be fixed shortly.
Reviewed by: allanjude@, emaste@, kib@, wblock@ (earlier versions)
Phabric: D523
MFC after: 2 weeks
Relnotes: yes
Sponsored by: The FreeBSD Foundation
nullfs vnode shares vnode lock with lower vnode, this allows the
reclamation of nullfs directory vnode in null_lookup(). In this
situation, VOP must return ENOENT.
More, since after the reclamation, the locks of nullfs directory vnode
and lower vnode are no longer shared, the relock of the ldvp does not
restore the correct locking state of dvp, and leaks ldvp lock.
Correct this by unlocking ldvp and locking dvp.
Use cached value of dvp->v_mount.
Reported by: bdrewery
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
In linux EXT4_LINK_MAX is now 64000. We can't really do that
since i_nlink and va_nlink are signed so setting higher values
is likely to cause trouble.
This is a system limitation so set the EXT_LINK_MAX to
what the system can handle.
MFC after: 3 days
writes to files for read-only file systems. Since there are already
checks in nandfs_setattr that return an error, this moves detection of
the error earlier.
It overflows witness.
Shorten the names of some nfs mutexes.
Reported and tested by: pho
No objections from: rmacklem, mav
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
forcing filesystem VOP_LINK() methods to repeat the code. In
tmpfs_link(), remove redundand check for the type of the source,
already done by VFS.
Note that NFS server already performs this check before calling
VOP_LINK().
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
- Suspend filesystem for unmount. This prevents new tmpfs nodes from
instantiating, and also ensures that only unmount thread can destroy
nodes.
- Do not start tmpfs node deletion until all vnodes are reclaimed,
which guarantees that no thread can access tmpfs data. For this,
call vflush() in the loop, until the mnt_nvnodelistsize is non-zero.
Note that after mnt_nvnodelistsize becomes 0, insmntque() blocks
insertion of a vnode germ into the mount list of vnodes.
- Fail node allocation when the filesystem is being unmounted. This
is race-free due to the vflush() call in loop. This is mostly
cosmetic, avoiding some more work which might be done until
suspension in unmount is started.
Note that there is currently no way to prevent new vnode instantiation
from readers during the unmount. Due to this, forced unmount might
live-lock if vflush() loop cannot get to the zero vnode count due to
races with readers. The unmount would proceed after the load is
lifted.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
In particular, vnode must be exclusively locked when the tmpfs vnode
and object are divorced. When the vnode is opened, the object must be
still alive, since only live vnode can be opened, and the tmpfs node
owns a reference on the object.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
a vnode until it is verified that the vnode indeed belongs to tmpfs
mount. Otherwise, it might access random memory, at least in the
debug kernel.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
producer, instead of hard-coding VFS_VGET(). New function, which
takes callback, is called vn_get_ino_gen(), standard callback for
vn_get_ino() is provided.
Convert inline copies of vn_get_ino() in msdosfs and cd9660 into the
uses of vn_get_ino_gen().
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
the reply to ReaddirPlus when the server failed within the loop
that calls VFS_VGET(). This failure is most likely an error
return from VFS_VGET() caused by a bogus d_fileno that was
truncated to 32bits.
This patch fixes the server so that it will return directory postop
attributes for the failure. It does not fix the underlying issue caused
by d_fileno being uint32_t when a file system like ZFS generates
a fileno that is greater than 32bits.
Reported by: jpaetzel
Reviewed by: jpaetzel
MFC after: 1 month
into head. The code is not believed to have any effect
on the semantics of non-NFSv4.1 server behaviour.
It is a rather large merge, but I am hoping that there will
not be any regressions for the NFS server.
MFC after: 1 month
UFS rather than for all but ZFS. This code was assuming that offsets were
monotonically increasing for all file systems except ZFS and that the
cookies from a previous call may have been rewound to a block boundary.
According to mckusick@ only UFS is known to do this, so only requests against
UFS file systems should remove cookies smaller than the given offset. This
fixes serving TMPFS over NFS as it too does not have monotonically increasing
offsets. The comment around the code also indicated it was specific to UFS.
Some of the code using 'not_zfs' is specific to ZFS snapshot handling, so
add a 'is_zfs' variable for those cases.
It's possible that 'is_zfs' check for VFS_VGET() support may not be
specific to ZFS. This needs more research and testing.
After this fix TMPFS and other file systems can be served over NFS.
To test I compared the results of syncing a /usr/src tree into a tmpfs and
serving that over NFS. Before the fix 3589 files were missing on the remote
view. After the fix all files were successfully found.
Reviewed by: rmacklem
Discussed with: mckusick, rmacklem via fs@
Discussed at: http://lists.freebsd.org/pipermail/freebsd-fs/2014-April/019264.html
MFC after: 2 weeks
Sponsored by: EMC / Isilon Storage Division
when a newly created file has another open done on it that
update the open mode. This patch moves the code that updates
the open mode up into the block where the mutex is held to
ensure this cannot happen. No bug caused by this potential
race has been observed, but this fix is a safety belt to ensure
it cannot happen.
MFC after: 2 weeks
permissions test, forgotten in r164033.
Refactor the permission checks for utimes(2) into vnode helper
function vn_utimes_perm(9), and simplify its code comparing with the
UFS origin, by writing the call to VOP_ACCESSX only once. Use the
helper for UFS(5), tmpfs(5), devfs(5) and msdosfs(5).
Reported by: bde
Reviewed by: bde, trasz
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
created to a symlink. This restriction (which was
inherited from OpenBSD) is not required by the NFS RFCs.
Since this is allowed by the old NFS server, it is a
POLA violation to not allow it. This patch modifies the
new NFS server to allow this.
Reported by: jhb
Reviewed by: jhb
MFC after: 3 days
The CUSE library is a wrapper for the devfs kernel functionality which
is exposed through /dev/cuse . In order to function the CUSE kernel
code must either be enabled in the kernel configuration file or loaded
separately as a module. Currently none of the committed items are
connected to the default builds, except for installing the needed
header files. The CUSE code will be connected to the default world and
kernel builds in a follow-up commit.
The CUSE module was written by Hans Petter Selasky, somewhat inspired
by similar functionality found in FUSE. The CUSE library can be used
for many purposes. Currently CUSE is used when running Linux kernel
drivers in user-space, which need to create a character device node to
communicate with its applications. CUSE has full support for almost
all devfs functionality found in the kernel:
- kevents
- read
- write
- ioctl
- poll
- open
- close
- mmap
- private per file handle data
Requested by several people. Also see "multimedia/cuse4bsd-kmod" in
ports.
disk. That has a side effect of corrupting the "." entries names on
rename, since the call to createde() in the msdosfs_rename() sets the
de_Name to the target name. If any change to the directory attributes
is performed, the wrong name is written back to the on-disk direntry
on update.
Overwrite the de_Name for the directories on rename to correct the dot
name.
Submitted by: bde
MFC after: 1 week
should either accept owner and owner_group strings that are just
the digits of the uid/gid or return NFS4ERR_BADOWNER.
This patch adds a sysctl vfs.nfsd.enable_stringtouid, which can
be set to enable the server w.r.t. accepting numeric string. It
also ensures that NFS4ERR_BADOWNER is returned if numeric uid/gid
strings are not enabled. This fixes the server for recent Linux
nfs4 clients that use numeric uid/gid strings by default.
Reported and tested by: craigyk@gmail.com
MFC after: 2 weeks
It can fail if pipe map is exhausted (as a result of too many pipes created),
but it is not fatal and could be provoked by unprivileged users. The only
consequence is worse performance with given pipe.
Reported by: ivoras
Suggested by: kib
MFC after: 1 week
for the VOP_READ() call. This patch fixes both the old and new
server for this case.
PR: 185232
Submitted by: PR had patch for old server
Reviewed by: kib
MFC after: 2 weeks
This simplifies the code and should avoid the clang sparc
port from generating an abort() call.
Requested by: rdivacky
Submitted by: jhb
MFC after: 2 weeks
post-create/mkdir directory attributes. This allows the RPC to
name cache the newly created directory and reduces the lookup RPC
count for applications creating a lot of directories.
MFC after: 2 weeks
post-open/create directory attributes. This allows the RPC to
name cache the newly created file and reduces the lookup RPC
count by about 10% for software builds.
MFC after: 2 weeks
attributes. This allows the client to cache directory names
when they are looked up, reducing the Lookup RPC count by
about 40% for software builds.
MFC after: 2 weeks
worse when filling up a device and then trying to erase files to make
space. Without enough space, you can't do that. Also, ensure that the
metadata writes don't generate ENOSPC. They will be retried later
since the buffers are still dirty...
Submitted by: mjg@
When server doesn't support this request, try to use SMB_INFO_ALLOCATION.
And use SMB_COM_QUERY_INFORMATION_DISK request as fallback.
MFC after: 2 weeks
To reduce the diff struct pcu.cnt field was not renamed, so
PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in
kvm(3) and vmstat(8). The goal was to not affect externally used KPI.
Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the
the global cnt variable.
Exp-run revealed no ports using it directly.
No objection from: arch@
Sponsored by: EMC / Isilon Storage Division
ext2fs: minor update to the dirpref policy.
The change in UFS r254996, reverted the change as the
older code seems to work better. This was not visible
in local testing but we can trust UFS is vastly more
exercised in diferent environments.
Bring in a minor change to the dirpref policy based on r248623.
This is pretty minimal change to keep the implementation in
sync with UFS but other parts from the original change are not
directly applicable so don't expect improvements in fsck times.
MFC after: 2 weeks
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.
MFC after: 3 weeks