largest size for a buffer in the buffer cache. This patch
defines a new constant MAXBCACHEBUF, which is the largest
size for a buffer in the buffer cache. Having a separate
constant allows MAXBCACHEBUF to be set larger than MAXBSIZE
on a per-architecture basis, so that NFS can do larger read/writes
for these architectures. It modifies sys/param.h so that BKVASIZE
can also be set on a per-architecture basis.
A couple of cases where NFS used MAXBSIZE instead of NFS_MAXBSIZE
is fixed as well.
Differential Revision: https://reviews.freebsd.org/D2330
Reviewed by: mav, kib
MFC after: 2 weeks
This is similar to r281756 so set the ptr NULL after free as a safety belt
against future changes.
Obtained from: HardenedBSD (b2e77ced9ae213d358b44d98f552d9ae4636ecac)
Submitted by: Oliver Pinter
Revewed by: rmacklem
use VOP_FSYNC() to perform the NFS server's Commit operation.
This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which
is set by file systems that use the buffer cache. If this flag
is not set, the NFS server always does a VOP_FSYNC().
This should be ok for old file system modules that do not set
MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although
it might not be optimal for file systems that use the buffer cache.
Reviewed by: kib
MFC after: 2 weeks
- Allow to call the function with vm object lock held.
- Allow to specify reqpage that doesn't match any page in the region,
meaning freeing all pages.
o Utilize the new function in couple more places in vnode pager.
Reviewed by: alc, kib
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
use VA_UTIMES_NULL to indicate whether it should
set the time to the current tod on the server.
This had the side effect of making the NFS client
use the client's timestamp for exclusive create,
starting with FreeBSD9.2.
Unfortunately a bug in some Solaris NFS servers
causes these servers to return NFS_OK to the
Setattr RPC done during exclusive create, but not
actually set the file's mode, leaving the file's
mode == 0.
This patch restores the NFS client's behaviour to
use the server's tod for the exclusive open's
Setattr RPC, to avoid the Solaris server bug and
to restore the pre-FreeBSD9.2 NFS behaviour.
Discussed on: freebsd-fs
PR: 186293
MFC after: 3 months
exactly the same code is at the end of the nfscl_checksattr()
function that is called just before it. As such, this code
had already been executed and didn't do anything.
MFC after: 1 week
which means that the NFSCLIENT and NFSSERVER
kernel options will no longer work. This commit
only removes the kernel components. Removal of
unused code in the user utilities will be done
later. This commit does not include an addition
to UPDATING, but that will be committed in a
few minutes.
Discussed on: freebsd-fs
into namecache, to avoid cache trashing when doing large operations.
E.g., tar archive extraction is not usually followed by access to many
of the files created.
Right now, each VOP_LOOKUP() implementation explicitely knowns about
this quirk and tests for both MAKEENTRY flag presence and op != CREATE
to make the call to cache_enter(). Centralize the handling of the
quirk into VFS, by deciding to cache only by MAKEENTRY flag in VOP.
VFS now sets NOCACHE flag for CREATE namei() calls.
Note that the change in semantic is backward-compatible and could be
merged to the stable branch, and is compatible with non-changed
third-party filesystems which correctly handle MAKEENTRY.
Suggested by: Chris Torek <torek@pi-coral.com>
Reviewed by: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
to mount_nfs(8). They are implemented on Linux, OS X, and Solaris,
and thus can be expected to appear in automounter maps.
Reviewed by: rmacklem@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
to a power of 2. For non-power of 2 settings, intermittent
page faults have been reported. Although the bug that causes
these page faults/crashes has not been identified, it does
not appear to occur when rsize, wsize is a power of 2.
Reported by: tcberner@gmail.com
MFC after: 2 weeks
to a power of 2. For non-power of 2 settings, intermittent
page faults have been reported. Although the bug that causes
these page faults/crashes has not been identified, it does
not appear to occur when rsize, wsize is a power of 2.
Reported by: tcberner@gmail.com
MFC after: 2 weeks
in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv().
This fixes a namespace collision with libc symbols.
Submitted by: kmacy
Tested by: make universe
path through the NFS clients' getpages functions.
Introduce vm_pager_free_nonreq(). This function can be used to eliminate
code that is duplicated in many getpages functions. Also, in contrast to
the code that currently appears in those getpages functions,
vm_pager_free_nonreq() avoids acquiring an exclusive object lock in one
case.
Reviewed by: kib
MFC after: 6 weeks
Sponsored by: EMC / Isilon Storage Division
forcing filesystem VOP_LINK() methods to repeat the code. In
tmpfs_link(), remove redundand check for the type of the source,
already done by VFS.
Note that NFS server already performs this check before calling
VOP_LINK().
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
into head. The code is not believed to have any effect
on the semantics of non-NFSv4.1 server behaviour.
It is a rather large merge, but I am hoping that there will
not be any regressions for the NFS server.
MFC after: 1 month
when a newly created file has another open done on it that
update the open mode. This patch moves the code that updates
the open mode up into the block where the mutex is held to
ensure this cannot happen. No bug caused by this potential
race has been observed, but this fix is a safety belt to ensure
it cannot happen.
MFC after: 2 weeks
post-create/mkdir directory attributes. This allows the RPC to
name cache the newly created directory and reduces the lookup RPC
count for applications creating a lot of directories.
MFC after: 2 weeks
post-open/create directory attributes. This allows the RPC to
name cache the newly created file and reduces the lookup RPC
count by about 10% for software builds.
MFC after: 2 weeks
attributes. This allows the client to cache directory names
when they are looked up, reducing the Lookup RPC count by
about 40% for software builds.
MFC after: 2 weeks
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.
MFC after: 3 weeks
nfsv4_fillattr() as NULLs for the Getattr callback. This caused
nfsv4_fillattr() to not fill in the Change attribute for the reply.
I believe this was a violation of the RFC, but had little effect on
server behaviour. This patch passes a non-NULL p argument to fix this.
MFC after: 1 week
for the Getattr and Recall callbacks. This patch fixes it.
Since the NFSv4.1 specific error codes would only happen for
abnormal circumstances, this patch has little effect, in practice.
MFC after: 1 week
synchronous (with FILE_SYNC) writes because non-contiguous
byte ranges in the same buffer cache block are being
written. This patch adds a new mount option "noncontigwr"
which allows the non-contiguous byte ranges to be combined,
with the dirty byte range becoming the superset of the bytes
that are dirty, if the file has not been file locked.
This reduces the number of writes significantly for software
builds. The only case where this change might break existing
applications is where an application is writing
non-overlapping byte ranges within the same buffer cache block
of a file from multiple clients concurrently.
Since such an application would normally do file locking on
the file, avoiding the byte range merge for files that have
been file locked should be sufficient for most (maybe all?) cases.
Submitted by: jhb (earlier version)
Reviewed by: kib
MFC after: 3 weeks
option, unbreak the lock tracing release semantic by embedding
calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
version of the releasing functions for mutex, rwlock and sxlock.
Failing to do so skips the lockstat_probe_func invokation for
unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
kernel compiled without lock debugging options, potentially every
consumer must be compiled including opt_kdtrace.h.
Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
is linked there and it is only used as a compile-time stub [0].
[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested. As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while. Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].
Sponsored by: EMC / Isilon storage division
Discussed with: rstone
[0] Reported by: rstone
[1] Discussed with: philip
The hang occurred in nfsv4_setsequence() when it couldn't find an
available session slot and is fixed by checking for a forced dismount
in progress and just returning for this case.
MFC after: 1 month
in the future in a backward compatible (API and ABI) way.
The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.
The structure definition looks like this:
struct cap_rights {
uint64_t cr_rights[CAP_RIGHTS_VERSION + 2];
};
The initial CAP_RIGHTS_VERSION is 0.
The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.
The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.
To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.
#define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL)
We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:
#define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL)
#define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL)
#define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP)
There is new API to manage the new cap_rights_t structure:
cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
void cap_rights_set(cap_rights_t *rights, ...);
void cap_rights_clear(cap_rights_t *rights, ...);
bool cap_rights_is_set(const cap_rights_t *rights, ...);
bool cap_rights_is_valid(const cap_rights_t *rights);
void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);
Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:
cap_rights_t rights;
cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);
There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:
#define cap_rights_set(rights, ...) \
__cap_rights_set((rights), __VA_ARGS__, 0ULL)
void __cap_rights_set(cap_rights_t *rights, ...);
Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:
cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);
Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.
This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.
Sponsored by: The FreeBSD Foundation
is being shut down which were caused by the nfscbd_pool being
destroyed before the backchannel is disabled. This patch is
believed to fix the problem, by simply avoiding ever destroying
the nfscbd_pool. Since the NFS client module cannot be unloaded,
this should not cause a memory leak.
MFC after: 2 weeks
waiting for an RPC reply from the server while holding the mount
point busy (mnt_lockref incremented). This happens because dounmount()
msleep()s waiting for mnt_lockref to become 0, before calling
VFS_UNMOUNT(). This patch adds a new VFS operation called VFS_PURGE(),
which the NFS client implements as purging RPCs in progress. Making
this call before checking mnt_lockref fixes the problem, by ensuring
that the VOP_xxx() calls will fail and unbusy the mount point.
Reported by: sbruno
Reviewed by: kib
MFC after: 2 weeks
credentials to the kernel rpc. Modify the NFSv4 client to add
support for the gssname and allgssname mount options to use this
capability. Requires the gssd daemon to be running with the "-h" option.
Reviewed by: jhb
would sometimes result in a corrupted file was reported via email.
This problem appears to have been caused by r251719 (reverting
r251719 fixed the problem). Although I have not been able to
reproduce this problem, I suspect it is caused by another thread
increasing np->n_size after the mtx_unlock(&np->n_mtx) but before
the vnode_pager_setsize() call. Since the np->n_mtx mutex serializes
updates to np->n_size, doing the vnode_pager_setsize() with the
mutex locked appears to avoid the problem.
Unfortunately, vnode_pager_setsize() where the new size is smaller,
cannot be called with a mutex held.
This patch returns the semantics to be close to pre-r251719 (actually
pre-r248567, r248581, r248567 for the new client) such that the call to
vnode_pager_setsize() is only delayed until after the mutex is
unlocked when np->n_size is shrinking. Since the file is growing
when being written, I believe this will fix the corruption.
A better solution might be to replace the mutex with a sleep lock,
but that is a non-trivial conversion, so this fix is hoped to be
sufficient in the meantime.
Reported by: David G. Lawrence (dg@dglawrence.com)
Tested by: David G. Lawrence (to be done soon)
Reviewed by: kib
MFC after: 1 week
file's size attribute does not get updated. As such, it is necessary
to invalidate the attribute cache before clearing NMODIFIED for pNFS.
MFC after: 2 weeks
enable use of the (no)resvport mount option for NFSv4. I had thought
that the RFC required that non-reserved port #s be allowed, but I couldn't
find it in the RFC.
MFC after: 2 weeks
- Use a shared bufobj lock in getblk() and inmem().
- Convert softdep's lk to rwlock to match the bufobj lock.
- Move INFREECNT to b_flags and protect it with the buf lock.
- Remove unnecessary locking around bremfree() and BKGRDINPROG.
Sponsored by: EMC / Isilon Storage Division
Discussed with: mckusick, kib, mdf
truncated directory for some NFS servers. This turned out to
be because the size of a directory reported by an NFS server
can be smaller that the ufs-like directory created from the
RPC XDR in the client. This patch fixes the problem by changing
r248567 so that vnode_pager_setsize() is only done for regular files.
Reported and tested by: hartmut.brandt@dlr.de
Reviewed by: kib
MFC after: 1 week
buffer for the last vnode on the mount back to the server, it
returns. At that point, the code continues with the unmount,
including freeing up the nfs specific part of the mount structure.
It is possible that an nfsiod thread will try to check for an
empty I/O queue in the nfs specific part of the mount structure
after it has been free'd by the unmount. This patch avoids this problem by
setting the iodmount entries for the mount back to NULL while holding the
mutex in the unmount and checking the appropriate entry is non-NULL after
acquiring the mutex in the nfsiod thread.
Reported and tested by: pho
Reviewed by: kib
MFC after: 2 weeks