The code in nfscl_doflayoutio() bogusly used FREAD instead of
NFSV4OPEN_ACCESSREAD. Since both happen to be defined as "1", this
worked and the patch doesn't result in a functional change.
Found by inspection during development of Flex File Layout support.
MFC after: 2 weeks
FAT specification requires that for valid FAT, FAT cluster 0 has a
specific value derived from the BPB media descriptor. The lowest
(little-endian) byte must be equal to bpb.bpbMedia, other bits in the
cluster number must be all 1's. Implement the check to reduce the
chance of the randomly corrupted FAT to pass the mount attempt.
Submitted by: Siva Mahadevan <smahadevan@freebsdfoundation.org>
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D12124
Currently several paths in the NFS client upgrade the shared vnode
lock to exclusive, which might cause temporal dropping of the lock.
This action appears to be fatal for nullfs mounts over NFS. If the
operation is performed over nullfs vnode, then bypassed down to NFS
VOP, and the lock is dropped, other thread might reclaim the upper
nullfs vnode. Since on reclaim the nullfs vnode lock and NFS vnode
lock are split, the original lock state of the nullfs vnode is not
restored. As result, VFS operations receive not locked vnode after a
VOP call.
Stop upgrading the vnode lock when we check the consistency or flush
buffers as result of detected inconsistency. Instead, allocate a new
lockmgr lock for each NFS node, which is locked exclusive instead of
the vnode lock upgrade. In other words, the other parallel
modification of the vnode are excluded by either vnode lock conflict
or exclusivity of the new lock when the vnode lock is shared.
Also revert r316529 because now the vnode cannot be reclaimed during
ncl_vinvalbuf().
In collaboration with: pho
Reviewed by: rmacklem
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D12083
The previous limit of 24 was somewhat restrictive, and with this change
ceil(log2(sizeof(struct pfs_node))) is the same as before in both the ILP32
and LP64 models, so the malloc zone used for allocations of struct pfs_node
is the same as before.
Approved by: des
Linux specific things to the native fdescfs file system.
Unlike FreeBSD, the Linux fdescfs is a directory containing a symbolic
links to the actual files, which the process has open.
A readlink(2) call on this file returns a full path in case of regular file
or a string in a special format (type:[inode], anon_inode:<file-type>, etc..).
As well as in a FreeBSD, opening the file in the Linux fdescfs directory is
equivalent to duplicating the corresponding file descriptor.
Here we have mutually exclusive requirements:
- in case of readlink(2) call fdescfs lookup() method should return VLNK
vnode otherwise our kern_readlink() fail with EINVAL error;
- in the other calls fdescfs lookup() method should return non VLNK vnode.
For what new vnode v_flag VV_READLINK was added, which is set if fdescfs has beed
mounted with linrdlnk option an modified kern_readlinkat() to properly handle it.
For now For Linux ABI compatibility mount fdescfs volume with linrdlnk option:
mount -t fdescfs -o linrdlnk null /compat/linux/dev/fd
Reviewed by: kib@
MFC after: 1 week
Relnotes: yes
This change adds two new tunables, allowing to control serialization for
read and write NFS requests separately. It does not change the default
behavior since there are too many factors to consider, but gives additional
space for further experiments and tuning.
The main motivation for this change is very low write speed in case of ZFS
with sync=always or when NFS clients requests sychronous operation, when
every separate request has to be written/flushed to ZIL, and requests are
processed one at a time. Setting vfs.nfsd.fha.write=0 in that case allows
to increase ZIL throughput by several times by coalescing writes and cache
flushes. There is a worry that doing it may increase data fragmentation
on disks, but I suppose it should not happen for pool with SLOG.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
When an NFS mount is hung against an unresponsive NFS server, the "umount -f"
option can be used to dismount the mount. Unfortunately, "umount -f" gets
hung as well if a "umount" without "-f" has already been done. Usually,
this is because of a vnode lock being held by the "umount" for the mounted-on
vnode.
This patch adds kernel code so that a new "-N" option can be added to "umount",
allowing it to avoid getting hung for this case.
It adds two flags. One indicates that a forced dismount is about to happen
and the other is used, along with setting mnt_data == NULL, to handshake
with the nfs_unmount() VFS call.
It includes a slight change to the interface used between the client and
common NFS modules, so I bumped __FreeBSD_version to ensure both modules are
rebuilt.
Tested by: pho
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D11735
If the nfsrpc_createlayoutrpc() call in nfsrpc_getcreatelayout() fails,
the code used nfhpp when it might be set NULL. This patch checks for
the error cases (laystat != 0) and avoids using nfhpp for the failure case.
This would only affect NFSv4.1 mounts with the "pnfs" option.
Found while testing the "umount -N" patch not yet in head.
MFC after: 2 weeks
This patch defines a macro that checks for MNTK_UNMOUNTF and replaces
explicit checks with this macro. It has no effect on semantics, but
prepares the code for a future patch where there will also be a
NFS specific flag for "forced dismount about to occur".
Suggested by: kib
MFC after: 2 weeks
Suppose that a file on NFS has partially filled last page, and this
page is dirty. NFS VOP_PAGEOUT() method only marks the the page clean
up to the block of the last written byte, leaving other blocks dirty.
Also any page which erronously exists in the vnode vm_object past EOF
is also left marked as dirty.
With the introduction of the buf-cache coherent pager, each pass of
syncer over the object with such page results in creation of B_DELWRI
buffer due to VOP_WRITE() call. This buffer is noted on next syncer
pass, which results e.g. a visible manifestation of shutdown never
finishing vnode sync. Note that before buf-cache coherency commit, a
dirty page might left never synced to server if a partial writes
occur.
Fix this by clearing dirty bits after EOF. Only blocks of the partial
page which are completely after EOF are marked clean, to avoid
possible user data loss.
Reported by: mav
Reviewed by: alc, markj
Tested by: mav, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D11697
r320062 used nm_rsize, nm_wsize to set the maximum request/response sizes for
the NFSv4.1 session. If rsize,wsize are not specified as options, the
value of nm_rsize, nm_wsize is 0 at session creation, resulting in
values for request/response that are too small.
This patch fixes the problem. A workaround is to specify rsize=N,wsize=N
mount options explicitly, so they are set before session creation.
This bug only affects NFSv4.1 mounts against some non-FreeBSD servers.
MFC after: 1 week
r320062 used nm_rsize, nm_wsize to set the maximum request/response sizes for
the NFSv4.1 session. If rsize,wsize are not specified as options, the
value of nm_rsize, nm_wsize is 0 at session creation, resulting in
values for request/response that are too small.
This patch fixes the problem. A workaround is to specify rsize=N,wsize=N
mount options explicitly, so they are set before session creation.
This bug only affects NFSv4.1 mounts against some non-FreeBSD servers.
MFC after: 1 week
It applies to both NFS client and NFS server, and is useful for both.
This is different from vfs.nfsd.enable_stringtouid, which is specific
to server side.
Reviewed by: rmacklem@
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
Update filesystems not currently using vop_stdpathconf() in pathconf
VOPs to use vop_stdpathconf() for any configuration variables that do
not have filesystem-specific values. vop_stdpathconf() is used for
variables that have system-wide settings as well as providing default
values for some values based on system limits. Filesystems can still
explicitly override individual settings.
PR: 219851
Reported by: cem
Reviewed by: cem, kib, ngie
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D11541
This patch adds support for AF_LOCAL socket upcalls to an nfsuserd daemon
that supports them. A future patch to the nfsuserd daemon will use AF_LOCAL
sockets to avoid a problem when using upcalls to 127.0.0.1 if jails are
in use.
Suggested by: dfr
PR: 205193
It is useful to know exactly what features may be lacking when trying to
mount ext4 filesystems.
Submitted by: Fedor Uporov
Differential Revision: https://reviews.freebsd.org/D11208
If an NFSv3 server were to reply with weak cache consistency attributes,
but not post operation attributes, the client would use garbage attributes
from memory. This was spotted during work on the code for the NFSv4.1 client.
I have never seen evidence that this happens and it wouldn't make sense
for an NFSv3 server to do this, so this patch is basically "theoretical",
but does fix the problem if a server were to do the above.
PR: 219552
MFC after: 2 weeks
This finishes what r245164 started and makes open(..., O_APPEND) work again
after r299753.
- Pass ioflags, incl. IO_APPEND, down to the direct write backend (r245164
added it to only the bio backend).
- (r299753 changed the WRONLY backend from bio to direct.)
PR: 220185
Reported by: Ben RUBSON <ben.rubson at gmail.com>
Reviewed by: bapt@, rmacklem@
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11348
changed from %d to a long-width type.
Use uintmax_t casting and %ju to futureproof the format string against
potential changes with either the #define or the implementation-specific
definition for offsetof(..).
The fields exist on all versions of the filesystem and using them is a mount
option on linux. For FreeBSD, the corresponding i_uid and i_gid are always
long enough so use them by default.
Reviewed by: Fedor Uporov
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D11354
vfs.nfsd.nfsd_enable_stringtouid, but in reverse - when set to 1,
it forces the NFSv4 server to return numeric UIDs and GIDs instead
of "user@domain" strings. This helps with clients that can't
translate returned identifiers, eg when rerooting.
The same can be achieved by just never running nfsuserd(8),
but the sysctl is useful to toggle the behaviour back and forth
without rebooting.
Reviewed by: rmacklem (earlier version)
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D11326
A NFSv4.1/pNFS server using File Layout can specify that Commit operations
are to be done against the DS instead of MDS. Since no extant pNFS
server did this, the code was untested and "#ifdef notyet".
The FreeBSD pNFS server I am developing does specify that Commits be done
through the DS, so the code has been enabled/tested.
This patch should only affect the case of a pNFS server that specfies
Commits through the DS.
PR: 219551
MFC after: 2 weeks
fixed size for the name buffer PFS_NAMELEN.
As r318736 was commited (ino64 project) the size of the permanent part
of the struct dirent was changed, so calulate PFS_DELEN properly.
When the NFSv4.1 client is doing pNFS, it needs to get an Open and
a Layout for every file it will be doing I/O on. The current code
does two separate RPCs to get these. This patch adds two new compounds
that do the both the Open and LayoutGet in the same RPC, reducing the
RPC count.
It also factors out the code that sets up and parses the LayoutGet operation
into separate functions, so that the code doesn't get duplicated for
these new RPCs.
This patch is fairly large, but should only affect the NFSv4.1 client
when the "pnfs" option is specified.
PR: 219550
MFC after: 2 weeks
ext4 on linux has always supported more than 32000 directories through
the dir_nlink feature, but FreeBSD was unable to catch up on this feature.
As part of the 64 bit inode changes nlink_t has been extended and this
feature is now possible.
Submitted by: Fedor Uporov
Differential Revision: https://reviews.freebsd.org/D11210
initialized.
bdrewery@ has reported panics "newnfs_copycred: negative nfsc_ngroups".
The only way I can see that this occurs is that the credentials field of
the open structure gets used before being filled in.
I am not sure quite how this happens, but for the file create case, the
code is serialized via the vnode lock on the directory. If, somehow, a
link to the same file gets created just after file creation, this might
occur.
This patch ensures that the credentials field is initialized to a reasonable
set of credentials before the structure is linked into any list, so I
this should ensure it is initialized before use.
I am committing the patch now, since bdrewery@ notes that the panics
are intermittent and it may be months before he knows if the patch fixes
his problem.
Reported by: bdrewery
MFC after: 2 weeks
From the linux tune2fs(8) manpage:
"Allow the kernel to initialize bitmaps and inode tables and keep a high
watermark for the unused inodes in a filesystem, to reduce e2fsck(8) time.
This first e2fsck run after enabling this feature will take the full time,
but subsequent e2fsck runs will take only a fraction of the original time,
depending on how full the file system is."
Submitted by: Fedor Uporov
Differential Revision: https://reviews.freebsd.org/D11211
r320070 removed the definition of maxbcachebuf from sys/param.h to
fix the build for arm.
This patch adds the definition of maxbcachebuf to sys/buf.h, which
should be ok, since sys/buf.h is not being included in arm/arm/elf_note.S.
Suggested by: kib
MFC after: 2 weeks
The code still doesn't use d_off. That will come in a future commit.
The code also removes the checks for servers returning a fileno that
doesn't fit in 32bits, since that should work ok now.
Bump __FreeBSD_version since this patch changes the interface between
the NFS kernel modules.
Reviewed by: kib
We can have support for reading ext4 "huge" files but we can't write
(anything) on ext4. and some filesystem. Formally enable the feature so
that we can mount such filesystems.
Submitted by: Fedor Uponov
Differential Revision: https://reviews.freebsd.org/D11209
arm build.
In the arm build, elf_note.S includes sys/param.h and then does an
elf macro called ELFNOTE(). Although the compile error doesn't make
sense to me, I believe it just means that an "extern ..." can't exist
in param.h for this inclusion case.
I suspect adding #if !defined(LOCORE) might fix the build, but this
commit just takes the definition out.
I will ask freebsd-current@ what is the best was to deal with this
and do a subsequent commit after that.
Reported by: melounmichal@gmail.com
By making MAXBCACHEBUF a tunable, it can be increased to allow for
larger read/write data sizes for the NFS client.
The tunable is limited to MAXPHYS, which is currently 128K.
Making MAXPHYS a tunable or increasing its value is being discussed,
since it would be nice to support a read/write data size of 1Mbyte
for the NFS client when mounting the AmazonEFS file service.
Reviewed by: kib
MFC after: 2 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D10991
This definition is a part of the maxiotune2 patch that will be
committed soon. It is being committed separately to ease merging
with the pNFS projects subversion trees.
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D10991
Some people may want to drop UFS-style ACLs for slimmer kernels.
Let's just not assume everyone needs ACLs.
Reported by: bde
Submitted by: Fedor Uporov
Differential Revision: https://reviews.freebsd.org/D11145
Its purpose was to translate the values for msdosfs inode numbers,
which is calculated from the msdosfs structures describing the file,
into the range representable by 32bit ino_t. The translation acted
for filesystems larger than 128Gb, it reserved the range 0xf0000000
(FILENO_FIRST_DYN) to UINT32_MAX and remembered some arbitrary
translation of ino >= FILENO_FIRST_DYN into this range. It consumed
memory that could be only freed by unmount, and the translation was
not stable across remounts.
With ino_t type extended to 64 bit, there is no such issue and values
can be returned without compaction to 32bit. That is, for the native
environments, the translation layer is not necessary and adds
significant undeserved code complexity. For compat ABIs which use
32bit ino_t, the vfs.ino64_trunc_error sysctl provides some measures
to soften the failure mode when inode numbers truncation is not safe.
Discussed with: bde
Sponsored by: The FreeBSD Foundation
Coverity warned that the switch statement fell through. While this was
intentional, the pattern wasn't especially clear. I just changed it to a
conventional if pattern.
Reported by: Coverity
CIDs: 1375851 (false positive), 1375853
Sponsored by: Dell EMC Isilon
This somewhat simplifies use of msdosfs code in userland (for makefs),
reduces diffs with NetBSD and is standard C as of C89.
Reviewed by: imp
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D11014
Add -o [no]verify option to mdconfig (and document in man page.)
Implement GEOM attribute MNT::verified to ask md if the backing vnode is
verified.
Check for MNT::verified in cd9660 mount to flag the mount as MNT_VERIFIED if
the underlying device has been verified.
Reviewed by: rwatson
Approved by: sjg (mentor)
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D2902
Moving the allocation forward, just before it's actually needed, seems
sensible.
Add newline character at the last line while here.
Reported by: pluknet
Differential Revision: https://reviews.freebsd.org/D10974
This is closely tied to the Extended Attribute implementation.
Submitted by: Fedor Uporov
Reviewed by: kevlo, pfg
Differential Revision: https://reviews.freebsd.org/D10807
Since ino64 expanded dev_t to 64bit, make VOP_GETATTR(9) provide all
bits of mnt_stat.f_fsid as va_fsid for vnodes on filesystems which use
f_fsid. In particular, NFSv3 and sometimes NFSv4, and ZFS use this
method or reporting st_dev by stat(2).
Provide a new helper vn_fsid() to avoid duplicating code to copy
f_fsid to va_fsid.
Note that the change is mostly cosmetic. Its motivation is to avoid
sign-extension of f_fsid[0] into 64bit dev_t value which happens after
dev_t becomes 64bit..
Reviewed by: avg(zfs), rmacklem (nfs) (both for previous version)
Sponsored by: The FreeBSD Foundation
Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment. Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.
ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks. Unfortunately, not everything can be
fixed, especially outside the base system. For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.
Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.
Struct xvnode changed layout, no compat shims are provided.
For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.
Update note: strictly follow the instructions in UPDATING. Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.
Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver. Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).
Sponsored by: The FreeBSD Foundation (emaste, kib)
Differential revision: https://reviews.freebsd.org/D10439
General cleanup, for diff reduction with NetBSD and future use by FAT
support in makefs.
Submitted by: Siva Mahadevan <smahadevan@freebsdfoundation.org>
Obtained from: NetBSD
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D10821
A long long time ago the register keyword told the compiler to store
the corresponding variable in a CPU register, but it is not relevant
for any compiler used in the FreeBSD world today.
ANSIfy related prototypes while here.
Reviewed by: cem, jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D10193
Extended attributes and their particular implementation in linux are
different from FreeBSD so in this case we have started diverging from
the UFS EA implementation, which would be the natural reference.
Depending on future progress implementing ACLs this approach may change
but for now bring to the tree an implementation that is consistent and
can be tested.
Submitted by: Fedor Uporov
Differential Revision: https://reviews.freebsd.org/D10460
The code specified the length of a layout as INT64_MAX instead of
UINT64_MAX. This could result in getting a layout for less than the
full file for extremely large files. Although having little practical
effect, this patch corrects this in the code.
Detected during recent testing of the pNFS server.
MFC after: 2 weeks
The nfsv4_seqsession() call returns NFSERR_REPLYFROMCACHE when it has a
reply in the session, due to a requestor retry. The code erroneously
assumed a return of 0 for this case. This patch fixes this and adds
a KASSERT(). This would be an extremely rare occurrence. It was found
during code inspection during the pNFS server development.
MFC after: 2 weeks
The NFSv4 RFCs give a server the option of allowing the use of an open
stateid for write access to be used for a Read operation.
This patch enables this by default and adds a sysctl to disable it,
for anyone who does not want this capability.
Allowing this is particularily useful for a pNFS Data Server (DS), since
they are not permitted to allow the use of special stateids.
Discovered during recent testing of the pNFS server under development.
MFC after: 2 weeks
An NFSv4 server has the option of allowing a Read to be done using a Write
Open. If this is not allowed, the server will return NFSERR_OPENMODE.
This patch attempts the read with a write open and then disables this
if the server replies NFSERR_OPENMODE.
This change will avoid some uses of the special stateids. This will be
useful for pNFS/DS Reads, since they cannot use special stateids.
It will also be useful for any NFSv4 server that does not support reading
via the special stateids. It has been tested against both types of NFSv4 server.
MFC after: 2 weeks
The NFSv4.1/pNFS client does not use/need a backchannel for the Data Server (DS)
sessions, so the flag should only be set for MetaData Server (MDS) sessions.
This patch should have been a part of r317275.
MFC after: 2 weeks
The "return layout on close" case in the pNFS client was badly broken.
Fortunately, extant pNFS servers that I have tested against do not
do this. This patch fixes it. It also changes the way the layout stateid.seqid
is set for LayoutReturn. I think this change is correct w.r.t. the RFC,
but I am not 100% sure.
This was found during recent testing of the pNFS server under development.
MFC after: 2 weeks
The NFSv4.1/pNFS client wasn't doing a newnfs_disconnect() call for the
connection to the Data Server (DS) under some circumstances. The main
effect of this was a leak of malloc'd structures in the krpc. This patch
adds the newnfs_disconnect() calls to fix this.
Detected during recent testing against the pNFS server under development.
MFC after: 2 weeks
The NFSv4 Setattr operation always has reply data even when it fails,
so don't set the ND_NOMOREDATA for it. This would only affect unusual
cases where Setattr fails and the RPC code wants to parse the rest of
the compound. Detected during recent development related to the pNFS server.
MFC after: 2 weeks
An NFSv4.1 client connection to a Data Server (DS) should not have a
backchannel. This patch fixes the NFSv4.1/pNFS client to not do a backchannel
for this case.
Found during recent testing with the pNFS server under development.
MFC after: 2 weeks
Implement FUSE open flag FOPEN_KEEP_CACHE. Without this flag, cached file
contents should be invalidated on open. Apparently, fusefs-encfs relies
upon this behavior.
PR: 218636
Submitted by: Ben RUBSON <ben.rubson at gmail.com>
The nfscl_mtofh() function didn't check for failed operations and, as such,
would have returned EBADRPC for these cases, due to parsing failure.
This patch adds checks, so that it returns with ND_NOMOREDATA set.
This is needed for future use in the pNFS server and acts as a safety
belt in the meantime.
MFC after: 2 weeks
The default uid/gid for NFSv4 are set by the nfsuserd(8) daemon.
However, they were 0 until the nfsuserd(8) was run. Since it is
possible to use NFSv4 without running the nfsuserd(8) daemon, set them
to nobody/nogroup initially.
Without this patch, the values would be set by the nfsuserd(8) daemon
and left changed even if the nfsuserd(8) daemon was killed. The default
values of 0 meant that setting a group to "wheel" would fail even when
done by root.
It also adds a definition of GID_NOGROUP to sys/conf.h.
Discussed on: freebsd-current@
MFC after: 2 weeks
The default uid/gid for NFSv4 are set by the nfsuserd(8) daemon.
However, they were 0 until the nfsuserd(8) was run. Since it is
possible to use NFSv4 without running the nfsuserd(8) daemon, set them
to nobody/nogroup initially.
Without this patch, the values would be set by the nfsuserd(8) daemon
and left changed even if the nfsuserd(8) daemon was killed. Also, the default
values of 0 meant that setting a group to "wheel" would fail even when
done by root and this patch fixes this issue.
MFC after: 2 weeks
The FreeBSD NFSv4 server did not set the attribute bit for TimeAccess in
the reply to an Open with exclusive_create, as required by the RFCs.
(This is required since the FreeBSD NFS server stores the create_verifier
in the va_atime attribute.)
As such, the Linux NFSv4 client did not set the TimeAccess (atime) in
the Setattr done in an RPC after the one with the Open/exclusive_create.
This patch fixes the server to set the TimeAccess bit in the reply.
I believe that storing the create_verifier in an extended attribute for
file systems that support extended attributes might be a good idea,
but I will wait for a discussion of this on the freebsd-fs@ email list
before considering committing a patch to do this.
Reported by: jim@ks.uiuc.edu
Suggested by: dfr
MFC after: 2 weeks
in place. To do per-cpu stats, convert all fields that previously were
maintained in the vmmeters that sit in pcpus to counter(9).
- Since some vmmeter stats may be touched at very early stages of boot,
before we have set up UMA and we can do counter_u64_alloc(), provide an
early counter mechanism:
o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter.
o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter,
so that at early stages of boot, before counters are allocated we already
point to a counter that can be safely written to.
o For sparc64 that required a whole dummy pcpu[MAXCPU] array.
Further related changes:
- Don't include vmmeter.h into pcpu.h.
- vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit,
to match kernel representation.
- struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion.
This is based on benno@'s 4-year old patch:
https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html
Reviewed by: kib, gallatin, marius, lidl
Differential Revision: https://reviews.freebsd.org/D10156
The "cred" argument of ncl_flush() is unused and it was confusing to have
the code passing in NULL for this argument in some cases. This patch deletes
this argument.
There is no semantic change because of this patch.
MFC after: 2 weeks
Some NFSv4.1 servers such as AmazonEFS can only support a small fixed number
of open_owner4s. This patch adds a mount option called "oneopenown" that
can be used for NFSv4.1 mounts to make the client do all Opens with the
same open_owner4 string. This option can only be used with NFSv4.1 and
may not work correctly when Delegations are is use.
Reported by: cperciva
Tested by: cperciva
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D8988
A function called svcpool_close() was added to the server side krpc by
r313735, so that a pool could be closed without destroying the data structures.
This little patch adds a call to it for the callback pool (svcpool_nfscbd),
so that the nfscbd daemon can be killed/restarted and continue to work
correctly.
MFC after: 2 weeks
When an mmap'd text file is written and then executed immediately
afterwards, it was possible that the modify time would change after the
text file was executing, resulting in the process executing the file
being killed. This was usually only observed when the file system's
times were set to higher resolution, but could have occurred for any
time resolution.
This was reported on a recent email list discussion.
This patch adds a VOP_SET_TEXT() to the NFS client which flushed all
dirty pages to the NFS server and then makes sure that n_mtime is up
to date to avoid this from occurring.
Thanks go to kib@ and pho@ for their help with developing this patch.
Tested by: pho
Reviewed by: kib
MFC after: 2 weeks
If the ExchangeID/CreateSession operations done by an NFSv4.1 client
after the server crashes/reboots fails, it is possible that some process/thread
is waiting for an open_owner lock. If the client state is free'd, this
can cause a crash.
This would not normally happen, but has been observed on a mount of the
AmazonEFS service.
Reported by: cperciva
Tested by: cperciva
PR: 216086
MFC after: 2 weeks
during recovery.
If the NFSv4.1 client gets a NFSv4.1 NFSERR_BADSESSION reply to an Open/Lock
operation while recovering from the server crash/reboot, allow the opens
to be retained for a subsequent recovery attempt. Since NFSv4.1 servers
should only reply NFSERR_BADSESSION after a crash/reboot that has lost
state, this case should almost never happen.
However, for the AmazonEFS file service, this has been observed when
the client does a fresh TCP connection for RPCs.
Reported by: cperciva
Tested by: cperciva
PR: 216088
MFC after: 2 weeks
Instead, issue a diagnostic and return appropriate error if
ncl_flush() was unable to clean buffer queue after the specified
number or retries.
Reviewed by: rmacklem
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
The AmazonEFS NFSv4.1 server does not support the FILES_FREE and FILES_TOTAL
attributes. As such, an NFSv4.1 mount to the server would return garbage
for these values. This patch initializes the fields of the nfsstatfs structure,
so that "df" and friends will at least return consistent bogus values.
This patch should have effect when mounting other NFSv4.1 servers.
Reported by: cperciva
MFC after: 2 weeks
This patch gives a requestor of the exclusive lock on the client state
in the NFSv4 client priority over shared lock requestors. This avoids
the server crash recovery thread being starved out by other threads doing
RPCs.
Tested by: cperciva
PR: 216087
MFC after: 2 weeks
When the NFSv4 client Commit operation encountered a stale write verifier,
it erroneously mapped that to EIO. This could have caused recently written
data to be lost when a server crashes/reboots between an UNSTABLE write
and the subsequent commit. This patch fixes this.
The bug was only for the NFSv4 client and did not affect NFSv3.
Tested by: cperciva
PR: 215887
MFC after: 2 weeks
For the ReclaimComplete operation, the RPC layer should not loop on
NFSERR_BADSESSION. If it does, the recovery thread (nfscl) can get stuck
looping and will not do a recovery.
This patch fixes it so it does not loop. This bug only affects NFSv4.1 and
only when a server reboots.
Tested by: cperciva
PR: 215886
MFC after: 2 weeks
If an operation that preceeds a Setattr in an NFSv4 compound fails,
there is no bitmap of attributes to parse. Without this patch, the
parsing would fail and return EBADRPC instead of the correct failure
error. This could break recovery from a server crash/reboot.
Tested by: cperciva
PR: 215883
MFC after: 2 weeks
Based on the change in r242386, it seems clear that scred was intended to
be released in all paths at exit.
No functional change. This line's indent was just the result of a bad copy
paste from the previous free() in an early exit path.
Reported by: PVS-Studio
Sponsored by: Dell EMC Isilon
Write out the dirty pages using VOP_WRITE() instead of directly
calling ncl_writerpc(). The state of the buffers now reflects the
write, fixing some hard to diagnose consistency and write order
issues. The change also allowed to remove remapping of paged out
pages into kernel space and related allocation of the phys buffer.
Reviewed by: markj, rmacklem
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D10241
ncl_vinvalbuf() might need to upgrade vnode lock, allowing the vnode
to be reclaimed by other thread. Handle the situation, indicated by
the returned error zero and VI_DOOMED iflag set, converting it into
EBADF. Handle all calls, even where the vnode is exclusively locked
right now.
Reviewed by: markj, rmacklem
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
X-Differential revision: https://reviews.freebsd.org/D10241
This interface has no in-tree consumers and has been more or less
non-functional for several releases.
Remove manpage note that the procfs special file 'mem' is grouped to
kmem. This hasn't been true since r81107.
Remove procfs' README file. It is an out of date duplication of the manpage
(quoth the README: "since the bsd kernel is single-processor...").
Reviewed by: vangyzen, bcr (manpage)
Approved by: des (procfs maintainer), vangyzen (mentor)
Differential Revision: https://reviews.freebsd.org/D9802
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.
Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96
Thread might create a condition for delayed SU cleanup, which creates
a reference to the mount point in td_su, but exit without returning
through userret(), e.g. when terminating due to single-threading or
process exit. In this case, td_su reference is not dropped and mount
point cannot be freed.
Handle the situation by clearing td_su also in the thread destructor
and in exit1(). softdep_ast_cleanup() has to receive the thread as
argument, since e.g. thread destructor is executed in different
context.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Right now the noexec mount option disallows image activators to try
execve the files on the mount point. Also, after r127187, noexec
also limits max_prot map entries permissions for mappings of files
from such mounts, but not the actual mapping permissions.
As result, the API behaviour is inconsistent. The files from noexec
mount can be mapped with PROT_EXEC, but if mprotect(2) drops execution
permission, it cannot be re-enabled later. Make this consistent
logically and aligned with behaviour of other systems, by disallowing
PROT_EXEC for mmap(2).
Note that this change only ensures aligned results from mmap(2) and
mprotect(2), it does not prevent actual code execution from files
coming from noexec mount. Such files can always be read into
anonymous executable memory and executed from there.
Reported by: shamaz.mazum@gmail.com
PR: 217062
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
inet_ntoa() cannot be used safely in a multithreaded environment
because it uses a static local buffer. Instead, use inet_ntoa_r()
with a buffer on the caller's stack.
Suggested by: glebius, emaste
Reviewed by: gnn
MFC after: 2 weeks
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D9625
Right now this is not critical, but will be after planned increase of
MNAMELEN from 88 to 1k.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
It's not a proper fix, but should be better than what we have now.
Since it got broken some six months ago it results in an incredibly
annoying and trivially reproducible panic every time eg an USB disk
gets disconnected.
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
This patch adds a new function to the server krpc called
svcpool_close(). It is similar to svcpool_destroy(), but does not free
the data structures, so that the pool can be used again.
This function is then used instead of svcpool_destroy(),
svcpool_create() when the nfsd threads are killed.
PR: 204340
Reported by: Panzura
Approved by: rmacklem
Obtained from: rmacklem
MFC after: 1 week
The option "nonc" disables using of namecache for the created mount,
by default namecache is used. The rationale for the option is that
namecache duplicates the information which is already kept in memory
by tmpfs. Since it believed that namecache scales better than tmpfs,
or will scale better, do not enable the option by default. On the
other hand, smaller machines may benefit from lesser namecache
pressure.
Discussed with: mjg
Tested by: pho (as part of larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
For directories, node->tn_spec.tn_dir.tn_parent pointer to the parent
is used. For non-directories, the implementation is naive, all
directory nodes are scanned to find a dirent linking the specified
node. This can be significantly improved by maintaining tn_parent for
all nodes, later.
Tested by: pho (as part of larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
On dotdot lookup and fhtovp operations, it is possible for the file
represented by tmpfs node to be removed after the thread calculated
the pointer. In this case, tmpfs_alloc_vp() accesses freed memory.
Introduce the reference count on the nodes. The allnodes list from
tmpfs mount owns 1 reference, and threads performing unlocked
operations on the node, add one transient reference. Similarly, since
struct tmpfs_mount maintains the list where nodes are enlisted,
refcount it by one reference from struct mount and one reference from
each node on the list. Both nodes and tmpfs_mounts are removed when
refcount goes to zero.
Note that this means that nodes and tmpfs_mounts might survive some
time after the node is deleted or tmpfs_unmount() finished. The
tmpfs_alloc_vp() in these cases returns error either due to node
removal (tn_nlinks == 0) or because of insmntque1(9) error.
Tested by: pho (as part of larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Remove TMPFS_ASSERT_ELOCKED(). Its claims are already stated by other
asserts nearby and by VFS guarantees.
Change TMPFS_ASSERT_LOCKED() and one inlined place to use
ASSERT_VOP_(E)LOCKED() instead of hand-rolled imprecise asserts.
Tested by: pho (as part of the larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Edit comments which explain no longer relevant details, and add
locking annotations to the struct tmpfs_node members.
Tested by: pho (as part of the larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
If some process' nodes were accessed using procfs and the process
cannot exit properly at the time modunload event is reported to the
pseudofs-backed filesystem, the assertion in pfs_vncache_unload() is
triggered. Assertion is correct, the cache should be cleaned.
Approved by: des (pseudofs maintainer)
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Inums in cd9660 refer to byte offsets on the media. DVD and BD media
can have entries above 4GB, especially with multi-session images.
PR: 190655
Reported by: Thomas Schmitt <scdbackup at gmx.net>
If tmpfs vnode is only shared locked, tn_status field still needs
updates to note the access time modification. Use the same locking
scheme as for UFS, protect tn_status with the node interlock + shared
vnode lock.
Fix nearby style.
Noted and reviewed by: mjg
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Right now size of the structure is 472 bytes on amd64, which is
already large and stack allocations are indesirable. With the ino64
work, MNAMELEN is increased to 1024, which will make it impossible to have
struct statfs on the stack.
Extracted from: ino64 work by gleb
Discussed with: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
For most NFSv4.1 servers, a NFS4ERR_BAD_SESSION error is a rare failure
that indicates that the server has lost session/open/lock state.
However, recent testing by cperciva@ against the AmazonEFS server found
several problems with client recovery from this due to it generating this
failure frequently.
Briefly, the problems fixed are:
- If all session slots were in use at the time of the failure, some processes
would continue to loop waiting for a slot on the old session forever.
- If an RPC that doesn't use open/lock state failed with NFS4ERR_BAD_SESSION,
it would fail the RPC/syscall instead of initiating recovery and then
looping to retry the RPC.
- If a successful reply to an RPC for an old session wasn't processed
until after a new session was created for a NFS4ERR_BAD_SESSION error,
it would erroneously update the new session and corrupt it.
- The use of the first element of the session list in the nfs mount
structure (which is always the current metadata session) was slightly
racey. With changes for the above problems it became more racey, so all
uses of this head pointer was wrapped with a NFSLOCKMNT()/NFSUNLOCKMNT().
- Although the kernel malloc() usually allocates more bytes than requested
and, as such, this wouldn't have caused problems, the allocation of a
session structure was 1 byte smaller than it should have been.
(Null termination byte for the string not included in byte count.)
There are probably still problems with a pNFS data server that fails
with NFS4ERR_BAD_SESSION, but I have no server that does this to test
against (the AmazonEFS server doesn't do pNFS), so I can't fix these yet.
Although this patch is fairly large, it should only affect the handling
of NFS4ERR_BAD_SESSION error replies from an NFSv4.1 server.
Thanks go to cperciva@ for the extension testing he did to help isolate/fix
these problems.
Reported by: cperciva
Tested by: cperciva
MFC after: 3 months
Differential Revision: https://reviews.freebsd.org/D8745
truncation, immediately queue the page for asynchronous laundering rather
than making the page pass through inactive queue first.
Reviewed by: kib, markj
The NFSv4.1 server failed to update the nfs-stablerestart file for
a client when the client was issued its first Open. As such, recovery
of Opens after a server reboot failed with NFSERR_NOGRACE.
This patch fixes this.
It also changes the code so that it malloc()'s the 1024 byte array
instead of allocating it on the kernel stack for both NFSv4.0 and NFSv4.1.
Note that this bug only affected NFSv4.1 and only when clients attempted
to reclaim Opens after a server reboot.
MFC after: 2 weeks
the vnode is inactivated. This contradicts with the nullfs caching
which keeps upper vnode around, as consequence keeping the use
reference to lower vnode.
Add a filesystem flag to request nullfs to not cache when mounted over
that filesystem, and set the flag for nfs v4 mounts.
Reported by: asomers
Reviewed by: rmacklem
Tested by: asomers, rmacklem
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The "-z" option on nfsstats was erroneously zeroing out the counts
of NFSv4 state structures. These counts will normally go back down
to zero as state is released. When zeroed out by "-z", these counts
can go negative. This patch fixes this problem.
MFC after: 2 weeks
The swap pager enqueues laundered pages near the head of the inactive queue
to avoid another trip through LRU before reclamation. This change adds
support for this behaviour to the vnode pager and makes use of it in UFS and
ext2fs. Some ioflag handling is consolidated into a common subroutine so
that this support can be easily extended to other filesystems which make use
of the buffer cache. No changes are needed for ZFS since its putpages
routine always undirties the pages before returning, and the laundry
thread requeues the pages appropriately in this case.
Reviewed by: alc, kib
Differential Revision: https://reviews.freebsd.org/D8589
longer used. More precisely, they are always zero because the code that
decremented and incremented them no longer exists.
Bump __FreeBSD_version to mark this change.
Reviewed by: kib, markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8583
See r294954 for the bread(9) change and r297401 for similar cd9660 fix.
Reported and tested by: Joshua Kinard <kumba@gentoo.org>
PR: 214705
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The pager, due to its construction, implements clustering for the
page-ins. In particular, buildworld load demonstrates reduction of
the READ RPCs from 39k down to 24k. No change in real or CPU time was
observed.
Discussed with, and measured by: bde
No objections from: rmacklem
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Rather than printing a warning for every time we receive a fileid > 2^32
from the NFS server, count warnings and print at most one of each warning
type per minute, e.g.,
Nov 15 05:17:34 ip-172-30-1-221 kernel: NFSv4 fileid > 32bits (24730 occurrences)
Nov 15 05:17:56 ip-172-30-1-221 kernel: NFSv4 mounted on fileid > 32bits (178 occurrences)
Nov 15 05:18:53 ip-172-30-1-221 kernel: NFSv4 fileid > 32bits (7582 occurrences)
Nov 15 05:18:58 ip-172-30-1-221 kernel: NFSv4 mounted on fileid > 32bits (23 occurrences)
A buildworld with an NFS mounted /usr/obj can otherwise result in
hundreds of thousands of lines being printed, which seems unnecessarily
verbose.
When ino_t becomes a 64-bit type, these printfs will no longer be needed
(and the problems associated with truncating 64-bit fileids to generate
32-bit inode numbers will also go away).
Reviewed by: rmacklem
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D8523
not remove user-space visible fields from vm_cnt or all of the references to
cached pages from comments. Those changes will come later.)
Reviewed by: kib, markj
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8497
If dotdot lookup does not escape from the file descriptor passed as
the lookup root, we can allow the component traversal. Track the
directories traversed, and check the result of dotdot lookup against
the recorded list of the directory vnodes.
Dotdot lookups are enabled by sysctl vfs.lookup_cap_dotdot, currently
disabled by default until more verification of the approach is done.
Disallow non-local filesystems for dotdot, since remote server might
conspire with the local process to allow it to escape the namespace.
This might be too cautious, provide the knob
vfs.lookup_cap_dotdot_nonlocal to override as well.
Idea by: rwatson
Discussed with: emaste, jonathan, rwatson
Reviewed by: mjg (previous version)
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 week
Differential revision: https://reviews.freebsd.org/D8110
volume limits. In particular:
- Assert that usemap_alloc() and usemap_free() cluster number argument
is valid.
- In chainlength(), return 0 if cluster start is after the max cluster.
- In chainlength(), cut the calculated cluster chain length at the max
cluster.
- For true paranoia, after the pm_inusemap is calculated in
fillinusemap(), reset all bits in the array for clusters after the
max cluster, as in-use.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
delegations enabled and the Linux NFSv4.1 client was reported in
reviews.freebsd.org/D7891.
I believe that the FreeBSD server behaviour conforms to the RFC and that
the Linux client has a bug. Therefore, I do not think the proposed patch
is appropriate. When nfsrv_writedelegifpos is non-zero, the FreeBSD
server will issue a write delegation for a read open if possible.
The Linux client then erroneously assumes that the credentials used for
the read open can write the file.
This patch reverses the default value for nfsrv_writedelegifpos to 0 so
that the default behaviour is Linux compatible and adds a sysctl that can
be used to set nfsrv_writedelegifpos.
This change should only affect users that are mounting a FreeBSD server
with delegations enabled (they are not enabled by default) with a Linux
NFSv4.1 client mount.
Reported by: fatih.acar@gandi.net
Tested by: fatih.acar@gandi.net
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D7891
The old behavior depended on the FAT version and on what files were in the
root directory. "mount_msdosfs -o shortnames" is still supported.
Reviewed by: wblock, cem
Discussed with: trasz, adrian, imp
MFC after: 4 weeks
X-MFC-Notes: Don't MFC the removal of findwin95
Differential Revision: https://reviews.freebsd.org/D8018
The lower vnode is already referenced and nodeget is supposed to consume
the reference. Thus the extra vref call was causing a leak.
Reported by: pho
Reviewed by: kib
MFC after: 1 week
The previous code was forcing an expensive walk in vop_stdvptocnp,
which was causing performance issues on highly contended zfs.
No objections: kib
MFC after: 2 weeks
Standard VOP_FSYNC() implementation just syncs data buffers, and due
to this, is the correct and efficient implementation for msdosfs or
any other filesystem which uses bufer cache trivially. Provide
globally visible wrapper vop_stdfdatasync_buf() for future consumption
by other filesystems.
Reviewed by: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D7471
the patch in D1626 plus changes so that it includes counts for
NFSv4.1 (and the draft of NFSv4.2).
Also, make all the counts uint64_t and add a vers field at the
beginning, so that future revisions can easily be implemented.
There is code in place to handle the old vesion of the nfsstats
structure for backwards binary compatibility.
Subsequent commits will update nfsstat(8) to use the new fields.
Submitted by: will (earlier version)
Reviewed by: ken
MFC after: 1 month
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D1626
The offset of the directory file, passed to getdirentries(2) syscall,
is user-controllable. The value of the offset must not be asserted,
instead the invalid value should be checked and rejected if invalid.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
These are currently unused in our implementation and some even appear to
have not been implemented yet on linux but it is good to keep them for
reference.
Obtained from: NetBSD (CVS Rev. 1.41)
MFC after: 1 month
Owning Giant in the init/uninit is accidental due to the moment where
VFS modules initialization is performed, and is not enforced by the
VFS interface. The Giant lock does not prevent a parallel execution
of the code, it is VFS which implements the proper protocol.
Approved by: des (pseudofs maintainer)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
and getboottimebin(9) KPI. Change consumers of boottime to use the
KPI. The variables were renamed to avoid shadowing issues with local
variables of the same name.
Issue is that boottime* should be adjusted from tc_windup(), which
requires them to be members of the timehands structure. As a
preparation, this commit only introduces the interface.
Some uses of boottime were found doubtful, e.g. NLM uses boottime to
identify the system boot instance. Arguably the identity should not
change on the leap second adjustment, but the commit is about the
timekeeping code and the consumers were kept bug-to-bug compatible.
Tested by: pho (as part of the bigger patch)
Reviewed by: jhb (same)
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
X-Differential revision: https://reviews.freebsd.org/D7302
Devfs' file layer ioctl is now just a thin shim around the vnode layer.
Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D7286
framework allowing to set the suspension policy for the dynamic block.
Extend the currently possible policies of stopping on interruptible
sleeps and ignoring such sleeps by two more: do not suspend at
interruptible sleeps, but interrupt them with either EINTR or ERESTART.
Reviewed by: jilles
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
executed by inactive methods, must be repeated on reclaim. In
particular, unlink and free sillyrenamed vnode both on inactivation
and reclaim.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
If strlen(hostp) was zero, the stack array 'nam' would never be initialized
before being strdup()ed. Fix this by initializing it to the empty string.
It's possible some external condition makes this case impossible, in which
case, an assertion instead of this workaround is appropriate.
Introduced in r299848.
Reported by: Coverity
CID: 1355336
Sponsored by: EMC / Isilon Storage Division
Ext2/3/4 manages generation numbers differently than UFS so adopt
some rules that should work well. When allocating a new inode,
make sure we generate a "good" random value specifically avoiding
zero.
Don't interfere with the numbers that are already generated in
the filesystem: ext2fs doesn't have the backwards compatibility
issues where there were no generation numbers.
Reviewed by: kevlo
MFC after: 1 week
on a fuse mounted file system, it will crash. Although it may be
possible to make this work correctly, this patch avoids the crash
in the meantime.
I removed the MPASS(), since panicing for the FIFO case didn't make
a lot of sense when it returns an error for the others.
PR: 195000
Submitted by: henry.hu.sh@gmail.com (earlier version)
MFC after: 2 weeks
When "cp" of a file with read-only (mode 0444) to a fuse mounted
file system was attempted it would fail with EACCES. This was because
fuse would attempt to open the file WRONLY and the open would fail.
This patch changes the fuse_vnop_open() to test for an extant read-write
open and use that, if it is available.
This makes the "cp" of a read-only file to the fuse mounted file system
work ok.
There are simpler ways to fix this than adding the fuse_filehandle_validrw()
function, but this function is useful for future patches related to
exporting a fuse filesystem via NFS.
MFC after: 2 weeks
eg an NFSv4 root over WiFi: boot from md_root (small rootfs image
preloaded by loader(8)), setup WiFi, and then reroot into the actual
root, over NFS.
Note that it's currently limited to NFSv4, and due to problems with
nfsuserd(8) it requres a workaround on the server side: one needs
to set the vfs.nfsd.enable_stringtouid=1 sysctl and not run nfsuserd(8)
on either the server or the client side.
Reviewed by: rmacklem@
MFC after: 1 month
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6347
When I/O on a file under fuse is switched from buffered to DIRECT_IO,
it was possible to read stale (before a recent modification) data from
the buffer cache. This patch invalidates the buffer cache for the
file to fix this.
PR: 194293
MFC after: 2 weeks
When a file is opened write-only and a partial block was written,
buffered I/O would try and read the whole block in. This would
result in a hung thread, since there was no open (fuse filehandle)
that allowed reading. This patch avoids the problem by forcing
DIRECT_IO for this case.
It also sets DIRECT_IO when the file system specifies the FN_DIRECTIO
flag in its reply to the open.
Tested by: nishida@asusa.net, freebsd@moosefs.com
PR: 194293, 206238
MFC after: 2 weeks
Trivial use-after-free where stp was freed too soon in the non-error path.
To fix, simply move its release to the end of the routine.
Reported by: Coverity
CID: 1006105
Sponsored by: EMC / Isilon Storage Division
consequence, the nfs client override of VOP_LOCK1() is no longer
needed.
Reviewed and tested by: rmacklem
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
When support for NFSv4.1 was added to the NFS server, it broke
the server rpc count stats, since newnfsstats.srvrpccnt[] doesn't
have entries for the new NFSv4.1 operations.
Without this patch, the code was incrementing bogus entries in
newnfsstats for the new NFSv4.1 operations.
This patch is an interim fix. The nfsstats structure needs to be
updated and that will come in a future commit.
Reported by: cem
MFC after: 2 weeks
It was reported via email that under certain heavy RPC loads
long delays before the exports would be updated was observed
when using "mountd -S". This patch reverses the priority between
the exclusive lock request to suspend the nfsd threads and the
shared lock request for performing RPCs.
As such, when mountd attempts to suspend the nfsd threads, it
gets priority over outstanding RPC requests to do this.
I suspect that the case reported was an artificial test load,
but this patch did fix the problem for the reporter.
Reported and Tested by: josephlai@qnap.com
MFC after: 2 weeks
This is only allowed by root and only used by the nfs daemon, which
should not provide an incorrect value. However, it's still good
practice to validate data provided by userland.
PR: 206626
Reported by: CTurt <cturt@hardenedbsd.org>
Reviewed by: rmacklem
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D6201
In win2unixfn() we expand Windows 95 style long names. In some cases that
requires moving the data in the nbp->nb_buf buffer backwards to make room. That
code failed to check for overflows, leading to a stack overflow in win2unixfn().
We now check for this event, and mark the entire conversion as failed in that
case. This means we present the 8 character, dos style, name instead.
PR: 204643
Differential Revision: https://reviews.freebsd.org/D6015
It was reported via email that a Linux client couldn't do a Kerberized
NFS mount when only "sec=krb5" was specified for the exports. The Linux
client attempted a mount via krb5i and the server replied NFSERR_SERVERFAULT.
Although NFSERR_WRONGSEC isn't listed as an error for SetClientID, I
think it is the correct reply, so this patch enables that.
I do not know if this fixes the mount attempt, but adding "krb5i" to the
list of allowed security flavours does allow the mount to work.
Reported by: joef@spectralogic.com
MFC after: 2 weeks
h_levels_num, as most data structs in ext2fs, is unsigned so
the index that addresses it has to be unsigned as well.
To get to overflow here we would probably be considering a
degenerate case though.
MFC after: 5 days
The ordering of acquisition of the state and session mutexes was
reversed in two cases executed when an NFSv4.1 client created/freed
a session. Since clients will typically do this only when mounting
and dismounting, the likelyhood of causing a deadlock was low but possible.
This can only occur for NFSv4.1 mounts, since the others do not
use sessions.
This was detected while testing the pNFS server/client where the
client crashed during dismounting.
The patch also reorders the unlocks, although that isn't necessary
for correct operation.
MFC after: 2 weeks
rounddown2 tends to produce longer lines than the original code
and when the code has a high indentation level it was not really
advantageous to do the replacement.
This tries to strike a balance between readability using the macros
and flexibility of having the expressions, so not everything is
converted.
the NFS server would leave the newly created vnode locked. This could
result in a file system that would not unmount and processes wedged,
waiting for the file to be unlocked.
Since this VOP_SETATTR() never fails for most file systems, this bug
doesn't normally manifest itself. I found it during testing of an
exported GlusterFS file system, which can fail.
This patch adds the vput() and changes the error to the correct NFS one.
MFC after: 2 weeks
the old and new NFS clients. He did a good job of isolating the problem
which was caused by the new NFS client not setting the post write mtime
correctly. The new NFS client code was cloned from the old client, but
was incorrect, because the mtime in the nfs vnode's cache wasn't yet
updated. This patch fixes this problem. The patch also adds missing mutex
locking.
Reported and tested by: bde
MFC after: 2 weeks