Commit Graph

3143 Commits

Author SHA1 Message Date
Rick Macklem
ca4defd583 The new NFS server would not allow a hard link to be
created to a symlink. This restriction (which was
inherited from OpenBSD) is not required by the NFS RFCs.
Since this is allowed by the old NFS server, it is a
POLA violation to not allow it. This patch modifies the
new NFS server to allow this.

Reported by:	jhb
Reviewed by:	jhb
MFC after:	3 days
2014-06-06 21:38:49 +00:00
Konstantin Belousov
60c5c866aa Allow shared locking for the tmpfs vnodes.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-06-04 15:30:49 +00:00
Hans Petter Selasky
fa0f6e62c6 Initial import of character device in userspace support for FreeBSD.
The CUSE library is a wrapper for the devfs kernel functionality which
is exposed through /dev/cuse . In order to function the CUSE kernel
code must either be enabled in the kernel configuration file or loaded
separately as a module. Currently none of the committed items are
connected to the default builds, except for installing the needed
header files. The CUSE code will be connected to the default world and
kernel builds in a follow-up commit.

The CUSE module was written by Hans Petter Selasky, somewhat inspired
by similar functionality found in FUSE. The CUSE library can be used
for many purposes. Currently CUSE is used when running Linux kernel
drivers in user-space, which need to create a character device node to
communicate with its applications. CUSE has full support for almost
all devfs functionality found in the kernel:
 - kevents
 - read
 - write
 - ioctl
 - poll
 - open
 - close
 - mmap
 - private per file handle data

Requested by several people. Also see "multimedia/cuse4bsd-kmod" in
ports.
2014-05-23 08:46:28 +00:00
Konstantin Belousov
08cf5ceb8e After r254627, the deupdate() started writing the directory entries to
disk.  That has a side effect of corrupting the "." entries names on
rename, since the call to createde() in the msdosfs_rename() sets the
de_Name to the target name.  If any change to the directory attributes
is performed, the wrong name is written back to the on-disk direntry
on update.

Overwrite the de_Name for the directories on rename to correct the dot
name.

Submitted by:	bde
MFC after:	1 week
2014-05-03 16:11:55 +00:00
Rick Macklem
ca20bd924f The new draft specification for NFSv4.0 specifies that a server
should either accept owner and owner_group strings that are just
the digits of the uid/gid or return NFS4ERR_BADOWNER.
This patch adds a sysctl vfs.nfsd.enable_stringtouid, which can
be set to enable the server w.r.t. accepting numeric string. It
also ensures that NFS4ERR_BADOWNER is returned if numeric uid/gid
strings are not enabled. This fixes the server for recent Linux
nfs4 clients that use numeric uid/gid strings by default.

Reported and tested by:	craigyk@gmail.com
MFC after:	2 weeks
2014-05-03 00:13:45 +00:00
Mateusz Guzik
183870cf75 Ignore the error from pipespace_new when creating a pipe.
It can fail if pipe map is exhausted (as a result of too many pipes created),
but it is not fatal and could be provoked by unprivileged users. The only
consequence is worse performance with given pipe.

Reported by:	ivoras
Suggested by:	kib
MFC after:	1 week
2014-05-02 00:52:13 +00:00
Rick Macklem
3c53f923dc The PR reported that the old NFS server did not set uio_td == NULL
for the VOP_READ() call. This patch fixes both the old and new
server for this case.

PR:		185232
Submitted by:	PR had patch for old server
Reviewed by:	kib
MFC after:	2 weeks
2014-04-24 20:47:58 +00:00
Rick Macklem
ab7f24103e Remove an unnecessary level of indirection for an argument.
This simplifies the code and should avoid the clang sparc
port from generating an abort() call.

Requested by:	rdivacky
Submitted by:	jhb
MFC after:	2 weeks
2014-04-23 23:13:46 +00:00
Rick Macklem
c0990edac6 Modify the NFSv4 client's Pathconf RPC (actually a Getattr Op.)
so that it only does the RPC for names that are answered by the RPC.
Doing the RPC for other names is harmless, but unnecessary.

MFC after:	2 weeks
2014-04-23 22:13:10 +00:00
Rick Macklem
9eeef7464b Fixes mkdir for the NFSv2 client that was broken by r264705.
Reported by:	bdrewery
MFC after:	2 weeks
2014-04-22 04:42:46 +00:00
Rick Macklem
c7b560b9b4 For an NFSv4 mount with the "nocto" option, don't get the
up to date file attributes upon close. This reduces the
Getattr RPC count by about 65% for software builds.

MFC after:	2 weeks
2014-04-21 19:10:23 +00:00
Rick Macklem
c3e4a7261c Modify the NFSv4 client create/mkdir RPC so that it acquires
post-create/mkdir directory attributes. This allows the RPC to
name cache the newly created directory and reduces the lookup RPC
count for applications creating a lot of directories.

MFC after:	2 weeks
2014-04-20 22:19:00 +00:00
Rick Macklem
de1a42bd0c Modify the NFSv4 client open/create RPC so that it acquires
post-open/create directory attributes. This allows the RPC to
name cache the newly created file and reduces the lookup RPC
count by about 10% for software builds.

MFC after:	2 weeks
2014-04-19 19:40:20 +00:00
Rick Macklem
a6f8e64e74 Modify the Lookup RPC for NFSv4 so that it acquires directory
attributes. This allows the client to cache directory names
when they are looked up, reducing the Lookup RPC count by
about 40% for software builds.

MFC after:	2 weeks
2014-04-18 22:05:34 +00:00
Warner Losh
1bbf66051b Take out the hack to write -1's to non-NAND. Always do a BIO_DELETE on
the ranges we want to erase. This is nicer to SSDs that want TRIMs
anyway.
2014-04-18 17:03:43 +00:00
Warner Losh
875ac64f3e More properly account for free/reserved segments to avoid deadlock or
worse when filling up a device and then trying to erase files to make
space. Without enough space, you can't do that. Also, ensure that the
metadata writes don't generate ENOSPC. They will be retried later
since the buffers are still dirty...

Submitted by: mjg@
2014-04-18 17:03:35 +00:00
Andrey V. Elsukov
14b2dc3952 Use SMB_QUERY_FS_SIZE_INFO request to populate statfs structure.
When server doesn't support this request, try to use SMB_INFO_ALLOCATION.
And use SMB_COM_QUERY_INFORMATION_DISK request as fallback.

MFC after:	2 weeks
2014-04-15 09:10:01 +00:00
Xin LI
25bfde79d6 Fix NFS deadlock vulnerability. [SA-14:05]
Fix "Heartbleed" vulnerability and ECDSA Cache Side-channel
Attack in OpenSSL. [SA-14:06]
2014-04-08 18:27:32 +00:00
Bryan Drewery
44f1c91610 Rename global cnt to vm_cnt to avoid shadowing.
To reduce the diff struct pcu.cnt field was not renamed, so
PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in
kvm(3) and vmstat(8). The goal was to not affect externally used KPI.

Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the
the global cnt variable.

Exp-run revealed no ports using it directly.

No objection from:	arch@
Sponsored by:	EMC / Isilon Storage Division
2014-03-22 10:26:09 +00:00
Pedro F. Giffuni
ca73017a2d Revert r263449;
ext2fs: minor update to the dirpref policy.

The change in UFS r254996, reverted the change as the
older code seems to work better. This was not visible
in local testing but we can trust UFS is vastly more
exercised in diferent environments.
2014-03-21 04:33:38 +00:00
Pedro F. Giffuni
e23c349230 ext2fs: minor update to the dirpref policy.
Bring in a minor change to the dirpref policy based on r248623.

This is pretty minimal change to keep the implementation in
sync with UFS but other parts from the original change are not
directly applicable so don't expect improvements in fsck times.

MFC after:	2 weeks
2014-03-20 21:19:13 +00:00
Pedro F. Giffuni
ef78ad0290 msdosfs: minor format fix - spaces vs tab
MFC after:	3 days
2014-03-20 20:14:04 +00:00
Robert Watson
4a14441044 Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

MFC after:	3 weeks
2014-03-16 10:55:57 +00:00
Bryan Drewery
504bde017a Add missing FALLTHROUGH comment in tmpfs_dir_getdents for looking up '.' and
'..'.

Reviewed by:	Russell Cattelan
Sponsored by:	EMC / Isilon Storage Division
MFC after:	2 weeks
2014-03-14 13:58:02 +00:00
Bryan Drewery
ac09d109ca Rename cnt to maxcookies and change its use as the condition for when to
lookup cookies to be less obscure.

No functional change.

Since r245115, cnt has not really been needed in tmpfs_dir_getdents().  Keep
it for the MPASS() for now though.

Sponsored by:	EMC / Isilon Storage Division
MFC after:	2 weeks
2014-03-14 13:55:48 +00:00
Bryan Drewery
62dca316da Cleanup redundant logic and add some comments to help explain how
it works in lieu of potentially less clear code.

Sponsored by:	EMC / Isilon Storage Division
Discussed with:	Russell Cattelan
2014-03-14 02:10:30 +00:00
Bryan Drewery
0742ebc98f Fix -o size less than PAGE_SIZE resulting in SIZE_MAX being used.
Discussed with:	kib
MFC after:	2 weeks
2014-03-14 01:43:55 +00:00
Pedro F. Giffuni
157b40af8e ext2fs: Fix a bug when sorting htree entries.
This a typo introduced when bringing the original code from NetBSD.

Reported by:	Mike Ma
MFC after:	3 days
2014-03-06 21:02:16 +00:00
Pedro F. Giffuni
c3b76e1345 ext2fs: small formatting fixes.
Remove some redundant spaces.
No functional change.

MFC after:	3 days
2014-03-01 21:22:20 +00:00
Pedro F. Giffuni
3a54024da4 ext2fs: use of tab vs spaces.
Consistently use a single tab after a #define as mentioned in style(9).
Use tabs instead of space for indenting.
Fix a typo: "hash_vesion".

No functional change.

MFC after:	3 days
2014-02-28 21:25:32 +00:00
Pedro F. Giffuni
67da48a15b ext2fs: fully enable ext4 read-only support.
The ext4 developers tend to tag Ext4-specific flags as
"incompatible" even when such features are not relevant for
read-only support.  This is a consequence of the process
though which this filesystem is implemented without design
and the fact that some new features are not extensible to
ext2/3.

Organize the features according to what we support and sort
them so that we can now read-only mount filesystems with
some features that may be found in newly formatted ext4 fs.

Submitted by:	Zheng Liu
Reviewed by:	pfg
MFC after:	5 days
2014-02-22 22:07:16 +00:00
Dimitry Andric
2de7ba0758 In sys/fs/nandfs/nandfs_vfsops.c, #if 0 an unused static function.
MFC after:	3 days
2014-02-15 11:42:56 +00:00
Pedro F. Giffuni
ad3d96a730 ext2fs: Use i_flag instead of i_flags for Ext4 inode flags.
The ext4 inode flags do not have equivalents for chflags (1)
and hold information that is private to the implementation.
The i_flag field in the inode is a better place to hold the Ext4
inode flags as it saves us from masking flags while setting or
getting attributes.  It should also make things cleaner if we
implement write support for Ext4.

Suggested by:	bde
Tested by:	Mike Ma
MFC after:	3 days
2014-01-28 14:39:05 +00:00
Pedro F. Giffuni
99984d229c ext2fs: Re-enable reallocblk.
The major corruption issues affecting this code have been fixed
a while ago.

MFC after:	1 week
2014-01-24 20:26:00 +00:00
Pedro F. Giffuni
1093104cf7 ext2fs: fix a bug in dirindex and re-enable.
The IN_* flags should be set in i_flag instead of corrupting
i_flags [1].

Re-enable HTree dirindex as the last series of bug fixes
seems to have fixed the issues.

Reported by:	bde [1]
Tested by:	kevlo
MFC after:	1 week
2014-01-24 13:51:38 +00:00
Pedro F. Giffuni
b7bbf8b9f3 ext2fs: fix logic error in the previous change.
Use the bitwise negation instead of bogus boolean negation and move
the flag manipulation with the assignment.
Fix some grammatical errors introduced in the same change.

Reported by:	bde
MFC after:	3 days
2014-01-22 19:09:41 +00:00
Pedro F. Giffuni
a7710d51c4 ext2fs: Translate the EXT4_EXTENTS and EXT4_INDEX to the inode flags.
r260545 cleared the inode flags to fix corruption problems but
we still need to pass some EXT4 flags for the ext4 read-only
mode.  None of these attributes has an equivalent in FreeBSD and
are uninteresting for the system utilities so they should be
innaccessible in ext2_getattrib().

Note: we also use EXT4_HUGE_FILE but we use it directly from the
dinode structure so it is not necessary to translate it,

Suggested by:	bde
MFC after:	3 days
2014-01-21 19:06:29 +00:00
Alexander Motin
6103bae6ae Fix lock leak in purely hypothetical case of TCP connection without SVC_ACK
method.  This change should be NOP now, but it is better to be future safe.

Reported by:	rmacklem
2014-01-14 20:18:38 +00:00
Pedro F. Giffuni
c2e2b77b19 ext2fs: fix inode flag conversion.
After r252890 we are naively attempting to pass through the
inode flags.  This is technically incorrect as the ext2
inode flags don't match the UFS/system values used in
FreeBSD and a clean conversion is needed.

Some filtering was left in place so the change didn't cause
significant changes in FreeBSD but some of the garbage passed
is likely to be the cause for warning messages in linux.

Fix the issue by resetting the flags before conversion as was
done previously. This also means we will not pass the EXT4_*
inode flags into FreeBSD's inode.

PR:		kern/185448
MFC after:	3 days
2014-01-11 15:19:04 +00:00
Alexander Motin
45e18ea7ea Fix off-by-one error in r260229.
Coverity CID:	1148955
2014-01-07 11:43:51 +00:00
Alexander Motin
d473bac729 Rework NFS Duplicate Request Cache cleanup logic.
- Introduce additional hash to group requests by hash of sockref.  This
allows to process TCP acknowledgements without looping though all the cache,
and as result allows to do it every time.
 - Indroduce additional callbacks to notify application layer about sockets
disconnection.  Without this last few requests processed just before socket
disconnection never processed their ACKs and stuck in cache for many hours.
 - Implement transport-specific method for tracking reply acknowledgements.
New implementation does not cross multiple stack layers to get the data and
does not have race conditions that previously made some requests stuck
in cache.  This could be done more efficiently at sockbuf layer, but that
would broke some KBIs, while I don't know other consumers for it aside NFS.
 - Instead of traversing all DRC twice per request, run cleaning only once
per request, and except in some conditions traverse only single hash slot
at a time.

Together this limits NFS DRC growth only to situations of real connectivity
problems.  If network is working well, and so all replies are acknowledged,
cache remains almost empty even after hours of heavy load.  Without this
change on the same test cache was growing to many thousand requests even
with perfectly working local network.

As another result this reduces CPU time spent on the DRC handling during
SPEC NFS benchmark from about 10% to 0.5%.

Sponsored by:	iXsystems, Inc.
2014-01-03 15:09:59 +00:00
Alexander Motin
1555cf04fc Slightly simplify expiration logic introduced in r254337.
- Do not update the histogram for items we are any way deleting from cache.
 - Do not update the histogram if nfsrc_tcphighwater is not set.
 - Remove some extra math operations.
2013-12-25 16:58:42 +00:00
Rick Macklem
43a213bb92 The NFSv4 server would call VOP_SETATTR() with a shared locked vnode
when a Getattr for a file is done by a client other than the one that
holds the file's delegation. This would only happen when delegations
are enabled and the problem is fixed by this patch.

MFC after:	1 week
2013-12-25 01:03:14 +00:00
Rick Macklem
0c695afb96 An intermittent problem with NFSv4 exporting of ZFS snapshots was
reported to the freebsd-fs mailing list. I believe the problem was
caused by the Readdir operation using VFS_VGET() for a snapshot file entry
instead of VOP_LOOKUP(). This would not occur for NFSv3, since it
will do a VFS_VGET() of "." which fails with ENOTSUPP at the beginning
of the directory, whereas NFSv4 does not check "." or "..". This
patch adds a call to VFS_VGET() for the directory being read to check
for ENOTSUPP.
I also observed that the mount_on_fileid and fsid attributes were
not correct at the snapshot's auto mountpoints when looking at packet
traces for the Readdir. This patch fixes the attributes by doing a check
for different v_mount structure, even if the vnode v_mountedhere is not
set.

Reported by:	jas@cse.yorku.ca
Tested by:	jas@cse.yorku.ca
Reviewed by:	asomers
MFC after:	1 week
2013-12-24 22:24:17 +00:00
Rick Macklem
b921158ae0 The NFSv4 client was passing both the p and cred arguments to
nfsv4_fillattr() as NULLs for the Getattr callback. This caused
nfsv4_fillattr() to not fill in the Change attribute for the reply.
I believe this was a violation of the RFC, but had little effect on
server behaviour. This patch passes a non-NULL p argument to fix this.

MFC after:	1 week
2013-12-24 00:48:39 +00:00
Pedro F. Giffuni
b41f53c43b ext2fs: make the hashing algorithm match the linux code.
There appears to be a hash function compatibility issue.
The code is currently disabled but fix it nevertheless.

PR:		kern/183230
MFC after:	3 days
2013-12-23 19:47:34 +00:00
Rick Macklem
6b8fe5d59d The NFSv4.1 client didn't return NFSv4.1 specific error codes
for the Getattr and Recall callbacks. This patch fixes it.
Since the NFSv4.1 specific error codes would only happen for
abnormal circumstances, this patch has little effect, in practice.

MFC after:	1 week
2013-12-23 15:16:53 +00:00
Alexander Motin
10f8f58d4a Fix RPC server threads file handle affinity to work better with ZFS.
Instead of taking 8 specific bytes of file handle to identify file during
RPC thread affitinity handling, use trivial hash of the full file handle.
ZFS's struct zfid_short does not have padding field after the length field,
as result, originally picked 8 bytes are loosing lower 16 bits of object ID,
causing many false matches and unneeded requests affinity to same thread.
  This fix substantially improves NFS server latency and scalability in SPEC
NFS benchmark by more flexible use of multiple NFS threads.

Sponsored by:	iXsystems, Inc.
2013-12-23 08:43:16 +00:00
Konstantin Belousov
f26ca5ecde Do not allow O_EXEC opens for fifo, return EINVAL.
Besides not making sense, open(O_EXEC) for fifo creates fifoinfo with
zero readers and writers counts, which causes premature free of pipes.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-12-17 17:28:02 +00:00
Alexander Motin
ca187878c0 Fix long known bug with handling device aliases residing not in devfs root.
Historically creation of device aliases created symbolic links using only
name of target device as a link target, not considering current directory.
Fix that by adding number of "../" chunks to the terget device name,
required to get out of the current directory to devfs root first.

MFC after:	1 month
2013-12-12 11:05:48 +00:00