Commit Graph

3877 Commits

Author SHA1 Message Date
rmacklem
17b4d47d54 Delete some macros that are unused.
These macros were added because they were used by the pNFS server last
year. However, they are no longer used by the pNFS server code and
might as well be deleted.
This is a partial reversion of r326735.
2018-06-09 23:38:22 +00:00
rmacklem
5df6d95db8 Delete an unused macro and clean up a comment about it.
NFSDEV_MIRRORSTR was defined for the pNFS server, but has not been used,
so this patch deletes it. It also cleans up the comment and hopefully
makes it more readable.
2018-06-09 23:14:59 +00:00
rmacklem
d080b0360c Revert r334586 since I now think __unused is the better way to handle this. 2018-06-04 11:35:04 +00:00
rmacklem
c5c0d85fbc Fix a gcc8 warning about a write only variable.
gcc8 warns that "verf" was set but not used. This was because the code
that uses it is disabled via a "#if 0".
This patch adds a "#if 0" to the variable's declaration and assignment
to get rid of the warning.
This way the code could be re-enabled without difficulty.

Requested by:	mmacy
MFC after:	2 weeks
2018-06-03 19:46:44 +00:00
rmacklem
f996124d10 Fix the default number of threads for Flex File layout pNFS client I/O.
The intent was that the default would be based on number of CPUs, but the
code disabled using taskqueue() by default.
This code is only executed when mounting a NFSv4.1 server that supports the
Flexible File layout for pNFS and, since such servers are rare, this change
shouldn't result in a POLA violation.
(The FreeBSD pNFS server is still a project and the only other one that
 uses Flexible File layout is being developed by Primary Data and I don't
 know if they have even shipped any to customers yet.)
Found while testing the pNFS server.
2018-06-02 00:11:26 +00:00
rmacklem
3cc36cf922 Add the BindConnectiontoSession operation to the NFSv4.1 server.
Under some fairly unusual circumstances, the Linux NFSv4.1 client is
doing a BindConnectiontoSession operation for TCP connections.
It is also used by the ESXi6.5 NFSv4.1 client.
This patch adds this operation to the NFSv4.1 server.

Reported by:	andreas.nagy@frequentis.com
Tested by:	andreas.nagy@frequentis.com
MFC after:	2 weeks
2018-06-01 19:47:41 +00:00
rmacklem
7e7e0a78a9 Strengthen locking for the NFSv4.1 server DestroySession operation.
If a client did a DestroySession on a session while it was still in use,
the server might try to use the session structure after it is free'd.
I think the client has violated RFC5661 if it does this, but this patch
makes DestroySession block all other nfsd threads so no thread could
be using the session when it is free'd. After the DestroySession, nfsd
threads will not be able to find the session. The patch also adds a check
for nd_sessionid being set, although if that was not the case it would have
been all 0s and unlikely to have a false match.
This might fix the crashes described in PR#228497 for the FreeNAS server.

PR:		228497
MFC after:	1 week
2018-05-30 20:16:17 +00:00
rmacklem
ad3d0b18ca Fix the sleep event for layout recall.
The sleep for I/O completion during an NFSv4.1 pNFS layout recall used
the wrong event value and could result in the "[nfscl]" thread hung
for the mount.
This patch fixes the event to be the correct.
This bug will only affect NFSv4.1 pnfs mounts and only when the server
does a layout recall callback, so it won't affect many. Without the patch,
a mount without the "pnfs" option will avoid the problem.
Found during testing of the pNFS server.

MFC after:	1 week
2018-05-26 23:02:15 +00:00
mmacy
29271e5a0a nfsclient: warnings cleanups 2018-05-20 06:14:12 +00:00
emaste
f0cc1a044c Use NULL for SYSINIT's last arg, which is a pointer type
Sponsored by:	The FreeBSD Foundation
2018-05-18 17:58:09 +00:00
rmacklem
fafcbcec14 Add a missing nfsrv_freesession() call for an unlikely failure case.
Since NFSv4.1 clients normally create a single session which supports
both fore and back channels, it is unlikely that a callback will fail
due to a lack of a back channel.
However, if this failure occurred, the session wasn't being dereferenced
and would never be free'd.
Found by inspection during pNFS server development.

Tested by:	andreas.nagy@frequentis.com
MFC after:	2 months
2018-05-17 21:17:20 +00:00
mckusick
6a5f395205 Revert change made in base r171522
(https://svnweb.freebsd.org/base?view=revision&revision=304232)
converting clrbuf() (which clears the entire buffer) to vfs_bio_clrbuf()
(which clears only the new pages that have been added to the buffer).

Failure to properly remove pages from the buffer cache can make
pages that appear not to need clearing to actually have bad random
data in them. See for example base r304232
(https://svnweb.freebsd.org/base?view=revision&revision=304232)
which noted the need to set B_INVAL and B_NOCACHE as well as clear
the B_CACHE flag before calling brelse() to release the buffer.

Rather than trying to find all the incomplete brelse() calls, it
is simpler, though more slightly expensive, to simply clear the
entire buffer when it is newly allocated.

PR: 213507
Submitted by: Damjan Jovanovic
Reviewed by:  kib
2018-05-16 23:30:03 +00:00
rmacklem
098223ca00 End grace for the NFSv4 server if all mounts do ReclaimComplete.
The NFSv4 protocol requires that the server only allow reclaim of state
and not issue any new open/lock state for a grace period after booting.
The NFSv4.0 protocol required this grace period to be greater than the
lease duration (over 2minutes). For NFSv4.1, the client tells the server
that it has done reclaiming state by doing a ReclaimComplete operation.
If all NFSv4 clients are NFSv4.1, the grace period can end once all the
clients have done ReclaimComplete, shortening the time period considerably.
This patch does this. If there are any NFSv4.0 mounts, the grace period
will still be over 2minutes.
This change is only an optimization and does not affect correct operation.

Tested by:	andreas.nagy@frequentis.com
MFC after:	2 months
2018-05-15 20:28:50 +00:00
rmacklem
082a58b8e3 Fix the eir_server_scope reply argument for NFSv4.1 ExchangeID.
In the reply to an ExchangeID operation, the NFSv4.1 server returns a
"scope" value (eir_server_scope). If this value is the same, it indicates
that two servers share state, which is never the case for FreeBSD servers.
As such, the value needs to be unique and it was without this patch.
However, I just found out that it is not supposed to change when the
server reboots and without this patch, it did change.
This patch fixes eir_server_scope so that it does not change when the
server is rebooted.
The only affect not having this patch has is that Linux clients don't
reclaim opens and locks after a server reboot, which meant they lost
any byte range locks held before the server rebooted.
It only affects NFSv4.1 mounts and the FreeBSD NFSv4.1 client was not
affected by this bug.

MFC after:	1 week
2018-05-13 23:38:01 +00:00
fsu
58c11e816a Fix directory blocks checksumming.
Reviewed by:    pfg
MFC after:      3 months

Differential Revision:    https://reviews.freebsd.org/D15396
2018-05-13 19:48:30 +00:00
fsu
faa61df95a Fix on-disk inode checksum calculation logic.
Reviewed by:    pfg
MFC after:      3 months

Differential Revision:    https://reviews.freebsd.org/D15395
2018-05-13 19:29:35 +00:00
fsu
aad5de15a2 Fix EXT2FS_DEBUG definition usage.
Reviewed by:    pfg
MFC after:      3 months

Differential Revision:    https://reviews.freebsd.org/D15394
2018-05-13 19:19:10 +00:00
rmacklem
de34a49ecf Fix a slow leak of session structures in the NFSv4.1 server.
For a fairly rare case of a client doing an ExchangeID after a hard reboot,
the old confirmed clientid still exists, but some clients use a new
co_verifier. For this case, the server was not freeing up the sessions on
the old confirmed clientid.
This patch fixes this case. It also adds two LIST_INIT() macros, which are
actually no-ops, since the structure is malloc()d with M_ZERO so the pointer
is already set to NULL.
It should have minimal impact, since the only way I could exercise this
code path was by doing a hard power cycle (pulling the plus) on a machine
running Linux with a NFSv4.1 mount on the server.
Originally spotted during testing of the ESXi 6.5 client.

Tested by:	andreas.nagy@frequentis.com
MFC after:	2 months
2018-05-13 12:42:53 +00:00
rmacklem
1e4675982e The NFSv4.1 server should return NFSERR_BACKCHANBUSY instead of NFS_OK.
When an NFSv4.1 session is busy due to a callback being in progress,
nfsrv_freesession() should return NFSERR_BACKCHANBUSY instead of NFS_OK.
The only effect this has is that the DestroySession operation will report
the failure for this case and this probably has little or no effect on a
client. Spotted by inspection and no failures related to this have been
reported.

MFC after:	2 months
2018-05-13 12:29:09 +00:00
rmacklem
962b30b569 Add support for the TestStateID operation to the NFSv4.1 server.
The Linux client now uses the TestStateID operation, so this patch adds
support for it to the NFSv4.1 server. The FreeBSD client never uses this
operation, so it should not be affected.

MFC after:	2 months
2018-05-11 22:16:23 +00:00
mmacy
a0bd5d3d7f Eliminate the overhead of gratuitous repeated reinitialization of cap_rights
- Add macros to allow preinitialization of cap_rights_t.

- Convert most commonly used code paths to use preinitialized cap_rights_t.
  A 3.6% speedup in fstat was measured with this change.

Reported by:	mjg
Reviewed by:	oshogbo
Approved by:	sbruno
MFC after:	1 month
2018-05-09 18:47:24 +00:00
pfg
68807f2165 msdosfs: use vfs_timestamp() to generate timestamps instead of getnanotime().
Most filesystems, with the notable exceptions of msdosfs and autofs use
only vfs_timestamp() to read the current time. This has the benefit of
configurable granularity (using the vfs.timestamp_precision sysctl).

For convenience, use it on msdosfs too.

Submitted by:	Damjan Jovanovic
Differential Revision:	https://reviews.freebsd.org/D15297
2018-05-06 21:29:29 +00:00
jamie
1c11f552d6 Make it easier for filesystems to count themselves as jail-enabled,
by doing most of the work in a new function prison_add_vfs in kern_jail.c
Now a jail-enabled filesystem need only mark itself with VFCF_JAIL, and
the rest is taken care of.  This includes adding a jail parameter like
allow.mount.foofs, and a sysctl like security.jail.mount_foofs_allowed.
Both of these used to be a static list of known filesystems, with
predefined permission bits.

Reviewed by:	kib
Differential Revision:	D14681
2018-05-04 20:54:27 +00:00
pfg
1ac70fc548 msdosfs: long names of files are created incorrectly.
This fixes a regression that happened in r120492 (2003) where libkiconv
was introduced and we went from checking unlen to checking for '\0'.

PR:		111843
Patch by:	Damjan Jovanovic
MFC after:	1 week
2018-05-04 03:44:12 +00:00
rmacklem
d3389fdd15 Revert r333183, since I am not sure that just initializing the
list is the correct thing to do and that is already done without
this commit.
2018-05-02 21:29:42 +00:00
rmacklem
713342ac10 Add two missing LIST_INIT()s.
This patch adds two missing LIST_INIT()s. Found by inspection.
In practice, these are currently no-ops, since the structure they are
in is malloc'd with M_ZERO and all LIST_INIT does is set the pointer
in the list head to NULL. (In other words, the M_ZERO has already
correctly initialized it.)

MFC after:	2 months
2018-05-02 20:36:11 +00:00
eadler
553b9804f0 [procfs] Split procfs_attr into multiple functions
Reviewed by:	des, kib
Discussed with:	mmacy
Differential Revision:	https://reviews.freebsd.org/D15150
2018-04-24 14:49:09 +00:00
rmacklem
894b3406cb Fix use of pointer after being set NULL.
Using a pointer after setting it NULL is probably not a good plan.
Spotted by inspection during changes for Flexible File Layout Ioerr handling.
This code path obviously isn't normally executed.

MFC after:	1 week
2018-04-20 11:38:29 +00:00
rmacklem
8cd51ec4a0 Fix OpenDowngrade for NFSv4.1 if a client sets the OPEN_SHARE_ACCESS_WANT* bits.
The NFSv4.1 RFC specifies that the OPEN_SHARE_ACCESS_WANT bits can be set
in the OpenDowngrade share_access argument and are basically ignored.
I do not know of a extant NFSv4.1 client that does this, but this little
patch fixes it just in case.
It also changes the error from NFSERR_BADXDR to NFSERR_INVAL since the NFSv4.1
RFC specifies this as the error to be returned if bogus bits are set.
(The NFSv4.0 RFC didn't specify any error for this, so the error reply can
 be changed for NFSv4.0 as well.)
Found by inspection while looking at a problem with OpenDowngrade reported
for the ESXi 6.5 NFSv4.1 client.

Reported by:	andreas.nagy@frequentis.com
PR:		227214
MFC after:	1 week
2018-04-19 20:30:33 +00:00
brooks
9d79658aab Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
benno
0b8d8dcffd Add isoboot(8) for booting BIOS systems from HDDs containing ISO images.
This is part of a project for adding the ability to create hybrid CD/USB boot
images. In the BIOS case when booting from something that isn't a CD we need
some extra boot code to actually find our next stage (loader) within an
ISO9660 filesystem. This code will reside in a GPT partition (similar to
gptboot(8) from which it is derived) and looks for /boot/loader in an
ISO9660 filesystem on the image.

Reviewed by:	imp
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D14914
2018-04-05 19:40:46 +00:00
emaste
61bad5ab72 Revert r313780 (UFS_ prefix) 2018-03-17 12:59:55 +00:00
emaste
e23f2eb452 Prefix UFS symbols with UFS_ to reduce namespace pollution
Followup to r313780.  Also prefix ext2's and nandfs's versions with
EXT2_ and NANDFS_.

Reported by:	kib
Reviewed by:	kib, mckusick
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D9623
2018-03-17 01:48:27 +00:00
ume
e35a88be6c Fix Bad file descriptor error.
MFC after:	1 week
2018-03-09 04:45:24 +00:00
eadler
d7c7fe26c8 sys/fuse: fix off by one error
Reported by:	Ilja Van Sprundel <ivansprundel@ioactive.com>
Reported by:	Domagoj Stolfa <domagoj.stolfa@gmail.com>
2018-03-03 20:42:39 +00:00
pfg
999ae367a8 {ext2|ufs}_readdir: Avoid setting negative ncookies.
ncookies cannot be negative or the allocator will fail. This should only
happen if a caller is very broken but we can still try to survive the
event.

We should probably also verify for uio_resid > MAXPHYS but in that case
it is not clear that just clipping the ncookies value is an adequate
response.

MFC after:	2 weeks
2018-02-06 22:38:19 +00:00
jeff
e67ec0d694 Use per-domain locks for vm page queue free. Move paging control from
global to per-domain state.  Protect reservations with the free lock
from the domain that they belong to.  Refactor to make vm domains more
of a first class object.

Reviewed by:    markj, kib, gallatin
Tested by:      pho
Sponsored by:   Netflix, Dell/EMC Isilon
Differential Revision:  https://reviews.freebsd.org/D14000
2018-02-06 22:10:07 +00:00
pfg
578257632e ext2fs: remove EXT4F_RO_INCOMPAT_SUPP
This was a hack to be able to mount ext4 filesystems read-only while not
supporting all the features. We now support all those features so it
doesn't make sense to keep the undocumented hack.

Discussed with:	fsu
2018-02-05 15:14:01 +00:00
pfg
7b59c2a93e ext2fs: Cleanup variable assignments for extents.
Delay the initialization of variables until the are needed.

In the case of ext4_ext_rm_leaf(), make sure 'error' value is not
undefined.

Reported by:		Clang's static analyzer
Differential Revision:	https://reviews.freebsd.org/D14193
2018-02-05 14:30:27 +00:00
fsu
3b5710279c Fix mistake in case of zeroed inode check.
Reported by:	pho
MFC after:	6 months
2018-01-29 22:15:46 +00:00
fsu
0a2aa6292e Add flex_bg/meta_bg features RW support.
Reviewed by:    pfg
MFC after:      6 months

Differential Revision:    https://reviews.freebsd.org/D13964
2018-01-29 21:54:13 +00:00
pfg
434dd3e7fe Revert r328479:
{ext2|ufs}_readdir: Set limit on valid ncookies values.

We aren't allowed to set resid like this.

Pointed out by:	kib, imp
2018-01-27 16:34:00 +00:00
pfg
55c3e327b4 {ext2|ufs}_readdir: Set limit on valid ncookies values.
Sanitize the values that will be assigned to ncookies so that we ensure
they are sane and we can handle them.

Let ncookies signed as it was before r328346. The valid range is such
that unsigned values are not required and we are not able to avoid at
least one cast anyways.

Hinted by:	bde
2018-01-27 15:33:52 +00:00
cem
e83cbe4f78 nfs: Remove NFSSOCKADDRALLOC, NFSSOCKADDRFREE macros
They were just thin wrappers over malloc(9) w/ M_ZERO and free(9).

Discussed with:	rmacklem, markj
Sponsored by:	Dell EMC Isilon
2018-01-25 22:38:39 +00:00
cem
c060d198e3 style: Remove remaining deprecated MALLOC/FREE macros
Mechanically replace uses of MALLOC/FREE with appropriate invocations of
malloc(9) / free(9) (a series of sed expressions).  Something like:

* MALLOC(a, b, ... -> a = malloc(...
* FREE( -> free(
* free((caddr_t) -> free(

No functional change.

For now, punt on modifying contrib ipfilter code, leaving a definition of
the macro in its KMALLOC().

Reported by:	jhb
Reviewed by:	cy, imp, markj, rmacklem
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14035
2018-01-25 22:25:13 +00:00
pfg
08d8a954a8 Minor style issue introduced in r328346.
Pointed by:	bde
2018-01-25 18:01:46 +00:00
pfg
944d693f04 ext2fs|ufs:Unsign some values related to allocation.
When allocating memory through malloc(9), we always expect the amount of
memory requested to be unsigned as a negative value would either stand for
an error or an overflow.
Unsign some values, found when considering the use of mallocarray(9), to
avoid unnecessary casting. Also consider that indexes should be of
at least the same size/type as the upper limit they pretend to index.

MFC after:	2 weeks
2018-01-24 17:58:48 +00:00
pfg
ca690ecdf9 Revert r327781, r328093, r328056:
ufs|ext2fs: Revert uses of mallocarray(9).

These aren't really useful: drop them.
Variable unsigning will be brought again later.
2018-01-24 16:44:57 +00:00
trasz
cf8b777c32 Add SPDX tags to autofs(5).
MFC after:	2 weeks
2018-01-24 16:40:26 +00:00
pfg
341f8038fe extfs: Remove unused variables.
Found by:	scan-build
Reviewed by:	fsu
Differential Revision:	https://reviews.freebsd.org/D14017
2018-01-23 14:17:04 +00:00
pfg
f0c6025eb6 Unsign some values related to allocation.
When allocating memory through malloc(9), we always expect the amount of
memory requested to be unsigned as a negative value would either stand for
an error or an overflow.
Unsign some values, found when considering the use of mallocarray(9), to
avoid unnecessary casting. Also consider that indexes should be of
at least the same size/type as the upper limit they pretend to index.

MFC after:	3 weeks
2018-01-22 02:08:10 +00:00
pfg
ced875130d Revert r327828, r327949, r327953, r328016-r328026, r328041:
Uses of mallocarray(9).

The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.

Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.

Reported by:	wosch
PR:		225197
2018-01-21 15:42:36 +00:00
manu
77258f226e nfs: Do not printf each time a lock structure is freed during module unload
There can be a lot of those structures and printing a line each time we free
one on module unload.

MFC after:	3 days
2018-01-18 15:28:49 +00:00
jhb
0aaf564f94 Use long for the last argument to VOP_PATHCONF rather than a register_t.
pathconf(2) and fpathconf(2) both return a long.  The kern_[f]pathconf()
functions now accept a pointer to a long value rather than modifying
td_retval directly.  Instead, the system calls explicitly store the
returned long value in td_retval[0].

Requested by:	bde
Reviewed by:	kib
Sponsored by:	Chelsio Communications
2018-01-17 22:36:58 +00:00
pfg
0c49a087dc ext2fs: use mallocarray(9).
Focus on code where we are doing multiplications within malloc(9). These
are not likely to overflow, however the change is still useful as some
static checkers can benefit from the allocation attributes we use for
mallocarray.
2018-01-16 19:29:32 +00:00
pfg
7bb0460e2e nfsclient: make some use of mallocarray(9).
Focus on code where we are doing multiplications within malloc(9). None of
these ire likely to overflow, however the change is still useful as some
static checkers can benefit from the allocation attributes we use for
mallocarray.

This initial sweep only covers malloc(9) calls with M_NOWAIT. No good
reason but I started doing the changes before r327796 and at that time it
was convenient to make sure the sorrounding code could handle NULL values.

X-Differential revision: https://reviews.freebsd.org/D13837
2018-01-15 21:14:56 +00:00
fsu
1c4853adf6 Add metadata_csum feature support.
Reviewed by:   pfg (mentor)
Approved by:   pfg (mentor)
MFC after:     6 months

Differential Revision:    https://reviews.freebsd.org/D13810
2018-01-14 20:46:39 +00:00
trasz
b5d92d1ceb Make nullfs properly report MNT_AUTOMOUNTED set on the nullfs mount itself,
instead of copying from the underlying filesystem.

PR:		224851
Reported by:	Jamie Landeg-Jones <jamie at dyslexicfish.net>
Tested by:	Jamie Landeg-Jones <jamie at dyslexicfish.net>
MFC after:	2 weeks
2018-01-10 17:51:02 +00:00
jhb
4d93d4b094 Correct comment. procfs_doprocfile implements 'file', not 'self'. 2018-01-05 18:32:46 +00:00
fsu
9384204e2e Add 64bit feature support.
Reviewed by:    kevlo, pfg (mentor)
Approved by:    pfg (mentor)
MFC after:      6 months

Differential Revision:    https://reviews.freebsd.org/D11530
2018-01-05 10:04:01 +00:00
kib
f74fbb05c8 Reuse kern_proc_vmmap_resident() for procfs_map resident count.
The existing algorithm in procfs_map() to calculate count of resident
pages in an entry is too primitive, resulting in too long run time for
large sparse mapping entries.  Re-use the kern_proc_vmmap_resident()
from kern_proc.c which only looks at the existing pages in the
iterations.

Also, this makes procfs to honor kern.proc_vmmap_skip_resident_count,
if user does not need this information.

Reported by:	Glenn Weinberg <glenn.weinberg@intel.com>
PR:	224532
No objections from:	des (procfs maintainer)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D13595
2017-12-28 13:23:13 +00:00
eadler
421a929b1e kernel: Fix several typos and minor errors
- duplicate words
- typos
- references to old versions of FreeBSD

Reviewed by:	imp, benno
2017-12-27 03:23:21 +00:00
kan
c8da6fae2c Do pass removing some write-only variables from the kernel.
This reduces noise when kernel is compiled by newer GCC versions,
such as one used by external toolchain ports.

Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial)
Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c)
Differential Revision: https://reviews.freebsd.org/D10385
2017-12-25 04:48:39 +00:00
kib
de4f3129c0 Fix build for LP64 arches with gcc.
gcc complaints that the comparision is always false due to the value
range, and the cast does not prevent the analysis.  Split the LP64
vs. ILP32 clamping as a workaround.

Sponsored by:	The FreeBSD Foundation
2017-12-21 23:08:10 +00:00
jhb
69ce06448a Replace one more LINK_MAX with NFS_LINK_MAX missed in r326991.
Sponsored by:	Chelsio Communications
2017-12-19 22:43:39 +00:00
jhb
5ea2187129 Update link count handling in fuse for post-ino64.
Set FUSE_LINK_MAX to UINT32_MAX instead of LINK_MAX to match the maximum
link count possible in the 'nlink' field of 'struct fuse_attr'.

Sponsored by:	Chelsio Communications
2017-12-19 22:40:54 +00:00
jhb
e09154bf75 Rework pathconf handling for FIFOs.
On the one hand, FIFOs should respect other variables not supported by
the fifofs vnode operation (such as _PC_NAME_MAX, _PC_LINK_MAX, etc.).
These values are fs-specific and must come from a fs-specific method.
On the other hand, filesystems that support FIFOs are required to
support _PC_PIPE_BUF on directory vnodes that can contain FIFOs.
Given this latter requirement, once the fs-specific VOP_PATHCONF
method supports _PC_PIPE_BUF for directories, it is also suitable for
FIFOs permitting a single VOP_PATHCONF method to be used for both
FIFOs and non-FIFOs.

To that end, retire all of the FIFO-specific pathconf methods from
filesystems and change FIFO-specific vnode operation switches to use
the existing fs-specific VOP_PATHCONF method.  For fifofs, set it's
VOP_PATHCONF to VOP_PANIC since it should no longer be used.

While here, move _PC_PIPE_BUF handling out of vop_stdpathconf() so that
only filesystems supporting FIFOs will report a value.  In addition,
only report a valid _PC_PIPE_BUF for directories and FIFOs.

Discussed with:	bde
Reviewed by:	kib (part of a larger patch)
MFC after:	1 month
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D12572
2017-12-19 22:39:05 +00:00
jhb
79dcd6ae46 Update tmpfs link count handling for ino64.
Add a new TMPFS_LINK_MAX to use in place of LINK_MAX for link overflow
checks and pathconf() reporting.  Rather than storing a full 64-bit
link count, just use a plain int and use INT_MAX as TMPFS_LINK_MAX.

Discussed with:	bde
Reviewed by:	kib (part of a larger patch)
Sponsored by:	Chelsio Communications
2017-12-19 20:19:07 +00:00
jhb
dc7a93fc9e Honor NANDFS_LINK_MAX for post-ino64.
This uses NANDFS_LINK_MAX instead of LINK_MAX for link overflow checks
and the value reported by pathconf() / fpathconf().

Sponsored by:	Chelsio Communications
2017-12-19 20:17:07 +00:00
jhb
050eb30c42 Report INT_MAX for LINK_MAX for devfs' VOP_PATHCONF().
devfs uses int's for link counts internally and already reports the the
full link count via stat() post ino64.

Sponsored by:	Chelsio Communications
2017-12-19 20:07:57 +00:00
jhb
f5bd099613 Use FUSE_LINK_MAX for LINK_MAX in fuse' VOP_PATHCONF().
Should have included this in r326993.

MFC after:	1 month
Sponsored by:	Chelsio Communications
2017-12-19 19:57:55 +00:00
jhb
4531c97d0c Handle _PC_FILESIZEBITS and _PC_SYMLINK_MAX for devfs' VOP_PATHCONF().
MFC after:	1 month
Sponsored by:	Chelsio Communications
2017-12-19 19:53:34 +00:00
jhb
3efec8ad25 Move NAME_MAX, LINK_MAX, and CHOWN_RESTRICTED out of vop_stdpathconf().
Having all filesystems fall through to default values isn't always correct
and these values can vary for different filesystem implementations.  Most
of these changes just use the existing default values with a few exceptions:
- Don't report CHOWN_RESTRICTED for ZFS since it doesn't do the exact
  permissions check this claims for chown().
- Use NANDFS_NAME_LEN for NAME_MAX for nandfs.
- Don't report a LINK_MAX of 0 on smbfs.  Now fail with EINVAL to
  indicate hard links aren't supported.

Requested by:	bde (though perhaps not this exact implementation)
Reviewed by:	kib (earlier version)
MFC after:	1 month
Sponsored by:	Chelsio Communications
2017-12-19 19:51:36 +00:00
jhb
580c86e043 Update NFS to handle larger link counts post ino64.
- Define a NFS_LINK_MAX as UINT32_MAX to match the wire protocol.
- Use NFS_LINK_MAX instead of LINK_MAX as the fallback value reported
  for a PATHCONF RPC by the NFS server.
- Use NFS_LINK_MAX instead of LINK_MAX as the default value reported
  by the NFS client pathconf() if not overridden by the NFS server.
- When reading the link count out of an RPC reply, read the full 32
  bits instead of the lower 16 bits.

Reviewed by:	rmacklem (earlier version)
Sponsored by:	Chelsio Communications
2017-12-19 19:18:48 +00:00
jhb
7246635251 Handle _PC_FILESIZEBITS and _PC_NO_TRUNC for smbfs' VOP_PATHCONF().
MFC after:	1 month
Sponsored by:	Chelsio Communications
2017-12-19 19:14:01 +00:00
jhb
c671bf7768 Support _PC_FILESIZEBITS in msdosfs' VOP_PATHCONF().
MFC after:	1 month
Sponsored by:	Chelsio Communications
2017-12-19 19:10:00 +00:00
jhb
826956798a Add a custom VOP_PATHCONF method for fuse.
This method handles _PC_FILESIZEBITS, _PC_SYMLINK_MAX, and _PC_NO_TRUNC.
For other values it defers to vop_stdpathconf().

MFC after:	1 month
Sponsored by:	Chelsio Communications
2017-12-19 19:09:06 +00:00
jhb
b731672881 Add a custom VOP_PATHCONF method for fdescfs.
The method handles NAME_MAX and LINK_MAX explicitly.  For all other
pathconf variables, the method passes the request down to the underlying
file descriptor.  This requires splitting a kern_fpathconf() syscallsubr
routine out of sys_fpathconf().  Also, to avoid lock order reversals with
vnode locks, the fdescfs vnode is unlocked around the call to
kern_fpathconf(), but with the usecount of the vnode bumped.

MFC after:	1 month
Sponsored by:	Chelsio Communications
2017-12-19 18:20:38 +00:00
kib
a26924110c In devfs_lookupx() dotdot lookup case, avoid dereferencing
dvp->v_mount after dvp is unlocked.

The vnode might be reclaimed after unlock, so v_mount becomes NULL.
Cache the struct mount pointer before the unlock, the struct is
type-stable.

Note that devfs_allocv() reads mp->mnt_data but does not operate on it
further when dirent is doomed.  The unmount cannot proceed until all
dirents are reclaimed.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-12-14 13:41:11 +00:00
pfg
e11cf14dc2 SPDX: some uses of the RSA-MD license. 2017-12-13 16:30:39 +00:00
fsu
f4e0f45cb9 Fix extattr getters in case of neither uio nor buffer was not passed to VOP_*.
Approved by:    pfg (mentor)
MFC after:      2 weeks

Differential Revision:    https://reviews.freebsd.org/D13359
2017-12-12 20:02:48 +00:00
rmacklem
6cd8602fbf Define macros used by the pNFS server code.
This commit defines some macros used by the pNFS server code.
They will not be used until the main pNFS server code merge occurs,
which will probably be in April 2018.
2017-12-09 21:04:56 +00:00
glebius
b99848bbc4 Fix file missed in r326607. 2017-12-06 00:44:49 +00:00
glebius
df52bf13ce Reduce pollution via tmpfs.h. 2017-12-06 00:42:08 +00:00
rmacklem
7a197a7ade Avoid the overhead of acquiring a lock in nfsrv_checkgetattr() when
there are no write delegations issued.

manu@ reported on the freebsd-current@ mailing list that there was
a significant performance hit in nfsrv_checkgetattr() caused by
the acquisition/release of a state lock, even when there were no
write delegations issued.
This patch add a count of outstanding issued write delegations to the
NFSv4 server. This count allows nfsrv_checkgetattr() to return without
acquiring any lock when the count is 0, avoiding the performance hit
for the case where no write delegations are issued.

Reported by:	manu
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D13327
2017-12-04 21:50:27 +00:00
pfg
08e0fdd25e SPDX: Complete license IDs for ext2fs. 2017-12-02 17:22:55 +00:00
manu
cae4d7505c r326394 is calling malloc with M_WAITOK under a lock, revert for now
Reported by:	andrew
2017-11-30 14:06:54 +00:00
manu
cbfe9423a0 devfs: Avoid a malloc/free if we just need to increment the refcount
MFC after:	1 week
Sponsored by:	Gandi.net
2017-11-30 12:38:42 +00:00
pfg
43f5681c36 sys/fs: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:15:37 +00:00
cem
811de45335 msdosfs(5): Reflect READONLY attribute in file mode
Msdosfs allows setting READONLY by clearing the owner write bit of the file
mode.  (While here, correct the misspelling of S_IWUSR as VWRITE.  No
functional change.)

In msdosfs_getattr, intuitively reflect that READONLY attribute to userspace
in the file mode.

Reported by:	Karl Denninger <karl AT denninger.net>
Sponsored by:	Dell EMC Isilon
2017-11-20 21:38:24 +00:00
pfg
4736ccfd9c sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
delphij
19153ec635 Remove unused header. 2017-11-19 03:52:03 +00:00
delphij
b322b6ecb9 Remove unused header. 2017-11-19 03:51:47 +00:00
pfg
9da7bdde06 spdx: initial adoption of licensing ID tags.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes:	yes
Differential Revision:	https://reviews.freebsd.org/D13133
2017-11-18 14:26:50 +00:00
jeff
3c355d849c Replace manyinstances of VM_WAIT with blocking page allocation flags
similar to the kernel memory allocator.

This simplifies NUMA allocation because the domain will be known at wait
time and races between failure and sleeping are eliminated.  This also
reduces boilerplate code and simplifies callers.

A wait primitive is supplied for uma zones for similar reasons.  This
eliminates some non-specific VM_WAIT calls in favor of more explicit
sleeps that may be satisfied without new pages.

Reviewed by:	alc, kib, markj
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
2017-11-08 02:39:37 +00:00
hselasky
2873c08d55 Allow CUSE(3) to free all memory mapped memory by using regular SWAP objects
instead of malloc(). The SWAP objects are automagically freed when there are no
more consumers. This greatly simplifies the mmap logic inside CUSE(3) in the
kernel. This change fixes an issue where mmapped memory can accumulate and never
get freed, if many different mmap sizes are needed over time. Further this
change fixes memory leaks when the CUSE(3) kernel module is unloaded.

While at it make sure the CUSE_ALLOC_PAGES_MAX limit is treated as an exclusive
limit. CUSE(3) memory maps must be less than CUSE_ALLOC_PAGES_MAX number of pages.

Reviewed by:		kib @
Differential Revision:	https://reviews.freebsd.org/D11392
Sponsored by:		Mellanox Technologies
MFC after:		1 week
2017-11-03 14:10:57 +00:00
fsu
edb3be92a6 Fix physical block number overflow in different places.
Approved by:    pfg (mentor)
MFC after:      6 months
2017-10-24 20:10:08 +00:00
fsu
be8349452e Set doreallocblks sysctl value to zero by default because of
possibility of filesystem corruption.

Approved by:    pfg (mentor)
MFC after:      2 weeks
2017-10-24 19:16:25 +00:00
fsu
66c5b6ebb7 Do not free bufs in case of extents metadata blocks + remove unneeded asserts.
Approved by:    pfg (mentor)
MFC after:      6 months
2017-10-24 19:14:33 +00:00
mjoras
d7ecc62316 Move clear_unrhdr to tmpfs_free_tmp.
Clearing the unr in tmpfs_unmount is not correct. In the case of
multiple references to the tmpfs mount (e.g. when there are lookup
threads using it) it will not be the one to finish tmpfs_free_tmp. In
those cases tmpfs_free_node_locked will be the final one to execute
tmpfs_free_tmp, and until then the unr must be valid.

Reported by:	pho
Approved/reviewed by:	rstone (mentor)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12749
2017-10-23 15:43:38 +00:00
markj
d72f1cfb7e Delete declarations of struct pfs_bitmap, removed in r143841.
MFC after:	1 week
2017-10-22 20:22:11 +00:00
fsu
4140eed8a1 Fix unused variable + style(9) fixes inside the ext4_ext_find_extent()
Approved by:    pfg (mentor)
Reported by:    Coverity
CID:            1381754
MFC after:      6 months
2017-10-19 16:42:03 +00:00
emaste
2531d8deb3 msdosfs: fix build with MSDOSFS_DEBUG
Inspired by a patch submission by longwitz@incore.de with many changes
for ino64 in HEAD.

PR:		199152
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2017-10-19 12:55:11 +00:00
rlibby
ff104cc9da ext2: delete redundant decl of ext2_fserr
Fix gcc build after r324706.

Reviewed by:	pfg
Differential Revision:	https://reviews.freebsd.org/D12709
2017-10-18 00:41:23 +00:00
fsu
933d923fdf Add inital extents read-write support.
Approved by:    pfg (mentor)
MFC after:      6 months
RelNotes:       Yes

Differential Revision:    https://reviews.freebsd.org/D12087
2017-10-17 20:45:44 +00:00
rmacklem
1af39d84c0 Use taskqueue(9) to do writes/commits to mirrored DSs concurrently.
When the NFSv4.1 pNFS client is using a Flexible File Layout specifying
mirrored Data Servers, it must do the writes and commits to all mirrors.
This patch modifies the client to use a taskqueue to perform these writes
and commits concurrently.
The number of threads can't be changed for taskqueue(9), so it is set
to 4 * mp_ncpus by default, but this can be overridden by setting the
sysctl vfs.nfs.pnfsiothreads.

Differential Revision:	https://reviews.freebsd.org/D12632
2017-10-16 23:28:12 +00:00
rmacklem
da67d313c5 Fix the client IP address reported by nfsdumpstate for 64bit arch and NFSv4.1.
The client IP address was not being reported for some NFSv4 mounts by
nfsdumpstate. Upon investigation, two problems were found for mounts
using IPv4. One was that the code (originally written and tested on i386)
assumed that a "u_long" was a "uint32_t" and would exactly store an
IPv4 host address. Not correct for 64bit arches.
Also, for NFSv4.1 mounts, the field was not being filled in. This was
basically correct, because NFSv4.1 does not use a callback address.
However, it meant that nfsdumpstate could not report the client IP addr.
This patch should fix both of these issues.
For IPv6, the address will still not be reported. The original NFSv4 RFC
only specified IPv4 callback addresses. I think this has changed and, if so,
a future commit to fix reporting of IPv6 addresses will be needed.

Reported by:	manu
PR:		223036
MFC after:	2 weeks
2017-10-15 22:22:27 +00:00
fsu
9658e6ca36 Add extended attributes support to fuse kernel module.
Author:         kem
Reviewed by:    cem, pfg (mentor)
Approved by:    pfg (mentor)
MFC after:      2 weeks

Differential Revision: https://reviews.freebsd.org/D12485
2017-10-14 19:02:52 +00:00
mjoras
851d0b97ab When unmounting a tmpfs, do not call free_unr.
tmpfs uses unr(9) to allocate inodes. Previously when unmounting it
would individually free the units when it freed each vnode. This is
unnecessary as we can use the newly-added unrhdr_clear function to clear
out the unr in onde go. This measurably reduces the time to unmount a
tmpfs with many files.

Reviewed by:	cem, lidl
Approved by:	rstone (mentor)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12591
2017-10-11 21:53:53 +00:00
rmacklem
e28755e64d Fix forced dismount when a pNFS mount is hung on a DS.
When a "pnfs" NFSv4.1 mount is hung because of an unresponsive DS,
a forced dismount wouldn't work, because the RPC socket for the DS
was not being closed. This patch fixes this.
This will only affect "pnfs" mounts where the pNFS server's DS
is unresponsive (crashed or network partitioned or...).
Found during testing of the pNFS server.

MFC after:	2 weeks
2017-10-10 21:05:40 +00:00
rmacklem
61b14fcdea Add Flex File Layout support to the NFSv4.1 pNFS client.
This patch adds support for the Flexible File Layout to the pNFS client.
Although the patch is rather large, it should only affect NFS mounts
using the "pnfs" option against pNFS servers that do not support File
Layout.
There are still a couple of things missing from the Flexible File Layout
client implementation:
- The code does not yet do a LayoutReturn with I/O error stats when
  I/O error(s) occur when attempting to do I/O on a DS.
  This will be fixed in a future commit, since it is important for the
  MDS to know that I/O on a DS is failing.
- The current code does writes and commits to mirror DSs serially.
  Making them happen concurrently will be done in a future commit,
  after discussion on freebsd-current@ on the best way to do this.
- The code does not handle NFSv4.0 DSs. Since there is no extant pNFS
  server that implements NFSv4.0 DSs and NFSv4.1 DSs makes more sense
  now, I don't intend to implement this until there is a need for it.
  There is support for NFSv4.1 and NFSv3 DSs.
2017-10-05 20:10:40 +00:00
hselasky
a1144014f8 Add support for new cuse(3) error code, CUSE_ERR_NO_DEVICE.
This error code is useful when emulating Linux input event
devices from userspace.

PR:			218626
Submitted by:		jan.kokemueller@gmail.com
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-10-05 16:42:02 +00:00
rmacklem
c6845ee7a4 Add a few definitions for the Flex File Layout.
This patch adds a few definitions for the Flex File Layout.
Until a future commit adds Flex File layout support, these new fields
are not used.
This patch should not affect the "pnfs" option for File Layout.
2017-10-04 22:55:30 +00:00
jhb
423e0b6c94 Flesh out pathconf() on UDF.
- Return 64 bits for _PC_FILESIZEBITS.
- Handle _PC_SYMLINK_MAX.
- Defer _PC_PATH_MAX to vop_stdpathconf().

Sponsored by:	Chelsio Communications
2017-10-02 23:31:11 +00:00
jhb
a0f436b3ed Return 64 for pathconf(_PC_FILESIZEBITS) on tmpfs.
Sponsored by:	Chelsio Communications
2017-10-02 23:23:12 +00:00
jhb
7b1d4528fd Handle _PC_FILESIZEBITS and _PC_SYMLINK_MAX pathconf() requests in cd9660.
cd9660 only supports symlinks with Rock Ridge extensions, so
_PC_SYMLINK_MAX is conditional on Rock Ridge.

Sponsored by:	Chelsio Communications
2017-10-02 23:12:02 +00:00
mjg
24290ab700 tmpfs: skip zero-sized page count updates
Such updates consisted of vast majority of modificiations, especially
in tmpfs_reg_resize.

For the case where page count did no change and the size grew we only
need to update tn_size. Use this fact to avoid vm object lock/relock.

MFC after:	1 week
2017-09-30 18:23:45 +00:00
rmacklem
a4636ede15 Add support for Flex File Layout to the pNFS client structures.
This patch modifies the pNFS client layout and deviceinfo structures
to add fields and unions for the Flex File Layout. Until a future
commit adds Flex File layout support, these new fields are not used.
This patch should not affect the "pnfs" option for File Layout.
2017-09-29 23:13:01 +00:00
rmacklem
fce41e8299 Add the NFS client state flag that enables Flexible File Layout.
This patch adds a NFSSTA_FLEXFILE flag that will be used to enable
Flexible File Layout for the NFSv4.1 pNFS client. It is not yet
used, but will be after a future commit adds Flex File Layout support.
2017-09-28 23:05:08 +00:00
rmacklem
888b65faac Change nfsv4_getipaddr() and nfsrpc_fillsa() to not use sockaddr_storage.
This patch changes nfsv4_getipaddr() and nfsrpc_fillsa() to use
a sockaddr_in * and sockaddr_in6 * instead of sockaddr_storage, to
avoid allocating the latter on the stack. It also moves the nfsrpc_fillsa()
call to after the completion of parsing of the DeviceInfo reply from
the server. This patch is in preparation for addition of Flex File
Layout support in a future commit.
It only affects the "pnfs" NFSv4.1 client mount option and should not
have changed its semantics.
2017-09-28 22:33:01 +00:00
rmacklem
9dd547c597 Fix a memory leak that occurred in the pNFS client.
When a "pnfs" NFSv4.1 mount was unmounted, it didn't free up the layouts
and deviceinfo structures. This leak only affects "pnfs" mounts and only
when the mount is umounted.
Found while testing the pNFS Flexible File layout client code.

MFC after:	2 weeks
2017-09-27 23:23:41 +00:00
markj
ae879c6393 Use C99 initializers for DTrace provider methods.
This makes the definitions easier to read and more cscope-friendly.

MFC after:	1 week
2017-09-27 17:46:38 +00:00
fsu
ac00ff77c6 Add check to avoid raw inode iblocks fields overflow in case of huge_file feature.
Use the Linux logic for now.

Reviewed by:    pfg (mentor)
Approved by:    pfg (mentor)
MFC after:      2 weeks
Differential Revision: https://reviews.freebsd.org/D12131
2017-09-27 16:12:13 +00:00
rmacklem
3ccccd3209 Add major and minor version arguments to nfscl_reqstart().
This patch adds "vers" and "minorvers" arguments to nfscl_reqstart().
The patch always passes them in as "0" and that implies no change
in semantics. These arguments will be used by a future commit that
adds support for the Flexible File Layout.
2017-09-26 23:42:44 +00:00
jhb
55f7f610b5 Use tmpfs_print for tmpfs FIFOs.
Reviewed by:	kib (part of a larger patch)
2017-09-25 20:26:16 +00:00
rmacklem
a18386826f Change a panic to an error return.
There was a panic() in the NFS server's write operation that didn't
need to be a panic() and could just be an error return.
This patch makes that change.
Found by code inspection during development of the pNFS service.

MFC after:	2 weeks
2017-09-24 20:05:48 +00:00
rmacklem
7b6c655221 Remove 0 filling from nfsm_uiombuflist().
nfsm_uiombuflist() zero filled the mbuf list to a multiple of 4bytes
as required for XDR. Unfortunately that modified an mbuf list after
it was m_copym()'d and was broken. This patch removes the zero filling code.
Since nfsm_uiombuflist() is not yet used in head/current, this has no
effect on users.
The function will be used by a future commit of code that adds Flex
File Layout support.
2017-09-24 19:43:31 +00:00
jhb
c5f4b54f16 Only handle _PC_MAX_CANON, _PC_MAX_INPUT, and _PC_VDISABLE for TTY devices.
Move handling of these three pathconf() variables out of vop_stdpathconf()
and into devfs_pathconf() as TTY devices can only be devfs files.  In
addition, only return settings for these three variables for devfs devices
whose device switch has the D_TTY flag set.

Discussed with:	bde, kib
Sponsored by:	Chelsio Communications
2017-09-21 23:05:32 +00:00
rmacklem
e8aa017e4d Add a few definitions for Flex File Layout for pNFS.
These definitions will be used by a future commit.
2017-09-21 00:41:12 +00:00
rmacklem
0ca7175e4b Make the nfsrpc_layoutget() function a static.
Make the NFSv4 pNFS client function nfsrpc_layoutget() a static, since it
is only used in sys/fs/nfsclient/nfs_clrpcops.c.
This prepares the code for future patches that add Flex File layout
support.
2017-09-19 23:28:22 +00:00
rmacklem
bc83a96e24 Add a new function called nfsm_uiombuflist(), similar to nfsm_uiombuf().
This patch adds a new function called nfsm_uiombuflist(), which is
similar to nfsm_uiombuf(), but doesn't not use the fields in
struct nfsrv_descript. This new function will be used by the pNFS client
for writing to mirrors using Flex Files layout.
The function is not yet called anywhere.
Also, get rid of #ifndef APPLE, which is ancient cruft left over from
the Mac OSX port of the NFSv4 client.
2017-09-19 21:31:36 +00:00
rmacklem
1ffac5aafb Simplify nfsrpc_layoutreturn() args.
Simplify nfsrpc_layoutreturn() args. in preparation for the addition
of Flex File layout support, since File layout uses a 0 length field.
Flex Files does use a longer field, but that will be added in a
subsequent commit.
2017-09-19 20:45:25 +00:00
rmacklem
f29a033ec4 Simplify nfsrpc_layoutcommit() args.
Simplify nfsrpc_layoutcommit() args. in preparation for the addition
of Flex File layout support, since it also uses a 0 length field.
2017-09-19 20:18:41 +00:00
rmacklem
6cf3ab0b1d Fix bogus FREAD with NFSV4OPEN_ACCESSREAD. No functional change.
The code in nfscl_doflayoutio() bogusly used FREAD instead of
NFSV4OPEN_ACCESSREAD. Since both happen to be defined as "1", this
worked and the patch doesn't result in a functional change.
Found by inspection during development of Flex File Layout support.

MFC after:	2 weeks
2017-09-17 22:18:01 +00:00
kib
7721974753 Style.
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-08-28 21:04:56 +00:00
kib
18480c36c8 Verify that the BPB media descriptor and FAT ID match.
FAT specification requires that for valid FAT, FAT cluster 0 has a
specific value derived from the BPB media descriptor.  The lowest
(little-endian) byte must be equal to bpb.bpbMedia, other bits in the
cluster number must be all 1's.  Implement the check to reduce the
chance of the randomly corrupted FAT to pass the mount attempt.

Submitted by:	Siva Mahadevan <smahadevan@freebsdfoundation.org>
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D12124
2017-08-28 20:52:32 +00:00
kib
8b66e16967 Do not drop NFS vnode lock when performing consistency checks.
Currently several paths in the NFS client upgrade the shared vnode
lock to exclusive, which might cause temporal dropping of the lock.
This action appears to be fatal for nullfs mounts over NFS. If the
operation is performed over nullfs vnode, then bypassed down to NFS
VOP, and the lock is dropped, other thread might reclaim the upper
nullfs vnode.  Since on reclaim the nullfs vnode lock and NFS vnode
lock are split, the original lock state of the nullfs vnode is not
restored.  As result, VFS operations receive not locked vnode after a
VOP call.

Stop upgrading the vnode lock when we check the consistency or flush
buffers as result of detected inconsistency.  Instead, allocate a new
lockmgr lock for each NFS node, which is locked exclusive instead of
the vnode lock upgrade.  In other words, the other parallel
modification of the vnode are excluded by either vnode lock conflict
or exclusivity of the new lock when the vnode lock is shared.

Also revert r316529 because now the vnode cannot be reclaimed during
ncl_vinvalbuf().

In collaboration with:	pho
Reviewed by:	rmacklem
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D12083
2017-08-20 10:08:45 +00:00
markj
128fe21bdc Bump the maximum file name length in pseudofs filesystems to 48.
The previous limit of 24 was somewhat restrictive, and with this change
ceil(log2(sizeof(struct pfs_node))) is the same as before in both the ILP32
and LP64 models, so the malloc zone used for allocations of struct pfs_node
is the same as before.

Approved by:	des
2017-08-03 21:35:53 +00:00
dchagin
b084ee9dfc Implement proper Linux /dev/fd and /proc/self/fd behavior by adding
Linux specific things to the native fdescfs file system.

Unlike FreeBSD, the Linux fdescfs is a directory containing a symbolic
links to the actual files, which the process has open.
A readlink(2) call on this file returns a full path in case of regular file
or a string in a special format (type:[inode], anon_inode:<file-type>, etc..).
As well as in a FreeBSD, opening the file in the Linux fdescfs directory is
equivalent to duplicating the corresponding file descriptor.

Here we have mutually exclusive requirements:
- in case of readlink(2) call fdescfs lookup() method should return VLNK
vnode otherwise our kern_readlink() fail with EINVAL error;
- in the other calls fdescfs lookup() method should return non VLNK vnode.

For what new vnode v_flag VV_READLINK was added, which is set if fdescfs has beed
mounted with linrdlnk option an modified kern_readlinkat() to properly handle it.

For now For Linux ABI compatibility mount fdescfs volume with linrdlnk option:

    mount -t fdescfs -o linrdlnk null /compat/linux/dev/fd

Reviewed by:	kib@
MFC after:	1 week
Relnotes:	yes
2017-08-01 03:40:19 +00:00
mav
c2a28dc66d Improve FHA locality control for NFS read/write requests.
This change adds two new tunables, allowing to control serialization for
read and write NFS requests separately.  It does not change the default
behavior since there are too many factors to consider, but gives additional
space for further experiments and tuning.

The main motivation for this change is very low write speed in case of ZFS
with sync=always or when NFS clients requests sychronous operation, when
every separate request has to be written/flushed to ZIL, and requests are
processed one at a time.  Setting vfs.nfsd.fha.write=0 in that case allows
to increase ZIL throughput by several times by coalescing writes and cache
flushes.  There is a worry that doing it may increase data fragmentation
on disks, but I suppose it should not happen for pool with SLOG.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2017-07-31 15:23:19 +00:00
rmacklem
396df4f3c4 Add kernel support for the NFS client forced dismount "umount -N" option.
When an NFS mount is hung against an unresponsive NFS server, the "umount -f"
option can be used to dismount the mount. Unfortunately, "umount -f" gets
hung as well if a "umount" without "-f" has already been done. Usually,
this is because of a vnode lock being held by the "umount" for the mounted-on
vnode.
This patch adds kernel code so that a new "-N" option can be added to "umount",
allowing it to avoid getting hung for this case.
It adds two flags. One indicates that a forced dismount is about to happen
and the other is used, along with setting mnt_data == NULL, to handshake
with the nfs_unmount() VFS call.
It includes a slight change to the interface used between the client and
common NFS modules, so I bumped __FreeBSD_version to ensure both modules are
rebuilt.

Tested by:	pho
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D11735
2017-07-29 19:52:47 +00:00
rmacklem
0d5371c8ce Fix possible crash for the NFSv4.1 pNFS client.
If the nfsrpc_createlayoutrpc() call in nfsrpc_getcreatelayout() fails,
the code used nfhpp when it might be set NULL. This patch checks for
the error cases (laystat != 0) and avoids using nfhpp for the failure case.
This would only affect NFSv4.1 mounts with the "pnfs" option.
Found while testing the "umount -N" patch not yet in head.

MFC after:	2 weeks
2017-07-29 02:25:49 +00:00
rmacklem
5bbbf9ea35 Replace the checks for MNTK_UNMOUNTF with a macro that does the same thing.
This patch defines a macro that checks for MNTK_UNMOUNTF and replaces
explicit checks with this macro. It has no effect on semantics, but
prepares the code for a future patch where there will also be a
NFS specific flag for "forced dismount about to occur".

Suggested by:	kib
MFC after:	2 weeks
2017-07-27 20:55:31 +00:00
kib
8fd2052d87 Mark pages after EOF as clean after pageout.
Suppose that a file on NFS has partially filled last page, and this
page is dirty.  NFS VOP_PAGEOUT() method only marks the the page clean
up to the block of the last written byte, leaving other blocks dirty.
Also any page which erronously exists in the vnode vm_object past EOF
is also left marked as dirty.

With the introduction of the buf-cache coherent pager, each pass of
syncer over the object with such page results in creation of B_DELWRI
buffer due to VOP_WRITE() call.  This buffer is noted on next syncer
pass, which results e.g. a visible manifestation of shutdown never
finishing vnode sync.  Note that before buf-cache coherency commit, a
dirty page might left never synced to server if a partial writes
occur.

Fix this by clearing dirty bits after EOF.  Only blocks of the partial
page which are completely after EOF are marked clean, to avoid
possible user data loss.

Reported by:	mav
Reviewed by:	alc, markj
Tested by:	mav, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D11697
2017-07-26 20:07:05 +00:00
kib
a748399a51 Move rtvals initialization out of the region protected by NFS node
lock.

Noted by:	alc
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-Differential revision:	https://reviews.freebsd.org/D11697
2017-07-26 20:01:31 +00:00
dchagin
2acbd91e85 Replace unnecessary _KERNEL by double-include protection.
MFC after:	2 week
2017-07-25 06:59:35 +00:00
rmacklem
4f0e3fa213 r320062 introduced a bug when doing NFSv4.1 mounts against some non-FreeBSD servers.
r320062 used nm_rsize, nm_wsize to set the maximum request/response sizes for
the NFSv4.1 session. If rsize,wsize are not specified as options, the
value of nm_rsize, nm_wsize is 0 at session creation, resulting in
values for request/response that are too small.
This patch fixes the problem. A workaround is to specify rsize=N,wsize=N
mount options explicitly, so they are set before session creation.
This bug only affects NFSv4.1 mounts against some non-FreeBSD servers.

MFC after:	1 week
2017-07-21 00:14:43 +00:00
rmacklem
a6315c2c10 Revert r321308. I'll commit a better fix soon. 2017-07-20 23:59:47 +00:00
rmacklem
3359d0c4fa r320062 introduced a bug when doing NFSv4.1 mounts against some non-FreeBSD servers.
r320062 used nm_rsize, nm_wsize to set the maximum request/response sizes for
the NFSv4.1 session. If rsize,wsize are not specified as options, the
value of nm_rsize, nm_wsize is 0 at session creation, resulting in
values for request/response that are too small.
This patch fixes the problem. A workaround is to specify rsize=N,wsize=N
mount options explicitly, so they are set before session creation.
This bug only affects NFSv4.1 mounts against some non-FreeBSD servers.

MFC after:	1 week
2017-07-20 23:15:50 +00:00
trasz
04dc5d07ae Rename vfs.nfsd.enable_uidtostring to vfs.nfs.enable_uidtostring.
It applies to both NFS client and NFS server, and is useful for both.
This is different from vfs.nfsd.enable_stringtouid, which is specific
to server side.

Reviewed by:	rmacklem@
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2017-07-19 09:59:32 +00:00