Commit Graph

3366 Commits

Author SHA1 Message Date
pfg
fe5a17c2a7 ext2fs: passthrough any extra timestamps to the dinode struct.
In general we don't trust any of the extended timestamps unless the
EXT2F_ROCOMPAT_EXTRA_ISIZE feature is set. However, in the case where
we freshly allocated a new inode the information is valid and it is
better to pass it along instead of leaving the value undefined.

This should have no practical effect but should reduce the amount of
garbage if EXT2F_ROCOMPAT_EXTRA_ISIZE is set, like in cases where the
filesystem is converted from ext3 to ext4.

MFC after:	4 days
2016-01-24 23:24:47 +00:00
pfg
84cfebb132 ext2: rename some directory index constants.
Missed from r294653.

Pointyhat:	me
2016-01-24 04:30:30 +00:00
pfg
01bfa389ba Fix comment. 2016-01-24 02:44:00 +00:00
pfg
3fde4bfd1c Rename some directory index constants.
Directory index was introduced in ext3. We don't always use the
prefix to denote the ext2 variant they belong to but when we
do we should try to be accurate.
2016-01-24 02:41:49 +00:00
pfg
d2a41899f8 ext2: Initialize i_flag after allocation.
We use i_flag to carry some flags like IN_E4INDEX which newer
ext2fs variants uses internally.

fsck.ext3 rightfully complains after our implementation tags
non-directory inodes with INDEX_FL.

Initializing i_flag during allocation removes the noise factor
and quiets down fsck.

Patch from:	Damjan Jovanovic
PR:		206530
2016-01-24 02:25:41 +00:00
kib
8c18805577 When devfs dirent is freed, a vnode might still keep a pointer to it,
apparently.  Interlock and clear the pointer to avoid free memory
dereference.

Submitted by:	bde (previous version)
MFC after:	3 weeks
2016-01-22 20:30:51 +00:00
pfg
d16eeed462 ext2fs: Bring back the htree dir_index implementation.
The htree dir_index is perhaps one of the most characteristic
features of the linux ext3 implementation. It was removed
in r281670, due to repeated bug reports.

Damjan Jovanic detected and fixed three bugs and did some
stress testing by building Apache OpenOffice on top of it
so it is now in good shape to bring back.

Differential Revision:	https://reviews.freebsd.org/D5007

Submitted by:	Damjan Jovanovic
Reviewed by:	pfg
Tested by:	pho
Relnotes:	Yes
MFC after:	2 months (only 10.x)
2016-01-21 14:50:28 +00:00
kib
32d7f35235 Assert that the linkage between struct cdev_privdata and and struct
file is consistent.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-01-17 08:34:35 +00:00
rpokala
157f6a3fb8 [PR 206224] bv_cnt is sometimes examined without holding the bufobj lock
Add locking around access to bv_cnt which is currently being done unlocked

PR:		206224
Reviewed by:	imp
Approved by:	jhb
MFC after:	1 week
Sponsored by:	Panasas, Inc.
Differential Revision:	https://reviews.freebsd.org/D4931
2016-01-17 01:04:20 +00:00
bz
4d126fab0e Unbreak NOIP builds after r294084. 2016-01-15 16:45:36 +00:00
melifaro
3243205726 Make nfscl_getmyip() use new routing KPI.
* Use standard IPv6 SAS instead of rt->rt_ifa address.
* Make address lookup work for IPv6 LLA.
* Save address into buffer provided by caller instead of using static vars.

Discussed with:	rmacklem
2016-01-15 09:05:14 +00:00
kib
21f21e7647 Make devfs_fpdrop() static. It was not a public KPI, and it has no
reason to remain exported for some time.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-01-13 14:03:06 +00:00
pfg
4ba3f35490 ext4: mount panic from freeing invalid pointers
Initialize the struct with those fields to zeroes on allocation,
preventing the panic.

Patch by:	Damjan Jovanovic.

PR:		206056
MFC after:	3 days
2016-01-11 19:25:43 +00:00
pfg
52388dd9b7 ext4: add support for reading sparse files
Add support for sparse files in ext4. Also implement read-ahead, which
greatly increases the performance when transferring files from ext4.

Both features implemented by Damjan Jovanovic.

PR:		205816
MFC after:	1 week
2016-01-11 19:14:55 +00:00
ae
8c83f31276 Change the type of newsize argument in the smbfs_smb_setfsize() function
from int to int64.
MSDN says that SMB_SET_FILE_END_OF_FILE_INFO uses signed 64-bit integer
to specify offset, but since smbfs_smb_setfsize() has used plain int,
a value was truncated in case when offset was larger than 2G.
	https://msdn.microsoft.com/en-us/library/ff469975.aspx

In particular, now `truncate -s 10G` will work correctly on the mounted
SMB share.

Reported and tested by:	Eugene Grosbein <eugen at grosbein dot net>
MFC after:	1 week
2016-01-11 18:11:06 +00:00
pfg
a32f535abc ext2fs: reading mmaped file in Ext4 causes panic
Always call brelse(path.ep_bp), fixing reading EXT4 files using mmap().

Patch by Damjan Jovanovic.

PR:		205938
MFC after:	1 week
2016-01-07 21:43:43 +00:00
kib
ae62b8f932 Hide transient EBADF errors caused by the parallel revoke(2) or forced
unmount of devfs mounts, by restarting the failed syscall.

When restarted, failing syscalls eventually either stop finding the
node and returning ENOENT, or the vnode op vectors finally transition
to the deadfs vop.  The later return EIO or other error, more
appropriate for the operation.

Submitted by:	bde
Tested by:	pho
MFC after:	3 weeks
2016-01-02 20:29:28 +00:00
kib
348ef00d1d Minor style cleanup.
Submitted by:	bde
MFC after:	1 week
2016-01-01 15:48:48 +00:00
kib
f0089fdb6f Force nullfs vnode reclaim after unlinking, to potentially unlink
lower vnode.  Otherwise, reference to the lower vnode from the upper
one prevents final unlink.

PR:	178238
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-12-30 19:49:22 +00:00
pfg
ea76029f56 ext2: recognize ext4 INCOMPAT_RECOVER flag
This is a flag specific for journalling in ext4.
Add it to the list of ext4 features we ignore for
read-only purposes.

PR:		205668
MFC after:	1 week
2015-12-29 15:51:52 +00:00
kib
76abdf80ab Make it possible for the cdevsw d_close() driver method to detect last
close and close due to revoke(2)-like operation.

A new FLASTCLOSE flag indicates that this is last close.  FREVOKE is
set for revokes, and FNONBLOCK is also set, same as is already done
for VOP_CLOSE() call from vgonel().

The flags reuse user open(2) flags which are never stored in f_flag,
to not consume bit space in the ABI visible way.  Assert this with the
static check.

Requested and reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-12-22 20:37:34 +00:00
kib
2a63539543 Keep devfs mount locked for the whole duration of the devfs_setattr(),
and ensure that our dirent is instantiated.

Reported and tested by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-12-22 20:22:17 +00:00
hselasky
c3f11e9f0e Make CUSE usable with platforms where the size of "unsigned long" is
different from the size of a pointer.
2015-12-22 09:55:44 +00:00
hselasky
76efdc2ae9 Make CUSE usable with platforms where the size of "unsigned long" is
different from the size of a pointer.
2015-12-22 09:41:33 +00:00
hselasky
66012f316c Guard against the same process being both CUSE server and client at
the same time. This can easily lead to a deadlock when destroying the
character devices nodes.
2015-12-22 09:26:24 +00:00
glebius
910a73cc44 Fix breakage caused by r292373 in ZFS/FUSE/NFS/SMBFS.
With the new VOP_GETPAGES() KPI the "count" argument counts pages already,
and doesn't need to be translated from bytes to pages.

While here make it consistent that *rbehind and *rahead are updated only
if we doesn't return error.

Pointy hat to:	glebius
2015-12-16 23:48:50 +00:00
glebius
63cd1c131a A change to KPI of vm_pager_get_pages() and underlying VOP_GETPAGES().
o With new KPI consumers can request contiguous ranges of pages, and
  unlike before, all pages will be kept busied on return, like it was
  done before with the 'reqpage' only. Now the reqpage goes away. With
  new interface it is easier to implement code protected from race
  conditions.

  Such arrayed requests for now should be preceeded by a call to
  vm_pager_haspage() to make sure that request is possible. This
  could be improved later, making vm_pager_haspage() obsolete.

  Strenghtening the promises on the business of the array of pages
  allows us to remove such hacks as swp_pager_free_nrpage() and
  vm_pager_free_nonreq().

o New KPI accepts two integer pointers that may optionally point at
  values for read ahead and read behind, that a pager may do, if it
  can. These pages are completely owned by pager, and not controlled
  by the caller.

  This shifts the UFS-specific readahead logic from vm_fault.c, which
  should be file system agnostic, into vnode_pager.c. It also removes
  one VOP_BMAP() request per hard fault.

Discussed with:	kib, alc, jeff, scottl
Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2015-12-16 21:30:45 +00:00
jhb
f4698eb999 The cdevpriv_dtr_t typedef was not able to be used in a function prototype
like the various d_*_t typedefs since it declared a function pointer rather
than a function.  Add a new d_priv_dtor_t typedef that declares the function
and can be used as a function prototype.  The previous typedef wasn't
useful outside of the cdevpriv implementation, so retire it.

The name d_priv_dtor_t was chosen to be more consistent with cdev methods
since it is commonly used in place of d_close_t even though it is not a
direct pointer in struct cdevsw.

Reviewed by:	kib, imp
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D4340
2015-12-02 18:27:30 +00:00
rmacklem
3b49f0eca8 Fix the memory leak that occurs when the nfscommon.ko module is unloaded.
This leak was introduced by r291527.
Since the nfscommon.ko module is rarely unloaded, this leak would not
have been much of an issue.

MFC after:	2 weeks
2015-12-02 02:47:13 +00:00
rmacklem
8d3825522e Delete the TUNABLE_INT() line. It was in r291527 so that it could be
MFC'd to stable/10 and still work.
2015-11-30 23:37:09 +00:00
rmacklem
493738a552 Add kernel support to the NFS server for the "-manage-gids"
option that will be added to the nfsuserd daemon in a future
commit. It modifies the cache used by NFSv4 for name<-->id
translation (both username/uid and group/gid) to support this.
When "-manage-gids" is set, the server looks up each uid
for the RPC and uses the list of groups cached in the server
instead of the list of groups provided in the RPC request.
The cached group list is acquired for the cache by the nfsuserd
daemon via getgrouplist(3).
This avoids the 16 groups limit for the list in the RPC request.
Since the cache is now used for every RPC when "-manage-gids"
is enabled, the code also modifies the cache to use a separate
mutex for each hash list instead of a single global mutex.

Suggested by:	jpaetzel
Tested by:	jpaetzel
MFC after:	2 weeks
2015-11-30 21:54:27 +00:00
mckusick
cb4ab786a1 For performance reasons, it is useful to have a single string used as
the name of a filesystem when setting it as the first parameter to the
getnewvnode() function. Most filesystems call getnewvnode from just one
place so can use a literal string as the first parameter. However, NFS
calls getnewvnode from two places, so we create a global constant string
that can be used by the two instances. This change also collapses two
instances of getnewvnode() in the UFS filesystem to a single call.

Reviewed by: kib
Tested by:   Peter Holm
2015-11-29 21:01:02 +00:00
rmacklem
8f26d7b382 When the nfsd threads are terminated, the NFSv4 server state
(opens, locks, etc) is retained, which I believe is correct behaviour.
However, for NFSv4.1, the server also retained a reference to the xprt
(RPC transport socket structure) for the backchannel. This caused
svcpool_destroy() to not call SVC_DESTROY() for the xprt and allowed
a socket upcall to occur after the mutexes in the svcpool were destroyed,
causing a crash.
This patch fixes the code so that the backchannel xprt structure is
dereferenced just before svcpool_destroy() is called, so the code
does do an SVC_DESTROY() on the xprt, which shuts down the socket upcall.

Tested by:	g_amanakis@yahoo.com
PR:		204340
MFC after:	2 weeks
2015-11-21 23:55:46 +00:00
rmacklem
7b391bfea3 Revert r283330 since it broke directory caching in the client.
At this time I cannot see a way to fix directory caching when it
has partial blocks in the buffer cache, due to the fact that the
syscall's uio_offset won't stay the same as the lblkno * NFS_DIRBLKSIZ
offset.

Reported by:	bde
MFC after:	2 weeks
2015-11-21 00:15:41 +00:00
rmacklem
c1f0354622 mnt_stat.f_iosize (which is used to set bo_bsize) must be set to
the largest size of buffer cache block or the mapping of the buffer
is bogus. When a mount with rsize=4096,wsize=4096 was done, f_iosize
would be set to 4096. This resulted in corrupted directory data, since
the buffer cache block size for directories is NFS_DIRBLKSIZ (8192).
This patch fixes the code so that it always sets f_iosize to at least
NFS_DIRBLKSIZ.

Tested by:	krichy@cflinux.hu
PR:		177971
MFC after:	2 weeks
2015-11-17 01:44:26 +00:00
markj
a7bb6eb720 - Consistently use PROC_ASSERT_HELD() to verify that a process' hold count
is non-zero.
- Include the process address in the PROC_ASSERT_HELD() and
  PROC_ASSERT_NOT_HELD() assertion messages so that the corresponding
  process can be found easily when debugging.

MFC after:	1 week
2015-11-08 01:38:56 +00:00
kib
6fac89c875 Ensure that when a blockable open of fifo returns success, a valid
file descriptor opened for complimentary access exists as well.

The implementation of the guarantee is done by counting the
generations of readers and writers opens.  We return success and not
EINTR or ERESTART error, when the sleep for complimentary opening is
interrupted, but the generation was changed during the sleep.

Longer explanation: assume there are two threads, A doing open("fifo",
O_RDONLY) and B doing open("fifo", O_WRONLY), and no other threads
either trying to open the fifo, nor there are any file descriptors
referencing the fifo.  Before the change, it was possible e.g. for for
thread A to return a valid file descriptor, while thread B returned
EINTR if a signal to B was delivered simultaneously with the wakeup
from A.  After the change, in this situation both A::open() and
B::open() succeed and the signal is made "as if" it was noticed
slightly later.  Note that the signal actual delivery is not changed,
it is done by ast on syscall return path, so signal handler is still
executed before first instruction after syscall.

See PR for the code demonstrating the issue.

PR:	203162
Reported by:	Victor Stinner victor.stinner@gmail.com
Reviewed by:	jilles
Tested by:	bapt, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-09-20 21:18:33 +00:00
trasz
1604109813 Fix an NFS server bug that manifested in "ls -al" displaying a plus
sign on every directory exported via NFSv4 with NFSv4 ACLs enabled.

Reviewed by:	rmacklem@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3502
2015-08-28 14:26:11 +00:00
trasz
db61d1271a Make it possible to forcibly unmount devfs.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-08-24 14:04:44 +00:00
trasz
9e31188bdc After r286237 it should be fine to call vgone(9) on a busy GEOM vnode;
remove KASSERT that would prevent forced devfs unmount from working.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-08-23 14:53:54 +00:00
rmacklem
6ee70894f0 For the case where an NFSv4.1 ExchangeID operation has the client identifier
that already has a confirmed ClientID, the nfsrv_setclient() function would
not fill in the clientidp being returned. As such, the value of ClientID
returned would be whatever garbage was on the stack.
An NFSv4.1 client would not normally do this, but it appears that it can
happen for certain Linux clients. When it happens, the client persistently
retries the ExchangeID and Create_session after Create_session fails when
it uses the bogus clientid. With this patch, the correct clientid is replied.
This problem was identified in a packet trace supplied by
Ahmed Kamal via email.

Reported by:	email.ahmedkamal@googlemail.com
MFC after:	2 weeks
2015-08-14 22:02:14 +00:00
jhb
3fab33edd0 The changes that introduced fo_mmap() treated all character device
mappings as if MAP_SHARED was always present since in general MAP_PRIVATE
is not permitted for character devices.  However, there is one exception
in that MAP_PRIVATE mappings are permitted for /dev/zero.

Only require a writable file descriptor (FWRITE) for shared, writable
mappings of character devices.  vm_mmap_cdev() will reject any private
mappings for other devices.

Reviewed by:	kib
Reported by:	sbruno (broke qemu cross-builds), peter
Differential Revision:	https://reviews.freebsd.org/D3316
2015-08-06 16:50:37 +00:00
cem
3d8d5f23ac nfsclient: Protest loudly when GETATTR responses are invalid
BROKEN NFS SERVER OR MIDDLEWARE: Certain WAN "accelerators" attempt to cache
NFS GETATTR traffic, but actually corrupt it (e.g., responding to requests
with attributes for totally different files).

Warn very verbosely when this is detected. Linux' NFS client has a similar
warning.

Adds a sysctl/tunable (vfs.nfs.fileid_maxwarnings) to configure the quantity
of warnings; default to 10. (Zero disables; -1 is unlimited.)

Adds a failpoint to aid in validating the warning / behavior with a
non-broken server. Use something like:

    sysctl 'debug.fail_point.nfscl_force_fileid_warning=10%return(1)'

Reviewed by:	rmacklem
Approved by:	markj (mentor)
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3304
2015-08-05 22:27:30 +00:00
rmacklem
e15fd657a2 This patch fixes a problem where, if the NFSv4 server has a previous
unconfirmed clientid structure for the same client on the last hash list,
this old entry would not be removed/deleted. I do not think this bug would have
caused serious problems, since the new entry would have been before the old one
on the list. This old entry would have eventually been scavenged/removed.
Detected while reading the code looking for another bug.

MFC after:	3 days
2015-07-29 23:06:30 +00:00
jeff
b7b72de7da - Remove some dead code copied from ffs. 2015-07-29 03:06:08 +00:00
brueffer
95419ac921 In tmpfs_chtimes(), remove checks on the nanosecond level when
determining whether a node changed.

Other filesystems, e.g., UFS, only check on seconds, when determining
whether something changed.

This also corrects the birthtime case, where we checked tv_nsec
twice, instead of tv_sec and tv_nsec (PR).

PR:			201284
Submitted by:		David Binderman
Patch suggested by:	kib
Reviewed by:		kib
MFC after:		2 weeks
Committed from:		Essen FreeBSD Hackathon
2015-07-26 08:33:46 +00:00
kib
48ccbdea81 The si_status field of the siginfo_t, provided by the waitid(2) and
SIGCHLD signal, should keep full 32 bits of the status passed to the
_exit(2).

Split the combined p_xstat of the struct proc into the separate exit
status p_xexit for normal process exit, and signalled termination
information p_xsig.  Kernel-visible macro KW_EXITCODE() reconstructs
old p_xstat from p_xexit and p_xsig.  p_xexit contains complete status
and copied out into si_status.

Requested by:	Joerg Schilling
Reviewed by:	jilles (previous version), pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2015-07-18 09:02:50 +00:00
markj
d19ba3f89d Check suspendability on the mountpoint returned by VOP_GETWRITEMOUNT.
This obviates the need for a MNTK_SUSPENDABLE flag, since passthrough
filesystems like nullfs and unionfs no longer need to inherit this
information from their lower layer(s). This change also restores the
pre-r273336 behaviour of using the presence of a susp_clean VFS method to
request suspension support.

Reviewed by:	kib, mjg
Differential Revision:	https://reviews.freebsd.org/D2937
2015-07-05 22:37:33 +00:00
mjg
feeee4c707 fd: make 'rights' a manadatory argument to fget* functions 2015-07-05 19:05:16 +00:00
rmacklem
5ebe352487 If a "principal" argument isn't provided for a Kerberized NFS mount,
the kernel would generate a bogus one with a ":/<path>" suffix.
This would only occur for the case where there was no explicit
"principal" argument and the getaddrinfo() call in mount_nfs.c failed to a
return a cannonical name for the server.
This patch fixes this unusual case.

PR:		201073
Submitted by:	masato@itc.naist.jp
MFC after:	2 weeks
2015-07-03 22:11:07 +00:00