303 Commits

Author SHA1 Message Date
pjd
b5293a7c15 MFC r209265:
r209260:

Backout r207970 for now, it can lead to deadlocks.

Reported by:	kan

r209261:

Turn off UMA allocations on all archs by default. It isn't stable even
on amd64.

Reported by:	many

Approved by:	re (kib)
2010-06-18 22:06:49 +00:00
mm
b2d90c6e79 MFC r208775:
Fix freeing space after deleting large files with holes.

OpenSolaris onnv revision:	9950:78fc41aa9bc5

Reviewed by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6792701)
Approved by:	re (kib)
2010-06-06 13:08:36 +00:00
mm
910063ec0a MFC r208689:
Fix ZIL close when doing zfs rollback or zfs receive on a mounted dataset.

The fix is a partial import and merge of OpenSolaris onnv revisions
8227:f7d7be9b1f56. and 9292:e112194b5b73

Reviewed by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6798298)
Approved by:	re (kib)
2010-06-04 08:46:26 +00:00
mm
a3b69608b2 MFC r208472, r208474:
MFC r208472:
Fix zfs receive temporarily changing unchanged stream properties.
Fix possible panic with zfs_enable_datasets.
OpenSolaris onnv revision: 8536:33bd5de3260e [1]

MFC r208474:
Remove kstat.zfs.arcstats.l2_write_bytes_written
The arcstats.l2_write_bytes_written kstat counter introduced
in r205231 was duplicite with vendor's arcstats.l2_write_bytes counter
imported in r208373 (OpenSolaris revision 8582:df9361868dbe)

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6748561, 6757075) [1]
2010-05-24 20:09:40 +00:00
pjd
27418a37e7 MFC r207920,r207934,r207936,r207937,r207970,r208142,r208147,r208148,r208166,
r208454,r208455,r208458:

r207920:

Back out r205134. It is not stable.

r207934:

Add missing new line characters to the warnings.

r207936:

Eventhough r203504 eliminates taste traffic provoked by vdev_geom.c,
ZFS still like to open all vdevs, close them and open them again,
which in turn provokes taste traffic anyway.

I don't know of any clean way to fix it, so do it the hard way - if we can't
open provider for writing just retry 5 times with 0.5 pauses. This should
elimitate accidental races caused by other classes tasting providers created on
top of our vdevs.

Reported by:	James R. Van Artsdalen <james-freebsd-fs2@jrv.org>
Reported by:	Yuri Pankov <yuri.pankov@gmail.com>

r207937:

I added vfs_lowvnodes event, but it was only used for a short while and now
it is totally unused. Remove it.

r207970:

When there is no memory or KVA, try to help by reclaiming some vnodes.
This helps with 'kmem_map too small' panics.

No objections from:	kib
Tested by:		Alexander V. Ribchansky <shurik@zk.informjust.ua>

r208142:

The whole point of having dedicated worker thread for each leaf VDEV was to
avoid calling zio_interrupt() from geom_up thread context. It turns out that
when provider is forcibly removed from the system and we kill worker thread
there can still be some ZIOs pending. To complete pending ZIOs when there is
no worker thread anymore we still have to call zio_interrupt() from geom_up
context. To avoid this race just remove use of worker threads altogether.
This should be more or less fine, because I also thought that zio_interrupt()
does more work, but it only makes small UMA allocation with M_WAITOK.
It also saves one context switch per I/O request.

PR:		kern/145339
Reported by:	Alex Bakhtin <Alex.Bakhtin@gmail.com>

r208147:

Add task structure to zio and use it instead of allocating one.
This eliminates the only place where we can sleep when calling zio_interrupt().
As a side-effect this can actually improve performance a little as we
allocate one less thing for every I/O.

Prodded by:	kib

r208148:

Allow to configure UMA usage for ZIO data via loader and turn it on by
default for amd64. On i386 I saw performance degradation when UMA was used,
but for amd64 it should help.

r208166:

Fix userland build by making io_task available only for the kernel and by
providing taskq_dispatch_safe() macro.

r208454:

Remove ZIO_USE_UMA from arc.c as well.

r208455:

ZIO_USE_UMA is no longer used.

r208458:

Create UMA zones unconditionally.
2010-05-24 10:09:36 +00:00
mm
243b56bf05 MFC r208373:
Update L2ARC code and fix several bugs.

- improve ARC memory consumption (Bug ID 6488341)
- ARC/L2ARC metadata accounting (Bug ID 6748019)
- L2ARC turbo warmup (Bud ID 6748023)
- kstats for ARC content (Bug ID 6748023)
- kstats for evicted bytes from ARC by L2ARC state (Bud ID 6871680)
- fix panic on i386 systems (Bug ID 6821260)

OpenSolaris onnv revisions:
8582:df9361868dbe, 8628:97dcded6e556, 9215:7c4584f76b47,
9274:a10f8bd993c1, 10357:29060492b29d

OpenSolaris Bug IDs:
6748019, 6748023, 6748030, 6488341, 6798268, 6821260, 6790261, 6871680

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSlaris (multiple bug IDs)
2010-05-24 06:11:33 +00:00
mm
f372c76833 MFC r208370, r208371, r208372, r208442, r208443:
MFC r208370:
Fix: vdev_reopen() can lead to failed allocations
OpenSolaris onnv-revision: 7980:589f37f25048, Bug ID: 6764914

MFC r208371:
Fix stack overflow in zfs send.
OpenSolaris onnv-revision: 8012:8ea30813950f, Bug ID: 6765626

MFC r208372:
Reorder some already introduced locking variables.
OpenSolaris onnv revision: 8214:d7abf7c1f1c1, Bug ID: 6747934

MFC r208442:
Fix mutex_exit misorder that can cause a kernel panic.
OpenSolaris onnv revision: 8667:5c308a17eb7c, Bug ID: 6795440

MFC r208443:
Fix kernel panic when calling spa_tryimport() on a corrupted pool.
OpenSolaris onnv revision: 8680:005fe27123ba, Bug ID: 6786321

Approved by:	pav, delphij (mentor)
Obtained from:	OpenSolaris (multiple Bug IDs)
2010-05-24 06:07:55 +00:00
mm
021167a39d MFC r208050:
Fix ZIL-related panic on zfs rollback.

OpenSolaris onnv-revision: 8746:e1d96ca6808c

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6796377)
2010-05-20 09:35:31 +00:00
mm
9f2993690b MFC r208047:
Import OpenSolaris revision 7837:001de5627df3
It includes the following changes:
- parallel reads in traversal code (Bug ID 6333409)
- faster traversal for zfs send (Bug ID 6418042)
- traversal code cleanup (Bug ID 6725675)
- fix for two scrub related bugs (Bug ID 6729696, 6730101)
- fix assertion in dbuf_verify (Bug ID 6752226)
- fix panic during zfs send with i/o errors (Bug ID 6577985)
- replace P2CROSS with P2BOUNDARY (Bug ID 6725680)

List of OpenSolaris Bug IDs:
6333409, 6418042, 6757112, 6725668, 6725675, 6725680,
6725698, 6729696, 6730101, 6752226, 6577985, 6755042

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (multiple Bug IDs)
2010-05-20 06:51:48 +00:00
mm
ca2a8cb430 MFC r207670, r208130, r208131:
MFC r207670:
Introduce hardforce export option (-F) for "zpool export".
When exporting with this flag, zpool.cache remains untouched.
OpenSolaris onnv revision: 8211:32722be6ad3b

MFC r208130:
Fix perfomance problem with ZFS prefetch caching [1]
Add statistics for ZFS prefetch (sysctl kstat.zfs.misc.zfetchstats)
OpenSolaris onnv revision: 10474:0e96dd3b905a (partial)

MFC r208131:
Fix deadlock between zfs_dirent_lock and zfs_rmdir
OpenSolaris onnv revision: 11321:506b7043a14c

Reported by:	jhell@dataix.net (private e-mail) [1]
Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID: 6775357, 6859997, 6868951, 6847615)
2010-05-19 06:49:52 +00:00
mm
3180dcff25 MFC r207626, r207627:
MFC r207626:
Speed up ZFS list operation with objset prefetching.

MFC r207627:
Enable "zfs list" to list explicitly requested snapshots.

OpenSolaris onnv revisions:
8415:8809e849f63e, 10474:0e96dd3b905a (partial)

PR:		kern/146297
Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6386929, 6758338, 6755389, 6847118)
2010-05-18 07:45:27 +00:00
trasz
5a547a114e MFC r208030:
Add missing check to prevent local users from panicing the kernel by trying
to set malformed ACL.
2010-05-17 17:56:27 +00:00
mm
018b8bae09 MFC r207481, r207956:
MFC r207481 [1]:
Add sysctl and loader tunable vfs.zfs.txg.write_limit_override.
This tunable improves fine-tuning of ZFS write throttling.

MFC r207956 [2]:
Fix possible hang when replaying large truncations.
OpenSolaris onnv revision:	7904:6a124a4ca9c5

PR:		kern/146108 [1]
Suggested by:	Nikolay Denev <ndenev at gmail.com> [1]
Obtained from:	OpenSolaris (Bug ID 6761624) [2]
Approved by:	pjd, delphij (mentor)
2010-05-15 07:07:38 +00:00
mm
da492fb89d MFC r207909, r207910, r207911:
MFC r207909:
Fix zfs rename (may occasionally fail with dataset busy).
OpenSolaris onnv revision:	8517:41a0783dde17

MFC r207910:
Fix possible panic with zfs destroy.
OpenSolaris onnv revision:	8779:f164e0e90508

MFC r207911:
Fix failed assertion on destroying datasets from an older pool version.
OpenSolaris onnv revision:	9390:887948510f80

PR:		kern/146471
Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6784757, 6784924, 6826861)
2010-05-14 09:50:28 +00:00
mm
1439cfd513 MFC r207908:
Fix endianess bug in ZFS intent log (ZIL).

OpenSolaris onnv revision:	8109:6147a1bdd359

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6760048)
2010-05-14 09:06:49 +00:00
mm
f7eeeb7544 MFC r207427:
Fix improper pool write throughput calculation.

OpenSolaris onnv revision:	9366:17553395a745

PR:		kern/146108
Obtained from:	OpenSolaris (Bug ID 6817339)
Approved by:	pjd, delphij (mentor)
2010-05-14 09:00:29 +00:00
mm
db9f0fc6b3 MFC r207624:
Fix deadlock during zfs receive.

OpenSolaris onnv revision:	9299:8809e849f63e

PR:		kern/146296
Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6783818, 6826836)
2010-05-11 07:02:29 +00:00
marius
56b4798ee0 MFC: r207683
- Fix broken symlinks on cross platform zfs send/recv. [1]
- Enable zfs_ace_byteswap() on FreeBSD as it works just fine (tested between
  amd64 and sparc64 in both directions by Michael Moll).

PR:		146272
Approved by:	mm, pjd
Obtained from:	OpenSolaris (onnv rev. 8283:1ca59f393041; Bug ID 6764193) [1]
2010-05-10 20:55:24 +00:00
mm
f7d40eca5c MFC r207480:
Change description of tunable group vfs.zfs.txg to be more
understandable.

Approved by:    pjd, delphij (mentor)
2010-05-04 08:37:28 +00:00
delphij
1516f66ade MFC r206838:
Partially MFp4 #176265 by pjd@:

 - Properly initialize and destroy system_taskq.
 - Add a dummy implementation of taskq_create_proc().

Note: We do not currently use system_taskq in ZFS so this is mostly a
no-op at this time.  Proper system_taskq initialization is required
by newer ZFS code.

Ok'ed by:	pjd
2010-05-03 09:46:47 +00:00
pjd
3bcbc02a08 MFC r207068,r207334:
r207068:

Allow to modify directory's content even if the ZFS_NOUNLINK (SF_NOUNLINK,
sunlnk) flag is set. We only deny dirctory's removal or rename.

PR:		kern/143343
Reported by:	marck

r207334:

Backport fix for 'zfs_znode_dmu_init: existing znode for dbuf' panic from OpenSolaris.

PR:		kern/144402
Reported by:	Alex Bakhtin <alex.bakhtin@gmail.com>
Tested by:	Alex Bakhtin <alex.bakhtin@gmail.com>
Obtained from:	OpenSolaris, Bug ID 6895088
2010-05-01 19:00:33 +00:00
pjd
7c91ad920c MFC r203504,r204067,r204073,r204101,r204804,r205079,r205080,r205132,r205133,
r205134,r205231,r205253,r205264,r205346,r206051,r206667,r206792,r206793,
    r206794,r206795,r206796,r206797:

r203504:

Open provider for writting when we find the right one. Opening too much
providers for writing provokes huge traffic related to taste events send
by GEOM on close. This can lead to various problems with opening GEOM
providers that are created on top of other GEOM providers.

Reorted by:	Kurt Touet <ktouet@gmail.com>, mr
Tested by:	mr, Baginski Darren <kickbsd@ya.ru>

r204067:

Update comment. We also look for GPT partitions.

r204073:

Add tunable and sysctl to skip hostid check on pool import.

r204101:

Don't set f_bsize to recordsize. It might confuse some software (like squid).

Submitted by:	Alexander Zagrebin <alexz@visp.ru>

r204804:

Remove racy assertion.

Reported by:	Attila Nagy <bra@fsn.hu>
Obtained from:	OpenSolaris, Bug ID 6827260

r205079:

Remove bogus assertion.

Reported by:	Johan Ström <johan@stromnet.se>
Obtained from:	OpenSolaris, Bug ID 6920880

r205080:

Force commit to correct Bug ID:

Obtained from:	OpenSolaris, Bug ID 6920880

r205132:

Don't bottleneck on acquiring the stream locks - this avoids a massive
drop off in throughput with large numbers of simultaneous reads

r205133:

fix compilation under ZIO_USE_UMA

r205134:

make UMA the default allocator for ZFS buffers - this avoids
a great deal of contention in kmem_alloc

r205231:

- reduce contention by breaking up ARC state locks in to 16 for data
  and 16 for metadata
- export L2ARC tunables as sysctls
- add several kstats to track L2ARC state more precisely
- avoid holding a contended lock when atomically incrementing a
  contended counter (no lock protection needed for atomics)

r205253:

use CACHE_LINE_SIZE instead of hardcoding 128 for lock pad

pointed out by Marius Nuennerich and jhb@

r205264:

- cache line align arcs_lock array (h/t Marius Nuennerich)
- fix ARCS_LOCK_PAD to use architecture defined CACHE_LINE_SIZE
- cache line align buf_hash_table ht_locks array

r205346:

The same code is used to import and to create pool.
The order of operations is the following:
1. Try to open vdev by remembered path and guid.
2. If 1 failed, try to find vdev which guid matches and ignore the path.
3. If 2 failed this means either that the vdev we're looking for is gone
   or that pool is being created and vdev doesn't contain proper guid yet.
   To be able to handle pool creation we open vdev by path anyway.

Because of 3 it is possible that we open wrong vdev on import which can lead to
confusions.

The solution for this is to check spa_load_state. On pool creation it will be
equal to SPA_LOAD_NONE and we can open vdev only by path immediately and if it
is not equal to SPA_LOAD_NONE we first open by path+guid and when that fails,
we open by guid. We no longer open wrong vdev on import.

r206051:

IOCPARM_MAX defines maximum size of a structure that can be passed
directly to ioctl(2). Because of how ioctl command is build using _IO*()
macros we have only 13 bits to encode structure size. So the structure
can be up to 8kB-1.

Currently we define IOCPARM_MAX as PAGE_SIZE.

This is IMHO wrong for three main reasons:

1. It is confusing on archs with page size larger than 8kB (not really
   sure if we support such archs (sparc64?)), as even if PAGE_SIZE is
   bigger than 8kB, we won't be able to encode anything larger in ioctl
   command.

2. It is a waste. Why the structure can be only 4kB on most archs if we
   have 13 bits dedicated for that, not 12?

3. It shouldn't depend on architecture and page size. My ioctl command
   can work on one arch, but can't on the other?

Increase IOCPARM_MAX to 8kB and make it independed of PAGE_SIZE and
architecture it is compiled for. This allows to use all the bits on all the
archs for size. Note that this doesn't mean we will copy more on every ioctl(2)
call. No. We still copyin(9)/copyout(9) only exact number of bytes encoded in
ioctl command.

Practical use for this change is ZFS. zfs_cmd_t structure used for ZFS
ioctls is larger than 4kB.

Silence on:	arch@

r206667:

Fix 3-way deadlock that can happen because of ZFS and vnode lock
order reversal.

thread0 (vfs_fhtovp)	thread1 (vop_getattr)	thread2 (zfs_recv)
--------------------	---------------------	------------------
			vn_lock
rrw_enter_read
						rrw_enter_write (hangs)
			rrw_enter_read (hangs)
vn_lock (hangs)

Reported by:	Attila Nagy <bra@fsn.hu>

r206792:

Set ARC_L2_WRITING on L2ARC header creation.

Obtained from:	OpenSolaris

r206793:

Remove racy assertion.

Obtained from:	OpenSolaris

r206794:

Extend locks scope to match OpenSolaris.

r206795:

Add missing list and lock destruction.

r206796:

Style fixes.

r206797:

Restore previous order.
2010-04-18 21:36:34 +00:00
marcel
5206b4a833 MFC rev 199727, 200888, 201031, 202904, 203054, 203106, 203572, 203884,
204183, 204184, 204185, 204425, 204904, 204905, 205172, 205234, 205357,
205428, 205429, 205431, 205432, 205433, 205434, 205435, 205454, 205665,
205713, 205723, 205726 and 205727:

Bring ia64 machine-dependent changes from 9-current to 8-stable.
2010-03-31 05:05:28 +00:00
delphij
ed497602c4 MFC r202964:
On FreeBSD, time_t is 64-bit for all platforms except i386 and powerpc,
where the type is 32-bit.  ZFS can handle 64-bit timestamp internally
but zfs_setattr() would check if the time value can fit, we change the
checking macros to match 64-bit timestamp if the platform supports it.

This change has some downsides like, while you can import zfs on 32-bit
platforms, the timestamp would overflow if they are out of the range.

This fixes the Y2.038K issue on platforms using 64-bit timestamps.

Reviewed by:	pjd
2010-02-25 00:46:51 +00:00
pjd
761a03ca68 MFC r201684.
Teach the (gpt)zfsboot and zfsloader raidz code to use its buffers
more efficiently.

Before this patch, in the worst case memory use would increase
exponentially on the number of drives in the raidz vdev.

Submitted by:	Matt Reimer <mattjreimer@gmail.com>
Sponsored by:	VPOP Technologies, Inc.
Silence from:	dfr
2010-02-23 16:46:34 +00:00
delphij
6498d27aa0 MFC r203533:
Remove two files that are not needed by FreeBSD.

Approved by:  pjd
2010-02-22 22:27:26 +00:00
delphij
9ccf830a69 Reduce diff against OpenSolaris - move Giant acquire/release to
zfs_znode.c.  As a side effect this also eliminates two potential
Giant leaks.

Approved by:  pjd
2010-02-01 09:29:32 +00:00
trasz
c252d063c1 MFC r196949:
Enable NFSv4 ACL support in ZFS.

MFC r197435:

In VOP_SETACL(9) and VOP_GETACL(9), specifying wrong ACL type should result
in EINVAL, not EOPNOTSUPP.
2010-01-31 02:11:14 +00:00
delphij
8f2f5c39e0 MFC r201689:
Instead of assuming all vdevs are healthy, check the newest vdev label
for each vdev's status.  Booting from a degraded vdev should now be
more robust.

Submitted by:	Matt Reimer <mattjreimer at gmail.com>
Sponsored by:	VPOP Technologies, Inc.
2010-01-20 01:13:52 +00:00
delphij
891ae802d0 MFC r202129:
Report ZFS filesystem version instead of the zpool version when we say it.

Reported by:	Yuri Pankov (on -fs@)
Submitted by:	delphij
2010-01-18 05:03:40 +00:00
delphij
136913ae01 MFC r201143:
Apply OpenSolaris revision 8021:b8fe9660eb2d which brings our zpool
to version 14, making it possible for zpools created on OpenSolaris
2009.06 be used on FreeBSD.

PR:		kern/141800
Submitted by:	mm
Reviewed by:	pjd, trasz
Obtained from:	OpenSolaris onnv-gate
2010-01-11 02:31:00 +00:00
delphij
8038a18975 MFC r201756:
Re-apply onnv-gate revisions 7994 and 8986 (corresponds to FreeBSD
revision 200726 and 200727).

Reviewed by:  mm@
2010-01-10 07:08:11 +00:00
netchild
5439fa05a6 MFC r197816:
---snip---
    Prevent paging pressure from draining arc too much
    - always drain arc if above arc_c_max - never drain arc if arc is below
      arc_c_max
---snip---
2010-01-08 09:59:13 +00:00
delphij
35b3b2ceeb MFC r200727:
Apply fix for Solaris bug 6764159: restore_object() makes a call
that can block while having a tx open but not yet committed
(onnv revision 7994)

Submitted by:	mm
Approved by:	pjd
Obtained from:	OpenSolaris
2010-01-03 03:10:28 +00:00
delphij
9ec4dfa13c MFC r200726:
Apply fix for Solaris bug 6801979: zfs recv can fail with E2BIG
(onnv revision 8986)

PR:		kern/141355
Requested by:	mm
Submitted by:	pjd
Obtained from:	OpenSolaris
2010-01-03 03:05:30 +00:00
delphij
87394b81d9 MFC r200724:
Apply fix for Solaris bug  6462803 zfs snapshot -r failed because
filesystem was busy.

PR:		kern/141387
Submitted by:	mm
Approved by:	pjd
Obtained from:	OpenSolaris (onnv 8989:cfce31f4eebf)
2010-01-03 02:58:05 +00:00
kib
b73e172156 MFC r200162:
Change VOP_FSYNC for zfs vnode from VOP_PANIC to zfs_freebsd_fsync().
2009-12-12 14:44:04 +00:00
pjd
e83705b4c5 MFC r200124,r200126,
r200124:

Avoid using additional variable for storing an error if we are not going
to do anything with it.

r200126:

Fix deadlock when ZVOLs are present and we are replacing dead component or
calling scrub when pool is in a degraded state. It will try to taste ZVOLs,
which will lead to deadlock, as ZVOL will try to acquire the same locks as
replace/scrub is holding already.

We can't simply skip provider based on their GEOM class, because ZVOL can have
providers build on top of it and we need to skip those as well.

We do it by asking for ZFS::iszvol attribute. Any ZVOL-based provider will give
us positive answer and we have to skip those providers.

This way we remove possibility to create ZFS pools on top of ZVOLs, but it is
not very useful anyway.

I believe deadlock is still possible in some very complex situations like when
we have MD provider on top of UFS file on top of ZVOL. When we try to replace
dead component in the pool mentioned ZVOL is based on, there might be a
deadlock when ZFS will try to taste MD provider. There is no easy way to detect
that, but it isn't very common.

r200125,r200158:

Fix order of looking for providers.

Before r200125 the order of looking for providers was wrong. It was:
1. Find provider by name.
2. Find provider by guid.
3. Find provider by name and guid.

Where it should have been:
1. Find provider by name and guid.
2. Find provider by guid.
3. Find provider by name.
2009-12-10 18:38:40 +00:00
rnoland
57dfbb5b88 MFC 198420
Correct some issues with zfs boot.

   - Teach it to read gang blocks. (essentially untested)
     If you see "ZFS: gang block detected!", please let
     me know, so we can either remove the printf if it
     works, or fix it if it doesn't.

   - If multiple partitions exist on a disk, probe them all.
     We also need to reset dsk->start to 0 to read the right
     sector here.

   - With GPT, we can have 128 partitions.

   - If the bootfs property has ever been set on a pool
     it seems that it never goes away.  zpool won't allow
     you to add to the pool with the bootfs property set.
     However, if you clear the property back to default
     we end up getting 0 for the object number and read
     a bogus block pointer and fail to boot.

   - Fix some error printfs. The printf in the loader is
     only capable of c,s and u formats.

   - Teach printf how to display %llu
2009-11-21 15:02:35 +00:00
rnoland
a05c128d84 MFC r199241
This patch addresses an overflow in the the zfs boot code and allows
users to boot from zfs raidz volumes.  This has been tested by a number
of users and does not impact those which are not booting from zfs raidz
volumes.

Submitted by:	Matt Reimer <mattjreimer@gmail.com>
2009-11-14 16:14:51 +00:00
pjd
b469a0b405 MFC r198703,r199156,r199157:
r198703:

- zfs_zaccess() can handle VAPPEND too, so map V_APPEND to VAPPEND and call
  zfs_access() instead of vaccess() in this case as well.
- If VADMIN is specified with another V* flag (unlikely) call both
  zfs_access() and vaccess() after spliting V* flags.

This fixes "dirtying snapshot!" panic.

PR:	kern/139806
Reported by:	Carl Chave <carl@chave.us>
In co-operation with:	jh

r199156:

Avoid passing invalid mountpoint to getnewvnode().

Reported by:	rwatson
Tested by:	rwatson

r199157:

Be careful which vattr fields are set during setattr replay.
Without this fix strange things can appear after unclean shutdown like
files with mode set to 07777.

Reported by:	des
2009-11-14 11:59:59 +00:00
trasz
23beb4d41e MFC r196863:
Improve wording.

MFC r196941:

Prevent the line from wrapping.

Approved by:	re (kib)
2009-10-22 16:26:38 +00:00
pjd
a5cc4c4f58 MFC r197831,r197842,r197843,r197860,r197861:
r197831:

Fix situation where Mac OS X NFS client creates a file and when it tries
to set ownership and mode in the same setattr operation, the mode was
overwritten by secpolicy_vnode_setattr().

PR:	kern/118320
Submitted by:	Mark Thompson <info-gentoo@mark.thompson.bz>

r197842:

Fix white-spaces.

r197843:

On FreeBSD it is enough to report provider removal when orphan event is
received, we don't have to do it on every ENXIO error in I/O path.
Solaris has no GEOM so they have to handle it in a less clean way.

r197860:

File system owner is when uid matches and jail matches.

r197861:

Allow file system owner to modify system flags if securelevel permits.

Approved by:	re (kib)
2009-10-12 20:36:55 +00:00
delphij
42fe38509d MFC revision 197683:
Return EOPNOTSUPP instead of EINVAL when doing chflags(2) over an old
format ZFS, as defined in the manual page.

Submitted by:	pjd (response of my original patch but bugs are mine)
Approved by:	re (kib)
2009-10-04 09:07:29 +00:00
pjd
ec3c13d92d MFC r197287, r197289, r197351, r197426, r197458, r197459, r197497, r197498,
r197512, r197513, r197514, r197515, r197525:

r197287:

Purge namecache for the file system being rolled back, so it doesn't point at
invalid vnodes after the rollback resulting in EIO errors when trying to access
files which are in the namecache.

Reported by:	des

r197289:

Purge file system namecache when receiving incremental stream and rolling back
to it.

r197351:

Purge namecache in the same place OpenSolaris does.

r197426:

Restore BSD behaviour - when creating new directory entry use parent directory
gid to set group ownership and not process gid.

This was overlooked during v6 -> v13 switch.

PR:	kern/139076
Reported by:	Sean Winn <sean@gothic.net.au>

r197458:

Close race in zfs_zget(). We have to increase usecount first and then
check for VI_DOOMED flag. Before this change vnode could be reclaimed
between checking for the flag and increasing usecount.

r197459:

Before calling vflush(FORCECLOSE) mark file system as unmounted so the
following vnops will fail. This is very important, because without this change
vnode could be reclaimed at any point, even if we increased usecount. The only
way to ensure that vnode won't be reclaimed was to lock it, which would be very
hard to do in ZFS without changing a lot of code. With this change simply
increasing usecount is enough to be sure vnode won't be reclaimed from under
us. To be precise it can still be reclaimed but we won't be able to see it,
because every try to enter ZFS through VFS will result in EIO.

The only function that cannot return EIO, because it is needed for vflush() is
zfs_root(). Introduce ZFS_ENTER_NOERROR() macro that only locks
z_teardown_lock and never returns EIO.

r197497:

Switch to fletcher4 as the default checksum algorithm. Fletcher2 was proven to
be a bit weak and OpenSolaris also switched to fletcher4.

r197498:	head/cddl/contrib/opensolaris

Fletcher4 is not the default checksum algorithm.

r197512:

- Don't depend on value returned by gfs_*_inactive(), it doesn't work
  well with forced unmounts when GFS vnodes are referenced.
- Make other preparations to GFS for forced unmounts.

PR:	kern/139062
Reported by:	trasz

r197513:

Use traverse() function to find and return mount point's vnode instead of
covered vnode when snapshot is already mounted.

r197514:

On lookup error VFS expects *vpp to be set to NULL, be sure to do that.

r197515:

Handle cases where virtual (GFS) vnodes are referenced when doing forced
unmount. In that case we cannot depend on the proper order of invalidating
vnodes, so we have to free resources when we have a chance.

PR:	kern/139062
Reported by:	trasz

r197525:

Ensure that tv_sec is between INT32_MIN and INT32_MAX, so ZFS won't object.
This completes the fix from r185586.

PR:	kern/139059
Reported by:	Daniel Braniss <danny@cs.huji.ac.il>
Submitted by:	Jaakko Heinonen <jh@saunalahti.fi>
Tested by:	Daniel Braniss <danny@cs.huji.ac.il>

Approved by:	re (kib)
2009-09-29 10:53:06 +00:00
pjd
3c242492c0 MFC r197218:
We believe ZFS is ready for production use. Remove a warning about it being
experimental. :)

Approved by:	re (kib)
2009-09-15 12:21:06 +00:00
pjd
87b424f9b4 MFC r197219:
Forced unmounts work just fine in my tests under heavy load. There might
still be a problem, but it isn't worth a warning.

Approved by:	re (kib)
2009-09-15 12:19:34 +00:00
pjd
12d546b4f4 MFC r196456,r196457,r196458,r196662,r196702,r196703,r196919,r196927,r196928,
r196943,r196944,r196947,r196950,r196953,r196954,r196965,r196978,r196979,
r196980,r196982,r196985,r196992,r197131,r197133,r197150,r197151,r197152,
r197153,r197167,r197172,r197177,r197200,r197201:

r196456:
- Give minclsyspri and maxclsyspri real values (consulted with kmacy).
- Honour 'pri' argument for thread_create().

r196457:
Set priority of vdev_geom threads and zvol threads to PRIBIO.

r196458:
- Hide ZFS kernel threads under zfskern process.
- Use better (shorter) threads names:
	'zvol:worker zvol/tank/vol00' -> 'zvol tank/vol00'
	'vdev:worker da0' -> 'vdev da0'

r196662:
Add missing mountpoint vnode locking.
This fixes panic on assertion with DEBUG_VFS_LOCKS and vfs.usermount=1 when
regular user tries to mount dataset owned by him.

r196702:
Remove empty directory.

r196703:
Backport the 'dirtying dbuf' panic fix from newer ZFS version.

Reported by:	Thomas Backman <serenity@exscape.org>

r196919:
bzero() on-stack argument, so mutex_init() won't misinterpret that the
lock is already initialized if we have some garbage on the stack.

PR:	kern/135480
Reported by:	Emil Mikulic <emikulic@gmail.com>

r196927:
Changing provider size is not really supported by GEOM, but doing so when
provider is closed should be ok.
When administrator requests to change ZVOL size do it immediately if ZVOL
is closed or do it on last ZVOL close.

PR:	kern/136942
Requested by:	Bernard Buri <bsd@ask-us.at>

r196928:
Teach zdb(8) how to obtain GEOM provider size.

PR:	kern/133134
Reported by:	Philipp Wuensche <cryx-freebsd@h3q.com>

r196943:
- Avoid holding mutex around M_WAITOK allocations.
- Add locking for mnt_opt field.

r196944:
Don't recheck ownership on update mount. This will eliminate LOR between
vfs_busy() and mount mutex. We check ownership in vfs_domount() anyway.

Noticed by:	kib
Reviewed by:	kib

r196947:
Defer thread start until we set priority.

Reviewed by:	kib

r196950:
Fix detection of file system being shared. Now zfs unshare/destroy/rename
command will properly remove exported file systems.

r196953:
When snapshot mount point is busy (for example we are still in it)
we will fail to unmount it, but it won't be removed from the tree,
so in that case there is no need to reinsert it.

Reported by:	trasz

r196954:
If we have to use avl_find(), optimize a bit and use avl_insert() instead of
avl_add() (the latter is actually a wrapper around avl_find() + avl_insert()).
Fix similar case in the code that is currently commented out.

r196965:
Fix reference count leak for a case where snapshot's mount point is updated.

r196978:
Call ZFS_EXIT() after locking the vnode.

r196979:
On FreeBSD we don't have to look for snapshot's mount point,
because fhtovp method is already called with proper mount point.

r196980:
When we automatically mount snapshot we want to return vnode of the mount point
from the lookup and not covered vnode. This is one of the fixes for using .zfs/
over NFS.

r196982:
We don't export individual snapshots, so mnt_export field in snapshot's
mount point is NULL. That's why when we try to access snapshots over NFS
use mnt_export field from the parent file system.

r196985:
Only log successful commands! Without this fix we log even unsuccessful
commands executed by unprivileged users. Action is not really taken, but it is
logged to pool history, which might be confusing.

Reported by:	Denis Ahrens <denis@h3q.com>

r196992:
Implement __assert() for Solaris-specific code. Until now Solaris code was
using Solaris prototype for __assert(), but FreeBSD's implementation.
Both take different arguments, so we were either core-dumping in assert()
or printing garbage.

Reported by:	avg

r197131:
Tighten up the check for race in zfs_zget() - ZTOV(zp) can not only contain
NULL, but also can point to dead vnode, take that into account.

PR:	kern/132068
Reported by:	Edward Fisk <7ogcg7g02@sneakemail.com>, kris
Fix based on patch from:	Jaakko Heinonen <jh@saunalahti.fi>

r197133:
- Protect reclaim with z_teardown_inactive_lock.
- Be prepared for dbuf to disappear in zfs_reclaim_complete() and check if
  z_dbuf field is NULL - this might happen in case of rollback or forced
  unmount between zfs_freebsd_reclaim() and zfs_reclaim_complete().
- On forced unmount wait for all znodes to be destroyed - destruction can be
  done asynchronously via zfs_reclaim_complete().

r197150:
There is a bug where mze_insert() can trigger an assert() of inserting
the same entry twice. This bug is not fixed yet, but leads to situation
where when try to access corrupted directory the kernel will panic.
Until the bug is properly fixed, try to recover from it and log that it
happened.

Reported by:	marck
OpenSolaris bug:	6709336

r197151:
Be sure not to overflow struct fid.

r197152:
Extend scope of the z_teardown_lock lock for consistency and "just in case".

r197153:
When zfs.ko is compiled with debug, make sure that znode and vnode point at
each other.

r197167:
Work-around READDIRPLUS problem with .zfs/ and .zfs/snapshot/ directories
by just returning EOPNOTSUPP. This will allow NFS server to fall back to
regular READDIR.
Note that converting inode number to snapshot's vnode is expensive operation.
Snapshots are stored in AVL tree, but based on their names, not inode numbers,
so to convert inode to snapshot vnode we have to interate over all snalshots.
This is not a problem in OpenSolaris, because in their READDIRPLUS
implementation they use VOP_LOOKUP() on d_name, instead of VFS_VGET() on
d_fileno as we do.

PR:	kern/125149
Reported by:	Weldon Godfrey <wgodfrey@ena.com>
Analysis by:	Jaakko Heinonen <jh@saunalahti.fi>

r197172:
Add missing \n.

Reported by:	marck

r197177:
Support both case: when snapshot is already mounted and when it is not yet
mounted.

r197200:
Modify mount(8) to skip MNT_IGNORE file systems by default, just like df(1)
does. This is not POLA violation, because there is no single file system in the
base that use MNT_IGNORE currently, although ZFS snapshots will be mounted with
MNT_IGNORE after next commit.

Reviewed by:	kib

r197201:
- Mount ZFS snapshots with MNT_IGNORE flag, so they are not visible in regular
  df(1) and mount(8) output. This is a bit smilar to OpenSolaris and follows
  ZFS route of not listing snapshots by default with 'zfs list' command.
- Add UPDATING entry to note that ZFS snapshots are no longer visible in
  mount(8) and df(1) output by default.

Reviewed by:	kib

Approved by:	re (bz)
2009-09-15 11:13:40 +00:00
kib
d85421d7c3 MFC r196966:
Lock Giant around vn_open_cred().
Remove innocent unnecessary call to NDFREE().

Approved by:	re (kensmith)
2009-09-11 12:56:13 +00:00
pjd
79cd85d217 MFC r196395:
Our libc doesn't implement control method for XDR (only kernel does) and it
will always return failure. Fix this by bringing userland implementation of
xdrmem_control() back. This allow 'zpool import' to work again.

Reported by:	Thomas Backman <serenity@exscape.org>
Reviewed by:	kmacy
Approved by:	re (kib)
2009-08-20 00:08:58 +00:00