Commit Graph

541 Commits

Author SHA1 Message Date
Martin Matuska
711bf9bcf1 Fix freeing space after deleting large files with holes.
OpenSolaris onnv revision:	9950:78fc41aa9bc5

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6792701)
MFC after:	3 days
2010-06-03 11:08:46 +00:00
Martin Matuska
dc5d34e454 Fix ZIL close when doing zfs rollback or zfs receive on a mounted dataset.
The fix is a partial import and merge of OpenSolaris onnv revisions
8227:f7d7be9b1f56. and 9292:e112194b5b73

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6798298)
MFC after:	3 days
2010-06-01 08:43:46 +00:00
Pawel Jakub Dawidek
510ec358c5 Fix a bug where resilver is not started automatically on pool import or load.
If disk was missing on pool load or import and on next pool load or import
it was present, resilver wasn't started automatically and ZFS reported all disks
as ONLINE and healthy. Then, when another disk died, pool became unaccessible,
because if it was 2-way mirror or RAIDZ1 two vdevs were out of sync.

To fix the problem, start resilver automatically on pool load or import.

Obtained from:	OpenSolaris
MFC after:	3 days
2010-05-31 23:17:45 +00:00
Pawel Jakub Dawidek
b1c7417cd8 Fix panic when reading label from provider with non power of 2 sector size.
Reported by:	James R. Van Artsdalen <james-freebsd-fs2@jrv.org>
MFC after:	3 days
2010-05-31 23:11:43 +00:00
Martin Matuska
dd85b12982 Remove kstat.zfs.arcstats.l2_write_bytes_written
The arcstats.l2_write_bytes_written kstat counter introduced
in r205231 was duplicite with vendor's arcstats.l2_write_bytes counter
imported in r208373 (OpenSolaris revision 8582:df9361868dbe)

Approved by:	pjd, delphij (mentor)
MFC after:	3 days
2010-05-23 21:16:34 +00:00
Martin Matuska
5b170d55ae Fix zfs receive temporarily changing unchanged stream properties.
Fix possible panic with zfs_enable_datasets.

OpenSolaris onnv revision:	8536:33bd5de3260e

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6748561, 6757075)
MFC after:	3 days
2010-05-23 21:02:43 +00:00
Pawel Jakub Dawidek
4e8c7af455 Create UMA zones unconditionally.
MFC after:	3 days
2010-05-23 19:10:06 +00:00
Pawel Jakub Dawidek
a95add4cf8 Remove ZIO_USE_UMA from arc.c as well.
MFC after:	3 days
2010-05-23 18:42:33 +00:00
Konstantin Belousov
afe1a68827 Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
  usermode into struct syscall_args. The structure is machine-depended
  (this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
  from the syscall. It is a generalization of
  cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
  return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively.  The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by:	jhb, marcel, marius, nwhitehorn, stas
Tested by:	marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
	stas (mips)
MFC after:	1 month
2010-05-23 18:32:02 +00:00
Martin Matuska
55a381515b Fix kernel panic when calling spa_tryimport() on a corrupted pool.
OpenSolaris onnv revision:	8680:005fe27123ba

Approved by:	delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6786321)
MFC after:	1 day
2010-05-23 10:13:11 +00:00
Martin Matuska
e3fffd1a9f Fix mutex_exit misorder that can cause a kernel panic.
OpenSolaris onnv revision:	8667:5c308a17eb7c

Approved by:	delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6795440)
MFC after:	1 day
2010-05-23 10:08:05 +00:00
Martin Matuska
7838815ebb Update L2ARC code and fix several bugs.
- improve ARC memory consumption (Bug ID 6488341)
- ARC/L2ARC metadata accounting (Bug ID 6748019)
- L2ARC turbo warmup (Bud ID 6748023)
- kstats for ARC content (Bug ID 6748023)
- kstats for evicted bytes from ARC by L2ARC state (Bud ID 6871680)
- fix panic on i386 systems (Bug ID 6821260)

OpenSolaris onnv revisions:
8582:df9361868dbe, 8628:97dcded6e556, 9215:7c4584f76b47,
9274:a10f8bd993c1, 10357:29060492b29d

OpenSolaris Bug IDs:
6748019, 6748023, 6748030, 6488341, 6798268, 6821260, 6790261, 6871680

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSlaris (multiple bug IDs)
MFC after:	3 days
2010-05-21 09:52:49 +00:00
Martin Matuska
370227d241 Reorder some already introduced locking variables.
OpenSolaris onnv revision:	8214:d7abf7c1f1c1

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6747934)
MFC after:	3 days
2010-05-21 09:35:28 +00:00
Martin Matuska
911e1f9b1d Fix stack overflow in zfs send.
OpenSolaris onnv-revision: 8012:8ea30813950f

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6765626)
MFC after:	3 days
2010-05-21 08:55:18 +00:00
Martin Matuska
8b2bc083b9 Fix: vdev_reopen() can lead to failed allocations
OpenSolaris onnv-revision: 7980:589f37f25048

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6764914)
MFC after:	3 days
2010-05-21 08:50:34 +00:00
Pawel Jakub Dawidek
2b3d97b81d Fix userland build by making io_task available only for the kernel and by
providing taskq_dispatch_safe() macro.

MFC after:	1 week
2010-05-16 19:44:08 +00:00
Pawel Jakub Dawidek
ed3c664257 Allow to configure UMA usage for ZIO data via loader and turn it on by
default for amd64. On i386 I saw performance degradation when UMA was used,
but for amd64 it should help.

MFC after:	3 days
2010-05-16 15:14:59 +00:00
Pawel Jakub Dawidek
cfb3e98d37 Add task structure to zio and use it instead of allocating one.
This eliminates the only place where we can sleep when calling zio_interrupt().
As a side-effect this can actually improve performance a little as we
allocate one less thing for every I/O.

Prodded by:	kib
MFC after:	1 week
2010-05-16 15:12:34 +00:00
Pawel Jakub Dawidek
ea478cb1da The whole point of having dedicated worker thread for each leaf VDEV was to
avoid calling zio_interrupt() from geom_up thread context. It turns out that
when provider is forcibly removed from the system and we kill worker thread
there can still be some ZIOs pending. To complete pending ZIOs when there is
no worker thread anymore we still have to call zio_interrupt() from geom_up
context. To avoid this race just remove use of worker threads altogether.
This should be more or less fine, because I also thought that zio_interrupt()
does more work, but it only makes small UMA allocation with M_WAITOK.
It also saves one context switch per I/O request.

PR:		kern/145339
Reported by:	Alex Bakhtin <Alex.Bakhtin@gmail.com>
MFC after:	1 week
2010-05-16 11:56:42 +00:00
Martin Matuska
ee56d88b76 Fix deadlock between zfs_dirent_lock and zfs_rmdir
OpenSolaris onnv revision:	11321:506b7043a14c

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6847615)
MFC after:	3 days
2010-05-16 07:46:03 +00:00
Martin Matuska
db708a6e2c Fix perfomance problem with ZFS prefetch caching [1]
Add statistics for ZFS prefetch (sysctl kstat.zfs.misc.zfetchstats)

Partial import of OpenSolaris onnv revision 10474:0e96dd3b905a

Reported by:	jhell@dataix.net (private e-mail) [1]
Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6859997, 6868951)
MFC after:	3 days
2010-05-16 07:16:28 +00:00
Martin Matuska
bef629c14d Fix ZIL-related panic on zfs rollback.
OpenSolaris onnv-revision: 8746:e1d96ca6808c

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6796377)
MCF after:	1 week
2010-05-13 20:55:58 +00:00
Martin Matuska
c43d127a9a Import OpenSolaris revision 7837:001de5627df3
It includes the following changes:
- parallel reads in traversal code (Bug ID 6333409)
- faster traversal for zfs send (Bug ID 6418042)
- traversal code cleanup (Bug ID 6725675)
- fix for two scrub related bugs (Bug ID 6729696, 6730101)
- fix assertion in dbuf_verify (Bug ID 6752226)
- fix panic during zfs send with i/o errors (Bug ID 6577985)
- replace P2CROSS with P2BOUNDARY (Bug ID 6725680)

List of OpenSolaris Bug IDs:
6333409, 6418042, 6757112, 6725668, 6725675, 6725680,
6725698, 6729696, 6730101, 6752226, 6577985, 6755042

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (multiple Bug IDs)
MFC after:	1 week
2010-05-13 20:32:56 +00:00
Edward Tomasz Napierala
4e28b70950 Add missing check to prevent local users from panicing the kernel by trying
to set malformed ACL.

MFC after:	3 days
2010-05-13 15:31:00 +00:00
Martin Matuska
f2d1218cbe Fix possible hang when replaying large truncations.
OpenSolaris onnv revision:	7904:6a124a4ca9c5

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6761624)
MFC after:	3 days
2010-05-12 09:51:57 +00:00
Pawel Jakub Dawidek
c60c36a745 I added vfs_lowvnodes event, but it was only used for a short while and now
it is totally unused. Remove it.

MFC after:	3 days
2010-05-11 22:46:36 +00:00
Pawel Jakub Dawidek
8423e00b36 Eventhough r203504 eliminates taste traffic provoked by vdev_geom.c,
ZFS still like to open all vdevs, close them and open them again,
which in turn provokes taste traffic anyway.

I don't know of any clean way to fix it, so do it the hard way - if we can't
open provider for writing just retry 5 times with 0.5 pauses. This should
elimitate accidental races caused by other classes tasting providers created on
top of our vdevs.

MFC after:	3 days
Reported by:	James R. Van Artsdalen <james-freebsd-fs2@jrv.org>
Reported by:	Yuri Pankov <yuri.pankov@gmail.com>
2010-05-11 22:29:00 +00:00
Pawel Jakub Dawidek
204b20d932 Add missing new line characters to the warnings.
MFC after:	3 days
2010-05-11 22:23:35 +00:00
Martin Matuska
8c04b2242e Fix failed assertion on destroying datasets from an older pool version.
OpenSolaris onnv revision:	9390:887948510f80

PR:		kern/146471
Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6826861)
MFC after:	3 days
2010-05-11 09:26:46 +00:00
Martin Matuska
431905576e Fix possible panic with zfs destroy.
OpenSolaris onnv revision:	8779:f164e0e90508

PR:		kern/146471
Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6784924)
MFC after:	3 days
2010-05-11 09:23:46 +00:00
Martin Matuska
bb8b966850 Fix zfs rename (may occasionally fail with dataset busy).
OpenSolaris onnv revision:	8517:41a0783dde17

PR:		kern/146471
Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6784757)
MFC after:	3 days
2010-05-11 09:19:41 +00:00
Martin Matuska
dbbd1505bf Fix endianess bug in ZFS intent log (ZIL).
OpenSolaris onnv revision:	8109:6147a1bdd359

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6760048)
MFC after:	3 days
2010-05-11 07:25:13 +00:00
Edward Tomasz Napierala
dc510c105f Enforce RLIMIT_FSIZE in ZFS.
Reviewed by:	pjd@
2010-05-07 14:30:21 +00:00
Marius Strobl
626b7c61f8 - Fix broken symlinks on cross platform zfs send/recv. [1]
- Enable zfs_ace_byteswap() on FreeBSD as it works just fine (tested between
  amd64 and sparc64 in both directions by Michael Moll).

PR:		146272
Approved by:	mm, pjd
Obtained from:	OpenSolaris (onnv rev. 8283:1ca59f393041; Bug ID 6764193) [1]
MFC after:	3 days
2010-05-05 22:15:20 +00:00
Martin Matuska
d75554ec04 Introduce hardforce export option (-F) for "zpool export".
When exporting with this flag, zpool.cache remains untouched.

OpenSolaris onnv revision: 8211:32722be6ad3b

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID: 6775357)
2010-05-05 18:22:29 +00:00
Martin Matuska
7d4daf9a10 Speed up ZFS list operation with objset prefetching.
Partial import of OpenSolaris onnv revisions:
8415:8809e849f63e, 10474:0e96dd3b905a

PR:		kern/146297
Submitted by:	myself
Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6386929, 6755389, 6847118)
MFC after:	2 weeks
2010-05-04 17:40:24 +00:00
Martin Matuska
77a7f64749 Fix deadlock during zfs receive.
OpenSolaris onnv revision:	9299:8809e849f63e

PR:		kern/146296
Submitted by:	myself
Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6783818, 6826836)
MFC after:	1 week
2010-05-04 17:30:07 +00:00
Martin Matuska
df04ddbaa6 Add sysctl and loader tunable vfs.zfs.txg.write_limit_override.
This tunable improves fine-tuning of ZFS write throttling.

PR:		kern/146108
Suggested by:	Nikolay Denev <ndenev at gmail.com>
Approved by:	pjd, delphij (mentor)
MFC after:	2 weeks
2010-05-01 20:44:37 +00:00
Martin Matuska
9ccdc9600e Change description of tunable group vfs.zfs.txg to be more
understandable.

Approved by:	pjd, delphij (mentor)
MFC after:	3 days
2010-05-01 19:53:15 +00:00
Martin Matuska
d8665eb1f6 Fix improper pool write throughput calculation.
OpenSolaris onnv revision:	9366:17553395a745

PR:		kern/146108
Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris, Bug ID 6817339
MFC after:	2 weeks
2010-04-30 07:48:29 +00:00
Pawel Jakub Dawidek
b19b0de471 Backport fix for 'zfs_znode_dmu_init: existing znode for dbuf' panic from OpenSolaris.
PR:		kern/144402
Reported by:	Alex Bakhtin <alex.bakhtin@gmail.com>
Tested by:	Alex Bakhtin <alex.bakhtin@gmail.com>
Obtained from:	OpenSolaris, Bug ID 6895088
MFC after:	3 days
2010-04-28 18:29:48 +00:00
Pawel Jakub Dawidek
7af9c09a61 Allow to modify directory's content even if the ZFS_NOUNLINK (SF_NOUNLINK,
sunlnk) flag is set. We only deny dirctory's removal or rename.

PR:		kern/143343
Reported by:	marck
MFC after:	3 days
2010-04-22 18:47:23 +00:00
Rui Paulo
ff569d8436 Rename the cyclic global variable lapic_cyclic_clock_func to just
cyclic_clock_func. This will make more sense when we start developing non
x86 cyclic version.
2010-04-20 17:03:30 +00:00
Rui Paulo
b527a2a4dc The amd64 version of the cyclic dtrace module is a verbatim copy of the
i386 version, so instead having a copy of the same file, use Makefile
foo to include the i386 version on amd64.
2010-04-20 16:30:17 +00:00
Xin LI
0e568ab25c Partially MFp4 #176265 by pjd@:
- Properly initialize and destroy system_taskq.
 - Add a dummy implementation of taskq_create_proc().

Note: We do not currently use system_taskq in ZFS so this is mostly a
no-op at this time.  Proper system_taskq initialization is required
by newer ZFS code.

Ok'ed by:	pjd
MFC after:	2 weeks
2010-04-19 09:03:36 +00:00
Pawel Jakub Dawidek
bd7226a572 Restore previous order. 2010-04-18 12:43:33 +00:00
Pawel Jakub Dawidek
224329fb6b Style fixes. 2010-04-18 12:36:53 +00:00
Pawel Jakub Dawidek
eb998be67d Add missing list and lock destruction. 2010-04-18 12:27:07 +00:00
Pawel Jakub Dawidek
ad3cb80827 Extend locks scope to match OpenSolaris. 2010-04-18 12:25:40 +00:00
Pawel Jakub Dawidek
57a81a8bbc Remove racy assertion.
Obtained from:	OpenSolaris
2010-04-18 12:21:52 +00:00
Pawel Jakub Dawidek
5195ca2307 Set ARC_L2_WRITING on L2ARC header creation.
Obtained from:	OpenSolaris
2010-04-18 12:20:33 +00:00
Pawel Jakub Dawidek
40c7da090f Fix 3-way deadlock that can happen because of ZFS and vnode lock
order reversal.

thread0 (vfs_fhtovp)	thread1 (vop_getattr)	thread2 (zfs_recv)
--------------------	---------------------	------------------
			vn_lock
rrw_enter_read
						rrw_enter_write (hangs)
			rrw_enter_read (hangs)
vn_lock (hangs)

Submitted by:	Attila Nagy <bra@fsn.hu>
MFC after:	3 days
2010-04-15 16:40:54 +00:00
Pawel Jakub Dawidek
58804a192e The same code is used to import and to create pool.
The order of operations is the following:
1. Try to open vdev by remembered path and guid.
2. If 1 failed, try to find vdev which guid matches and ignore the path.
3. If 2 failed this means either that the vdev we're looking for is gone
   or that pool is being created and vdev doesn't contain proper guid yet.
   To be able to handle pool creation we open vdev by path anyway.

Because of 3 it is possible that we open wrong vdev on import which can lead to
confusions.

The solution for this is to check spa_load_state. On pool creation it will be
equal to SPA_LOAD_NONE and we can open vdev only by path immediately and if it
is not equal to SPA_LOAD_NONE we first open by path+guid and when that fails,
we open by guid. We no longer open wrong vdev on import.

MFC after:	2 weeks
2010-03-19 20:14:27 +00:00
Kip Macy
e577b0b2e3 - cache line align arcs_lock array (h/t Marius Nuennerich)
- fix ARCS_LOCK_PAD to use architecture defined CACHE_LINE_SIZE
- cache line align buf_hash_table ht_locks array

MFC after:	7 days
2010-03-17 21:10:09 +00:00
Kip Macy
07c5b1686e use CACHE_LINE_SIZE instead of hardcoding 128 for lock pad
pointed out by Marius Nuennerich and jhb@
2010-03-17 20:00:22 +00:00
Kip Macy
285738b6ad - reduce contention by breaking up ARC state locks in to 16 for data
and 16 for metadata
- export L2ARC tunables as sysctls
- add several kstats to track L2ARC state more precisely
- avoid holding a contended lock when atomically incrementing a
  contended counter (no lock protection needed for atomics)
2010-03-16 22:17:21 +00:00
Kip Macy
03af82ac5e fix compilation under ZIO_USE_UMA 2010-03-13 21:52:21 +00:00
Kip Macy
181c6ae3f0 Don't bottleneck on acquiring the stream locks - this avoids a massive
drop off in throughput with large numbers of simultaneous reads

MFC after:	7 days
2010-03-13 21:41:52 +00:00
Pawel Jakub Dawidek
3a98b0c4df Remove bogus assertion.
Reported by:	Johan Ström <johan@stromnet.se>
Obtained from:	OpenSolaris, Bug ID 6827260
MFC after:	1 week
2010-03-12 12:07:21 +00:00
Pawel Jakub Dawidek
5b2e8d582f Remove racy assertion.
Reported by:	Attila Nagy <bra@fsn.hu>
Obtained from:	OpenSolaris, Bug ID 6827260
MFC after:	1 week
2010-03-06 20:03:26 +00:00
Marcel Moolenaar
d49ebec408 Use mf and not mf.a. The latter doesn't force memory ordering and
applies to sequential memory.
2010-02-22 01:24:34 +00:00
Pawel Jakub Dawidek
251294bca9 Don't set f_bsize to recordsize. It might confuse some software (like squid).
Submitted by:	Alexander Zagrebin <alexz@visp.ru>
MFC after:	2 weeks
2010-02-19 20:18:16 +00:00
Pawel Jakub Dawidek
bbd388268e Add tunable and sysctl to skip hostid check on pool import. 2010-02-18 22:31:43 +00:00
Xin LI
d0d444da07 Remove two files that are not needed by FreeBSD.
Approved by:	pjd
MFC after:	2 weeks
2010-02-05 23:17:59 +00:00
Pawel Jakub Dawidek
9d3f36a309 Open provider for writting when we find the right one. Opening too much
providers for writing provokes huge traffic related to taste events send
by GEOM on close. This can lead to various problems with opening GEOM
providers that are created on top of other GEOM providers.

Reorted by:	Kurt Touet <ktouet@gmail.com>, mr
Tested by:	mr, Baginski Darren <kickbsd@ya.ru>
MFC after:	2 weeks
2010-02-04 21:11:44 +00:00
Xin LI
63243c5c71 On FreeBSD, time_t is 64-bit for all platforms except i386 and powerpc,
where the type is 32-bit.  ZFS can handle 64-bit timestamp internally
but zfs_setattr() would check if the time value can fit, we change the
checking macros to match 64-bit timestamp if the platform supports it.

This change has some downsides like, while you can import zfs on 32-bit
platforms, the timestamp would overflow if they are out of the range.

This fixes the Y2.038K issue on platforms using 64-bit timestamps.

Reviewed by:	pjd
MFC after:	1 month
2010-01-25 07:52:54 +00:00
Xin LI
51b1ec310c Report ZFS filesystem version instead of the zpool version when we say it.
Reported by:	Yuri Pankov (on -fs@)
Submitted by:	delphij
Approved by:	pjd
MFC after:	1 week
2010-01-11 23:15:11 +00:00
Xin LI
017b01f662 Re-apply onnv-gate revisions 7994 and 8986 (corresponds to FreeBSD
revision 200726 and 200727).  It looks like that the two revisions
were not applied in the right sequence, I found this when comparing
with the OpenSolaris code.

MFC after:	3 days
Reviewed by:	mm@
2010-01-07 20:10:22 +00:00
Xin LI
03ec6213ee Instead of assuming all vdevs are healthy, check the newest vdev label
for each vdev's status.  Booting from a degraded vdev should now be
more robust.

Submitted by:	Matt Reimer <mattjreimer at gmail.com>
Sponsored by:	VPOP Technologies, Inc.
MFC after:	2 weeks
2010-01-06 23:09:23 +00:00
Pawel Jakub Dawidek
49335ba853 Teach the (gpt)zfsboot and zfsloader raidz code to use its buffers
more efficiently.

Before this patch, in the worst case memory use would increase
exponentially on the number of drives in the raidz vdev.

Submitted by:	Matt Reimer <mattjreimer@gmail.com>
Sponsored by:	VPOP Technologies, Inc.
Silence from:	dfr
2010-01-06 22:39:40 +00:00
Xin LI
1ee5de4482 Reduce diff against OpenSolaris - move Giant acquire/release to
zfs_znode.c.  As a side effect this also eliminates two potential
Giant leaks.

Approved by:	pjd
MFC after:	1 month
2010-01-02 23:38:03 +00:00
Xin LI
9189129097 Apply OpenSolaris revision 8012 which brings our zpool to version 14,
making it possible for zpools created on OpenSolaris 2009.06 be used
on FreeBSD.

PR:		kern/141800
Submitted by:	mm
Reviewed by:	pjd, trasz
Obtained from:	OpenSolaris
MFC after:	2 weeks
2009-12-28 22:15:11 +00:00
Xin LI
dd0c145752 Apply fix for Solaris bug 6462803: zfs snapshot -r failed because
filesystem was busy
(onnv revision 8989)

Submitted by:	mm
Approved by:	pjd
Obtained from:	OpenSolaris
MFC after:	2 weeks
2009-12-19 11:49:20 +00:00
Xin LI
24a41d7ec6 Apply fix for Solaris bug 6801979: zfs recv can fail with E2BIG
(onnv revision 8986)

Requested by:	mm
Submitted by:	pjd
Obtained from:	OpenSolaris
MFC after:	2 weeks
2009-12-19 11:47:22 +00:00
Xin LI
775f802393 Apply fix Solaris bug 6462803 zfs snapshot -r failed because
filesystem was busy.

Submitted by:	mm
Approved by:	pjd
MFC after:	2 weeks
2009-12-19 11:43:39 +00:00
Konstantin Belousov
88f2d72947 Change VOP_FSYNC for zfs vnode from VOP_PANIC to zfs_freebsd_fsync(),
both to not panic when fsync(2) is called for fifo on zfs
filedescriptor, and to actually fsync fifo inode to permanent storage.

PR:	kern/141177
Reviewed by:	pjd
MFC after:	1 week
2009-12-05 20:36:42 +00:00
Pawel Jakub Dawidek
dfb903e852 We have to eventually look for provider without checking guid as this is need
for attaching when there is no metadata yet.

Before r200125 the order of looking for providers was wrong. It was:
1. Find provider by name.
2. Find provider by guid.
3. Find provider by name and guid.

Where it should have been:
1. Find provider by name and guid.
2. Find provider by guid.
3. Find provider by name.

MFC after:	1 week
2009-12-05 20:16:28 +00:00
Pawel Jakub Dawidek
6468cb2ce0 Fix deadlock when ZVOLs are present and we are replacing dead component or
calling scrub when pool is in a degraded state. It will try to taste ZVOLs,
which will lead to deadlock, as ZVOL will try to acquire the same locks as
replace/scrub is holding already.

We can't simply skip provider based on their GEOM class, because ZVOL can have
providers build on top of it and we need to skip those as well.

We do it by asking for ZFS::iszvol attribute. Any ZVOL-based provider will give
us positive answer and we have to skip those providers.

This way we remove possibility to create ZFS pools on top of ZVOLs, but it is
not very useful anyway.

I believe deadlock is still possible in some very complex situations like when
we have MD provider on top of UFS file on top of ZVOL. When we try to replace
dead component in the pool mentioned ZVOL is based on, there might be a
deadlock when ZFS will try to taste MD provider. There is no easy way to detect
that, but it isn't very common.

MFC after:	1 week
2009-12-05 14:33:11 +00:00
Pawel Jakub Dawidek
ccba826977 Always check guid when opening by path, because we may end up with provider
that does have the same name, but only by accident.

MFC after:	1 week
2009-12-05 14:24:22 +00:00
Pawel Jakub Dawidek
29c8c85594 Avoid using additional variable for storing an error if we are not going
to do anything with it.
2009-12-05 14:21:42 +00:00
Paul Saab
e44920da1a Correct another case of not doing 64bit math. This allows mine and
other raidz2 volumes to boot.

Submitted by:	Matt Reimer <mattjreimer@gmail.com>
2009-11-13 02:50:50 +00:00
Pawel Jakub Dawidek
fd9ee28bfc Be careful which vattr fields are set during setattr replay.
Without this fix strange things can appear after unclean shutdown like
files with mode set to 07777.

Reported by:	des
MFC after:	3 days
2009-11-10 22:27:33 +00:00
Pawel Jakub Dawidek
56697614cc Avoid passing invalid mountpoint to getnewvnode().
Reported by:	rwatson
Tested by:	rwatson
MFC after:	3 days
2009-11-10 22:25:46 +00:00
Pawel Jakub Dawidek
fd66267ffb - zfs_zaccess() can handle VAPPEND too, so map V_APPEND to VAPPEND and call
zfs_access() instead of vaccess() in this case as well.
- If VADMIN is specified with another V* flag (unlikely) call both
  zfs_access() and vaccess() after spliting V* flags.

This fixes "dirtying snapshot!" panic.

PR:		kern/139806
Reported by:	Carl Chave <carl@chave.us>
In co-operation with:	jh
MFC after:	3 days
2009-10-30 23:33:06 +00:00
Robert Noland
14c436e101 Correct some issues with zfs boot.
- Teach it to read gang blocks. (essentially untested)
   If you see "ZFS: gang block detected!", please let
   me know, so we can either remove the printf if it
   works, or fix it if it doesn't.

 - If multiple partitions exist on a disk, probe them all.
   We also need to reset dsk->start to 0 to read the right
   sector here.

 - With GPT, we can have 128 partitions.

 - If the bootfs property has ever been set on a pool
   it seems that it never goes away.  zpool won't allow
   you to add to the pool with the bootfs property set.
   However, if you clear the property back to default
   we end up getting 0 for the object number and read
   a bogus block pointer and fail to boot.

 - Fix some error printfs. The printf in the loader is
   only capable of c,s and u formats.

 - Teach printf how to display %llu

Reviewed by:	dfr, jhb
MFC after:	2 weeks
2009-10-23 18:44:53 +00:00
Pawel Jakub Dawidek
c217b20ef6 Allow file system owner to modify system flags if securelevel permits.
MFC after:	3 days
2009-10-08 16:05:17 +00:00
Pawel Jakub Dawidek
68c53ef849 File system owner is when uid matches and jail matches.
MFC after:	3 days
2009-10-08 16:03:19 +00:00
Pawel Jakub Dawidek
3a6c0cbf26 On FreeBSD it is enough to report provider removal when orphan event is
received, we don't have to do it on every ENXIO error in I/O path.
Solaris has no GEOM so they have to handle it in a less clean way.

MFC after:	3 days
2009-10-07 20:56:15 +00:00
Pawel Jakub Dawidek
2ada529a14 Fix white-spaces.
MFC after:	3 days
2009-10-07 20:54:07 +00:00
Pawel Jakub Dawidek
c0103003c0 Fix situation where Mac OS X NFS client creates a file and when it tries
to set ownership and mode in the same setattr operation, the mode was
overwritten by secpolicy_vnode_setattr().

PR:		kern/118320
Submitted by:	Mark Thompson <info-gentoo@mark.thompson.bz>
MFC after:	3 days
2009-10-07 12:38:19 +00:00
Kip Macy
e6b112e274 Prevent paging pressure from draining arc too much
- always drain arc if above arc_c_max - never drain arc if arc is below arc_c_max

MFC after:	3 days
2009-10-06 21:40:50 +00:00
Xin LI
6f62807611 Return EOPNOTSUPP instead of EINVAL when doing chflags(2) over an old
format ZFS, as defined in the manual page.

Submitted by:	pjd (response of my original patch but bugs are mine)
MFC after:	3 days
2009-10-01 18:58:26 +00:00
Pawel Jakub Dawidek
ab711589df Handle cases where virtual (GFS) vnodes are referenced when doing forced
unmount. In that case we cannot depend on the proper order of invalidating
vnodes, so we have to free resources when we have a chance.

PR:		kern/139062
Reported by:	trasz
MFC after:	3 days
2009-09-26 00:10:45 +00:00
Pawel Jakub Dawidek
a0b238644a On lookup error VFS expects *vpp to be set to NULL, be sure to do that.
MFC after:	3 days
2009-09-26 00:08:44 +00:00
Pawel Jakub Dawidek
a99aaff645 Use traverse() function to find and return mount point's vnode instead of
covered vnode when snapshot is already mounted.

MFC after:	3 days
2009-09-26 00:07:14 +00:00
Pawel Jakub Dawidek
1aba32d9b4 - Don't depend on value returned by gfs_*_inactive(), it doesn't work
well with forced unmounts when GFS vnodes are referenced.
- Make other preparations to GFS for forced unmounts.

PR:		kern/139062
Reported by:	trasz
MFC after:	3 days
2009-09-26 00:04:30 +00:00
Pawel Jakub Dawidek
86758476b4 Switch to fletcher4 as the default checksum algorithm. Fletcher2 was proven to
be a bit weak and OpenSolaris also switched to fletcher4.

PR:		kern/139072
Reported by:	Daniel Grund <bugs@dgrund.de>
MFC after:	3 days
2009-09-25 18:19:50 +00:00
Pawel Jakub Dawidek
ad8294cf98 Before calling vflush(FORCECLOSE) mark file system as unmounted so the
following vnops will fail. This is very important, because without this change
vnode could be reclaimed at any point, even if we increased usecount. The only
way to ensure that vnode won't be reclaimed was to lock it, which would be very
hard to do in ZFS without changing a lot of code. With this change simply
increasing usecount is enough to be sure vnode won't be reclaimed from under
us. To be precise it can still be reclaimed but we won't be able to see it,
because every try to enter ZFS through VFS will result in EIO.

The only function that cannot return EIO, because it is needed for vflush() is
zfs_root(). Introduce ZFS_ENTER_NOERROR() macro that only locks
z_teardown_lock and never returns EIO.

MFC after:	3 days
2009-09-24 15:56:26 +00:00
Pawel Jakub Dawidek
ab9bbf4a2b Close race in zfs_zget(). We have to increase usecount first and then
check for VI_DOOMED flag. Before this change vnode could be reclaimed
between checking for the flag and increasing usecount.

MFC after:	3 days
2009-09-24 15:49:15 +00:00
Edward Tomasz Napierala
c40502ccd0 In VOP_SETACL(9) and VOP_GETACL(9), specifying wrong ACL type should result
in EINVAL, not EOPNOTSUPP.
2009-09-23 15:09:34 +00:00
Pawel Jakub Dawidek
eb03c3cdfb Restore BSD behaviour - when creating new directory entry use parent directory
gid to set group ownership and not process gid.

This was overlooked during v6 -> v13 switch.

PR:		kern/139076
Reported by:	Sean Winn <sean@gothic.net.au>
MFC after:	3 days
2009-09-23 09:18:16 +00:00
Pawel Jakub Dawidek
c4be11d7fc Purge namecache in the same place OpenSolaris does. 2009-09-20 13:28:29 +00:00
Pawel Jakub Dawidek
5469543c92 Purge file system namecache when receiving incremental stream and rolling back
to it.

MFC after:	3 days
2009-09-17 15:14:28 +00:00
Pawel Jakub Dawidek
3282c51713 Purge namecache for the file system being rolled back, so it doesn't point at
invalid vnodes after the rollback resulting in EIO errors when trying to access
files which are in the namecache.

Reported by:	des
MFC after:	3 days
2009-09-17 14:58:21 +00:00
Pawel Jakub Dawidek
95f08808b6 Forced unmounts work just fine in my tests under heavy load. There might
still be a problem, but it isn't worth a warning.
2009-09-15 11:42:08 +00:00
Pawel Jakub Dawidek
a4e6b460d3 We believe ZFS is ready for production use. Remove a warning about it being
experimental. :)
2009-09-15 11:34:53 +00:00
Pawel Jakub Dawidek
63e1d3df27 - Mount ZFS snapshots with MNT_IGNORE flag, so they are not visible in regular
df(1) and mount(8) output. This is a bit smilar to OpenSolaris and follows
  ZFS route of not listing snapshots by default with 'zfs list' command.
- Add UPDATING entry to note that ZFS snapshots are no longer visible in
  mount(8) and df(1) output by default.

Reviewed by:	kib
MFC after:	3 days
2009-09-14 21:10:40 +00:00
Pawel Jakub Dawidek
85c171b2e1 Support both case: when snapshot is already mounted and when it is not yet
mounted.

MFC after:	3 days
2009-09-13 21:40:36 +00:00
Pawel Jakub Dawidek
8a2c4db0fe Add missing \n.
Reported by:	marck
2009-09-13 17:30:56 +00:00
Pawel Jakub Dawidek
7746b6461d Work-around READDIRPLUS problem with .zfs/ and .zfs/snapshot/ directories
by just returning EOPNOTSUPP. This will allow NFS server to fall back to
regular READDIR.

Note that converting inode number to snapshot's vnode is expensive operation.
Snapshots are stored in AVL tree, but based on their names, not inode numbers,
so to convert inode to snapshot vnode we have to interate over all snalshots.

This is not a problem in OpenSolaris, because in their READDIRPLUS
implementation they use VOP_LOOKUP() on d_name, instead of VFS_VGET() on
d_fileno as we do.

PR:		kern/125149
Reported by:	Weldon Godfrey <wgodfrey@ena.com>
Analysis by:	Jaakko Heinonen <jh@saunalahti.fi>
MFC after:	3 days
2009-09-13 16:05:20 +00:00
Pawel Jakub Dawidek
7b4a12379b When zfs.ko is compiled with debug, make sure that znode and vnode point at
each other.

MFC after:	3 days
2009-09-13 10:33:51 +00:00
Pawel Jakub Dawidek
33a0ef82f2 Extend scope of the z_teardown_lock lock for consistency and "just in case".
MFC after:	3 days
2009-09-13 10:29:51 +00:00
Pawel Jakub Dawidek
7dae3c4faf Be sure not to overflow struct fid.
MFC after:	3 days
2009-09-13 10:25:33 +00:00
Pawel Jakub Dawidek
f53901193d There is a bug where mze_insert() can trigger an assert() of inserting
the same entry twice. This bug is not fixed yet, but leads to situation
where when try to access corrupted directory the kernel will panic.
Until the bug is properly fixed, try to recover from it and log that it
happened.

Reported by:	marck
OpenSolaris bug:	6709336
MFC after:	3 days
2009-09-13 10:12:29 +00:00
Pawel Jakub Dawidek
f5516e3d1d - Protect reclaim with z_teardown_inactive_lock.
- Be prepared for dbuf to disappear in zfs_reclaim_complete() and check if
  z_dbuf field is NULL - this might happen in case of rollback or forced
  unmount between zfs_freebsd_reclaim() and zfs_reclaim_complete().
- On forced unmount wait for all znodes to be destroyed - destruction can be
  done asynchronously via zfs_reclaim_complete().

MFC after:	1 week
2009-09-12 19:53:31 +00:00
Pawel Jakub Dawidek
2a8e7dad33 Tighten up the check for race in zfs_zget() - ZTOV(zp) can not only contain
NULL, but also can point to dead vnode, take that into account.

PR:		kern/132068
Reported by:	Edward Fisk" <7ogcg7g02@sneakemail.com>, kris
Fix based on patch from:	Jaakko Heinonen <jh@saunalahti.fi>
MFC after:	1 week
2009-09-12 19:27:54 +00:00
Pawel Jakub Dawidek
3770996142 Only log successful commands! Without this fix we log even unsuccessful
commands executed by unprivileged users. Action is not really taken, but it is
logged to pool history, which might be confusing.

Reported by:	Denis Ahrens <denis@h3q.com>
MFC after:	3 days
2009-09-08 16:40:08 +00:00
Pawel Jakub Dawidek
d6b8039292 We don't export individual snapshots, so mnt_export field in snapshot's
mount point is NULL. That's why when we try to access snapshots over NFS
use mnt_export field from the parent file system.

MFC after:	1 week
2009-09-08 15:57:03 +00:00
Pawel Jakub Dawidek
f148fd9a4a When we automatically mount snapshot we want to return vnode of the mount point
from the lookup and not covered vnode. This is one of the fixes for using .zfs/
over NFS.

MFC after:	1 week
2009-09-08 15:51:40 +00:00
Pawel Jakub Dawidek
2391003912 On FreeBSD we don't have to look for snapshot's mount point,
because fhtovp method is already called with proper mount point.

MFC after:	1 week
2009-09-08 15:42:55 +00:00
Pawel Jakub Dawidek
6f8e88e1da Call ZFS_EXIT() after locking the vnode.
MFC after:	1 week
2009-09-08 15:37:01 +00:00
Konstantin Belousov
211ddddce7 Lock Giant around vn_open_cred().
Remove innocent unnecessary call to NDFREE().

Reported by:	marcel
Reviewed and tested by:	pjd
MFC after:	3 days
2009-09-08 09:17:34 +00:00
Pawel Jakub Dawidek
1ea3566294 Fix reference count leak for a case where snapshot's mount point is updated.
Such situation is not supported.

This problem was triggered by something like this:

	# zpool create tank da0
	# zfs snapshot tank@snap
	# cd /tank/.zfs/snapshot/snap  (this will mount the snapshot)
	# cd
	# mount -u nosuid /tank/.zfs/snapshot/snap  (refcount leak)
	# zpool export tank
	cannot export 'tank': pool is busy

MFC after:	1 week
2009-09-08 08:54:15 +00:00
Pawel Jakub Dawidek
28e449adf2 If we have to use avl_find(), optimize a bit and use avl_insert() instead of
avl_add() (the latter is actually a wrapper around avl_find() + avl_insert()).

Fix similar case in the code that is currently commented out.
2009-09-07 21:58:54 +00:00
Pawel Jakub Dawidek
3f6043a57d When snapshot mount point is busy (for example we are still in it)
we will fail to unmount it, but it won't be removed from the tree,
so in that case there is no need to reinsert it.

This fixes a panic reproducable in the following steps:

	# zfs create tank/foo
	# zfs snapshot tank/foo@snap
	# cd /tank/foo/.zfs/snapshot/snap
	# umount /tank/foo
	panic: avl_find() succeeded inside avl_add()

Reported by:	trasz
MFC after:	3 days
2009-09-07 21:46:51 +00:00
Edward Tomasz Napierala
343775c0b4 Enable NFSv4 ACL support in ZFS.
Reviewed by:	pjd
2009-09-07 19:43:13 +00:00
Pawel Jakub Dawidek
08780916dd Defer thread start until we set priority.
Reviewed by:	kib
MFC after:	3 days
2009-09-07 19:22:44 +00:00
Pawel Jakub Dawidek
c739b7b22b Don't recheck ownership on update mount. This will eliminate LOR between
vfs_busy() and mount mutex. We check ownership in vfs_domount() anyway.

Noticed by:	kib
Reviewed by:	kib
MFC after:	1 week
2009-09-07 18:54:55 +00:00
Pawel Jakub Dawidek
2ff6f0f89a - Avoid holding mutex around M_WAITOK allocations.
- Add locking for mnt_opt field.

MFC after:	1 week
2009-09-07 18:23:26 +00:00
Edward Tomasz Napierala
900b1670c4 Prevent the line from wrapping. 2009-09-07 16:56:41 +00:00
Pawel Jakub Dawidek
841bcfea21 Changing provider size is not really supported by GEOM, but doing so when
provider is closed should be ok.

When administrator requests to change ZVOL size do it immediately if ZVOL
is closed or do it on last ZVOL close.

PR:		kern/136942
Requested by:	Bernard Buri <bsd@ask-us.at>
MFC after:	1 week
2009-09-07 14:16:50 +00:00
Pawel Jakub Dawidek
5e65224daf bzero() on-stack argument, so mutex_init() won't misinterpret that the
lock is already initialized if we have some garbage on the stack.

PR:		kern/135480
Reported by:	Emil Mikulic <emikulic@gmail.com>
MFC after:	3 days
2009-09-07 11:38:43 +00:00
Edward Tomasz Napierala
a41422a93e Improve wording.
Discussed with:	pjd, cperciva, rink, wkoszek and des, in order of appearance.
2009-09-05 15:08:58 +00:00
Pawel Jakub Dawidek
26d0605727 Backport the 'dirtying dbuf' panic fix from newer ZFS version.
Reported by:	Thomas Backman <serenity@exscape.org>
MFC after:	1 week
2009-08-31 16:27:00 +00:00
Pawel Jakub Dawidek
575c1d371c Add missing mountpoint vnode locking.
This fixes panic on assertion with DEBUG_VFS_LOCKS and vfs.usermount=1 when
regular user tries to mount dataset owned by him.

MFC after:	1 week
2009-08-30 21:03:40 +00:00
Pawel Jakub Dawidek
5d5535163a - Hide ZFS kernel threads under zfskern process.
- Use better (shorter) threads names:
	'zvol:worker zvol/tank/vol00' -> 'zvol tank/vol00'
	'vdev:worker da0' -> 'vdev da0'
2009-08-23 11:33:46 +00:00
Pawel Jakub Dawidek
4ec2b0e7ce Set priority of vdev_geom threads and zvol threads to PRIBIO. 2009-08-23 11:27:08 +00:00
Pawel Jakub Dawidek
1869987e42 - Give minclsyspri and maxclsyspri real values (consulted with kmacy).
- Honour 'pri' argument for thread_create().
2009-08-23 11:22:46 +00:00
Pawel Jakub Dawidek
35ae9291c2 Our libc doesn't implement control method for XDR (only kernel does) and it
will always return failure. Fix this by bringing userland implementation of
xdrmem_control() back. This allow 'zpool import' to work again.

Reported by:	Thomas Backman <serenity@exscape.org>
Reviewed by:	kmacy
Approved by:	re (kib)
2009-08-20 00:05:29 +00:00
Pawel Jakub Dawidek
8e9fd65fbf getcwd() (when __getcwd() fails) works by stating current directory, going up
(..), calling readdir and looking for previous directory inode.  In case of
.zfs/ directory this doesn't work, because .zfs/ is hidden by default, so it
won't be visible in readdir output.

Fix this by implementing VPTOCNP for snapshot directories, so __getcwd()
doesn't fail and getcwd() doesn't have to use readdir method.

This fixes /bin/pwd from within .zfs/snapshot/<name>/.

Suggested by:	kib
Approved by:	re (rwatson)
2009-08-17 10:00:18 +00:00
Pawel Jakub Dawidek
8461b0f043 Manage asynchronous vnode release just like Solaris.
Discussed with:	kmacy
Approved by:	re (kib)
2009-08-17 09:48:34 +00:00
Pawel Jakub Dawidek
e35eb914f4 - Reduce z_teardown_lock lock scope a bit.
- The error variable is int, not bool.
- Convert spaces to tabs where needed.

Approved by:	re (kib)
2009-08-17 09:28:15 +00:00
Pawel Jakub Dawidek
0330a5dc10 If z_buf is NULL, we should free znode immediately.
Noticed by:	avg
Approved by:	re (kib)
2009-08-17 09:25:37 +00:00
Pawel Jakub Dawidek
d83cfc37a4 - We need to recycle vnode instead of freeing znode.
Submitted by:	avg

- Add missing vnode interlock unlock.
- Remove redundant znode locking.

Approved by:	re (kib)
2009-08-17 09:21:39 +00:00
Pawel Jakub Dawidek
f820bc079f Fix panic in zfs recv code. The last vnode (mountpoint's vnode) can have
0 usecount.

Reported by:	Thomas Backman <serenity@exscape.org>
Approved by:	re (kib)
2009-08-17 09:13:22 +00:00
Pawel Jakub Dawidek
159ef108e1 Remove OpenSolaris taskq port (it performs very poorly in our kernel) and
replace it with wrappers around our taskqueue(9).
To make it possible implement taskqueue_member() function which returns 1
if the given thread was created by the given taskqueue.

Approved by:	re (kib)
2009-08-17 09:01:20 +00:00
Pawel Jakub Dawidek
fddc954016 - Fix a race where /dev/zfs control device is created before ZFS is fully
initialized. Also destroy /dev/zfs before doing other deinitializations.
- Initialization through taskq is no longer needed and there is a race
  where one of the zpool/zfs command loads zfs.ko and tries to do some work
  immediately, but /dev/zfs is not there yet.

Reported by:	pav
Approved by:	re (kib)
2009-08-17 08:36:41 +00:00
Pawel Jakub Dawidek
830940567b Remove files that are no longer used.
Discussed with:	kmacy
Approved by:	re (kib)
2009-08-17 08:03:02 +00:00
Marcel Moolenaar
538e86713d Fix misalignment in nvpair_native_embedded() caused by the compiler
replacing the bzero(). See also revision 195627, which fixed the
misalignment in nvpair_native_embedded_array().

Approved by:	re (kensmith)
2009-08-16 01:48:46 +00:00
Edward Tomasz Napierala
abd370a36b Remove CDDL warning.
Approved by:	re (kib), core
2009-08-13 12:28:30 +00:00
Pawel Jakub Dawidek
abd8353f5d We don't support ephemeral IDs in FreeBSD and without this fix ZFS can
panic when in zfs_fuid_create_cred() when userid is negative. It is
converted to unsigned value which makes IS_EPHEMERAL() macro to
incorrectly report that this is ephemeral ID. The most reasonable
solution for now is to always report that the given ID is not ephemeral.

PR:		kern/132337
Submitted by:	Matthew West <freebsd@r.zeeb.org>
Tested by:	Thomas Backman <serenity@exscape.org>, Michael Reifenberger <mike@reifenberger.com>
Approved by:	re (kib)
MFC after:	2 weeks
2009-07-27 14:52:34 +00:00
Edward Tomasz Napierala
d2ceff236a Fix extattr_list_file(2) on ZFS in case the attribute directory
doesn't exist and user doesn't have write access to the file.
Without this fix, it returns bogus value instead of 0.  For some
reason this didn't manifest on my kernel compiled with -O0.

PR:		kern/136601
Submitted by:	Jaakko Heinonen <jh at saunalahti dot fi>
Approved by:	re (kib)
2009-07-22 15:15:58 +00:00
Edward Tomasz Napierala
65588fd503 Fix permission handling for extended attributes in ZFS. Without
this change, ZFS uses SunOS Alternate Data Streams semantics - each
EA has its own permissions, which are set at EA creation time
and - unlike SunOS - invisible to the user and impossible to change.
From the user point of view, it's just broken: sometimes access
is granted when it shouldn't be, sometimes it's denied when
it shouldn't be.

This patch makes it behave just like UFS, i.e. depend on current
file permissions.  Also, it fixes returned error codes (ENOATTR
instead of ENOENT) and makes listextattr(2) return 0 instead
of EPERM where there is no EA directory (i.e. the file never had
any EA).

Reviewed by:	pjd (idea, not actual code)
Approved by:	re (kib)
2009-07-20 19:16:42 +00:00
Andriy Gapon
b064b6d1cd dtrace_gethrtime: improve scaling of TSC ticks to nanoseconds
Currently dtrace_gethrtime uses formula similar to the following for
converting TSC ticks to nanoseconds:
rdtsc() * 10^9 / tsc_freq
The dividend overflows 64-bit type and wraps-around every 2^64/10^9 =
18446744073 ticks which is just a few seconds on modern machines.

Now we instead use precalculated scaling factor of
10^9*2^N/tsc_freq < 2^32 and perform TSC value multiplication separately
for each 32-bit half.  This allows to avoid overflow of the dividend
described above.
The idea is taken from OpenSolaris.
This has an added feature of always scaling TSC with invariant value
regardless of TSC frequency changes. Thus the timestamps will not be
accurate if TSC actually changes, but they are always proportional to
TSC ticks and thus monotonic. This should be much better than current
formula which produces wildly different non-monotonic results on when
tsc_freq changes.

Also drop write-only 'cp' variable from amd64 dtrace_gethrtime_init()
to make it identical to the i386 twin.

PR:		kern/127441
Tested by:	Thomas Backman <serenity@exscape.org>
Reviewed by:	jhb
Discussed with:	current@, bde, gnn
Silence from:	jb
Approved by:	re (gnn)
MFC after:	1 week
2009-07-15 17:07:39 +00:00
Konstantin Belousov
f33a947b56 Add new msleep(9) flag PBDY that shall be specified together with
PCATCH, to indicate that thread shall not be stopped upon receipt of
SIGSTOP until it reaches the kernel->usermode boundary.

Also change thread_single(SINGLE_NO_EXIT) to only stop threads at
the user boundary unconditionally.

Tested by:	pho
Reviewed by:	jhb
Approved by:	re (kensmith)
2009-07-14 22:52:46 +00:00
Marcel Moolenaar
dd8a461e83 In nvpair_native_embedded_array(), meaningless pointers are zeroed.
The programmer was aware that alignment was not guaranteed in the
packed structure and used bzero() to NULL out the pointers.
However, on ia64, the compiler is quite agressive in finding ILP
and calls to bzero() are often replaced by simple assignments (i.e.
stores). Especially when the width or size in question corresponds
with a store instruction (i.e. st1, st2, st4 or st8).

The problem here is not a compiler bug. The address of the memory
to zero-out was given by '&packed->nvl_priv' and given the type of
the 'packed' pointer the compiler could assume proper alignment for
the replacement of bzero() with an 8-byte wide store to be valid.
The problem is with the programmer. The programmer knew that the
address did not have the alignment guarantees needed for a regular
assignment, but failed to inform the compiler of that fact. In
fact, the programmer told the compiler the opposite: alignment is
guaranteed.

The fix is to avoid using a pointer of type "nvlist_t *" and
instead use a "char *" pointer as the basis for calculating the
address. This tells the compiler that only 1-byte alignment can
be assumed and the compiler will either keep the bzero() call
or instead replace it with a sequence of byte-wise stores. Both
are valid.

Approved by:	re (kib)
2009-07-11 22:43:20 +00:00
Andriy Gapon
f340e9fe71 dtrace/amd64: fix virtual address checks
On amd64 KERNBASE/kernbase does not mean start of kernel memory.
This should fix a KASSERT panic in dtrace_copycheck when copyin*()
is used in D program.
Also make checks for user memory a bit stricter.

Reported by:	Thomas Backman <serenity@exscape.org>
Submitted by:	wxs (kaddr part)
Tested by:	Thomas Backman (prototype), wxs
Reviewed by:	alc (concept), jhb, current@
Aprroved by:	jb (concept)
MFC after:	2 weeks
PR:		kern/134408
2009-06-24 16:03:57 +00:00
Konstantin Belousov
a18a95db4a O_NOFOLLOW shall be in flags, not in cmode.
Noted by:	bde
2009-06-22 10:08:48 +00:00
Konstantin Belousov
e0c161b89c Add another flags argument to vn_open_cred. Use it to specify that some
vn_open_cred invocations shall not audit namei path.

In particular, specify VN_OPEN_NOAUDIT for dotdot lookup performed by
default implementation of vop_vptocnp, and for the open done for core
file. vn_fullpath is called from the audit code, and vn_open there need
to disable audit to avoid infinite recursion. Core file is created on
return to user mode, that, in particular, happens during syscall return.
The creation of the core file is audited by direct calls, and we do not
want to overwrite audit information for syscall.

Reported, reviewed and tested by: rwatson
2009-06-21 13:41:32 +00:00
Jamie Gritton
c1f192193d Rename the host-related prison fields to be the same as the host.*
parameters they represent, and the variables they replaced, instead of
abbreviated versions of them.

Approved by:	bz (mentor)
2009-06-13 15:39:12 +00:00
Kip Macy
f0c6b798a3 pjd has requested that I keep the tunable as zfs_prefetch_disable to minimize gratuitous
differences with Opensolaris' ZFS

Sorry for the churn
2009-06-11 22:24:08 +00:00
Kip Macy
e4e5e663e0 check against prefetch_enable 2009-06-11 09:51:21 +00:00
Kip Macy
3fa5485637 use default policy for enabling prefetching unless the TUNABLE is set 2009-06-10 21:05:37 +00:00
Kip Macy
107b659450 As far as I can tell systems that have less than 4GB are more often hurt
by prefetched than helped.  On i386 systems and systems with less than 4GB,
prefetch is now disabled by default. I've added a prefetch enable tunable, to
enable prefetching for those systems. The prefetch disable tunable will continue
to unconditionally disable prefetching.
2009-06-10 01:21:32 +00:00
Paul Saab
a6d545d8ed Support shared vnode locks for write operations when the offset is
provided on filesystems that support it.  This really improves mysql
+ innodb performance on ZFS.

Reviewed by:	jhb, kmacy, jeffr
2009-06-04 16:18:07 +00:00
Doug Rabson
8be608b58c Allow the bootfs property to be set for raidz pools on FreeBSD.
Reviewed by:	pjd
2009-05-31 11:59:32 +00:00
Kip Macy
762169b50a fix xdrmem_control to be safe in an if statement
fix zfs to depend on krpc
remove xdr from zfs makefile

Submitted by:	dchagin@freebsd.org
2009-05-30 22:23:58 +00:00
Kip Macy
139ccddec0 work around snapshot shutdown race reported by Henri Hennebert 2009-05-30 19:26:35 +00:00
Jamie Gritton
76ca6f88da Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex.  Jails may
have their own host information, or they may inherit it from the
parent/system.  The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL.  The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by:	bz (mentor)
2009-05-29 21:27:12 +00:00
Attilio Rao
1ae1c2a3bd Reverse the logic for ADAPTIVE_SX option and enable it by default.
Introduce for this operation the reverse NO_ADAPTIVE_SX option.
The flag SX_ADAPTIVESPIN to be passed to sx_init_flags(9) gets suppressed
and the new flag, offering the reversed logic, SX_NOADAPTIVE is added.

Additively implements adaptive spininning for sx held in shared mode.
The spinning limit can be handled through sysctls in order to be tuned
while the code doesn't reach the release, after which time they should
be dropped probabilly.

This change has made been necessary by recent benchmarks where it does
improve concurrency of workloads in presence of high contention
(ie. ZFS).

KPI breakage is documented by __FreeBSD_version bumping, manpage and
UPDATING updates.

Requested by:	jeff, kmacy
Reviewed by:	jeff
Tested by:	pho
2009-05-29 01:49:27 +00:00
Kip Macy
c334d2d544 MFdevbranch 192944
- add FreeBSD implementation of xdrmem_control needed by zfs
 - have zfs define xdr_ops using FreeBSD's definition
 - remove solaris xdr files from zfs compile
2009-05-28 08:18:12 +00:00
Stacey Son
a5aedd68b4 Add the OpenSolaris dtrace lockstat provider. The lockstat provider
adds probes for mutexes, reader/writer and shared/exclusive locks to
gather contention statistics and other locking information for
dtrace scripts, the lockstat(1M) command and other potential
consumers.

Reviewed by:	attilio jhb jb
Approved by:	gnn (mentor)
2009-05-26 20:28:22 +00:00
Edward Tomasz Napierala
b7014134a7 Change license to more bori^Wadul^Wcanonical.
Submitted by:	rwatson@
2009-05-26 11:42:06 +00:00
Edward Tomasz Napierala
0970b4bae0 MFp4 changes neccessary for NFSv4 ACLs support in ZFS. This is mostly
about removing a few #ifdefs and providing compatibility wrappers and
VOP implementations to get and set an ACL; ZFS does ACL enforcement all
by itself.

Note that the VOPs are ifdefed out for now, so this change should be
a no-op.

Reviewed by:	pjd
2009-05-26 08:21:59 +00:00
Edward Tomasz Napierala
4076aa37dc Don't allow non-owner to set SUID bit on a file. It doesn't make
any difference now, but in NFSv4 ACLs, there is write_acl permission,
which also affects mode changes.

Reviewed by:	pjd
2009-05-24 19:21:49 +00:00
Edward Tomasz Napierala
194f4d42de Fix comment. 2009-05-24 15:48:48 +00:00
Dag-Erling Smørgrav
bba5cfd28b Unexpand $FreeBSD$. 2009-05-23 16:01:58 +00:00
Dag-Erling Smørgrav
6feca53bed Remove svn:keywords on a file that had fbsd:nokeywords (though I don't
understand the reason for the latter)
2009-05-23 16:00:16 +00:00
Kip Macy
e95d34711b - back out direct map hack
- it is no longer needed
2009-05-19 01:14:37 +00:00
Kip Macy
0fe5460dbd set createtxg prop name
PR: bin/130105
2009-05-17 04:04:25 +00:00
Kip Macy
ea41c77517 SAVESTART implies SAVENAME 2009-05-17 01:31:28 +00:00
Kip Macy
2e9c90d55b enable adaptive spinning on zfs locks 2009-05-16 23:56:45 +00:00
Kip Macy
be08aa8b59 - allow forced unmounts
- don't assume snapshot was auto-mounted
2009-05-16 20:33:13 +00:00
Kip Macy
71bc1ce36e only use direct map if system has more than 2GB 2009-05-16 20:09:07 +00:00
Kip Macy
32237d8492 apply band-aid to x86_64 systems with more physical memory than kmem by allocating from the direct map 2009-05-16 19:17:15 +00:00
Doug Rabson
e1899ef6c8 Add support for booting from raidz1 and raidz2 pools. 2009-05-16 10:48:20 +00:00
Attilio Rao
dfd233edd5 Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS.  Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled.  Bump __FreeBSD_version in order to signal such
situation.
2009-05-11 15:33:26 +00:00
Kip Macy
469ef3e563 rename xdr support files to avoid conflicts when linking in to the kernel 2009-05-11 04:18:58 +00:00
Kip Macy
8569258bf8 - rename atomic.S and crc32.c to avoid collisions when linking zfs in to the kernel
- update Makefile
- ifdef out acl_{alloc, free}, they aren't used by zfs and conflict with existing in-kernel routines
2009-05-09 01:45:55 +00:00
Marko Zec
29b02909eb Introduce a new virtualization container, provisionally named vprocg, to hold
virtualized instances of hostname and domainname, as well as a new top-level
virtualization struct vimage, which holds pointers to struct vnet and struct
vprocg.  Struct vprocg is likely to become replaced in the near future with
a new jail management API import.

As a consequence of this change, change struct ucred to point to a struct
vimage, instead of directly pointing to a vnet.

Merge vnet / vimage / ucred refcounting infrastructure from p4 / vimage
branch.

Permit kldload / kldunload operations to be executed only from the default
vimage context.

This change should have no functional impact on nooptions VIMAGE kernel
builds.

Reviewed by:	bz
Approved by:	julian (mentor)
2009-05-08 14:11:06 +00:00
Kip Macy
a6827463ad don't call vn_rele_async_fini in the !_KERNEL case 2009-05-07 23:34:41 +00:00
Kip Macy
c20fd07777 move VN_RELE_ASYNC to the compatibility layer with the rest of the VN_* defines 2009-05-07 23:02:15 +00:00
Kip Macy
6ef1a81d6e avoid LOR and gratuitous extra lock acquisitions by moving user_evict list buffers to
a temporary list
2009-05-07 21:51:13 +00:00
Kip Macy
77d0162c70 Allow the VM to provide backpressure on the ARC cache as it does
on Solaris.
2009-05-07 20:57:06 +00:00
Kip Macy
62fa227ccd Asynchronously release vnodes to avoid blocking on range locks when calling back in to zfs.
This is based on a fix that went in to opensolaris on March 9th. However, it uses a dedicated
thread instead of a Solaris' taskq to avoid doing a blocking memory allocation with the vnode
interlock held.

This fixes a long-time deadlock in ZFS. This is not, strictly speaking, an LOR. The spa_zio
thread releases a vnode, this calls in to vn_reclaim which in turn needs to acquire range locks
to sync dirty data out to disk. The range locks are already held by a user-level process waiting
on a condition variable that it the process is waiting on a spa_zio thread to signal it on. The
process could not be signalled because the spa_zio thread could not proceed.

The nature of this problem was not apparent due to ZFS locks opting out of witness which meant
that DDB did not know about the locks that were held by ZFS.

Reviewed by:	pjd
MFC after:	7 days
2009-05-07 20:28:06 +00:00
Jamie Gritton
b38ff370e4 Introduce the extensible jail framework, using the same "name=value"
interface as nmount(2).  Three new system calls are added:
* jail_set, to create jails and change the parameters of existing jails.
  This replaces jail(2).
* jail_get, to read the parameters of existing jails.  This replaces the
  security.jail.list sysctl.
* jail_remove to kill off a jail's processes and remove the jail.
Most jail parameters may now be changed after creation, and jails may be
set to exist without any attached processes.  The current jail(2) system
call still exists, though it is now a stub to jail_set(2).

Approved by:	bz (mentor)
2009-04-29 21:14:15 +00:00
Robert Watson
885868cd8f Remove VOP_LEASE and supporting functions. This hasn't been used since
the removal of NQNFS, but was left in in case it was required for NFSv4.
Since our new NFSv4 client and server can't use it for their
requirements, GC the old mechanism, as well as other unused lease-
related code and interfaces.

Due to its impact on kernel programming and binary interfaces, this
change should not be MFC'd.

Proposed by:    jeff
Reviewed by:    jeff
Discussed with: rmacklem, zach loafman @ isilon
2009-04-10 10:52:19 +00:00
Andrew Thompson
853a10a581 Revert r190676,190677
The geom and CAM changes for root_hold are the wrong solution for USB design
quirks.

Requested by:	scottl
2009-04-10 04:08:34 +00:00
Andrew Thompson
626fc9fe3d Add a how argument to root_mount_hold() so it can be passed NOWAIT and be called
in situations where sleeping isnt allowed.
2009-04-03 19:46:12 +00:00
Robert Watson
455f3aa24f Move dtnfsclient.c in the cddl tree to nfs_kdtrace.c in the nfsclient
directory, since it's under a BSD license, and this keeps NFS internals-
aware tracing parts close to NFS.

MFC after:	1 month
Suggested by:	jhb
2009-03-25 17:47:22 +00:00