Commit Graph

584 Commits

Author SHA1 Message Date
ivoras
936441dc07 Undo r216230: the interaction between saved ashift in metadata and
detected ashift does not support this. With this change, pools
created while stripesize=512 could not be imported when stripesize
becomes larger (on the same drive).

Noticed by:	pjd
2010-12-07 15:24:08 +00:00
avg
8e95976942 opensolaris cyclic: fix deadlock and make a little bit closer to upstream
The dealock was caused in the following way:
- thread T1 on CPU C1 holds a spin mutex, IPIs CPU C2 and waits for the
  IPI to be handled
- C2 executes timer interrupt filter, thus has interrupts disabled, and
  gets blocked on the spin mutex held by T1
The problem seems to have been introduced by simplifications made to
OpenSolaris code during porting.
The problem is fixed by reorganizing the code to more closely resemble
the upstream version.  Interrupt filter (cyclic_fire) now doesn't
acquire any locks, all per-CPU data accesses are performed on a
target CPU with preemption and interrupts disabled thus precluding
concurrent access to the data.
cyp_mtx spin mutex is used to disable preemtion and interrupts; it's not
used for classical mutual exclusion, because xcall already serializes
calls to a CPU.  It's an emulation of OpenSolaris
cyb_set_level(CY_HIGH_LEVEL) call, the spin mutexes could probably be
reduced to just a spinlock_enter()/_exit() pair.

Diff with upstream version is now reduced by ~500 lines, however it still
remains quite large - many things that are not needed (at the moment) or
are irrelevant on FreeBSD were simply ripped out during porting.
Examples of such things:
- support for CPU onlining/offlining
- support for suspend/resume
- support for running callouts at soft interrupt levels
- support for callout rebinding from CPU to CPU
- support for CPU partitions

Tested by:	Artem Belevich <fbsdlist@src.cx>
MFC after:	3 weeks
X-MFC with:	r216252
2010-12-07 12:25:26 +00:00
avg
c652724d35 opensolaris cyclic xcall: no need for special handling of curcpu
smp_rendezvous_cpus already properly handles current CPU case
and non-SMP case.

MFC after:	3 weeks
2010-12-07 12:04:06 +00:00
avg
1bf1cfe9bd dtrace_xcall: no need for special handling of curcpu
smp_rendezvous_cpus alreadt does the right thing in a very similar
fashion, so the code was kind of duplicating that.

MFC after:	3 weeks
2010-12-07 09:19:47 +00:00
avg
dad6f05965 dtrace_gethrtime_init: pin to master while examining other CPUs
Also use pc_cpumask to be future-friendly.

Reviewed by:	jhb
MFC after:	2 weeks
2010-12-07 09:03:17 +00:00
ivoras
e41ab538e4 Use GEOM stripesize field when calculating ashift. This will enable correct
alignment on drives with large sector sizes (e.g. 4 KiB) but the
implementation might need to be revisited if devices with large stripesizes
appear (e.g. if RAID controllers or flash drives start using the field),
probably by introducing a physsectorsize field in GEOM providers.

Discussed with: mav, mostly silence on freebsd-geom@ and freebsd-fs@
2010-12-06 12:18:02 +00:00
trasz
cb2672a1ca Don't panic when we read an empty ACL from ZFS. Apparently this may happen
with filesystems created under MacOS X ZFS port.  This is kind of filesystem
corruption (we don't allow for setting empty ACLs), so make acl_get_file(3)
and related syscalls fail with EINVAL in that case.  In theory, we could
return empty ACL to userland, but I'm afraid this would break some code.

MFC after:	3 days
2010-11-30 21:04:05 +00:00
avg
3317f727fa zfs+sendfile: populate all requested pages, not just those already cached
kern_sendfile() uses vm_rdwr() to read-ahead blocks of data to populate
page cache.  When sendfile stumbles upon a page that is not populated
yet, it sends out all the mbufs that it collected so far.  This
resulted in very poor performance with ZFS when file data is not in the
page cache, because ZFS vop_read for UIO_NOCOPY case populated only
those pages that are already in cache, but not valid.  Which means that
most of the time it populated only the first requested page in the
described above scenario.

Reported by:	Alexander Zagrebin <alexz@visp.ru>
Tested by:	Alexander Zagrebin <alexz@visp.ru>,
		Artemiev Igor <ai@kliksys.ru>
MFC after:	12 days
2010-11-16 15:53:44 +00:00
avg
ea5ab33248 fix misspelling in a comment
Reported by:	Daniel Braniss <danny@cs.huji.ac.il>
MFC after:	3 days
2010-11-16 12:30:47 +00:00
mm
8da817e6eb Disable VFS_HOLD placed on mnt_vnodecovered during the mount of a snapshot
and VFS_RELE on a non-existing hold on snapshot parent's z_vfs.

This disables the changes from OpenSolaris onnv-revision 9234:bffdc4fc05c4
(bug IDs: 6792139, 6794830) - not applicable to FreeBSD.

This fixes the process hang if umounting a manually mounted snapshot.

Reported by:	Alexander Zagrebin <alexz@visp.ru>
Approved by:	delphij (mentor)
MFC after:	1 week
2010-11-13 21:09:18 +00:00
delphij
070520ecc1 Validate whether the zfs_cmd_t submitted from userland is not smaller than
what we have.  Without the check the kernel could accessing memory that
does not belong to the request struct.

Note that we do not test if the struct equals in size at this time, which
may faciliate forward compatibility with newer binaries.

Reviewed by:	pjd at MeetBSD CA '2010
MFC after:	1 week
2010-11-05 22:18:09 +00:00
mm
2a63368855 Bugfix merge from OpenSolaris:
OpenSolaris onnv-revision:	10209:91f47f0e7728
6830541	zfs_get_data_trips on a verify
6696242	multiple zfs_fillpage() zfs: accessing past end of object panics
6785914	zfs fails to drop dn_struct_rwlock in recovery code path

Approved by:	delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6830541, 6696242, 6785914)
MFC after:	2 weeks
2010-10-26 15:48:03 +00:00
avg
6634407233 zfs: add vop_getpages method implementation
This should make vnode_pager_getpages path a bit shorter and clearer.
Also this should eliminate problems with partially valid pages.
Having this method opens room for future optimizations.

To do: try to satisfy other pages besides the required one taking into
account tradeofs between number of page faults, read throughput and read
latency.  Also, eventually vop_putpages should be added too.

Reviewed by:	kib, mm, pjd
MFC after:	3 weeks
2010-10-16 20:43:05 +00:00
rpaulo
1a7d335fab Pass a format string to panic() and to taskqueue_start_threads().
Found with:	clang
2010-10-13 17:13:43 +00:00
rpaulo
53e67c9191 In zfs_post_common(), use %d instead of %hhu.
Found with:	clang
2010-10-13 17:12:23 +00:00
avg
3bb689aafa zfs + sendfile: do not produce partially valid pages for vnode's tail
Since r212650 and before this change sendfile(2) could produce
a partially valid page for a trailing portion of a ZFS vnode.
vm_fault() always wants to see a fully valid page even if it's the last
page that partially extends beyond vnode's end.  Otherwise it calls
vop_getpages() to bring in the page.  In the case of ZFS this means
that the data is read from the page into the same page and this breaks
checks in ZFS mappedread() - a thread that set VPO_BUSY on the page in
vm_fault() will get blocked forever waiting for it to be cleared.

Many thanks to Kai and Jeremy for reproducing the issue and providing
important debugging information and help.

Reported by:	Kai Gallasch <gallasch@free.de>,
		Jeremy Chadwick <freebsd@jdc.parodius.com>
Tested by:	Kai Gallasch <gallasch@free.de>,
		Jeremy Chadwick <freebsd@jdc.parodius.com>
Reviewed by:	kib
MFC after:	3 days
To-Do:		apply the same treatment to tmpfs + sendfile
2010-10-12 17:04:21 +00:00
pjd
4964d26ba0 Provide internal ioflags() function that converts ioflag provided by FreeBSD's
VFS to OpenSolaris-specific ioflag expected by ZFS. Use it for read and write
operations.

Reviewed by:	mm
MFC after:	1 week
2010-10-10 20:49:33 +00:00
mm
6afff59f3c Change FAPPEND to IO_APPEND as this is a ioflag and not a fflag.
This corrects writing to append-only files on ZFS.

PR:		kern/149495 [1], kern/151082 [2]
Submitted by:	Daniel Zhelev <daniel@zhelev.biz> [1], Michael Naef <cal@linu.gs> [2]
Approved by:	delphij (mentor)
MFC after:	1 week
2010-10-08 23:01:38 +00:00
avg
86ef760035 opensolaris_kmem kmem_size(): report lesser of vm_kmem_size and available
physical memory

This is needed to correctly autotune ZFS ARC size when vm_kmem_size is
set to value larger than available physical memory.

MFC after:	2 weeks
2010-10-07 18:16:14 +00:00
mm
240333eaa7 Properly handle IO with B_FAILFAST
Retry IO once with ZIO_FLAG_TRYHARD before declaring a pool faulted

OpenSolaris revision and Bug IDs:

9725:0bf7402e8022
6843014 ZFS B_FAILFAST handling is broken

Approved by:	delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6843014)
MFC after:	3 weeks
2010-09-27 09:42:31 +00:00
mm
06d7ad0883 Enable offlining of log devices.
OpenSolaris revision and Bug IDs:

9701:cc5b64682e64
6803605	should be able to offline log devices
6726045	vdev_deflate_ratio is not set when offlining a log device
6599442	zpool import has faults in the display

Approved by:	delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6803605, 6726045, 6599442)
MFC after:	3 weeks
2010-09-27 09:05:51 +00:00
avg
daa9ae80c8 zfs_map_page/zfs_unmap_page: do not use sched_pin() and SFB_CPUPRIVATE
zfs_map_page/zfs_unmap_page are mostly called around potential I/O paths
and it seems to be a not very good idea to do cpu pinning there.

Suggested by:	kib
MFC after:	2 weeks
2010-09-21 05:58:45 +00:00
avg
1eb3acf170 zfs_vnops: use zfs_map_page/zfs_unmap_page helper functions in another place
MFC after:	2 weeks
2010-09-21 05:54:36 +00:00
avg
c133aaf445 zfs arc_reclaim_needed: fix typo in mismerge in r212780
PR:		kern/146410, kern/138790
MFC after:	3 weeks
X-MFC with:	r212780
2010-09-17 07:34:50 +00:00
avg
c6e876c6a2 zfs+sendfile: advance uio_offset upon reading as well
Picked from analogous code in tmpfs.

MFC after:	1 week
2010-09-17 07:20:20 +00:00
avg
adea260b59 zfs arc_reclaim_needed: remove redundant checks for arc_c_max and arc_c_max
Those checks are not present in upstream code and they are enforced in
actual calculations of delta by which ARC size can be grown or should be
reduced.

MFC after:	3 weeks
2010-09-17 07:17:38 +00:00
avg
c94255f128 zfs arc_reclaim_needed: more reasonable threshold for available pages
vm_paging_target() is not a trigger of any kind for pageademon, but
rather a "soft" target for it when it's already triggered.
Thus, trying to keep 2048 pages above that level at the expense of ARC
was simply driving ARC size into the ground even with normal memory
loads.
Instead, use a threshold at which a pagedaemon scan is triggered, so
that ARC reclaiming helps with pagedaemon's task, but the latter still
recycles active and inactive pages.

PR:		kern/146410, kern/138790
MFC after:	3 weeks
2010-09-17 07:14:07 +00:00
mm
619300afba Fix kernel panic when moving a file to .zfs/shares
Fix possible loss of correct error return code in ZFS mount

OpenSolaris revisions and Bug IDs:

11824:53128e5db7cf
6863610	ZFS mount can lose correct error return

12079:13822b941977
6939941	problem with moving files in zfs (142901-12)

Approved by:	delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6863610, 6939941)
MFC after:	3 days
2010-09-15 19:55:26 +00:00
avg
4e51477b36 zfs vn_has_cached_data: take into account v_object->cache != NULL
This mirrors code in tmpfs.
This changge shouldn't affect much read path, it may cause unnecessary
vm_page_lookup calls in the case where v_object has no active or inactive
pages but has some cache pages.  I believe this situation to be non-essential.

In write path this change should allow us to properly detect the above
case and free a cache page when we write to a range that corresponds to it.
If this situation is undetected then we could have a discrepancy between
data in page cache and in ARC or on disk.

This change allows us to re-enable vn_has_cached_data() check in zfs_write.

NOTE: strictly speaking resident_page_count and cache fields of v_object
should be exmined under VM_OBJECT_LOCK, but for this particular usage
we may get away with it.

Discussed with:	alc, kib
Approved by:	pjd
Tested with:	tools/regression/fsx
MFC after:	3 weeks
2010-09-15 11:05:41 +00:00
avg
23cfa76cb6 zfs mappedread, update_pages: use int for offset and length within a page
uint64_t, int64_t were redundant there

Approved by:	pjd
Tested by:	tools/regression/fsx
MFC after:	2 weeks
2010-09-15 10:48:16 +00:00
avg
204e0a0dec zfs mappedread: use uiomove_fromphys where possible
Reviewed by:	alc
Approved by:	pjd
Tested by:	tools/regression/fsx
MFC after:	2 weeks
2010-09-15 10:44:20 +00:00
avg
ddc5721620 zfs: catch up with vm_page_sleep_if_busy changes
Reviewed by:	alc
Approved by:	pjd
Tested by:	tools/regression/fsx
MFC after:	2 weeks
2010-09-15 10:39:21 +00:00
avg
65a73d5f0b tmpfs, zfs + sendfile: mark page bits as valid after populating it with data
Otherwise, adding insult to injury, in addition to double-caching of data
we would always copy the data into a vnode's vm object page from backend.
This is specific to sendfile case only (VOP_READ with UIO_NOCOPY).

PR:		kern/141305
Reported by:	Wiktor Niesiobedzki <bsd@vink.pl>
Reviewed by:	alc
Tested by:	tools/regression/sockets/sendfile
MFC after:	2 weeks
2010-09-15 10:31:27 +00:00
mm
9525362cee Remove duplicated VFS_HOLD due to a mismerge.
PR:		kern/150544
Approved by:	delphij (mentor)
MFC after:	1 day
2010-09-14 12:12:18 +00:00
mm
af9e1720ca Add missing vop_vector zfsctl_ops_shares
Add missing locks around VOP_READDIR and VOP_GETATTR with z_shares_dir

PR:		kern/150544
Approved by:	delphij (mentor)
Obtained from:	perforce (pjd)
MFC after:	1 day
2010-09-14 10:27:32 +00:00
pjd
7766ec39d4 Remove the page queues lock around vm_page_undirty() - it is no longer needed.
Reviewed by:	alc
2010-09-13 19:47:09 +00:00
rpaulo
f29acedb77 Revamp locking a bit. This fixes three problems:
* processes now can't go away while we are inserting probes (fixes a panic)
* if a trap happens, we won't be holding the process lock (fixes a hang)
* fix a LOR between the process lock and the fasttrap bucket list lock

Thanks to kib for pointing some problems.
Sponsored by:	The FreeBSD Foundation
2010-09-12 14:12:16 +00:00
rpaulo
fe4368ea53 Avoid a LOR (sleepable after non-sleepable) in
fasttrap_tracepoint_enable().

Sponsored by:	The FreeBSD Foundation
2010-09-11 12:58:31 +00:00
mdf
ab3a8b533a Replace sbuf_overflowed() with sbuf_error(), which returns any error
code associated with overflow or with the drain function.  While this
function is not expected to be used often, it produces more information
in the form of an errno that sbuf_overflowed() did.
2010-09-10 16:42:16 +00:00
pjd
746c214807 Forgot to commit this file. Add ZPOOL_CONFIG_IS_LOG.
Reported by:	keramida
MFC after:	2 weeks
2010-09-10 04:44:13 +00:00
pjd
87a79d2bea On FreeBSD we can log from pool that have multiple top-level vdevs or log
vdevs, so don't deny adding new vdevs if bootfs property is set.

MFC after:	2 weeks
2010-09-09 21:20:18 +00:00
rpaulo
a2f2f93652 Fix two bugs in DTrace:
* when the process exits, remove the associated USDT probes
* when the process forks, duplicate the USDT probes.

Sponsored by:	The FreeBSD Foundation
2010-09-09 09:58:05 +00:00
gibbs
6833acab2d Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic.
Add the BIO_ORDERED flag for struct bio and update bio clients to use it.

The barrier semantics of bioq_insert_tail() were broken in two ways:

 o In bioq_disksort(), an added bio could be inserted at the head of
   the queue, even when a barrier was present, if the sort key for
   the new entry was less than that of the last queued barrier bio.

 o The last_offset used to generate the sort key for newly queued bios
   did not stay at the position of the barrier until either the
   barrier was de-queued, or a new barrier (which updates last_offset)
   was queued.  When a barrier is in effect, we know that the disk
   will pass through the barrier position just before the
   "blocked bios" are released, so using the barrier's offset for
   last_offset is the optimal choice.

sys/geom/sched/subr_disk.c:
sys/kern/subr_disk.c:
	o Update last_offset in bioq_insert_tail().

	o Only update last_offset in bioq_remove() if the removed bio is
	  at the head of the queue (typically due to a call via
	  bioq_takefirst()) and no barrier is active.

	o In bioq_disksort(), if we have a barrier (insert_point is non-NULL),
	  set prev to the barrier and cur to it's next element.  Now that
	  last_offset is kept at the barrier position, this change isn't
	  strictly necessary, but since we have to take a decision branch
	  anyway, it does avoid one, no-op, loop iteration in the while
	  loop that immediately follows.

	o In bioq_disksort(), bypass the normal sort for bios with the
	  BIO_ORDERED attribute and instead insert them into the queue
	  with bioq_insert_tail().  bioq_insert_tail() not only gives
	  the desired command order during insertion, but also provides
	  barrier semantics so that commands disksorted in the future
	  cannot pass the just enqueued transaction.

sys/sys/bio.h:
	Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio.

sys/cam/ata/ata_da.c:
sys/cam/scsi/scsi_da.c
	Use an ordered command for SCSI/ATA-NCQ commands issued in
	response to bios with the BIO_ORDERED flag set.

sys/cam/scsi/scsi_da.c
	Use an ordered tag when issuing a synchronize cache command.

	Wrap some lines to 80 columns.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
sys/geom/geom_io.c
	Mark bios with the BIO_FLUSH command as BIO_ORDERED.

Sponsored by:	Spectra Logic Corporation
MFC after:	1 month
2010-09-02 19:40:28 +00:00
rpaulo
0a04223ce6 Make the /dev/dtrace/helper node have the mode 0660. This allows
programs that refuse to run as root (pgsql) to install probes when their
user is part of the wheel group.

Sponsored by:	The FreeBSD Foundation
> Description of fields to fill in above:                     76 columns --|
> PR:            If a GNATS PR is affected by the change.
> Submitted by:  If someone else sent in the change.
> Reviewed by:   If someone else reviewed your modification.
> Approved by:   If you needed approval for this commit.
> Obtained from: If the change is from a third party.
> MFC after:     N [day[s]|week[s]|month[s]].  Request a reminder email.
> Security:      Vulnerability reference (one per line) or description.
> Empty fields above will be automatically removed.

M    dev/dtrace/dtrace_load.c
2010-09-01 12:08:32 +00:00
jh
c2cb836190 execve(2) has a special check for file permissions: a file must have at
least one execute bit set, otherwise execve(2) will return EACCES even
for an user with PRIV_VFS_EXEC privilege.

Add the check also to vaccess(9), vaccess_acl_nfs4(9) and
vaccess_acl_posix1e(9). This makes access(2) to better agree with
execve(2). Because ZFS doesn't use vaccess(9) for VEXEC, add the check
to zfs_freebsd_access() too. There may be other file systems which are
not using vaccess*() functions and need to be handled separately.

PR:		kern/125009
Reviewed by:	bde, trasz
Approved by:	pjd (ZFS part)
2010-08-30 16:30:18 +00:00
pjd
8814cacd23 Return NULL pointer instead of B_FALSE as it is done in the vendor code.
Obtained from:	//depot/user/pjd/zfs/...
2010-08-28 19:29:06 +00:00
pjd
d26210a142 Move ZUT_OBJS in the same place that is used in vendor code.
Obtained from:	//depot/user/pjd/zfs/...
2010-08-28 19:28:12 +00:00
mm
a5c8d0424b Import changes from OpenSolaris that provide
- better ACL caching and speedup of ACL permission checks
- faster handling of stat()
- lowered mutex contention in the read/writer lock (rrwlock)
- several related bugfixes

Detailed information (OpenSolaris onnv changesets and Bug IDs):

9749:105f407a2680
6802734	Support for Access Based Enumeration (not used on FreeBSD)
6844861	inconsistent xattr readdir behavior with too-small buffer

9866:ddc5f1d8eb4e
6848431	zfs with rstchown=0 or file_chown_self privilege allows user to "take" ownership

9981:b4907297e740
6775100	stat() performance on files on zfs should be improved
6827779	rrwlock is overly protective of its counters

10143:d2d432dfe597
6857433	memory leaks found at: zfs_acl_alloc/zfs_acl_node_alloc
6860318	truncate() on zfsroot succeeds when file has a component of its path set without access permission

10232:f37b85f7e03e
6865875	zfs sometimes incorrectly giving search access to a dir

10250:b179ceb34b62
6867395	zpool_upgrade_007_pos testcase panic'd with BAD TRAP: type=e (#pf Page fault)

10269:2788675568fd
6868276	zfs_rezget() can be hazardous when znode has a cached ACL

10295:f7a18a1e9610
6870564	panic in zfs_getsecattr

Approved by:	delphij (mentor)
Obtained from:	OpenSolaris (multiple Bug IDs)
MFC after:	2 weeks
2010-08-28 09:24:11 +00:00
mm
465a8c9841 Update ZFS metaslab code from OpenSolaris.
This provides a noticeable write speedup, especially on pools with
less than 30% of free space.

Detailed information (OpenSolaris onnv changesets and Bug IDs):

11146:7e58f40bcb1c
6826241	Sync write IOPS drops dramatically during TXG sync
6869229	zfs should switch to shiny new metaslabs more frequently

11728:59fdb3b856f6
6918420	zdb -m has issues printing metaslab statistics

12047:7c1fcc8419ca
6917066	zfs block picking can be improved

Approved by:	delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6826241, 6869229, 6918420, 6917066)
MFC after:	2 weeks
2010-08-28 08:59:55 +00:00
rpaulo
9067eec20f Remove debugging.
Sponsored by:	The FreeBSD Foundation
2010-08-28 08:39:37 +00:00
rpaulo
37076417ff Replace a memory barrier with a mutex barrier.
Sponsored by:	The FreeBSD Foundation
2010-08-28 08:13:38 +00:00
pjd
1d0bc060d7 Use ZFS_CTLDIR_NAME instead of hardcoding ".zfs". 2010-08-27 21:31:15 +00:00
pjd
8a3810443d Update comment now that I finally committed r211854.
MFC after:	1 month
2010-08-26 23:44:32 +00:00
avg
e53e842194 zfs arc_reclaim_thread: no need to call arc_reclaim_needed when
resetting needfree

needfree is checked at the very start of arc_reclaim_needed.
This change makes code easier to follow and maintain in face of
potential changed in arc_reclaim_needed.

Also, put the whole sub-block under _KERNEL because needfree can be set
only in kernel code.

To do: rename needfree to something else to aovid confusion with
OpenSolaris global variable of the same name which is used in the same
code, but has different meaning (page deficit).

Note: I have an impression that locking around accesses to this variable
as well as mutual notifications between arc_reclaim_thread and
arc_lowmem are not proper.

MFC after:	1 week
2010-08-24 17:48:22 +00:00
rpaulo
8c74e06a34 Replace a pksignal() call with tdksignal().
Pointed out by:	kib
2010-08-24 12:12:03 +00:00
rpaulo
64690e8feb MD fasttrap implementation.
Sponsored by:	The FreeBSD Foundation
2010-08-24 12:05:58 +00:00
rpaulo
df7cc01ea2 Port the fasttrap provider to FreeBSD. This provider is responsible for
injecting debugging probes in the userland programs and is the basis for
the pid provider and the usdt provider.

Sponsored by:	The FreeBSD Foundation
2010-08-24 11:11:58 +00:00
rpaulo
8111273b84 Port this to FreeBSD. We miss some suword functions, so we use copyout.
Sponsored by:	The FreeBSD Foundation
> Description of fields to fill in above:                     76 columns --|
> PR:            If a GNATS PR is affected by the change.
> Submitted by:  If someone else sent in the change.
> Reviewed by:   If someone else reviewed your modification.
> Approved by:   If you needed approval for this commit.
> Obtained from: If the change is from a third party.
> MFC after:     N [day[s]|week[s]|month[s]].  Request a reminder email.
> Security:      Vulnerability reference (one per line) or description.
> Empty fields above will be automatically removed.

M    sys/fasttrap_impl.h
2010-08-22 11:41:06 +00:00
rpaulo
9760737bfb Destroy the helper device when unloading.
Sponsored by:	The FreeBSD Foundation
2010-08-22 11:05:37 +00:00
rpaulo
ed2c978b97 Add more compatibility structure members needed by the upcoming fasttrap
DTrace device.

Sponsored by:	The FreeBSD Foundation
2010-08-22 11:04:43 +00:00
rpaulo
2b32a31ca3 Kernel DTrace support for:
o uregs  (sson@)
o ustack (sson@)
o /dev/dtrace/helper device (needed for USDT probes)

The work done by me was:
Sponsored by:	The FreeBSD Foundation
2010-08-22 10:53:32 +00:00
rpaulo
3a636008c7 Add a function compatibility function dtrace_instr_size_isa() that on
FreeBSD does the same as dtrace_dis_isize().

Sponsored by:	The FreeBSD Foundation
2010-08-22 10:40:15 +00:00
rpaulo
916a9e3325 Add the FreeBSD definition for the fasttrap ioctls.
Sponsored by:	The FreeBSD Foundation
2010-08-22 10:13:56 +00:00
rpaulo
e29e410357 Add a sysname char * to struct opensolaris_utsname.
Sponsored by:	The FreeBSD Foundation
2010-08-21 14:09:24 +00:00
rpaulo
b8de098092 Port the DTrace helper ioctls to FreeBSD and add a helper member to
dof_helper_t (needed by drti.o).

Sponsored by:	The FreeBSD Foundation
2010-08-21 11:58:08 +00:00
rpaulo
232561a60d Add sysname to struct opensolaris_utsname. This is needed by one DTrace
test.

Sponsored by:	The FreeBSD Foundation
2010-08-21 11:41:32 +00:00
imp
48183e9b23 First cut at mips n64 ABI support 2010-08-19 03:31:26 +00:00
pjd
481dcd416a In FreeBSD we use 'jailed' property.
MFC after:	2 weeks
2010-08-07 10:23:54 +00:00
mm
e67ee04556 Import two changesets from OpenSolaris to make future updates easier.
The changes do not affect FreeBSD code because
zfs_znode_move(), cleanlocks() and cleanshares() are not used.

OpenSolaris onnv changeset:	9788:f660bc44f2e8, 9909:aa280f585a3e

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6843700, 6790232)
MFC after:	7 weeks
2010-07-25 15:17:24 +00:00
mm
2e35298c1d Consider snapshots as descendants via zfs allow -d
OpenSolaris onnv changeset:	9847:2f3ba86e857a

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6809340)
MFC after:	1 week
2010-07-24 22:28:29 +00:00
avg
4663240679 zfs arc_memory_throttle: available memory is free + cache
OpenSolaris freemem has the same meaning as our v_free_count +
v_cache_count.

Obtained from:	Artem Belevich <fbsdlist@src.cx>,
		Peter Jeremy <peterjeremy@acm.org>
Discussed with:	pjd
MFC after:	2 weeks
2010-07-23 17:44:01 +00:00
mm
db407a02da Enable fake resolving of SMB RIDs by using nulldomain and UID_NOBODY
- fixes panics when Solaris/OpenSolaris pools that contain files
uploaded with the SMB protocol are accessed

Enable seting/unsetting the sharesmb property (dummy action)
- allows users who import pools from Solaris/Opensolaris to unset
the sharesmb property and get rid of annoying messages

PR:		kern/145778, kern/148709
Approved by:	pjd, delphij (mentor)
MFC after:	7 weeks
2010-07-22 23:30:24 +00:00
mm
41e1927dcc To improve latency, lower default vfs.zfs.vdev.max_pending from 35 to 10
OpenSolaris onnv changeset (partial):	10801:e0bf032e8673

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6891731)
MFC after:	1 week
2010-07-20 05:22:14 +00:00
nwhitehorn
f613d35061 Add OpenSolaris atomics for powerpc64 and connect ZFS to the build on
this platform.

Reviewed by:	pjd
2010-07-17 13:34:01 +00:00
nwhitehorn
5d6878ba3a Increase stack size for ZFS sync thread. This is required to make ZFS
function on 64-bit PowerPC.

Reviewed by:	pjd
Obtained from:	OpenSolaris changeset 14653:7cf402a7f374
2010-07-17 13:31:27 +00:00
jhb
7f218ea7f4 Revert the previous commit. The race is not applicable to the lockmgr
implementation in 8.0 and later as its flags field does not hold dynamic
state such as waiters flags, but is only modified in lockinit() aside
from VN_LOCK_*().

Discussed with:	attilio
2010-07-16 19:52:03 +00:00
jhb
ea417bf09a When the MNTK_EXTENDED_SHARED mount option was added, some filesystems were
changed to defer the setting of VN_LOCK_ASHARE() (which clears LK_NOSHARE
in the vnode lock's flags) until after they had determined if the vnode was
a FIFO.  This occurs after the vnode has been inserted a VFS hash or some
similar table, so it is possible for another thread to find this vnode via
vget() on an i-node number and block on the vnode lock.  If the lockmgr
interlock (vnode interlock for vnode locks) is not held when clearing the
LK_NOSHARE flag, then the lk_flags field can be clobbered.  As a result
the thread blocked on the vnode lock may never get woken up.  Fix this by
holding the vnode interlock while modifying the lock flags in this case.

MFC after:	3 days
2010-07-16 19:20:20 +00:00
mm
b2946e8934 Merge ZFS version 15 and almost all OpenSolaris bugfixes referenced
in Solaris 10 updates 141445-09 and 142901-14.

Detailed information:
(OpenSolaris revisions and Bug IDs, Solaris 10 patch numbers)

7844:effed23820ae
6755435	zfs_open() and zfs_close() needs to use ZFS_ENTER/ZFS_VERIFY_ZP (141445-01)

7897:e520d8258820
6748436	inconsistent zpool.cache in boot_archive could panic a zfs root filesystem upon boot-up (141445-01)

7965:b795da521357
6740164	zpool attach can create an illegal root pool (141909-02)

8084:b811cc60d650
6769612	zpool_import() will continue to write to cachefile even if altroot is set (N/A)

8121:7fd09d4ebd9c
6757430	want an option for zdb to disable space map loading and leak tracking (141445-01)

8129:e4f45a0bfbb0
6542860	ASSERT: reason != VDEV_LABEL_REMOVE||vdev_inuse(vd, crtxg, reason, 0) (141445-01)

8188:fd00c0a81e80
6761100	want zdb option to select older uberblocks (141445-01)

8190:6eeea43ced42
6774886	zfs_setattr() won't allow ndmp to restore SUNWattr_rw (141445-01)

8225:59a9961c2aeb
6737463	panic while trying to write out config file if root pool import fails (141445-01)

8227:f7d7be9b1f56
6765294	Refactor replay (141445-01)

8228:51e9ca9ee3a5
6572357	libzfs should do more to avoid mnttab lookups (141909-01)
6572376	zfs_iter_filesystems and zfs_iter_snapshots get objset stats twice (141909-01)

8241:5a60f16123ba
6328632	zpool offline is a bit too conservative (141445-01)
6739487	ASSERT: txg <= spa_final_txg due to scrub/export race (141445-01)
6767129	ASSERT: cvd->vdev_isspare, in spa_vdev_detach() (141445-01)
6747698	checksum failures after offline -t / export / import / scrub (141445-01)
6745863	ZFS writes to disk after it has been offlined (141445-01)
6722540	50% slowdown on scrub/resilver with certain vdev configurations (141445-01)
6759999	resilver logic rewrites ditto blocks on both source and destination (141445-01)
6758107	I/O should never suspend during spa_load() (141445-01)
6776548	codereview(1) runs off the page when faced with multi-line comments (N/A)
6761406	AMD errata 91 workaround doesn't work on 64-bit systems (141445-01)

8242:e46e4b2f0a03
6770866	GRUB/ZFS should require physical path or devid, but not both (141445-01)

8269:03a7e9050cfd
6674216	"zfs share" doesn't work, but "zfs set sharenfs=on" does (141445-01)
6621164	$SRC/cmd/zfs/zfs_main.c seems to have a syntax error in the translation note (141445-01)
6635482	i18n problems in libzfs_dataset.c and zfs_main.c (141445-01)
6595194	"zfs get" VALUE column is as wide as NAME (141445-01)
6722991	vdev_disk.c: error checking for ddi_pathname_to_dev_t() must test for NODEV (141445-01)
6396518	ASSERT strings shouldn't be pre-processed (141445-01)

8274:846b39508aff
6713916	scrub/resilver needlessly decompress data (141445-01)

8343:655db2375fed
6739553	libzfs_status msgid table is out of sync (141445-01)
6784104	libzfs unfairly rejects numerical values greater than 2^63 (141445-01)
6784108	zfs_realloc() should not free original memory on failure (141445-01)

8525:e0e0e525d0f8
6788830	set large value to reservation cause core dump (141445-01)
6791064	want sysevents for ZFS scrub (141445-01)
6791066	need to be able to set cachefile on faulted pools (141445-01)
6791071	zpool_do_import() should not enable datasets on faulted pools (141445-01)
6792134	getting multiple properties on a faulted pool leads to confusion (141445-01)

8547:bcc7b46e5ff7
6792884	Vista clients cannot access .zfs (141445-01)

8632:36ef517870a3
6798384	It can take a village to raise a zio (141445-01)

8636:7e4ce9158df3
6551866	deadlock between zfs_write(), zfs_freesp(), and zfs_putapage() (141909-01)
6504953	zfs_getpage() misunderstands VOP_GETPAGE() interface (141909-01)
6702206	ZFS read/writer lock contention throttles sendfile() benchmark (141445-01)
6780491	Zone on a ZFS filesystem has poor fork/exec performance (141445-01)
6747596	assertion failed: DVA_EQUAL(BP_IDENTITY(&zio->io_bp_orig), BP_IDENTITY(zio->io_bp))); (141445-01)

8692:692d4668b40d
6801507	ZFS read aggregation should not mind the gap (141445-01)

8697:e62d2612c14d
6633095	creating a filesystem with many properties set is slow (141445-01)

8768:dfecfdbb27ed
6775697	oracle crashes when overwriting after hitting quota on zfs (141909-01)

8811:f8deccf701cf
6790687	libzfs mnttab caching ignores external changes (141445-01)
6791101	memory leak from libzfs_mnttab_init (141445-01)

8845:91af0d9c0790
6800942	smb_session_create() incorrectly stores IP addresses (N/A)
6582163	Access Control List (ACL) for shares (141445-01)
6804954	smb_search - shortname field should be space padded following the NULL terminator (N/A)
6800184	Panic at smb_oplock_conflict+0x35() (N/A)

8876:59d2e67b4b65
6803822	Reboot after replacement of system disk in a ZFS mirror drops to grub> prompt (141445-01)

8924:5af812f84759
6789318	coredump when issue zdb -uuuu poolname/ (141445-01)
6790345 zdb -dddd -e poolname coredump (141445-01)
6797109 zdb: 'zdb -dddddd pool_name/fs_name inode' coredump if the file with inode was deleted (141445-01)
6797118 zdb: 'zdb -dddddd poolname inum' coredump if I miss the fs name (141445-01)
6803343 shareiscsi=on failed, iscsitgtd failed request to share (141445-01)

9030:243fd360d81f
6815893	hang mounting a dataset after booting into a new boot environment (141445-01)

9056:826e1858a846
6809691	'zpool create -f' no longer overwrites ufs infomation (141445-01)

9179:d8fbd96b79b3
6790064	zfs needs to determine uid and gid earlier in create process (141445-01)

9214:8d350e5d04aa
6604992	forced unmount + being in .zfs/snapshot/<snap1> = not happy (141909-01)
6810367	assertion failed: dvp->v_flag & VROOT, file: ../../common/fs/gfs.c, line: 426 (141909-01)

9229:e3f8b41e5db4
6807765	ztest_dsl_dataset_promote_busy needs to clean up after ENOSPC (141445-01)

9230:e4561e3eb1ef
6821169	offlining a device results in checksum errors (141445-01)
6821170	ZFS should not increment error stats for unavailable devices (141445-01)
6824006	need to increase issue and interrupt taskqs threads in zfs (141445-01)

9234:bffdc4fc05c4
6792139	recovering from a suspended pool needs some work (141445-01)
6794830	reboot command hangs on a failed zfs pool (141445-01)

9246:67c03c93c071
6824062	System panicked in zfs_mount due to NULL pointer dereference when running btts and svvs tests (141909-01)

9276:a8a7fc849933
6816124	System crash running zpool destroy on broken zpool (141445-03)

9355:09928982c591
6818183	zfs snapshot -r is slow due to set_snap_props() doing txg_wait_synced() for each new snapshot (141445-03)

9391:413d0661ef33
6710376	log device can show incorrect status when other parts of pool are degraded (141445-03)

9396:f41cf682d0d3 (part already merged)
6501037	want user/group quotas on ZFS (141445-03)
6827260	assertion failed in arc_read(): hdr == pbuf->b_hdr (141445-03)
6815592	panic: No such hold X on refcount Y from zfs_znode_move (141445-03)
6759986	zfs list shows temporary %clone when doing online zfs recv (141445-03)

9404:319573cd93f8
6774713	zfs ignores canmount=noauto when sharenfs property != off (141445-03)

9412:4aefd8704ce0
6717022	ZFS DMU needs zero-copy support (141445-03)

9425:e7ffacaec3a8
6799895	spa_add_spares() needs to be protected by config lock (141445-03)
6826466	want to post sysevents on hot spare activation (141445-03)
6826468	spa 'allowfaulted' needs some work (141445-03)
6826469	kernel support for storing vdev FRU information (141445-03)
6826470	skip posting checksum errors from DTL regions of leaf vdevs (141445-03)
6826471	I/O errors after device remove probe can confuse FMA (141445-03)
6826472	spares should enjoy some of the benefits of cache devices (141445-03)

9443:2a96d8478e95
6833711	gang leaders shouldn't have to be logical (141445-03)

9463:d0bd231c7518
6764124	want zdb to be able to checksum metadata blocks only (141445-03)

9465:8372081b8019
6830237	zfs panic in zfs_groupmember() (141445-03)

9466:1fdfd1fed9c4
6833162	phantom log device in zpool status (141445-03)

9469:4f68f041ddcd
6824968	add ZFS userquota support to rquotad (141445-03)

9470:6d827468d7b5
6834217	godfather I/O should reexecute (141445-03)

9480:fcff33da767f
6596237	Stop looking and start ganging (141909-02)

9493:9933d599bc93
6623978	lwb->lwb_buf != NULL, file ../../../uts/common/fs/zfs/zil.c, line 787, function zil_lwb_commit (141445-06)

9512:64cafcbcc337
6801810	Commit of aligned streaming rewrites to ZIL device causes unwanted disk reads (N/A)

9515:d3b739d9d043
6586537	async zio taskqs can block out userland commands (142901-09)

9554:787363635b6a
6836768	zfs_userspace() callback has no way to indicate failure (N/A)

9574:1eb6a6ab2c57
6838062	zfs panics when an error is encountered in space_map_load() (141909-02)

9583:b0696cd037cc
6794136	Panic BAD TRAP: type=e when importing degraded zraid pool. (141909-03)

9630:e25a03f552e0
6776104	"zfs import" deadlock between spa_unload() and spa_async_thread() (141445-06)

9653:a70048a304d1
6664765	Unable to remove files when using fat-zap and quota exceeded on ZFS filesystem (141445-06)

9688:127be1845343
6841321	zfs userspace / zfs get userused@ doesn't work on mounted snapshot (N/A)
6843069	zfs get userused@S-1-... doesn't work (N/A)

9873:8ddc892eca6e
6847229	assertion failed: refcount_count(&tx->tx_space_written) + delta <= tx->tx_space_towrite in dmu_tx.c (141445-06)

9904:d260bd3fd47c
6838344	kernel heap corruption detected on zil while stress testing (141445-06)

9951:a4895b3dd543
6844900	zfs_ioc_userspace_upgrade leaks (N/A)

10040:38b25aeeaf7a
6857012	zfs panics on zpool import (141445-06)

10000:241a51d8720c
6848242	zdb -e no longer works as expected (N/A)

10100:4a6965f6bef8
6856634	snv_117 not booting: zfs_parse_bootfs: error2 (141445-07)

10160:a45b03783d44
6861983	zfs should use new name <-> SID interfaces (N/A)
6862984	userquota commands can hang (141445-06)

10299:80845694147f
6696858	zfs receive of incremental replication stream can dereference NULL pointer and crash (N/A)

10302:a9e3d1987706
6696858	zfs receive of incremental replication stream can dereference NULL pointer and crash (fix lint) (N/A)

10575:2a8816c5173b (partial merge)
6882227 spa_async_remove() shouldn't do a full clear (142901-14)

10800:469478b180d9
6880764	fsync on zfs is broken if writes are greater than 32kb on a hard crash and no log attached (142901-09)
6793430 zdb -ivvvv assertion failure: bp->blk_cksum.zc_word[2] == dmu_objset_id(zilog->zl_os) (N/A)

10801:e0bf032e8673 (partial merge)
6822816 assertion failed: zap_remove_int(ds_next_clones_obj) returns ENOENT (142901-09)

10810:b6b161a6ae4a
6892298 buf->b_hdr->b_state != arc_anon, file: ../../common/fs/zfs/arc.c, line: 2849 (142901-09)

10890:499786962772
6807339	spurious checksum errors when replacing a vdev (142901-13)

11249:6c30f7dfc97b
6906110 bad trap panic in zil_replay_log_record (142901-13)
6906946 zfs replay isn't handling uid/gid correctly (142901-13)

11454:6e69bacc1a5a
6898245 suspended zpool should not cause rest of the zfs/zpool commands to hang (142901-10)

11546:42ea6be8961b (partial merge)
6833999 3-way deadlock in dsl_dataset_hold_ref() and dsl_sync_task_group_sync() (142901-09)

Discussed with:	pjd
Approved by:	delphij (mentor)
Obtained from:	OpenSolaris (multiple Bug IDs)
MFC after:	2 months
2010-07-12 23:49:04 +00:00
rpaulo
05f71bf7df Merge from vendor-sys/opensolaris:
* add fasttrap files
2010-07-06 10:28:19 +00:00
mm
09f088a457 Import latest ARC change from OpenSolaris:
- large ghost eviction causes high write latency
- arc_adjust might adjust MRU unnecessarily
- arc_adapt can lead to wild arc_p adjustment

OpenSolaris onnv-revision:	12636:13b5d698941e

Submitted by:	avg
Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6950219, 6953403, 6951024)
MFC after:	1 month
2010-06-17 22:47:44 +00:00
pjd
0e09f70bb6 Turn off UMA allocations on all archs by default. It isn't stable even on
amd64.

Reported by:	many
MFC after:	3 days
2010-06-17 17:41:42 +00:00
pjd
9a26791248 Remove redundant assignment.
MFC after:	3 days
2010-06-16 12:42:20 +00:00
mm
ba82206079 Fix arc_read_done may try to byteswap undefined data (sparc related)
OpenSolaris onnv-revision:	10839:cf83b553a2ab

Obtained from:	OpenSolaris (Bug ID 6836714)
Approved by:	pjd, delphij (mentor)
MFC after:	3 days
2010-06-12 11:28:46 +00:00
mm
ddf29dd30c Fix panic in zfs_getsecattr
OpenSolaris onnv-revision:	10295:f7a18a1e9610

Obtained from:	OpenSolaris (Bug ID 6870564)
Approved by:	pjd, delphij (mentor)
MFC after:	3 days
2010-06-12 11:27:10 +00:00
mm
9a68899bb4 Fix possible zfs panic on zpool import
OpenSolaris onnv-revision:	10040:38b25aeeaf7a

Obtained from:	OpenSolaris (Bug ID 6857012)
Approved by:	pjd, delphij (mentor)
MFC after:	3 days
2010-06-12 11:25:57 +00:00
mm
b9b5f9e6aa Fix zpool resilver stalls with spa_scrub_thread in a 3 way deadlock
OpenSolaris onnv-revision:	9997:174d75a29a1c

Obtained from:	OpenSolaris (Bug ID 6843235)
Approved by:	pjd, delphij (mentor)
MFC after:	3 days
2010-06-12 11:24:10 +00:00
mm
e8fb0ae71d Fix ZFS panic deadlock: cycle in blocking chain via zfs_zget
OpenSolaris onnv-revision:	9774:0bb234ab2287

Obtained from:	OpenSolaris (Bug ID 6788152)
Approved by:	pjd, delphij (mentor)
MFC after:	3 days
2010-06-12 11:22:45 +00:00
mm
58b59c49b6 Fix vdev_probe() starvation brings txg train to a screeching halt
OpenSolaris onnv-revision:	9722:e3866bad4e96

Obtained from:	OpenSolaris (Bug ID 6844069)
Approved by:	pjd, delphij (mentor)
MFC after:	3 days
2010-06-12 11:21:37 +00:00
mm
53bcf2affc Fix incomplete resilvering after disk replacement (raidz)
OpenSolaris onnv-revision:	9434:3bebded7c76a

Obtained from:	OpenSolaris (Bug ID 6794570)
Approved by:	pjd, delphij (mentor)
MFC after:	3 days
2010-06-12 11:20:50 +00:00
mm
d97ee7139a Fix zfs destroy fails to free object in open context, stops up txg train
OpenSolaris onnv-revision:	9409:9dc3f17354ed

Obtained from:	OpenSolaris (Bug ID 6809683)
Approved by:	pjd, delphij (mentor)
MFC after:	3 days
2010-06-12 11:19:51 +00:00
mm
7a92822e34 Fix unable to remove a file over NFS after hitting refquota limit
OpenSolaris onnv-revision:	8890:8c2bd5f17bf2

Obtained from:	OpenSolaris (Bug ID 6798878)
Approved by:	pjd, delphij (mentor)
MFC after:	3 days
2010-06-12 11:18:29 +00:00
jhb
9b74a62d73 Update several places that iterate over CPUs to use CPU_FOREACH(). 2010-06-11 18:46:34 +00:00
mm
df50b677c0 Fix freeing space after deleting large files with holes.
OpenSolaris onnv revision:	9950:78fc41aa9bc5

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6792701)
MFC after:	3 days
2010-06-03 11:08:46 +00:00
mm
07c9330a40 Fix ZIL close when doing zfs rollback or zfs receive on a mounted dataset.
The fix is a partial import and merge of OpenSolaris onnv revisions
8227:f7d7be9b1f56. and 9292:e112194b5b73

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6798298)
MFC after:	3 days
2010-06-01 08:43:46 +00:00
pjd
fdfbbe5975 Fix a bug where resilver is not started automatically on pool import or load.
If disk was missing on pool load or import and on next pool load or import
it was present, resilver wasn't started automatically and ZFS reported all disks
as ONLINE and healthy. Then, when another disk died, pool became unaccessible,
because if it was 2-way mirror or RAIDZ1 two vdevs were out of sync.

To fix the problem, start resilver automatically on pool load or import.

Obtained from:	OpenSolaris
MFC after:	3 days
2010-05-31 23:17:45 +00:00
pjd
3cd955d57e Fix panic when reading label from provider with non power of 2 sector size.
Reported by:	James R. Van Artsdalen <james-freebsd-fs2@jrv.org>
MFC after:	3 days
2010-05-31 23:11:43 +00:00
mm
5e6f9e5ca8 Remove kstat.zfs.arcstats.l2_write_bytes_written
The arcstats.l2_write_bytes_written kstat counter introduced
in r205231 was duplicite with vendor's arcstats.l2_write_bytes counter
imported in r208373 (OpenSolaris revision 8582:df9361868dbe)

Approved by:	pjd, delphij (mentor)
MFC after:	3 days
2010-05-23 21:16:34 +00:00
mm
9b8a28a2fd Fix zfs receive temporarily changing unchanged stream properties.
Fix possible panic with zfs_enable_datasets.

OpenSolaris onnv revision:	8536:33bd5de3260e

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6748561, 6757075)
MFC after:	3 days
2010-05-23 21:02:43 +00:00
pjd
0ec82d1d34 Create UMA zones unconditionally.
MFC after:	3 days
2010-05-23 19:10:06 +00:00
pjd
4a2fdc2125 Remove ZIO_USE_UMA from arc.c as well.
MFC after:	3 days
2010-05-23 18:42:33 +00:00
kib
4208ccbe79 Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
  usermode into struct syscall_args. The structure is machine-depended
  (this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
  from the syscall. It is a generalization of
  cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
  return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively.  The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by:	jhb, marcel, marius, nwhitehorn, stas
Tested by:	marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
	stas (mips)
MFC after:	1 month
2010-05-23 18:32:02 +00:00
mm
5fc2b82b2f Fix kernel panic when calling spa_tryimport() on a corrupted pool.
OpenSolaris onnv revision:	8680:005fe27123ba

Approved by:	delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6786321)
MFC after:	1 day
2010-05-23 10:13:11 +00:00
mm
d4c8f6c26c Fix mutex_exit misorder that can cause a kernel panic.
OpenSolaris onnv revision:	8667:5c308a17eb7c

Approved by:	delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6795440)
MFC after:	1 day
2010-05-23 10:08:05 +00:00
mm
013e2a2184 Update L2ARC code and fix several bugs.
- improve ARC memory consumption (Bug ID 6488341)
- ARC/L2ARC metadata accounting (Bug ID 6748019)
- L2ARC turbo warmup (Bud ID 6748023)
- kstats for ARC content (Bug ID 6748023)
- kstats for evicted bytes from ARC by L2ARC state (Bud ID 6871680)
- fix panic on i386 systems (Bug ID 6821260)

OpenSolaris onnv revisions:
8582:df9361868dbe, 8628:97dcded6e556, 9215:7c4584f76b47,
9274:a10f8bd993c1, 10357:29060492b29d

OpenSolaris Bug IDs:
6748019, 6748023, 6748030, 6488341, 6798268, 6821260, 6790261, 6871680

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSlaris (multiple bug IDs)
MFC after:	3 days
2010-05-21 09:52:49 +00:00
mm
71313e529c Reorder some already introduced locking variables.
OpenSolaris onnv revision:	8214:d7abf7c1f1c1

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6747934)
MFC after:	3 days
2010-05-21 09:35:28 +00:00
mm
c5161d1ad5 Fix stack overflow in zfs send.
OpenSolaris onnv-revision: 8012:8ea30813950f

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6765626)
MFC after:	3 days
2010-05-21 08:55:18 +00:00
mm
c47ee5a2ca Fix: vdev_reopen() can lead to failed allocations
OpenSolaris onnv-revision: 7980:589f37f25048

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6764914)
MFC after:	3 days
2010-05-21 08:50:34 +00:00
pjd
15e5afeb5b Fix userland build by making io_task available only for the kernel and by
providing taskq_dispatch_safe() macro.

MFC after:	1 week
2010-05-16 19:44:08 +00:00
pjd
0e169b2cc7 Allow to configure UMA usage for ZIO data via loader and turn it on by
default for amd64. On i386 I saw performance degradation when UMA was used,
but for amd64 it should help.

MFC after:	3 days
2010-05-16 15:14:59 +00:00
pjd
eb30e42c89 Add task structure to zio and use it instead of allocating one.
This eliminates the only place where we can sleep when calling zio_interrupt().
As a side-effect this can actually improve performance a little as we
allocate one less thing for every I/O.

Prodded by:	kib
MFC after:	1 week
2010-05-16 15:12:34 +00:00
pjd
4f54652124 The whole point of having dedicated worker thread for each leaf VDEV was to
avoid calling zio_interrupt() from geom_up thread context. It turns out that
when provider is forcibly removed from the system and we kill worker thread
there can still be some ZIOs pending. To complete pending ZIOs when there is
no worker thread anymore we still have to call zio_interrupt() from geom_up
context. To avoid this race just remove use of worker threads altogether.
This should be more or less fine, because I also thought that zio_interrupt()
does more work, but it only makes small UMA allocation with M_WAITOK.
It also saves one context switch per I/O request.

PR:		kern/145339
Reported by:	Alex Bakhtin <Alex.Bakhtin@gmail.com>
MFC after:	1 week
2010-05-16 11:56:42 +00:00
mm
8de3c295cb Fix deadlock between zfs_dirent_lock and zfs_rmdir
OpenSolaris onnv revision:	11321:506b7043a14c

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6847615)
MFC after:	3 days
2010-05-16 07:46:03 +00:00
mm
f9955a0614 Fix perfomance problem with ZFS prefetch caching [1]
Add statistics for ZFS prefetch (sysctl kstat.zfs.misc.zfetchstats)

Partial import of OpenSolaris onnv revision 10474:0e96dd3b905a

Reported by:	jhell@dataix.net (private e-mail) [1]
Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6859997, 6868951)
MFC after:	3 days
2010-05-16 07:16:28 +00:00
mm
cc11636cc4 Fix ZIL-related panic on zfs rollback.
OpenSolaris onnv-revision: 8746:e1d96ca6808c

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6796377)
MCF after:	1 week
2010-05-13 20:55:58 +00:00
mm
6f4ba1587b Import OpenSolaris revision 7837:001de5627df3
It includes the following changes:
- parallel reads in traversal code (Bug ID 6333409)
- faster traversal for zfs send (Bug ID 6418042)
- traversal code cleanup (Bug ID 6725675)
- fix for two scrub related bugs (Bug ID 6729696, 6730101)
- fix assertion in dbuf_verify (Bug ID 6752226)
- fix panic during zfs send with i/o errors (Bug ID 6577985)
- replace P2CROSS with P2BOUNDARY (Bug ID 6725680)

List of OpenSolaris Bug IDs:
6333409, 6418042, 6757112, 6725668, 6725675, 6725680,
6725698, 6729696, 6730101, 6752226, 6577985, 6755042

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (multiple Bug IDs)
MFC after:	1 week
2010-05-13 20:32:56 +00:00
trasz
b0594437d3 Add missing check to prevent local users from panicing the kernel by trying
to set malformed ACL.

MFC after:	3 days
2010-05-13 15:31:00 +00:00
mm
ee02e5dd0e Fix possible hang when replaying large truncations.
OpenSolaris onnv revision:	7904:6a124a4ca9c5

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6761624)
MFC after:	3 days
2010-05-12 09:51:57 +00:00
pjd
f1b200bbcc I added vfs_lowvnodes event, but it was only used for a short while and now
it is totally unused. Remove it.

MFC after:	3 days
2010-05-11 22:46:36 +00:00
pjd
5b223eea5e Eventhough r203504 eliminates taste traffic provoked by vdev_geom.c,
ZFS still like to open all vdevs, close them and open them again,
which in turn provokes taste traffic anyway.

I don't know of any clean way to fix it, so do it the hard way - if we can't
open provider for writing just retry 5 times with 0.5 pauses. This should
elimitate accidental races caused by other classes tasting providers created on
top of our vdevs.

MFC after:	3 days
Reported by:	James R. Van Artsdalen <james-freebsd-fs2@jrv.org>
Reported by:	Yuri Pankov <yuri.pankov@gmail.com>
2010-05-11 22:29:00 +00:00
pjd
b2459231ed Add missing new line characters to the warnings.
MFC after:	3 days
2010-05-11 22:23:35 +00:00
mm
1514727a55 Fix failed assertion on destroying datasets from an older pool version.
OpenSolaris onnv revision:	9390:887948510f80

PR:		kern/146471
Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6826861)
MFC after:	3 days
2010-05-11 09:26:46 +00:00
mm
b04969b505 Fix possible panic with zfs destroy.
OpenSolaris onnv revision:	8779:f164e0e90508

PR:		kern/146471
Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6784924)
MFC after:	3 days
2010-05-11 09:23:46 +00:00
mm
e1cc9b41b9 Fix zfs rename (may occasionally fail with dataset busy).
OpenSolaris onnv revision:	8517:41a0783dde17

PR:		kern/146471
Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6784757)
MFC after:	3 days
2010-05-11 09:19:41 +00:00
mm
29ba4cd6b8 Fix endianess bug in ZFS intent log (ZIL).
OpenSolaris onnv revision:	8109:6147a1bdd359

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6760048)
MFC after:	3 days
2010-05-11 07:25:13 +00:00
trasz
c449683180 Enforce RLIMIT_FSIZE in ZFS.
Reviewed by:	pjd@
2010-05-07 14:30:21 +00:00
marius
c06ba98ac4 - Fix broken symlinks on cross platform zfs send/recv. [1]
- Enable zfs_ace_byteswap() on FreeBSD as it works just fine (tested between
  amd64 and sparc64 in both directions by Michael Moll).

PR:		146272
Approved by:	mm, pjd
Obtained from:	OpenSolaris (onnv rev. 8283:1ca59f393041; Bug ID 6764193) [1]
MFC after:	3 days
2010-05-05 22:15:20 +00:00
mm
a0a9776a5c Introduce hardforce export option (-F) for "zpool export".
When exporting with this flag, zpool.cache remains untouched.

OpenSolaris onnv revision: 8211:32722be6ad3b

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID: 6775357)
2010-05-05 18:22:29 +00:00
mm
6dc3ed99a0 Speed up ZFS list operation with objset prefetching.
Partial import of OpenSolaris onnv revisions:
8415:8809e849f63e, 10474:0e96dd3b905a

PR:		kern/146297
Submitted by:	myself
Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6386929, 6755389, 6847118)
MFC after:	2 weeks
2010-05-04 17:40:24 +00:00
mm
a0d55d935c Fix deadlock during zfs receive.
OpenSolaris onnv revision:	9299:8809e849f63e

PR:		kern/146296
Submitted by:	myself
Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6783818, 6826836)
MFC after:	1 week
2010-05-04 17:30:07 +00:00
mm
65cf7aac0b Add sysctl and loader tunable vfs.zfs.txg.write_limit_override.
This tunable improves fine-tuning of ZFS write throttling.

PR:		kern/146108
Suggested by:	Nikolay Denev <ndenev at gmail.com>
Approved by:	pjd, delphij (mentor)
MFC after:	2 weeks
2010-05-01 20:44:37 +00:00
mm
54fb849ffc Change description of tunable group vfs.zfs.txg to be more
understandable.

Approved by:	pjd, delphij (mentor)
MFC after:	3 days
2010-05-01 19:53:15 +00:00
mm
d524410a5a Fix improper pool write throughput calculation.
OpenSolaris onnv revision:	9366:17553395a745

PR:		kern/146108
Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris, Bug ID 6817339
MFC after:	2 weeks
2010-04-30 07:48:29 +00:00
pjd
c08f915983 Backport fix for 'zfs_znode_dmu_init: existing znode for dbuf' panic from OpenSolaris.
PR:		kern/144402
Reported by:	Alex Bakhtin <alex.bakhtin@gmail.com>
Tested by:	Alex Bakhtin <alex.bakhtin@gmail.com>
Obtained from:	OpenSolaris, Bug ID 6895088
MFC after:	3 days
2010-04-28 18:29:48 +00:00
pjd
811be862d0 Allow to modify directory's content even if the ZFS_NOUNLINK (SF_NOUNLINK,
sunlnk) flag is set. We only deny dirctory's removal or rename.

PR:		kern/143343
Reported by:	marck
MFC after:	3 days
2010-04-22 18:47:23 +00:00
rpaulo
7a84f3701d Rename the cyclic global variable lapic_cyclic_clock_func to just
cyclic_clock_func. This will make more sense when we start developing non
x86 cyclic version.
2010-04-20 17:03:30 +00:00
rpaulo
4eb7b21df0 The amd64 version of the cyclic dtrace module is a verbatim copy of the
i386 version, so instead having a copy of the same file, use Makefile
foo to include the i386 version on amd64.
2010-04-20 16:30:17 +00:00
delphij
f7bf5e7363 Partially MFp4 #176265 by pjd@:
- Properly initialize and destroy system_taskq.
 - Add a dummy implementation of taskq_create_proc().

Note: We do not currently use system_taskq in ZFS so this is mostly a
no-op at this time.  Proper system_taskq initialization is required
by newer ZFS code.

Ok'ed by:	pjd
MFC after:	2 weeks
2010-04-19 09:03:36 +00:00
pjd
3ad95a23cd Restore previous order. 2010-04-18 12:43:33 +00:00
pjd
1beda198ca Style fixes. 2010-04-18 12:36:53 +00:00
pjd
0ef453c1ae Add missing list and lock destruction. 2010-04-18 12:27:07 +00:00
pjd
0c230ccd55 Extend locks scope to match OpenSolaris. 2010-04-18 12:25:40 +00:00
pjd
993c35f241 Remove racy assertion.
Obtained from:	OpenSolaris
2010-04-18 12:21:52 +00:00
pjd
5d9f61b0a9 Set ARC_L2_WRITING on L2ARC header creation.
Obtained from:	OpenSolaris
2010-04-18 12:20:33 +00:00
pjd
7b9dcfcaf4 Fix 3-way deadlock that can happen because of ZFS and vnode lock
order reversal.

thread0 (vfs_fhtovp)	thread1 (vop_getattr)	thread2 (zfs_recv)
--------------------	---------------------	------------------
			vn_lock
rrw_enter_read
						rrw_enter_write (hangs)
			rrw_enter_read (hangs)
vn_lock (hangs)

Submitted by:	Attila Nagy <bra@fsn.hu>
MFC after:	3 days
2010-04-15 16:40:54 +00:00
pjd
96459d48e9 The same code is used to import and to create pool.
The order of operations is the following:
1. Try to open vdev by remembered path and guid.
2. If 1 failed, try to find vdev which guid matches and ignore the path.
3. If 2 failed this means either that the vdev we're looking for is gone
   or that pool is being created and vdev doesn't contain proper guid yet.
   To be able to handle pool creation we open vdev by path anyway.

Because of 3 it is possible that we open wrong vdev on import which can lead to
confusions.

The solution for this is to check spa_load_state. On pool creation it will be
equal to SPA_LOAD_NONE and we can open vdev only by path immediately and if it
is not equal to SPA_LOAD_NONE we first open by path+guid and when that fails,
we open by guid. We no longer open wrong vdev on import.

MFC after:	2 weeks
2010-03-19 20:14:27 +00:00
kmacy
e10aea9e47 - cache line align arcs_lock array (h/t Marius Nuennerich)
- fix ARCS_LOCK_PAD to use architecture defined CACHE_LINE_SIZE
- cache line align buf_hash_table ht_locks array

MFC after:	7 days
2010-03-17 21:10:09 +00:00
kmacy
bf3a24dd0d use CACHE_LINE_SIZE instead of hardcoding 128 for lock pad
pointed out by Marius Nuennerich and jhb@
2010-03-17 20:00:22 +00:00
kmacy
bd31c2114d - reduce contention by breaking up ARC state locks in to 16 for data
and 16 for metadata
- export L2ARC tunables as sysctls
- add several kstats to track L2ARC state more precisely
- avoid holding a contended lock when atomically incrementing a
  contended counter (no lock protection needed for atomics)
2010-03-16 22:17:21 +00:00
kmacy
91efcb8e77 fix compilation under ZIO_USE_UMA 2010-03-13 21:52:21 +00:00
kmacy
cd0c2afd36 Don't bottleneck on acquiring the stream locks - this avoids a massive
drop off in throughput with large numbers of simultaneous reads

MFC after:	7 days
2010-03-13 21:41:52 +00:00