Commit Graph

580 Commits

Author SHA1 Message Date
Steven Hartland
e44e975c1b zfs_ioc_rename should not leave the value of zc_name passed in via zc altered
on return.

MFC after:	1 week
2013-08-04 11:38:08 +00:00
Xin LI
bd3d1456a5 MFV r253783:
Skip eviction step of processing free records when doing ZFS
receive to avoid the expensive search operation of non-existent
dbufs in dn_dbufs.

Illumos ZFS issues:
  3834 incremental replication of 'holey' file systems is slow

MFC after:      2 weeks
2013-07-30 21:35:02 +00:00
Xin LI
1c4ead73c6 MFV r253782:
To quote Illumos issue #3888:

When 'zfs recv -F' is used with an incremental recv it rolls
back any changes made since the last snapshot in case new
changes were made to the file system while the recv is in
progress (without -F the recv would fail when it does it's
final check to commit the recv-ed data as the recv-ed data
conflicts with the newly written data).

However, if there is a snapshot taken after the recv began
rolling back to the 'latest' snapshot will not help and the
recv will still fail. 'zfs recv -F' should be extended to
destroy any snapshots created since the source snapshot when
finishing the recv (effectively rolling back through all
snapshots, instead of just to the latest snapshot).

Illumos ZFS issues:
  3888 zfs recv -F should destroy any snapshots created since the
       incremental source

MFC after:	2 weeks
2013-07-30 21:20:12 +00:00
Xin LI
d637247e1f MFV r253781 + r253871:
Illumos ZFS issues:
  3894 zfs should not allow snapshot of inconsistent dataset

MFC after:	2 weeks
2013-07-30 21:02:09 +00:00
Xin LI
44e362e207 MFV r253780:
To quote Illumos #3875:

The problem here is that if we ever end up in the error
path, we drop the locks protecting access to the zfsvfs_t
prior to forcibly unmounting the filesystem. Because z_os
is NULL, any thread that had already picked up the zfsvfs_t
and was sitting in ZFS_ENTER() when we dropped our locks
in zfs_resume_fs() will now acquire the lock, attempt to
use z_os, and panic.

Illumos ZFS issues:
  3875 panic in zfs_root() after failed rollback

MFC after:	2 weeks
2013-07-30 20:37:32 +00:00
Alexander Motin
ec4d2e0d96 Allow three IOCTLs to be used on suspended pool, restoring state that
existed before IOCTL code refactoring merged change 4445fffb from illumos
at r248571.

This change allows `zpool clear` to be used again to recover suspended pool.
It seems the only was supposed by the code to restore pool operation after
reconnecting lost disks that were required for data completeness.  There
are still cases where `zpool clear` command can just safely stuck due to
deadlocks inside ZFS kernel part, but probably that is better then having
no chances to recover at all.
2013-07-30 14:50:44 +00:00
Alexander Motin
698cd997d6 Partially close race between calls of orphan() method from GEOM and close()
method from ZFS core, that reliably causes use-after-free panic if SSD vdev
detached during inititial erase.
2013-07-28 20:07:34 +00:00
Alexander Motin
ffacde9be5 Following r222950, revert unintentional change cls -> class in argument name
in r245264.  Aside from non-uniformity, that again confused C++ compilers.
2013-07-25 08:41:22 +00:00
Andriy Gapon
f66c1f6482 zfs module: perform cleanup during shutdown in addition to module unload
- move init and fini code into separate functions (like it is done upstream)
- invoke fini code via shutdown_post_sync event hook

This should make zfs close its underlying devices during shutdown,
which may be important for their drivers.

MFC after:	20 days
2013-07-24 09:59:16 +00:00
Andriy Gapon
886dbd270f zfs: move vnode creation from zfs_znode_cache_constructor to zfs_znode_alloc
All other places where a znode is allocated do not need z_vnode at all.
These are:
- zfs_create_share_dir
- zfs_create_fs

This chnage ensures two things:
- VN_LOCK_ASHARE is not erroneously called for VFIFO vnodes
- vn_lock is called on a fully constructed vnode with correct v_ops

The change also allows to make zfs_znode_cache_constructor a normal
kmem_cache constructor again (as it is in upstream).
This allows to avoid a problem where zfs_znode_cache_destructor
may be called on un-constructed znodes.

MFC after:	17 days
2013-07-24 09:15:59 +00:00
Xin LI
c92bc5e996 Manually merge part of vendor import r238583 from Illumos.
Illumos changeset: 13680:2bd022a765e2
Illumos ZFS issue:

    2671 zpool import should not fail if vdev ashift has increased

MFC after:	3 days
2013-07-18 00:22:42 +00:00
Andriy Gapon
9c1f50af0a zfs: try to properly handle i/o errors in mappedread_sf
Unconditionally freeing a page is not good, especially if it is the page
that was wired by the caller.  The checks are picked up from
kern_sendfile.

MFC after:	3 weeks
2013-07-09 08:47:11 +00:00
Andriy Gapon
78ed7a7855 zfs: load zpool.cache after a root fs is mounted
MFC after:	3 weeks
2013-07-09 08:37:42 +00:00
Martin Matuska
12df7d65b0 MFV r252839:
Quoting illumos issue #3836:
  Currently zio_free() always puts the zio on a list for subsequent
  processing by zio_free_sync().  This is only necessary for frees that
  might need to issue reads (gang and dedup blocks).

  By processing the majority of the frees as we encounter them, we reduce
  the amount of time that the spa_sync() thread spends burning CPU and
  not doing any i/o, thus increasing the overall write throughput of the
  system.

Illumos ZFS issues:
  3836 zio_free() can be processed immediately in the common case

MFC after:	1 week
2013-07-05 21:29:59 +00:00
Robert Millan
2592710c47 Enable kernel-specific code for FreeBSD also on other systems that use
the kernel of FreeBSD.

Reviewed by:	pjd
2013-06-30 23:14:55 +00:00
Steven Hartland
baa0b41221 Remove invalid ASSERT which causes a panic on zfs renames when run with ASSERTS.
Removal was missed in merge of illumos 3464 (r248571)

MFC after:	2 days
2013-06-29 23:15:45 +00:00
Martin Matuska
f82ca5238a Unbreak "zfs jail" and "zfs unjail" (broken since r248571)
I missed to register zfs_ioc_jail and zfs_ioc_unjail as legacy ioctl's
with the new zfs_ioctl_register_legacy() function.

These operations do not modify pools or datasets so there is no need to
log them to pool history.

Reported by:	Alexander Leidinger <ale@FreeBSD.org> and others on current@
MFC after:	3 days
2013-06-29 16:45:37 +00:00
Gavin Atkinson
af582854d8 Don't try to re-insert an already present but invalid page.
This could happen if a thread doing a page-in loses a ZFS range lock
race to a thread writing to the same range

This fixes "panic: vm_page_alloc: pindex already allocated" in
http://docs.FreeBSD.org/cgi/mid.cgi?1372165971.96049.42.camel

Submitted by:	avg
MFC after:	1 week
2013-06-28 07:51:12 +00:00
Xin LI
e33806a54a MFV r252215:
Restore a previous behavior before r251646, where when destructing
ZFS snapshot, the ioctl would return ENOENT when it hit any of
them in the errlist (the new behavior was only return ENOENT when
all returns error).

Illumos ZFS issues:
  3829 fix for 3740 changed behavior of zfs destroy/hold/release ioctl

MFC after:	1 week
2013-06-25 22:14:32 +00:00
Steven Hartland
9446debe6b Fix intermittent ZFS lock panic when kernel is compiled with debugging caused
by access of uninitialized smlock in mmutex_init.

MFC after:	1 week
2013-06-21 15:47:10 +00:00
Steven Hartland
5f921c5911 Fixed import of destroyed ZFS pools failing due to vdev_geom incorrectly
preventing config loads from devices associated with destroyed pools.

Reviewed by:	avg
MFC after:	1 week
2013-06-21 12:02:09 +00:00
Xin LI
9625321547 MFV r251644:
Poor ZFS send / receive performance due to snapshot
hold / release processing (by smh@)

Illumos ZFS issues:
  3740 Poor ZFS send / receive performance due to snapshot
       hold / release processing

MFC after:      2 weeks
2013-06-12 07:07:06 +00:00
Xin LI
ed8fd1989f MFV r251626:
ZFS event processing should work on R/O root filesystems

Illumos ZFS issues:
  3749 zfs event processing should work on R/O root filesystems

MFC after:      2 weeks
2013-06-11 19:35:44 +00:00
Xin LI
3b245f3ee1 MFV r251624:
txg commit callbacks don't work

Illumos ZFS issues:
  3747 txg commit callbacks don't work

MFC after:      2 weeks
2013-06-11 19:29:31 +00:00
Xin LI
3f3a9cac29 MFV r251622:
ZFS shouldn't ignore errors unmounting snapshots

Illumos ZFS issues:
  3744 zfs shouldn't ignore errors unmounting snapshots

MFC after:      2 weeks
2013-06-11 19:22:20 +00:00
Xin LI
57e06a1a63 MFV r251621:
ZFS needs a refcount audit

Illumos ZFS issues:
  3741 zfs needs a refcount audit

MFC after:      2 weeks
2013-06-11 19:16:14 +00:00
Xin LI
a91afe8a8d MFV r251620:
ZFS comments need cleaner, more consistent style

Illumos ZFS issues:
  3741 zfs comments need cleaner, more consistent style

MFC after:      2 weeks
2013-06-11 19:12:06 +00:00
Xin LI
4acaabea05 MFV r251619:
ZFS needs better comments.

Illumos ZFS issues:
  3741 zfs needs better comments

MFC after:      2 weeks
2013-06-11 19:02:36 +00:00
Xin LI
9e43a32a5c MFV r251519:
* Illumos ZFS issue #3805 arc shouldn't cache freed blocks

Quote from the Illumos issue:

    ZFS should proactively evict freed blocks from the cache.

    Even though these freed blocks will never be used again, and thus
    will eventually be evicted, this causes us to use memory
    inefficiently for 2 reasons:

    1. A block that is freed has no chance of being accessed again, but
       will be kept in memory preferentially to a block that was accessed
       before it (and is thus older) but has not been freed and thus has
       at least some chance of being accessed again.

    2. We partition the ARC into several buckets:
       user data that has been accessed only once (MRU)
       metadata that has been accessed only once (MRU)
       user data that has been accessed more than once (MFU)
       metadata that has been accessed more than once (MFU)

    The user data vs metadata split is somewhat arbitrary, and the
    primary control on how much memory is used to cache data vs metadata
    is to simply try to keep the proportion the same as it has been in the
    past (each bucket "evicts against" itself).  The secondary control is
    to evict data before evicting metadata.

    Because of this bucketing, we may end up with one bucket mostly
    containing freed blocks that are very old, while another bucket has
    more recently accessed, still-allocated blocks.  Data in the useful
    bucket (with still-allocated blocks) may be evicted in preference to
    data in the useless bucket (with old, freed blocks).

    On dcenter, we saw that the MFU metadata bucket was 230MB, while the
    MFU data bucket was 27GB and the MRU metadata bucket was 256GB.
    However, the vast majority of data in the MRU metadata bucket (256GB)
    was freed blocks, and thus useless.  Meanwhile, the MFU metadata bucket
    (230MB) was constantly evicting useful blocks that will be soon needed.

    The problem of cache segmentation is a larger problem that needs more
    investigation.  However, if we stop caching freed blocks, it should
    reduce the impact of this more fundamental issue.

MFC after:	2 weeks
2013-06-08 09:11:20 +00:00
Xin LI
ca8a27d4b1 MFV r251474:
* Illumos zfs issue #3137 L2ARC compression

Whether or not to compress buffers entering the L2ARC is
controlled by "compression" setting on the dataset, when
compression is not "off", L2ARC compression is enabled.

The compress method is always LZ4 for L2ARC when enabled
because it works best for the scenario.

MFC after:	2 weeks
2013-06-06 23:21:41 +00:00
Davide Italiano
a28d25df31 In case ZFS doesn't use UMA for buffers there's no need to waste memory
creating zones that will remain empty.

Reviewed by:	pjd
2013-05-01 17:34:44 +00:00
Steven Hartland
562a9d583b Changed ZFS TRIM sysctl from vfs.zfs.trim_disable -> vfs.zfs.trim.enabled
Enabled ZFS TRIM by default

Reviewed by:	pjd (mentor)
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-04-26 11:24:20 +00:00
Martin Matuska
95b6497f5e MFV r249857:
Merge vendor bugfix for a possible deadlock related to async destroy
and improve write performance by introducing a new lock protecting
tx_open_txg.

Illumos ZFS issues:
  3642 dsl_scan_active() should not issue I/O to determine if async
       destroying is active
  3643 txg_delay should not hold the tc_lock

MFC after:	1 week
2013-04-24 21:21:03 +00:00
Martin Matuska
90eafc0bb8 The zfs synctask code restructuring introduced a new bug that makes it
impossible to set quota and reservation on pools lower than version 22.
Problem has been reported and a solution discussed with vendor.

Illumos ZFS issues:
  3739 cannot set zfs quota or reservation on pool version < 22

Reviewed by:	Matthew Ahrens <mahrens@delphix.com>
Reported by:	Steve Wills <swills@FreeBSD.org>
MFC after:	3 days
2013-04-23 06:28:35 +00:00
Martin Matuska
2cb0c5e424 MFV r249354:
Merge bugfixes accepted and integrated by vendor. Underlying problems
have been reported by us and fixed in r240942 and r249196.

Illumos ZFS issues:
  3645 dmu_send_impl: possibilty of pool hold leak
  3692 Panic on zfs receive of a recursive deduplicated stream

MFC after:	8 days
2013-04-11 07:40:30 +00:00
Martin Matuska
86161c3eeb Cast to (void *)(uintptr_t) on copyout and copyin of zfs_iocparm_t.zfs_cmd
MFC after:	9 days
2013-04-10 07:01:17 +00:00
Martin Matuska
83b4af1142 ZFS expects a copyout of zfs_cmd_t on an ioctl error. Our sys_ioctl()
doesn't copyout in this case.

To solve this issue a new struct zfs_iocparm_t is introduced consisting of:
- zfs_ioctl_version (future backwards compatibility purposes)
- user space pointer to zfs_cmd_t (copyin and copyout)
- size of zfs_cmd_t (verification purposes)

The copyin and copyout of zfs_cmd_t is now done the illumos (vendor) way
what makes porting of new changes easier and ensures correct behavior if
returning an error.

MFC after:	10 days
2013-04-09 22:27:44 +00:00
Martin Matuska
a548fef5dc MFV r249186:
Do not list read-only pools in zpool.cache
Reduce diff against vendor in unused vdev_disk.c

Illumos ZFS issues:
  3639 zpool.cache should skip over readonly pools
  3640 want automatic devid updates

MFC after:	1 week
2013-04-06 17:24:00 +00:00
Martin Matuska
95e7edacfe MFV r248660:
Merge vendor change - modify time processing in deadman thread.

Illumos ZFS issues:
  3618 ::zio dcmd does not show timestamp data

MFC after:	3 weeks
2013-04-06 17:15:47 +00:00
Martin Matuska
367437755d Provide a fix for kernel panic if receiving recursive deduplicated streams.
Problem reported to vendor.

Illumos ZFS issues:
  3692 Panic on zfs receive of a recursive deduplicated stream

MFC after:	2 weeks
2013-04-06 11:54:41 +00:00
Martin Matuska
f1b5c26470 MFV r248217:
Merge change from vendor to reduce diff only.
ZFS dtrace probes are not supported on FreeBSD yet.

Illumos ZFS issues:
  3598 want to dtrace when errors are generated in zfs

MFC after:	3 weeks
2013-04-06 10:39:38 +00:00
Martin Matuska
bae7bccf39 MFV r242816:
Import vendor change to reduce diff, no effect on FreeBSD.

Illumos ZFS issues:
  3517 importing pool with autoreplace=on and "hole" vdevs crashes syseventd
2013-04-06 08:21:37 +00:00
Andriy Gapon
9ff9b984c9 spa_open_common: fix argument to zvol_create_minors
Prior to r248571 spa_open was always called with a bare pool name,
but now it is called with a dataset name instead (spa_lookup handles
that).
So, when a ZFS root is mounted spa_open is called with a name of a root
dataset, which can very well be different from the pool name.
But zvol_create_minors should be called with the pool name, because it
performs a recursive traversal of all datasets under the name to find
all those that are volumes.

MFC after:	7 days
2013-04-03 11:06:26 +00:00
Martin Matuska
03863a70e1 Fix possible pool hold leak in dmu_send_impl()
Problem reported to vendor:
  https://www.illumos.org/issues/3645

Reported by:	Andriy Gapon <avg@FreeBSD.org>
MFC after:	15 days
2013-04-03 09:52:30 +00:00
Martin Matuska
41451f4a0e Do not check against uninitialized rc and comment out vendor code
MFC after:	16 days
2013-04-02 08:15:39 +00:00
Martin Matuska
20547d41f8 Call dmu_snapshot_list_next() in zvol.c with dsl_pool_config lock held
Submitted by:	Andriy Gapon <avg@FreeBSD.org>
MFC after:	17 days
2013-04-01 16:14:57 +00:00
Will Andrews
58567a1b4e ZFS: Fix a panic while unmounting a busy filesystem.
This particular scenario was easily reproduced using a NFS export.  When the
first 'zfs unmount' occurred, it returned EBUSY via this path, while
vflush() had flushed references on the filesystem's root vnode, which in
turn caused its v_interlock to be destroyed.  The next time 'zfs unmount'
was called, vflush() tried to obtain this lock, which caused this panic.

Since vflush() on FreeBSD is a definitive call, there is no need to check
vfsp->vfs_count after it completes.  Simply #ifdef sun this check.

Submitted by:	avg
Reviewed by:	avg
Approved by:	ken (mentor)
MFC after:	1 month
2013-03-23 16:34:56 +00:00
Steven Hartland
def84b9736 Fix for building libzpool under i386.
Reviewed by:	pjd (mentor)
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-03-21 23:06:11 +00:00
Steven Hartland
2b114ad2a4 Add missing descriptions for ZFS sysctls
Reviewed by:	pjd (mentor)
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-03-21 11:25:21 +00:00
Steven Hartland
adea827b21 Optimisation of TRIM processing.
Previously TRIM processing was very bursty. This was made worse by the fact
that TRIM requests on SSD's are typically much slower than reads or writes.
This often resulted in stalls while large numbers of TRIM's where processed.

In addition due to the way the TRIM thread was only woken by writes, deletes
could stall in the queue for extensive periods of time.

This patch adds a number of controls to how often the TRIM thread for each
SPA processes its outstanding delete requests.
vfs.zfs.trim.timeout: Delay TRIMs by up to this many seconds
vfs.zfs.trim.txg_delay: Delay TRIMs by up to this many TXGs (reduced to 32)
vfs.zfs.vdev.trim_max_bytes: Maximum pending TRIM bytes for a vdev
vfs.zfs.vdev.trim_max_pending: Maximum pending TRIM segments for a vdev
vfs.zfs.trim.max_interval: Maximum interval between TRIM queue processing
(seconds)

Given the most common TRIM implementation is ATA TRIM the current defaults
are targeted at that.

Reviewed by:	pjd (mentor)
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-03-21 11:02:08 +00:00