illumos/illumos-gate@4b5c8e93ca4b5c8e93cahttps://www.illumos.org/issues/7104
The current default indirect block size is 16KB. We can improve
performance by increasing it to 128KB. This is especially helpful for
any workload that needs to read most of the metadata, e.g.
scrub/resilver, file deletion, filesystem deletion, and zfs send.
We also need to fix a few space estimation errors to make the tests
pass.
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@25f7d993ad25f7d993adhttps://www.illumos.org/issues/7071
upstream
DLPX-40482 lzc_snapshot does not fill in errlist on ENOENT
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@dcbf3bd6a1dcbf3bd6a1https://www.illumos.org/issues/6950
When reading compressed data from disk, the ARC should keep the compressed
block cached and only decompress it when consumers access the block. The
uncompressed data should be short-lived allowing the ARC to cache a much larger
amount of data. The DMU would also maintain a smaller cache of uncompressed
blocks to minimize the impact of decompressing frequently accessed blocks.
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: George Wilson <george.wilson@delphix.com>
illumos/illumos-gate@759e89be35759e89be35https://www.illumos.org/issues/6447
I got a patch from someone who uses nvpair code outside of illumos. It fixes a
couple of gcc warnings/bugs for him.
1. silence uninitialized use warnings
2. add parentheses around assignment used as truth value
3. fix printf format specifier (ll is for integers only)
4. strstr, strspn, strcspn, and strcmp are declared in string.h, not
strings.h.
5. avoid scanning integer into boolean variable
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Steve Dougherty <sdougherty@barracuda.com>
illumos/illumos-gate@10e67aa0db10e67aa0dbhttps://www.illumos.org/issues/7082
upstream
DLPX-40542 bptree_iterate() passes wrong args to zfs_dbgmsg()
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@9adfa60d489adfa60d48https://www.illumos.org/issues/6314
Callers of dsl_dataset_name pass a buffer of size ZFS_MAXNAMELEN, but
dsl_dataset_name copies the datasets' name PLUS the snapshot name to it,
resulting in a max of 2 * ZFS_MAXNAMELEN + '@'.
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@f83b46baf9f83b46baf9https://www.illumos.org/issues/6872
We compile the zfs libraries with -Wno-uninitialized. We should remove
this. Change makefiles, fix new warnings, fix pbchk errors.
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Paul Dagnelie <pcd@delphix.com>
illumos/illumos-gate@8808ac5dae8808ac5daehttps://www.illumos.org/issues/4521
zfstest is trying to execute evil "zfs unmount -a", which fails (fortunately,
as it would otherwise leave me with my ~ missing):
03:44:11.86 cannot unmount '/export/home/yuri': Device busy cannot unmount '/
export/home': Device busy
03:44:11.86 ERROR: /usr/sbin/zfs unmount -a exited 1
This affects, at least, zfs_mount_009_neg and zfs_mount_all_001_pos, both
failing on that step. The pool containing the /export/home hierarchy is
included in KEEP variable, but it doesn't seem to affect anything here.
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
illumos/illumos-gate@6fdcb3d1c26fdcb3d1c2https://www.illumos.org/issues/5813
Lets pull in this patch from freebsd:
http://svnweb.freebsd.org/base?view=revision&revision=271764
zfs_setprop_error(): Handle errno value E2BIG.
This errno value is emitted by dsl_props_set_check() in
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_prop.c, and
is used to mean that the property value is too long. For the record,
the maximum length is ZAP_MAXVALUELEN, which is 8*1024 bytes.
Instead of claiming an unknown error (and abort()ing), provide
something more specific to the scenario involved. As far as I
can tell, E2BIG is not emitted for any other scenario.
MFC after: 1 week
Sponsored by: Spectra Logic
Affects: All ZFS versions starting 27 Feb 2009 (illumos ccba0801)
This change modified the value returned by
dsl_props_set_check(), so that it can distinguish between
a name that's too long and a value that's too long, but
libzfs was not updated accordingly.
MFSpectraBSD: r1051499 on 2014/03/28 11:07:59
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Will Andrews <will@freebsd.org>
illumos/illumos-gate@4cde22c2994cde22c299https://www.illumos.org/issues/6873
lzc_destroy_snaps() returns an nvlist in errlist.
zfs_destroy_snaps_nvl() should nvlist_free() it before returning.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Chris Williamson <chris.williamson@delphix.com>
illumos/illumos-gate@20fea7a47420fea7a474https://www.illumos.org/issues/6879
In libzfs_sendrecv, there's a typo:
case DRR_SPILL:
if (byteswap) {
drr->drr_u.drr_write.drr_length =
BSWAP_64(drr->drr_u.drr_spill.drr_length);
}
Instead of drr_write.drr_length, we should be assigning the result of the
byteswap to drr_spill.drr_length.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Dan Kimmel <dan.kimmel@delphix.com>
illumos/illumos-gate@4a20c933b14a20c933b1https://www.illumos.org/issues/6111
If you create a zfs child folder, zfs send returns an error when a recursive
incremental send is done between two snapshots made prior to the folder
creation.
The problem can be reproduced with the following steps.
root@zfs:/# zfs create pool/test
root@zfs:/# zfs snapshot pool/test@snap1
root@zfs:/# zfs snapshot pool/test@snap2
root@zfs:/# zfs create pool/test/child
root@zfs:/# zfs send -R -I pool/test@snap1 pool/test@snap2 > /dev/null
WARNING: could not send pool/test/child@snap2: does not exist
WARNING: could not send pool/test/child@snap2: does not exist
root@zfs:/# echo $?
1
root@zfs:/# zfs snapshot -r pool/test@snap3
root@zfs:/# zfs send -R -I pool/test@snap1 pool/test@snap3 > /dev/null
root@zfs:/# echo $?
0
root@zfs:/# zfs send -R -I pool/test@snap2 pool/test@snap3 > /dev/null
root@zfs:/# echo $?
0
Since pool/test/child was created after snap2, zfs send should not expect snap2
to be in pool/test/child when doing a recursive send. It should examine the
compare the creation time of the snapshot and each child folder to decide if
the folder will be sent. The next incremental send between snap2 and snap3
would properly create the child folder and snap3 which first appears in the
child folder.
The problem is identical if '-i' is used instead of '-I'.
Reviewed by: Alex Aizman alex.aizman@nexenta.com
Reviewed by: Alek Pinchuk alek.pinchuk@nexenta.com
Reviewed by: Roman Strashkin roman.strashkin@nexenta.com
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Alex Deiter <alex.deiter@nexenta.com>
illumos/illumos-gate@20a95fb2c420a95fb2c4https://www.illumos.org/issues/5768
zfsctl_snapshot_inactive() leaks a hold on the dvp (directory vnode) if v_count > 1.
reproduce by:
create a fs with 100 snapshots.
have a thread do:
while true; do ls -l /test/snaps/.zfs/snapshot >/dev/null; done
have another thread do:
while true; do zfs promote test/clone; zfs promote test/snaps; done
use dtrace to delay & observe:
dtrace -w -xd \\
-n 'vn_rele:entry/args0 == (void*)0xffffff01dd42ce80ULL/{[stack()]=count();
chill(100000);}' \\
-n 'zfsctl_snapshot_inactive:entry{ if (args[0]->v_count > 1) trace(args[0]-
>v_count); self->vp=args[0];}' \\
-n 'gfs_vop_inactive:entry/callers["zfsctl_snapshot_inactive"]/{self->good=1;
[stack()]=count()}' \\
-n 'zfsctl_snapshot_inactive:return{if (self->good) self->good=0; else printf
("bad return");}' \\
-n 'gfs_dir_lookup:return/callers["zfsctl_snapshot_inactive"] && self->vp-
>v_count > 1/{trace(self->vp->v_count)}'
the address is found by selecting one of the output of this at random:
dtrace -n 'zfsctl_snapshot_inactive:entry{print(args[0]);'
when you see "bad return", we have hit the bug. Then doing "zfs umount test/
snaps" will fail with EBUSY.
When we hit this case, we also leak the hold on the target vnode (vn). When the
inactive callback is called on a vnode with v_count > 1, it needs to be
decremented.
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Adam Leventhal <adam.leventhal@delphix.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Rich Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@0c779ad4240c779ad424https://www.illumos.org/issues/7054
upstream:
ee0003de7d3e598499be7ac3fe6b61efcc47cb7f
DLPX-40399 dmu_tx_hold_t should use refcount_t to track space
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@99189164df99189164dfhttps://www.illumos.org/issues/6940
Similar to #6334, but this time with empty directories:
$ zfs create tank/quota
$ zfs set quota=10M tank/quota
$ zfs snapshot tank/quota@snap1
$ zfs set mountpoint=/mnt/tank/quota tank/quota
$ mkdir /mnt/tank/quota/dir # create an empty directory
$ mkfile 11M /mnt/tank/quota/11M
/mnt/tank/quota/11M: initialized 9830400 of 11534336 bytes: Disc quota exceeded
$ rmdir /mnt/tank/quota/dir # now unlink the empty directory
rmdir: directory "/mnt/tank/quota/dir": Disc quota exceeded
From user perspective, I would expect that ZFS is always able to remove files
and directories even when the quota is exceeded.
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Simon Klinkert <simon.klinkert@gmail.com>
7020 sdev_cleandir can loop forever
Note that the bulk of the upstream change is not applicable to FreeBSD
and the affected files are not even in the vendor area.
illumos/illumos-gate@45b174751545b1747515https://www.illumos.org/issues/7019
Currently zfsdev_ioctl, when confronted by a request with the FKIOCTL flag set,
skips all processing of secpolicy functions. This means that ZFS is not doing
any kind of verification of the credentials or access rights of the caller and
assuming that (as it is an in-kernel client) all such checks have already been
done.
This turns out to be quite a dangerous assumption, especially with respect to
sdev. In general I don't think it's particularly reasonable to offload this
enforcement of access rights onto other kernel subsystems when ZFS has some
particular local semantics in this area (delegated datasets etc) and does not
provide any kind of API to allow other subsystems to avoid code duplication
when doing it. ZFS should apply its normal access policy to requests from
within the kernel, and callers should take care to give it the correct
credentials and call it from the correct context in order to get the results
they need.
You can observe the currently unfortunate consequences of this bug in any non-
global zone that has access to /dev/zvol or any subset of it via sdev profiles.
In particular, a zone used to contain a KVM or similar which has a single zvol
passed through to it using a <device match= block in its zone XML.
Even though sdev makes something of an attempt to control for whether the
caller should have access to nodes in /dev/zvol, it doesn't do this correctly,
or really at all in the lookup call path. So, if we have a zone that's been
given access to any part of /dev/zvol, it can simply look up the full path to
any other zvol on the entire system, and the node will appear and be able to be
used.
https://www.illumos.org/issues/7020
sdev_cleandir can currently hang forever when it encounters a child node that
is busy, or when it is given a matching expr and the first entry on the list
does not match.
The previous code (circa 2013) iterated over the children of the node using a
for loop with SDEV_NEXT_ENTRY, which was then changed to a while ((dv =
SDEV_FIRST_ENTRY(ddv)) { loop. Unfortunately the continue statements that
previously made it skip over an entry were left as they were, which now result
in an infinite busy-loop in the kernel.
You can trigger this pretty easily by setting up an sdev exclude rule in
zonecfg.
Diagnosis: look for a runaway process consuming 100% CPU in kernel -- they have
a distinctive stack:
# mdb -k
> 0t1234::pid2proc | ::walk thread | ::findstack -v
[ ffffd001efcd3310 _resume_from_idle+0x112() ]
ffffd001efcd3360 apix_hilevel_intr_epilog+0xc1(ffffd001efcd33d0, 0)
ffffd001efcd33c0 apix_do_interrupt+0x34a(ffffd001efcd33d0, 0)
ffffd001efcd33d0 _sys_rtt_ints_disabled+8()
ffffd001efcd3550 rw_enter+0x58()
ffffd001efcd35e0 sdev_cleandir+0x60(ffffd0631b6d75d8, 0, 0)
ffffd001efcd3630 devzvol_prunedir+0xec(ffffd0631b6d76e8)
ffffd001efcd36d0 devzvol_readdir+0x150(ffffd06333250e00, ffffd001efcd3790,
ffffd062dc990e18, ffffd001efcd37dc, 0, 0)
ffffd001efcd3760 fop_readdir+0x6b(ffffd06333250e00, ffffd001efcd3790,
ffffd062dc990e18, ffffd001efcd37dc, 0, 0)
ffffd001efcd3830 walk_dir+0xee(ffffd06333250e00, ffffd0669e4483c8,
fffffffffbbdf410)
ffffd001efcd3850 prof_make_names_walk+0x2e(ffffd0669e4483c8,
fffffffffbbdf410)
ffffd001efcd38b0 prof_make_names+0xfc(ffffd0669e4483c8)
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Alex Wilson <alex.wilson@joyent.com>
illumos/illumos-gate@63364b0ee263364b0ee2https://www.illumos.org/issues/6922
ZFS does not do a config_sync after removing an aux (spare, log, or cache)
device. AFAICT this isn't being done because it is slow and was deemed
unnecessary. However, it should be such a rare operation that speed doesn't
matter, and not doing it results in two problems:
1) It is theoretically possible to remove an aux device from one pool and
attach it to another, then lose power. When power is restored, both pools would
think that they own the aux device.
2) Removal of the aux device doesn't send any useful sysevents to userland.
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Alan Somers <asomers@gmail.com>
illumos/illumos-gate@1825bc56e51825bc56e5https://www.illumos.org/issues/6878
Summary of changes:
* Replace generic "scan done" message with "scan aborted, restarting",
"scan cancelled", or "scan done"
* Log number of errors using spa_get_errlog_size
* Refactor scan restarting check into static function
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Nav Ravindranath <nav@delphix.com>
illumos/illumos-gate@8df0bcf0df8df0bcf0dfhttps://www.illumos.org/issues/6513
If a ZFS object contains a hole at level one, and then a data block is created
at level 0 underneath that l1 block, l0 holes will be created. However, these
l0 holes do not have the birth time property set; as a result, incremental
sends will not send those holes.
Fix is to modify the dbuf_read code to fill in birth time data.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Boris Protopopov <bprotopopov@hotmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Paul Dagnelie <pcd@delphix.com>
illumos/illumos-gate@0d8fa8f8eb0d8fa8f8ebhttps://www.illumos.org/issues/6902
pjd has authored and commited a patch in Jan 21, 2012 that substanially speeds
up zfs snapshot listing if requesting only the name property and sorting by
name.
In this special case, the snapshot properties do not need to be loaded. This
code has been adopted by zfsonlinux on May 29, 2012.
Commit message from pjd:
Dramatically optimize listing snapshots when user requests only
snapshot
names and wants to sort them by name, ie. when executes:
1. zfs list -t snapshot -o name -s name
Because only name is needed we don't have to read all snapshot
properties.
Below you can find how long does it take to list 34509 snapshots from
a single
disk pool before and after this change with cold and warm cache:
before:
1. time zfs list -t snapshot -o name -s name > /dev/null
cold cache: 525s
warm cache: 218s
after:
1. time zfs list -t snapshot -o name -s name > /dev/null
cold cache: 1.7s
warm cache: 1.1s
References:
http://svnweb.freebsd.org/base?view=revision&revision=230438https://github.com/freebsd/freebsd/commit/8e3e9863https://github.com/zfsonlinux/zfs/commit/0cee2406
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pawel Dawidek <pjd@freebsd.org>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Martin Matuska <martin@matuska.org>
illumos/illumos-gate@c971037baac971037baahttps://www.illumos.org/issues/6876
Calling dsl_dataset_name on a dataset with a 256 byte buffer is asking for
trouble. We should check every dataset on import, using a 1024 byte buffer and
checking each time to see if the dataset's new name is longer than 256 bytes.
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Paul Dagnelie <pcd@delphix.com>
illumos/illumos-gate@11ceac77ea11ceac77eahttps://www.illumos.org/issues/6844
dnode_next_offset is used in a variety of places to iterate over the holes or
allocated blocks in a dnode. It operates under the premise that it can iterate
over the blockpointers of a dnode in open context while holding only the
dn_struct_rwlock as reader. Unfortunately, this premise does not hold.
When we create the zio for a dbuf, we pass in the actual block pointer in the
indirect block above that dbuf. When we later zero the bp in
zio_write_compress, we are directly modifying the bp. The state of the bp is
now inconsistent from the perspective of dnode_next_offset: the bp will appear
to be a hole until zio_dva_allocate finally finishes filling it in. In the
meantime, dnode_next_offset can detect a hole in the dnode when none exists.
I was able to experimentally demonstrate this behavior with the following
setup:
1. Create a file with 1 million dbufs.
2. Create a thread that randomly dirties L2 blocks by writing to the first L0
block under them.
3. Observe dnode_next_offset, waiting for it to skip over a hole in the middle
of a file.
4. Do dnode_next_offset in a loop until we skip over such a non-existent hole.
The fix is to ensure that it is valid to iterate over the indirect blocks in a
dnode while holding the dn_struct_rwlock by passing the zio a copy of the BP
and updating the actual BP in dbuf_write_ready while holding the lock.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Boris Protopopov <bprotopopov@hotmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Alex Reece <alex@delphix.com>
illumos/illumos-gate@1fdcbd00c91fdcbd00c9https://www.illumos.org/issues/6874
When we do a clone swap (caused by "zfs rollback" or "zfs receive"), the ZPL
doesn't completely reload the state from the DMU; some values remain cached in
the zfsvfs_t.
steps to reproduce:
```
#!/bin/bash -x
zfs destroy -R test/fs
zfs destroy -R test/recvd
zfs create test/fs
zfs snapshot test/fs@a
zfs set userquota@$USER=1m test/fs
zfs snapshot test/fs@b
zfs send test/fs@a | zfs recv test/recvd
zfs send -i @a test/fs@b | zfs recv test/recvd
zfs userspace test/recvd
1. should show 1m quota
dd if=/dev/urandom of=/test/recvd/file bs=1k count=1024
sync
dd if=/dev/urandom of=/test/recvd/file2 bs=1k count=1024
2. should fail with ENOSPC
sync
zfs unmount test/recvd
zfs mount test/recvd
zfs userspace test/recvd
3. if bug above, now shows 1m quota
dd if=/dev/urandom of=/test/recvd/file3 bs=1k count=1024
4. if bug above, now fails with ENOSPC
```
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Alex Wilson <alex.wilson@joyent.com>
Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Patrick Mooney <pmooney@pfmooney.com>
illumos/illumos-gate@771e39c3b1
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Alex Wilson <alex.wilson@joyent.com>
illumos/illumos-gate@a2f72b65eb
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Alex Wilson <alex.wilson@joyent.com>
illumos/illumos-gate@0b8049bfb0
delete permissions for ACLs
Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Author: Kevin Crowe <kevin.crowe@nexenta.com>
openzfs/openzfs@a40149b935
aclmode=passthrough
Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Author: Albert Lee <trisk@nexenta.com>
openzfs/openzfs@1bcf0d240b
perms (groupmask)
Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Author: Albert Lee <trisk@nexenta.com>
openzfs/openzfs@eebb483d0c
some additional considerations
Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Author: Kevin Crowe <kevin.crowe@nexenta.com>
openzfs/openzfs@d316fffc9c
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Joe Stein <joe.stein@delphix.com>
openzfs/openzfs@215198a6ad
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Tim Chase <tim@chase2k.com>
openzfs/openzfs@445e67805d
illumos/illumos-gate@26455f9efc26455f9efchttps://www.illumos.org/issues/6052
At the moment type parameter of lzc_create() is of dmu_objset_type_t type.
That exposes an implementation detail and requires sys/fs/zfs.h to be included
in libzfs_core.h creating unnecessary coupling between libzfs_core interface
and ZFS internals.
I think that dmu_objset_type_t should be replaced with a libzfs_core
enumeration of supported dataset types.
For ABI reasons the new enumeration could be bit-compatible with
dmu_objset_type_t.
For example:
typedef enum {
LZC_DST_ZFS = 2,
LZC_DST_ZVOL
} lzc_dataset_type_t;
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Andriy Gapon <andriy.gapon@clusterhq.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Author: Alexander Motin <mav@FreeBSD.org>
Improve speculative prefetch of indirect blocks.
Scalability of many operations on wide ZFS pool can be limited by
requirement to prefetch indirect blocks first. Recently added
asynchronous indirect block read partially helped, but did not
solve the problem completely. This patch extends existing prefetcher
functionality to explicitly work with indirect blocks.
Before this change prefetcher issued reads for up to 8MB of data in
advance. With this change it also issues indirect block reads
for up to 64MB of data in advance, so that when it will be time to
actually read those data, it can be done immediately. Alike effect
can be achieved by just increasing maximal data prefetch distance,
but at higher memory cost.
Also this change introduces indirect block prefetch for rewrite
operations, that was never done before. Previously ARC miss for
Indirect blocks regularly blocked rewrites, converting perfectly
aligned asynchronous operations into synchronous read-write pairs,
significantly reducing maximal rewrite speed.
While being there this issue was also fixed:
- prefetch was done always, even if caching for the dataset was
completely disabled.
Testing on FreeBSD with zvol on top of 6x striped 2x mirrored pool
of 12 assorted HDDs shown me such performance numbers:
------- BEFORE --------
Write 491363677 bytes/sec
Read 312430631 bytes/sec
Rewrite 97680464 bytes/sec
-------- AFTER --------
Write 493524146 bytes/sec
Read 438598079 bytes/sec
Rewrite 277506044 bytes/sec
Closes#65Closes#80openzfs/openzfs@792fd28ac0
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Author: Will Andrews <will@firepipe.net>
Closes#83Closes#32
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Eli Rosenthal <eli.rosenthal@delphix.com>
illumos/illumos-gate@c20404ff77
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: George Wilson <george.wilson@delphix.com>
illumos/illumos-gate@41c6413cb5
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Alex Wilson <alex.wilson@joyent.com>
illumos/illumos-gate@d09e4475f6
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Stefan Ring <stefanrin@gmail.com>
Reviewed by: Steven Burgess <sburgess@datto.com>
Reviewed by: Arne Jansen <sensille@gmx.net>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Paul Dagnelie <pcd@delphix.com>
In certain circumstances, "zfs send -i" (incremental send) can produce a
stream which will result in incorrect sparse file contents on the
target.
The problem manifests as regions of the received file that should be
sparse (and read a zero-filled) actually contain data from a file that
was deleted (and which happened to share this file's object ID).
Note: this can happen only with filesystems (not zvols, because they do
not free (and thus can not reuse) object IDs).
Note: This can happen only if, since the incremental source (FromSnap),
a file was deleted and then another file was created, and the new file
is sparse (i.e. has areas that were never written to and should be
implicitly zero-filled).
We suspect that this was introduced by 4370 (applies only if hole_birth
feature is enabled), and made worse by 5243 (applies if hole_birth
feature is disabled, and we never send any holes).
The bug is caused by the hole birth feature. When an object is deleted
and replaced, all the holes in the object have birth time zero. However,
zfs send cannot tell that the holes are new since the file was replaced,
so it doesn't send them in an incremental. As a result, you can end up
with invalid data when you receive incremental send streams. As a
short-term fix, we can always send holes with birth time 0 (unless it's
a zvol or a dataset where we can guarantee that no objects have been
reused).
Closes#37
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Joshua M. Clulow <jmc@joyent.com>
illumos/illumos-gate@b211eb9181
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Gerhard Roethlin <git@the-color-black.net>
illumos/illumos-gate@cb605c4d8a
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Igor Kozhukhov <ikozhukhov@gmail.com>
illumos/illumos-gate@c16bcc4577