Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Prakash Surya <prakash.surya@delphix.com>
illumos/illumos-gate@97e8130957
Currently, zfs(8) and zpool(8) print "invalid type '(null)'" or similar
messages, if you pass in invalid types, sources or column names for "zfs
get", "zfs list" and "zpool get". This is because the commands use
getsubopt(3), and in case of failure, they print 'value', which is NULL
when sub options don't match.
They should print 'suboptarg' instead, which is the documented way to
get at the non-matching sub option value.
Reviewed by: smh
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D5365
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Paul Dagnelie <pcd@delphix.com>
illumos/illumos-gate@68ecb2ec93
This allows to do a full (non-incremental send) and receive it as a clone
of an existing dataset. It can leverage nopwrite to share blocks with the
origin. This can be used to change the relationship of datasets on the
target. For example, maybe on the source you have:
A ---- B ---- C
And you have sent to the target a full of B, and the incremental B->C:
B ---- C
You later realize that you want to have A on the target. You will have to
do a full send of A, but nopwrite can save you space on the target if you
receive it as a clone of B, assuming that A and B have some blocks inxi
common:
B ---- C
\
A
If the zfs module is not present and not loadable, report an error
to the user instead of crashing
Reviewed by: mahrens
Sponsored by: Gandi.net
Differential Revision: https://reviews.freebsd.org/D4691
Fix zfs(8) not formatting due to wrong macro (Oc) in the syntax for the new
zfs set multiple dataset properties option.
PR: 204631
Submitted by: Thomas Eberhardt
Sponsored by: Multiplay
Previously, the parameters of 'zfs holds' could only be snapshots
Add -d <depth> flag to limit depth of recursion
Add -p flag to print literal values, rather than interpreted values
Add -H flag to suppress header output and use tabs rather than whitespace
Reviewed by: mahrens, smh, dteske
Approved by: bapt (mentor)
MFC after: 3 weeks
Relnotes: yes
Sponsored by: ScaleEngine Inc.
Differential Revision: https://reviews.freebsd.org/D3994
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: John Wren Kennedy <john.kennedy@delphix.com>
illumos/illumos-gate@52244c0958
In fact, only unrelated part of that commit is applicable:
8. zpool list -v doesn't print spares
It also doesn't correctly identify log devices.
updated for large block support.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Joe Stein <joe.stein@delphix.com>
illumos/illumos-gate@e9316f7696
to illumos-gate 13967:92bec6d87f59
Illumos ZFS issues:
3557 dumpvp_size is not updated correctly when a dump zvol's size is
changed
3558 setting the volsize on a dump device does not return back ENOSPC
3559 setting a volsize larger than the space available sometimes succeeds
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Reviewed by: Richard PALO <richard@NetBSD.org>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Approved by: Rich Lowe <richlowe@richlowe.net>
Author: Chris Williamson <chris.williamson@delphix.com>
illumos/illumos-gate@30925561c2
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Xin Li <delphij@freebsd.org>
Reviewed by: Arne Jansen <sensille@gmx.net>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@9c3fd1216f
For more info, see:
- slides http://www.slideshare.net/MatthewAhrens/openzfs-send-and-receive
- video https://www.youtube.com/watch?v=iY44jPMvxog
- manpage changes (for zfs resume -s and zfs send -t)
- upcoming talk at the OpenZFS Developer Summit
The TL;DR is:
Use "zfs receive -s" to save the partially received state on failure.
On failure, get the receive token with "zfs get receive_resume_token <fs>"
Resume the send with "zfs send -t <token_value>"
Relnotes: yes
Previously, lockstat(1) would use a lock's address as its identifier when
consuming data describing lock contention and hold events. After collecting
the requested data, it would use ksyms(4) to resolve lock addresses to
names. Of course, this doesn't work too well for locks contained in
dynamically-allocated memory. This change modifies lockstat(1) to trace the
lock names obtained from the base struct lock_object instead, leading to
output that is generally much more useful.
This change also removes the -c option, which is used to coalesce data for
locks in an array. It's not possible to support this option without also
tracing lock addresses, and since lock arrays in which the lock names are
distinct are not very common in FreeBSD, it's simpler to just remove the
option.
Reviewed by: avg (earlier revision)
Differential Revision: https://reviews.freebsd.org/D3661
(userland portion that was not merged in r286677)
Update zdb to also print ltime, type, and level information
for these new style holes. Previously, only the logical birth
time would be printed.
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@ca0cc3918a
A ZFS feature flags (large blocks) tracks its refcounts as the number of
datasets that have ever used the feature. Several features of this type
are planned to be added (new checksum functions). This code should be made
common infrastructure rather than duplicating the code for each feature.
5925 zfs receive -o origin=
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Author: Paul Dagnelie <pcd@delphix.com>
While running 'zfs recv' we noticed that every 128th 8K block required a
read. We were seeing that restore_write() was calling dmu_tx_hold_write()
and the indirect block was not cached. We should prefetch upcoming indirect
blocks to avoid having to go to disk and blocking the restore_write().
Allow an incremental send stream to be received as a clone, even if the
stream does not mark it as a clone.
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@732885fca0
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Albert Lee <trisk@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@98110f08fa
5661 ZFS: "compression = on" should use lz4 if feature is enabled
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Reviewed by: Xin LI <delphij@freebsd.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Justin T. Gibbs <justing@spectralogic.com>
illumos/illumos-gate@db1741f555
- Rather than assuming that a process is listening on 127.0.0.1:22, use
nc(1) to find an available port and bind to it for the duration of the
test.
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
0. For spin events report time spent spinning, not a loop count.
While loop count is much easier and cheaper to obtain it is hard
to reason about the reported numbers, espcially for adaptive locks
where both spinning and sleeping can happen.
So, it's better to compare apples and apples.
1. Teach lockstat about FreeBSD rw locks.
This is done in part by changing the corresponding probes
and in part by changing what probes lockstat should expect.
2. Teach lockstat that rw locks are adaptive and can spin on FreeBSD.
3. Report lock acquisition events for successful rw try-lock operations.
4. Teach lockstat about FreeBSD sx locks.
Reporting of events for those locks completely mirrors
rw locks.
5. Report spin and block events before acquisition event.
This is behavior documented for the upstream, so it makes sense to stick
to it. Note that because of FreeBSD adaptive lock implementations
both the spin and block events may be reported for the same acquisition
while the upstream reports only one of them.
Differential Revision: https://reviews.freebsd.org/D2727
Reviewed by: markj
MFC after: 17 days
Relnotes: yes
Sponsored by: ClusterHQ
The format of these pages is somewhat experimental, so they may be subject
to further tweaking.
Differential Revision: https://reviews.freebsd.org/D2170
Reviewed by: bcr, rpaulo
MFC after: 2 weeks
the test harness. This is a problem in many of the *.ksh test scripts as
well, but those scripts are executed using a shell whose path is specified
in dtest.pl, so there's no need to modify them.
MFC after: 1 week
* Avoid hard-coding program paths.
* Use -x when searching for oneself in ps(1) output.
* Use the correct keyword (egid instead of pgid) in tst.egid.ksh.
MFC after: 1 week
* Avoid hard-coding program paths, except when it's necessary in order to
override the use of a shell builtin.
* Translate struct proc through psinfo_t so that we can access process
arguments via the pr_psargs field of psinfo_t.
* Replace uses of pstop and prun with kill(1).
MFC after: 1 week
which causes dtrace(1) to run the C preprocessor on input scripts before
executing them. Suppress some warnings emitted by the preprocessor which are
confusing the DTrace lexer tests.
MFC after: 1 week
Since the upstream for cddl code is now illumos not sun, mechanically
convert all sun #ifdef's to illumos #ifdef's which have been used in all
newer code for some time.
Also do a manual pass to correct the use if #ifdef comments as per style(9)
as well as few uses of #if defined(__FreeBSD__) vs #ifndef illumos.
MFC after: 1 month
Sponsored by: Multiplay
Introduce a seperate phase to list all unavailable pools when listing
pools to upgrade. This avoids confusing output when displaying older
and disabled feature pools. These existing phases now silently skip
unavailable pools.
Introduce cb_unavail to upgrade_cbdata_t which enables the final
output for zpool list to correctly detail if all pools or only all
available pools where up-to-date on version / features.
Correct the type of upgrade_cbdata_t.cb_first from int -> boolean_t.
Change the pool iteration when upgrading named pools to include
unavailable pools and update upgrade_one so it doesn't try to upgrade
unavailable pools but warns about them. This allows the correct error
to be displayed as well as upgrades with available and unavailable
pools intermixed to partially complete.
Also correct some missing trailing \n's from output in upgrade_one.
MFC after: 1 month
X-MFC-With: r276194
Prior to this fix "zpool upgrade" and "zpool upgrade -a" would fail due to
an assert when operating on unavailable pools.
We now print a warning to stderr but allow the processing of other pools
to procesed.
MFC after: 1 month
Convert ARC flags to use enum. Previously, public flags are defined in
arc.h and private flags are defined in arc.c which can lead to confusion
and programming errors.
Consistently use 'hdr' (when referencing arc_buf_hdr_t) instead of 'buf'
or 'ab' because arc_buf_t are often named 'buf' as well.
Illumos issue:
5369 arc flags should be an enum
5370 consistent arc_buf_hdr_t naming scheme
MFC after: 2 weeks
Remove "dbuf phys" db->db_data pointer aliases.
Use function accessors that cast db->db_data to the appropriate
"phys" type, removing the need for clients of the dmu buf user
API to keep properly typed pointer aliases to db->db_data in order
to conveniently access their data.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c:
In zap_leaf() and zap_leaf_byteswap, now that the pointer alias
field l_phys has been removed, use the db_data field in an on
stack dmu_buf_t to point to the leaf's phys data.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:
Remove the db_user_data_ptr_ptr field from dbuf and all logic
to maintain it.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dbuf.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:
Modify the DMU buf user API to remove the ability to specify
a db_data aliasing pointer (db_user_data_ptr_ptr).
cddl/contrib/opensolaris/cmd/zdb/zdb.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_diff.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_bookmark.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deadlist.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deleg.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_destroy.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_prop.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_synctask.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_userhold.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_history.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dir.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_impl.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_leaf.h:
Create and use the new "phys data" accessor functions
dsl_dir_phys(), dsl_dataset_phys(), zap_m_phys(),
zap_f_phys(), and zap_leaf_phys().
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dir.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_impl.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_leaf.h:
Remove now unused "phys pointer" aliases to db->db_data
from clients of the DMU buf user API.
Illumos issue:
5314 Remove "dbuf phys" db->db_data pointer aliases in ZFS
MFC after: 2 weeks
Port Illumos 'zfs allow' examples update. While I'm there also fix
a typo.
Illumos issue:
4181 zfs(1m): 'zfs allow' examples in the man page are outdated
MFC after: 2 weeks
ZFS large block support.
Please note that booting from datasets that have recordsize greater
than 128KB is not supported (but it's Okay to enable the feature on
the pool). This *may* remain unchanged because of memory constraint.
Limited safety belt is provided for mounted root filesystem but use
caution is advised.
Illumos issue:
5027 zfs large block support
MFC after: 1 month
- Limit ARC for zdb at 256MB. zdb do not typically revisit data
in the ARC.
- Increase default max_inflight from 200 to 1000 (can be overriden
by -I) so we can queue more I/Os when doing scrubbing.
- Print status while loading meataslabs for leak detection.
Illumos issues:
5169 zdb should limit its ARC size
5170 zdb -c should create more scrub i/os by default
5171 zdb should print status while loading metaslabs for leak detection
MFC after: 2 weeks
modifications to libproc to support fetching the CTF info for a given file.
With this change, dtrace(1) is able to resolve type info for function and
USDT probe arguments, and function return values. In particular, the args[n]
syntax should now work for referencing arguments of userland probes,
provided that the requisite CTF info is available.
The uctf tests pass if the test programs are compiled with CTF info. The
current infrastructure around the DTrace test suite doesn't support this
yet.
Differential Revision: https://reviews.freebsd.org/D891
MFC after: 1 month
Relnotes: yes
Sponsored by: EMC / Isilon Storage Division
install signal handlers when running in list mode (-l), and acknowledge
interrupts by cleaning up and exiting. This ensures that a command like
$ dtrace -l -P 'pid$target' -p <target PID> | less
won't cause the ptrace(2)d target process to be killed if less(1) exits
before all dtrace output is consumed.
Reported by: Anton Yuzhaninov <citrin+bsd@citrin.ru>
Differential Revision: https://reviews.freebsd.org/D880
Reviewed by: rpaulo
MFC after: 1 month
Sponsored by: EMC / Isilon Storage Division
In the case where new features where enabled by a zpool upgrade -a the
boot code warning wasn't output.
Submitted by: Jan Kokemueller
MFC after: 3 days
This can occur if a spare is being spared, which would yield three
children: the original pool drive, the previous spare, and the spare
that is replacing it.
MFC after: 1 week
Sponsored by: Spectra Logic
Affects: All ZFS versions starting 7 Jun 2006 (illumos 94de1d4c)
MFSpectraBSD: r668345 on 2013/06/04 17:10:43
It seems that if a pragma is used to define a weak alias for a local
function, the pragma must appear after the function is defined.
PR: 193056
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
seems that they would only pass by chance on illumos; on FreeBSD, they still
fail since userland CTF is not yet supported.
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
Iterate through all the children instead of returning error when we hit
the first error. This makes the error message give more information
rather than just the first device that causes problem.
Illumos issue:
5118 When verifying or creating a storage pool, error messages only
show one device
MFC after: 2 weeks
Double test device size for ztest(1).
Illumos issue:
5039 ztest should default to larger device sizes
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after: 2 weeks
Import Illumos changes to address the following Illumos issues:
4976 zfs should only avoid writing to a failing non-redundant
top-level vdev
4978 ztest fails in get_metaslab_refcount()
4979 extend free space histogram to device and pool
4980 metaslabs should have a fragmentation metric
4981 remove fragmented ops vector from block allocator
4982 space_map object should proactively upgrade when feature
is enabled
4984 device selection should use fragmentation metric
MFC after: 2 weeks
Instead of asserting all zio's be properly aligned, only assert
on the logical ones.
Cap uberblocks at 8k, otherwise with ashift=17, there would be
only one uberblock.
This fixes a problem that zdb would trip assert on pools with
ashift >= 0xe (8k).
While there, also change the code so it only attempt to condense
space map unless the uncondensed size consumes greater than
zfs_metaslab_condense_block_threshold blocks.
Illumos issue:
4958 zdb trips assert on pools with ashift >= 0xe
MFC after: 2 weeks
Improve extreme rewind import.
When doing an "extreme rewind" import ("zpool import -XF"), we attempt
to verify all data in the pool, essentially scrubbing the entire pool.
The problem is that spa_load_verify_cb() issues an unbounded number of
concurrent scrub i/os. This can lead to all of memory being used for
these zio's, wedging the system. Like normal scrub, we need to put a
cap on the number of outstanding i/os, and have the traverse thread
block when we reach this cap.
For this purpose the cap can be very large (10,000) to optimize the
elevator algorithm. Three kernel tunables have been added:
vfs.zfs.spa_load_verify_maxinflight
vfs.zfs.spa_load_verify_metadata
vfs.zfs.spa_load_verify_data
The latter two tunables controls whether metadata and/or user data
when doing extreme rewind.
Make 'zpool import -T' imply scrub.
Make zpool import -T <txg> accept hexadecimal values for the txg when
prefixed with 0x.
Skip txg's for which there is no uberblock when doing extreme rewind.
Skip reading all user data twice by skipping prefetches when doing
extreme rewinds as we do not access via the ARC.
Illumos issues:
4970 need controls on i/o issued by zpool import -XF
4971 zpool import -T should accept hex values
4972 zpool import -T implies extreme rewind, and thus a scrub
4973 spa_load_retry retries the same txg
4974 spa_load_verify() reads all data twice
MFC after: 2 weeks
zpool status -x is used to identify pools that are exhibiting
errors or are otherwise unavailable, therefore non-native
block-size pools shouldn't be reported.
Also update man page to clarify other additional conditions
which won't cause a pool to be displayed under zpool status -x.
Sponsored by: Multiplay
Use reserved space for ZFS administrative commands.
We reserve 1/2^spa_slop_shift = 1/32 or 3.125% of pool space (or 32MB at
least) for system use. Most ZPL operations, e.g. write(2), creat(2), will
fail with ENOSPC if we fall below this.
Certain operations, e.g. file removal and most administrative actions,
still permitted until half of the slop space is used. This would allow
users to use these operations to free up space in the pool when pool is
close to full but half of slop space is still free.
A very restricted set of operations that frees up space or change quota
are always permitted, regardless of the amount of free space.
MFC after: 2 weeks
Refresh zpool list for each interval in order to produce fresh
output.
Illumos issue: 4966 zpool list iterator does not update output
MFC after: 2 weeks
2915 DTrace in a zone should see "cpu", "curpsinfo", et al
2916 DTrace in a zone should be able to access fds[]
2917 DTrace in a zone should have limited provider access
MFC after: 2 weeks
Merge from r258379 missed the tests.
4248 dtrace(1M) should never create DOF with empty probes section
4249 Only probes from the first DTrace object file will be included
Illumos Revision: 54a20ab41aadcb81c53e72fc65886e964e9add59
MFC after: 5 days
Add a new zfs property, "redundant_metadata" which can have values "all" or
"most". The default will be "all", which is the current behavior. When set
to all, ZFS stores an extra copy of all metadata. If a single on-disk block
is corrupt, at worst a single block of user data (which is recordsize bytes
long) can be lost.
Setting to "most" will cause us to only store 1 copy of level-1 indirect
blocks of user data files. This can improve performance of random writes,
because less metadata has to be written. In practice, at worst about
100 blocks (of recordsize bytes each) of user data can be lost if a single
on-disk block is corrupt.
The exact behavior of which metadata blocks are stored redundantly may change
in future releases.
Illumos issue: 3835 zfs need not store 2 copies of all metadata
MFC after: 2 weeks
FreeBSD ZFS port unlike OpenSolaris does not use device IDs, and does not
implement respective devid_*() fuctions. It is pointless to open devices
just to close them back immediately.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
and zdb(8) by growing the buffer on demand with a cap of 1GB (specified in
spa_history_create_obj()).
PR: bin/186574
Submitted by: Andrew Childs <lorne cons org nz> (with changes)
MFC after: 2 weeks
New ZFS property volmode and sysctl vfs.zfs.vol.mode allow switching ZVOL
between three modes:
geom -- existing fully functional behavior (default);
dev -- exposing volumes only as raw disk device file in devfs;
none -- not exposing volumes outside ZFS.
The "dev" mode is less functional (can't be partitioned, mounted, etc),
but it is faster, and in some scenarios with untrusted consumers safer.
It can be useful for NAS, VM block storages, etc.
The "none" mode may be convenient for backup servers, etc. that don't
need direct data access.
Due to the way ZVOL is integrated with main ZFS code, those property
and sysctl are checked only during pool import and volume creation.
MFC after: 1 month
Sponsored by: iXsystems, Inc.
illumos/illumos-gate@6fb4854bed
This fixes the tst.resize1.d and tst.resize2.d DTrace tests, which have
been failing since r261122 since they were causing dtrace(1) to attempt to
allocate and use large amounts of memory, and get killed by the OOM killer
as a result.
MFC after: 1 month
"Manpages should start a new sentence on a new line. This makes it easier
for translators to track changes." -jhb
Approved by: jhb
MFC after: 3 days
Sponsored by: SupraNet Communications, Inc
hot spares. This should be MFC'd to all STABLE branches.
Upon the availability of zfsd, the zpool manpage on relevant branches should
be updated to remove this caveat and document hot spare's reliance on zfsd.
Approved by: avg
MFC after: 1 week
Sponsored by: SupraNet Communications
4370 avoid transmitting holes during zfs send
4371 DMU code clean up
illumos/illumos-gate@43466aae47
NOTE: Make sure the boot code is updated if a zpool upgrade is
done on boot zpool.
MFC after: 2 weeks
3306 zdb should be able to issue reads in parallel
3321 'zpool reopen' command should be documented in the man page
and help message
illumos/illumos-gate@31d7e8fa33
FreeBSD porting notes: the kernel part of this changeset depends
on Solaris buf(9S) interfaces and are not really applicable for
our use. vdev_disk.c is patched as-is to reduce diverge from
upstream, but vdev_file.c is left intact.
MFC after: 2 weeks
4171 clean up spa_feature_*() interfaces
4172 implement extensible_dataset feature for use by other zpool
features
illumos/illumos-gate@2acef22db7
MFC after: 2 weeks
4168 ztest assertion failure in dbuf_undirty
4169 verbatim import causes zdb to segfa
4170 zhack leaves pool in ACTIVE state
illumos/illumos-gate@7fdd916c47
MFC after: 2 weeks
expect the installed ksh binary to be named "ksh", which is not the case
when it's installed on FreeBSD via the shells/ksh93 port. Allow for it to be
"ksh93" as well so that the tests can actually pass.
4101 metaslab_debug should allow for fine-grained control
4102 space_maps should store more information about themselves
4103 space map object blocksize should be increased
4104 ::spa_space no longer works
4105 removing a mirrored log device results in a leaked object
4106 asynchronously load metaslab
illumos/illumos-gate@0713e232b7
Note that some tunables have been removed and some new tunables have
been added. Of particular note, FreeBSD-only knob
vfs.zfs.space_map_last_hope is removed as it was a nop for some time now
(after one of the previous merges from upstream).
MFC after: 11 days
Sponsored by: HybridCluster [merge]
illumos/illumos-gate@69962b5647
Please note the following changes:
- zio_ioctl has lost its priority parameter and now TRIM is executed
with 'now' priority
- some knobs are gone and some new knobs are added; not all of them are
exposed as tunables / sysctls yet
MFC after: 10 days
Sponsored by: HybridCluster [merge]
On some architectures (powerpc), char is unsigned by default, which means
comparisons against -1 always fail, so the programs get stuck in an
infinite loop.
MFC after: 1 week
installed. Additionally, remove Solaris-specific sections and references,
and replace example outputs with output from lockstat on FreeBSD, since
lockstat's output contains stack traces.
This change also removes some examples that don't seem to work properly on
FreeBSD. The examples should be re-added when lockstat is fixed.
Reported by: avg
MFC after: 1 week
available under Oracle Solaris 11.
This includes an update to the ZFS(8) man page to reflect all the
available alias (snap, umount, and recv).
Initial changes obtained from ZFS On Linux + fixes for man page and cmd
help:
10b75496bbcf81b00a73
Obtained from: https://github.com/zfsonlinux/zfs
MFC after: 2 weeks
Sponsored by: Multiplay
don't make sense on FreeBSD. In particular,
- remove the ATTRIBUTES section,
- remove references to the Solaris Dynamic Tracing Guide, except in the
SEE ALSO section,
- update the description of the -A option for FreeBSD's implementation,
- remove references to Solaris-specific programs and configuration files,
and replace them with FreeBSD equivalents where possible.
The content has not changed aside from this.
Approved by: re (joel)
MFC after: 1 week
Add support of Illumos dumps on zvol over RAID-Z.
Note that this only adds the features. FreeBSD would
still need more work to support dumping on zvols.
Illumos ZFS issues:
2932 support crash dumps to raidz, etc. pools
MFC after: 1 month
Approved by: re (ZFS blanket)
Don't treat the parameter as a number (pool GUID) when there is
error converting it from string, instead, treat it as the pool
name.
Illumos ZFS issues:
1765 assert triggered in libzfs_import.c trying to import pool
name beginning with a number
minimum allocation size for devices. Use this information to
automatically increase ZFS's minimum allocation size for new top-level
vdevs to a value that more closely matches the optimum device
allocation size.
Use GEOM's stripesize attribute, if set, as the physical sector
size of the GEOM.
Calculate the minimum blocksize of each metaslab class. Use the
calculated value instead of SPA_MINBLOCKSIZE (512b) when determining
the likelyhood of compression yeilding a reduction in physical space
usage.
Report devices with sub-optimal block size configuration in "zpool
status". Also properly fail attempts to attach devices with a
logical block size greater than 8kB, since this will cause corruption
to ZFS's label area.
Sponsored by: Spectra Logic Corporaion
MFC after: 2 weeks
Background
==========
Many modern devices use physical allocation units that are much
larger than the minimum logical allocation size accessible by
external commands. Two prevalent examples of this are 512e disk
drives (512b logical sector, 4K physical sector) and flash devices
(512b logical sector, 4K or larger allocation block size, and 128k
or larger erase block size). Operations that modify less than the
physical sector size result in a costly read-modify-write or garbage
collection sequence on these devices.
Simply exporting the true physical sector of the device to ZFS would
yield optimal performance, but has two serious drawbacks:
1) Existing pools created with devices that have different logical
and physical block sizes, but were configured to use the logical
block size (e.g. because the OS version used for pool construction
reported the logical block size instead of the physical block
size) will suddenly find that the vdev allocation size has
increased. This can be easily tolerated for active members of
the array, but ZFS would prevent replacement of a vdev with
another identical device because it now appears that the smaller
allocation size required by the pool is not supported by the new
device.
2) The device's physical block size may be too large to be supported
by ZFS. The optimal allocation size for the vdev may be quite
large. For example, a RAID controller may export a vdev that
requires read-modify-write cycles unless accessed using 64k
aligned/sized requests. ZFS currently has an 8k minimum block
size limit.
Reporting both the logical and physical allocation sizes for vdevs
solves these problems. A device may be used so long as the logical
block size is compatible with the configuration. By comparing the
logical and physical block sizes, new configurations can be optimized
and administrators can be notified of any existing pools that are
sub-optimal.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h:
Add the SPA_ASHIFT constant. ZFS currently has a hard upper
limit of 13 (8k) for ashift and this constant is used to
both document and enforce this limit.
sys/cddl/contrib/opensolaris/uts/common/sys/fs/zfs.h:
Add the VDEV_AUX_ASHIFT_TOO_BIG error code.
Add fields for exporting the configured, logical, and
physical ashift to the vdev_stat_t structure.
Add VDEV_STAT_VALID() macro which can be used to verify the
presence of required vdev_stat_t fields in nvlist data.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:
Provide a SYSCTL_PROC handler for "max_auto_ashift". Since
the limit is only referenced long after boot when a create
operation occurs, there's no compelling need for it to be
a boot time configurable tunable. This also allows the
validation code for the max_auto_ashift value to be contained
within the sysctl handler.
Populate the new fields in the vdev_stat_t structure.
Fail vdev opens if the vdev reports an ashift larger than
SPA_MAXASHIFT.
Propogate vdev_logical_ashift and vdev_physical_ashift between
child and parent vdevs as is done for vdev_ashift.
In vdev_open(), restore code that fails opens for devices
where vdev_ashift grows. This can only happen now if the
device's logical ashift grows, which means it really isn't
safe to use the device.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_file.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_missing.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_root.c:
Update the vdev_open() API so that both logical (what was
just ashift before) and physical ashift are reported.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h:
Add two new fields, vdev_physical_ashift and vdev_logical_ashift,
to vdev_t.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_config.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:
Add vdev_ashift_optimize(). Call it anytime a new top-level
vdev is allocated.
cddl/contrib/opensolaris/cmd/zpool/zpool_main.c:
Add text for the VDEV_AUX_ASHIFT_TOO_BIG error.
For each sub-optimally configured leaf vdev, report configured
and native block sizes.
cddl/contrib/opensolaris/cmd/zpool/zpool_main.c:
cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h:
cddl/contrib/opensolaris/lib/libzfs/common/libzfs_status.c:
Introduce a new zpool status: ZPOOL_STATUS_NON_NATIVE_ASHIFT.
This status is reported on healthy pools containing vdevs
configured to use a block size smaller than their reported
physical block size.
cddl/contrib/opensolaris/lib/libzfs/common/libzfs_status.c:
Update find_vdev_problem() and supporting functions to
provide the full vdev_stat_t structure to problem checking
routines, and to allow decent into replacing vdevs.
Add a vdev_non_native_ashift() validator which is used on
the full vdev tree to check for ZPOOL_STATUS_NON_NATIVE_ASHIFT.
cddl/contrib/opensolaris/lib/libzpool/common/kernel.c:
cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h:
Enhance sysctl userland stubs now that a SYSCTL_PROC handler
is used in vdev.c.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab_impl.h:
When the group membership of a metaslab class changes (i.e.
when a vdev is added or removed from a pool), walk the group
list to determine the smallest block size currently available
and record this in the metaslab class.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:
Add the metaslab_class_get_minblocksize() accessor.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_compress.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio_compress.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:
In zio_compress_data(), take the minimum blocksize as an
input parameter instead of assuming SPA_MINBLOCKSIZE.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:
In l2arc_compress_buf(), pass SPA_MINBLOCKSIZE as the minimum
blocksize of the device. The l2arc code performs has it's own
code for deciding if compression is worth while, so this
effectively disables zio_compress_data() from second guessing
the original decision.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:
In zio_write_bp_init(), use the minimum blocksize of the
normal metaslab class when compressing data.
Illumos ZFS issues:
3957 ztest should update the cachefile before killing itself
3958 multiple scans can lead to partial resilvering
3959 ddt entries are not always resilvered
3960 dsl_scan can skip over dedup-ed blocks if
physical birth != logical birth
3961 freed gang blocks are not resilvered and can cause pool to suspend
3962 ztest should print out zfs debug buffer before exiting
Fix a regression introduced by fix for Illumos bug #3834. Quote from
Matthew Ahrens on the Illumos issue:
ztest fails this assertion because ztest_dmu_read_write() does
dmu_tx_hold_free(tx, bigobj, bigoff, bigsize);
and then
dmu_object_set_checksum(os, bigobj,
(enum zio_checksum)ztest_random_dsl_prop(ZFS_PROP_CHECKSUM), tx);
If the region to free is past the end of the file, the DMU assumes that there
will be nothing to do for this object. However, ztest does set_checksum(),
which must modify the dnode. The fix is for ztest to also call
dmu_tx_hold_bonus(tx, bigobj);
so we can account for the dirty data associated with setting the checksum
Illumos ZFS issues:
3955 ztest failure: assertion refcount_count(&tx->tx_space_written)
+ delta <= tx->tx_space_towrite
Merge vendor bugfix for ZFS test suite that triggers false positives.
Illumos ZFS issues:
3949 ztest fault injection should avoid resilvering devices
3950 ztest: deadman fires when we're doing a scan
3951 ztest hang when running dedup test
3952 ztest: ztest_reguid test and ztest_fault_inject don't place nice together
Bring back some important fixes from Illumos:
3022 DTrace: keys should not affect the sort order when sorting by value
3023 it should be possible to dereference dynamic variables
3024 D integer narrowing needs some work
We particularly avoid the LD_NOLAZYLOAD changes that Illumos made
as those don't apply to FreeBSD and were causing problems in
interactive mode.
Illumos Revision: 13758:23432da34147
Reference:
https://www.illumos.org/issues/3022https://www.illumos.org/issues/3023https://www.illumos.org/issues/3024
MFC after: 1 month
Tested by: markj
zpool create should treat -O mountpoint and -m the same
Illumos ZFS issues:
3745 zpool create should treat -O mountpoint and -m the same
MFC after: 2 weeks
seven arguments.
The original test uses Solaris' uadmin system call to trigger the test
probe; this change adds a sysctl to the dtrace_test module and gets the test
program to trigger the test probe via the sysctl handler.
The test is currently failing on amd64 because of some bugs in the way that
probe arguments beyond the first five are obtained - these bugs will be
fixed in a separate change.
dt_cg_ptrsize() and generally cleans up some of the error handling around
register allocation.
This change corresponds to part of illumos-gate commit e5803b76927480:
3025 register leak in D code generation
Reviewed by: pfg
Obtained from: illumos
MFC after: 1 month
FreeBSD. In the IPv6 case, try each interface before returning an error;
each IPv6-enabled interface will have a link-local address even if the link
isn't up.
MFC after: 1 week
users to guarantee that the output of DTrace scripts will be time-ordered.
This option is enabled by adding the line
#pragma D option temporal
to the beginning of a script, or by adding '-x temporal' to the arguments of
dtrace(1).
This change fixes a bug in the original port of the temporal option. This
bug was causing some assertions to fail, so they had been disabled; in this
revision the assertions are working properly and are enabled.
The DTrace version number has been bumped from 1.9.0 to 1.9.1 to reflect
the language change that's being introduced.
This change corresponds to part of illumos-gate commit e5803b76927480:
3021 option for time-ordered output from dtrace(1M)
Reviewed by: pfg
Obtained from: illumos
MFC after: 1 month
The following change from illumos brought caused DTrace to
pause in an interactive environment:
3026 libdtrace should set LD_NOLAZYLOAD=1 to help the pid provider
This was not detected during testing because it doesn't
affect scripts.
We shouldn't be changing the environment, especially since the
LD_NOLAZYLOAD option doesn't apply to our (GNU) ld.
Unfortunately the change from upstream was made in such a way
that it is very difficult to separate this change from the
others so, at least for now, it's better to just revert
everything.
Reference:
https://www.illumos.org/issues/3026
Reported by: Navdeep Parhar and Mark Johnston
Merge changes from illumos:
3675 DTrace print() should try to resolve function pointers
3676 dt_print_enum hardcodes a value of zero
Illumos Revision: b1fa6326238973aeaf12c34fcda75985b6c06be1
Reference:
https://www.illumos.org/issues/3675https://www.illumos.org/issues/3676
Obtained from: Illumos
MFC after: 1 month
Merge changes from illumos:
3021 option for time-ordered output from dtrace(1M)
3022 DTrace: keys should not affect the sort order when sorting by value
3023 it should be possible to dereference dynamic variables
3024 D integer narrowing needs some work
3025 register leak in D code generation
3026 libdtrace should set LD_NOLAZYLOAD=1 to help the pid provider
This brings yet another feature implemented in upstream DTrace.
A complete description is available here:
http://dtrace.org/blogs/ahl/2012/07/28/my-new-dtrace-favorite/
This change bumps the DT_VERS_* number to 1.9.1 in
accordance to what is done in illumos.
This change was somewhat complicated because upstream is mixed many
changes in an individual commit and some of the tests don't really
apply to us.
There are also appear to be differences in timestamping with Solaris
so we had to workaround some assertions making sure no regression
happened.
Special thanks to Fabian Keil for changes and testing.
Illumos Revisions: 13758:23432da34147
Reference:
https://www.illumos.org/issues/3021https://www.illumos.org/issues/3022https://www.illumos.org/issues/3023https://www.illumos.org/issues/3024https://www.illumos.org/issues/3025https://www.illumos.org/issues/1694
Tested by: Fabian Keil
Obtained from: Illumos
MFC after: 1 months
Merge change from illumos:
1368 enablings on defunct providers prevent providers from unregistering
We try to address some underlying differences between the Solaris
and FreeBSD implementations: dtrace_attach() / dtrace_detach() are
currently unimplemented in FreeBSD but the new code from illumos
makes use of taskq so some adaptations were made to dtrace_open()
and dtrace_close() to handle them appropriately.
Illumos Revision: r13430:8e6add739e38
Reference:
https://www.illumos.org/issues/1368
Reviewed by: gnn
Tested by: Fabian Keil
Obtained from: Illumos
MFC after: 3 weeks
Merge change from illumos:
1694 Add type-aware print() action
This is a very nice feature implemented in upstream Dtrace.
A complete description is available here:
http://dtrace.org/blogs/eschrock/2011/10/26/your-mdb-fell-into-my-dtrace/
This change bumps the DT_VERS_* number to 1.9.0 in
accordance to what is done in illumos.
While here also include some minor cleanups to ease further merging
and appease clang with a fix by Fabian Keil.
Illumos Revisions: 13501:c3a7090dbc16
13483:f413e6c5d297
Reference:
https://www.illumos.org/issues/1560https://www.illumos.org/issues/1694
Tested by: Fabian Keil
Obtained from: Illumos
MFC after: 1 month
Merge changes from illumos:
1451 DTrace needs toupper()/tolower() subroutines
1457 lltostr() D subroutine should take an optional base
This change bumps the DT_VERS_* number to 1.8.1 in
accordance to what is done in illumos.
The test suite we currently include is outdated and
doesnt support some updates in tst.subr.d which had to
be left out for now.
Illumos Revisions: r13458 5e394d8db762
r13459 c3454574dd1a
Reference:
https://www.illumos.org/issues/1451https://www.illumos.org/issues/1457
Tested by: Fabian Keil
Obtained from: Illumos
MFC after: 1 month
Merge change from illumos:
1455 DTrace tracemem() should take an optional size argument
Our local enhancements to dt_print_bytes were equivalent to
those in illumos but we made it match the illumos version
to ease further code merges.
For now leave out tst.smallsize.d and tst.smallsize.d.out
since those don't seem to work cleanly on FreeBSD.
This change bumps the DT_VERS_* number to 1.7.1 in accordance
to what is done in illumos.
Illumos Revision: 13457:571b0355c2e3
Reference:
https://www.illumos.org/issues/1455
Tested by: Fabian Keil
Obtained from: Illumos
MFC after: 1 month
includes MFV 238590, 238592, 247580
MFV 238590, 238592:
In the first zfs ioctl restructuring phase, the libzfs_core library was
introduced. It is a new thin library that wraps around kernel ioctl's.
The idea is to provide a forward-compatible way of dealing with new
features. Arguments are passed in nvlists and not random zfs_cmd fields,
new-style ioctls are logged to pool history using a new method of
history logging.
http://blog.delphix.com/matt/2012/01/17/the-future-of-libzfs/
MFV 247580 [1]:
To address issues of several deadlocks and race conditions the locking
code around dsl_dataset was rewritten and the interface to synctasks
was changed.
User-Visible Changes:
"zfs snapshot" can create more arbitrary snapshots at once (atomically)
"zfs destroy" destroys multiple snapshots at once
"zfs recv" has improved performance
Backward Compatibility:
I have extended the compatibility layer to support full backward
compatibility by remapping or rewriting the responsible ioctl arguments.
Old utilities are fully supported by the new kernel module.
Forward Compatibility:
New utilities work with old kernels with the following restrictions:
- creating, destroying, holding and releasing of multiple snapshots
at once is not supported, this includes recursive (-r) commands
Illumos ZFS issues:
2882 implement libzfs_core
2900 "zfs snapshot" should be able to create multiple,
arbitrary snapshots at once
3464 zfs synctask code needs restructuring
References:
https://www.illumos.org/issues/2882https://www.illumos.org/issues/2900https://www.illumos.org/issues/3464 [1]
MFC after: 1 month
Sponsored by: Hybrid Logic Inc. [1]
Import minor ZFS changes from vendor
Illumos ZFS issues:
3604 zdb should print bpobjs more verbosely (fix zdb hang)
3606 zpool status -x shouldn't warn about old on-disk format
MFC after: 3 days
Merge new read-only zfs properties from vendor (illumos)
Illumos ZFS issues:
3588 provide zfs properties for logical (uncompressed) space used and
referenced
References:
https://www.illumos.org/issues/3588
MFC after: 2 weeks
Merge the ZFS I/O deadman thread from vendor (illumos).
This feature panics the system on hanging ZFS I/O, helps debugging
and resumes failed service.
The panic behavior can be controlled with the loader-only tunables:
vfs.zfs.deadman_enabled (enable or disable panic on stalled ZFS I/O)
vfs.zfs.deadman_synctime (expiration time for stalled ZFS I/O)
By default, ZFS I/O deadman is enabled by default on amd64 and i386
excluding virtual guest machines.
Illumos ZFS issues:
3246 ZFS I/O deadman thread
References:
https://www.illumos.org/issues/3246
MFC after: 2 weeks
kernel. Properly restore, continue, and detach from processes
being DTraced when DTrace exits with an error so the program
being inspected is not terminated.
cddl/contrib/opensolaris/cmd/dtrace/dtrace.c:
In fatal(), the generic error handler, close the DTrace
handle as is done in the "probe/script" error handler
dfatal(). fatal() can be invoked after DTrace attaches
to processes (e.g. a script specified by command line
argument can't be found) and closing the handle will
release them.
Submitted by: Spectra Logic Corporation
Reviewed by: rpaulo, gnn
* Illumos zfs issue #3035 [1] LZ4 compression support in ZFS.
LZ4 is a new high-speed BSD-licensed compression algorithm created
by Yann Collet that delivers very high compression and decompression
performance compared to lzjb (>50% faster on compression, >80% faster
on decompression and around 3x faster on compression of incompressible
data), while giving better compression ratio [1].
This version of LZ4 corresponds to upstream's [2] revision 85.
Please note that for obvious reasons this is not backward read
compatible. This means once a pool have LZ4 compressed data, these
data can no longer be read by older ZFS implementations.
Local changes:
- On-stack hash table disabled and using kernel slab allocator
instead, at this time. This requires larger kernel thread stack
for zio workers. This may change in the future should we adjusted
the zio workers' thread stack size.
- likely and unlikely will be undefined if they are already defined,
this is required for i386 XEN build.
- Removed De Bruijn sequence based __builtin_ctz family of builtins
in favor of the latter. Both GCC and clang supports these builtins.
- Changed the way the LZ4 code detects endianness.
- Manual pages modifications to mention the feature based on Illumos
counterpart.
- Boot loader changes to make it support LZ4 decompression.
[1] https://www.illumos.org/issues/3035
[2] http://code.google.com/p/lz4/source/list
Obtained from: Illumos (13921:9d721847e469)
Tested on: FreeBSD/amd64
MFC after: 1 month
the underlying zap_count() to return no errors. However, it is possible
that the pool reaches to such a state where zap_count would return error,
leading to panics when a pool is imported.
This commit changes the ddt_zap_count to return error returned from
zap_count and handle the error appropriately. With this change, it's now
possible to let zpool rollback damaged transaction groups and import the
pool.
Obtained from: ZFS on Linux github (e8fd45a0f9)
MFC after: 1 month
Introduce a new dataset aclmode setting "restricted" to protect ACL's
being destroyed or corrupted by a drive-by chmod.
illumos-gate 13889:a67716f16746
3254 add support in zfs for aclmode=restricted
References:
https://www.illumos.org/issues/3254
MFC after: 2 weeks
Import the zio nop-write improvement from Illumos. To reduce I/O,
nop-write omits overwriting data if the checksum (cryptographically
secure) of new data matches the checksum of existing data.
It also saves space if snapshots are in use.
It currently works only on datasets with enabled compression, disabled
deduplication and sha256 checksums.
IllumOS 13887:196932ec9e6a and 13888:7204b3392a58
3236 zio nop-write
References:
https://www.illumos.org/issues/3236
MFC after: 2 weeks
Illumos 13886:e3261d03efbf
3349 zpool upgrade -V bumps the on disk version number, but leaves
the in core version
References:
https://www.illumos.org/issues/3349
MFC after: 1 week
Illumos r13840:97fd5cdf328a:
3145 single-copy arc
3212 ztest: race condition between vdev_online() and spa_vdev_remove()
Illumos r13849:3468a95b27cd:
3258 ztest's use of file descriptors is unstable
doesn't exist on a dataset we are starting from. For example if we
have the following configuration:
tank
tank/foo
tank/foo@snap
tank/bar
tank/bar@snap
We can execute:
# zfs destroy -t tank@snap
eventhough tank@snap doesn't exit.
Unfortunately it is not possible to do the same with recursive rename:
# zfs rename -r tank@snap tank@pans
cannot open 'tank@snap': dataset does not exist
...until now. This change allows to recursively rename snapshots even if
snapshot doesn't exist on the starting dataset.
Sponsored by: rsync.net
MFC after: 2 weeks
impatient if we sent more than 2 INT signals. This fixes a bug where
we wouldn't see aggregations print on the command line if we Ctrl-C'd
a dtrace script or command line invocation.
MFC after: 2 weeks
Merge changes from illumos
906 dtrace depends_on pragma should search all library paths, not just the
current one
949 dtrace should only include the first instance of a library found on
its library path
Illumos Revisions: 13353:936a1e45726c
13354:2b2c36a81512
Reference:
https://www.illumos.org/issues/906https://www.illumos.org/issues/949
Tested by: Fabian Keil
Obtained from: Illumos
MFC after: 3 weeks
Bryan Cantrill implemented the equivalent of semi-log graph
paper for Dtrace so llquantize will use one logarithmic and
one linear scale.
Special thanks to Mark Peek for providing fix to an
assertion and to Fabian Keill for testing the port.
Illumos Revision: 13355:15b74a2a9a9d
Reference:
https://www.illumos/issues/905
Obtained from: Illumos
Tested by: Fabian Keill, mp
MFC after: 4 days
1948 zpool list should show more detailed pool information
Display per-vdev information with "zpool list -v".
The added expandsize property has currently no value on FreeBSD.
This changeset allows adding expansion support to individual vdevs
in the future.
References:
https://www.illumos.org/issues/1948
Obtained from: illumos (issue #1948)
MFC after: 2 weeks