This corrects inconsitencies between zpool list and zpool status which are
both described as displaying the pool <state> however zpool list would use
this hardcoded FAULTED instead of the correct UNAVAIL.
MFC after: 1 month
Plug a memory leak in libzfs. In zfs_iter_bookmarks, an nvlist is allocated
before calling lzc_get_bookmarks, which allocates the nvlist again (and
overwrites the pointer to previously allocated list).
Illumos issue:
5427 memory leak in libzfs when doing rollback
MFC after: 2 weeks
ZFS large block support.
Please note that booting from datasets that have recordsize greater
than 128KB is not supported (but it's Okay to enable the feature on
the pool). This *may* remain unchanged because of memory constraint.
Limited safety belt is provided for mounted root filesystem but use
caution is advised.
Illumos issue:
5027 zfs large block support
MFC after: 1 month
This errno value is emitted by dsl_props_set_check() in
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_prop.c, and
is used to mean that the property value is too long. For the record,
the maximum length is ZAP_MAXVALUELEN, which is 8*1024 bytes.
Instead of claiming an unknown error (and abort()ing), provide
something more specific to the scenario involved. As far as I
can tell, E2BIG is not emitted for any other scenario.
MFC after: 1 week
Sponsored by: Spectra Logic
Affects: All ZFS versions starting 27 Feb 2009 (illumos ccba0801)
This change modified the value returned by
dsl_props_set_check(), so that it can distinguish between
a name that's too long and a value that's too long, but
libzfs was not updated accordingly.
MFSpectraBSD: r1051499 on 2014/03/28 11:07:59
Import Illumos changes to address the following Illumos issues:
4976 zfs should only avoid writing to a failing non-redundant
top-level vdev
4978 ztest fails in get_metaslab_refcount()
4979 extend free space histogram to device and pool
4980 metaslabs should have a fragmentation metric
4981 remove fragmented ops vector from block allocator
4982 space_map object should proactively upgrade when feature
is enabled
4984 device selection should use fragmentation metric
MFC after: 2 weeks
FreeBSD ZFS port unlike OpenSolaris does not use device IDs, and does not
implement respective devid_*() fuctions. It is pointless to open devices
just to close them back immediately.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
and zdb(8) by growing the buffer on demand with a cap of 1GB (specified in
spa_history_create_obj()).
PR: bin/186574
Submitted by: Andrew Childs <lorne cons org nz> (with changes)
MFC after: 2 weeks
4171 clean up spa_feature_*() interfaces
4172 implement extensible_dataset feature for use by other zpool
features
illumos/illumos-gate@2acef22db7
MFC after: 2 weeks
(64MB). Even if we would find one somehow, ZFS kernel code rejects such
devices. It is funny to look on attempts to read 4 256K vdev labels from
1.44MB floppy, though it is not very practical and quite slow.
Add support of Illumos dumps on zvol over RAID-Z.
Note that this only adds the features. FreeBSD would
still need more work to support dumping on zvols.
Illumos ZFS issues:
2932 support crash dumps to raidz, etc. pools
MFC after: 1 month
Approved by: re (ZFS blanket)
minimum allocation size for devices. Use this information to
automatically increase ZFS's minimum allocation size for new top-level
vdevs to a value that more closely matches the optimum device
allocation size.
Use GEOM's stripesize attribute, if set, as the physical sector
size of the GEOM.
Calculate the minimum blocksize of each metaslab class. Use the
calculated value instead of SPA_MINBLOCKSIZE (512b) when determining
the likelyhood of compression yeilding a reduction in physical space
usage.
Report devices with sub-optimal block size configuration in "zpool
status". Also properly fail attempts to attach devices with a
logical block size greater than 8kB, since this will cause corruption
to ZFS's label area.
Sponsored by: Spectra Logic Corporaion
MFC after: 2 weeks
Background
==========
Many modern devices use physical allocation units that are much
larger than the minimum logical allocation size accessible by
external commands. Two prevalent examples of this are 512e disk
drives (512b logical sector, 4K physical sector) and flash devices
(512b logical sector, 4K or larger allocation block size, and 128k
or larger erase block size). Operations that modify less than the
physical sector size result in a costly read-modify-write or garbage
collection sequence on these devices.
Simply exporting the true physical sector of the device to ZFS would
yield optimal performance, but has two serious drawbacks:
1) Existing pools created with devices that have different logical
and physical block sizes, but were configured to use the logical
block size (e.g. because the OS version used for pool construction
reported the logical block size instead of the physical block
size) will suddenly find that the vdev allocation size has
increased. This can be easily tolerated for active members of
the array, but ZFS would prevent replacement of a vdev with
another identical device because it now appears that the smaller
allocation size required by the pool is not supported by the new
device.
2) The device's physical block size may be too large to be supported
by ZFS. The optimal allocation size for the vdev may be quite
large. For example, a RAID controller may export a vdev that
requires read-modify-write cycles unless accessed using 64k
aligned/sized requests. ZFS currently has an 8k minimum block
size limit.
Reporting both the logical and physical allocation sizes for vdevs
solves these problems. A device may be used so long as the logical
block size is compatible with the configuration. By comparing the
logical and physical block sizes, new configurations can be optimized
and administrators can be notified of any existing pools that are
sub-optimal.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h:
Add the SPA_ASHIFT constant. ZFS currently has a hard upper
limit of 13 (8k) for ashift and this constant is used to
both document and enforce this limit.
sys/cddl/contrib/opensolaris/uts/common/sys/fs/zfs.h:
Add the VDEV_AUX_ASHIFT_TOO_BIG error code.
Add fields for exporting the configured, logical, and
physical ashift to the vdev_stat_t structure.
Add VDEV_STAT_VALID() macro which can be used to verify the
presence of required vdev_stat_t fields in nvlist data.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:
Provide a SYSCTL_PROC handler for "max_auto_ashift". Since
the limit is only referenced long after boot when a create
operation occurs, there's no compelling need for it to be
a boot time configurable tunable. This also allows the
validation code for the max_auto_ashift value to be contained
within the sysctl handler.
Populate the new fields in the vdev_stat_t structure.
Fail vdev opens if the vdev reports an ashift larger than
SPA_MAXASHIFT.
Propogate vdev_logical_ashift and vdev_physical_ashift between
child and parent vdevs as is done for vdev_ashift.
In vdev_open(), restore code that fails opens for devices
where vdev_ashift grows. This can only happen now if the
device's logical ashift grows, which means it really isn't
safe to use the device.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_file.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_missing.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_root.c:
Update the vdev_open() API so that both logical (what was
just ashift before) and physical ashift are reported.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h:
Add two new fields, vdev_physical_ashift and vdev_logical_ashift,
to vdev_t.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_config.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:
Add vdev_ashift_optimize(). Call it anytime a new top-level
vdev is allocated.
cddl/contrib/opensolaris/cmd/zpool/zpool_main.c:
Add text for the VDEV_AUX_ASHIFT_TOO_BIG error.
For each sub-optimally configured leaf vdev, report configured
and native block sizes.
cddl/contrib/opensolaris/cmd/zpool/zpool_main.c:
cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h:
cddl/contrib/opensolaris/lib/libzfs/common/libzfs_status.c:
Introduce a new zpool status: ZPOOL_STATUS_NON_NATIVE_ASHIFT.
This status is reported on healthy pools containing vdevs
configured to use a block size smaller than their reported
physical block size.
cddl/contrib/opensolaris/lib/libzfs/common/libzfs_status.c:
Update find_vdev_problem() and supporting functions to
provide the full vdev_stat_t structure to problem checking
routines, and to allow decent into replacing vdevs.
Add a vdev_non_native_ashift() validator which is used on
the full vdev tree to check for ZPOOL_STATUS_NON_NATIVE_ASHIFT.
cddl/contrib/opensolaris/lib/libzpool/common/kernel.c:
cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h:
Enhance sysctl userland stubs now that a SYSCTL_PROC handler
is used in vdev.c.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab_impl.h:
When the group membership of a metaslab class changes (i.e.
when a vdev is added or removed from a pool), walk the group
list to determine the smallest block size currently available
and record this in the metaslab class.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:
Add the metaslab_class_get_minblocksize() accessor.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_compress.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio_compress.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:
In zio_compress_data(), take the minimum blocksize as an
input parameter instead of assuming SPA_MINBLOCKSIZE.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:
In l2arc_compress_buf(), pass SPA_MINBLOCKSIZE as the minimum
blocksize of the device. The l2arc code performs has it's own
code for deciding if compression is worth while, so this
effectively disables zio_compress_data() from second guessing
the original decision.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:
In zio_write_bp_init(), use the minimum blocksize of the
normal metaslab class when compressing data.
Fix zfs send -D hang after processing requiring a CTRL+C to interrupt due to
pthread_join prior to fd close.
This was introduced by r251646 (MFV r251644)
Illumos ZFS issue:
3909 "zfs send -D" does not work
MFC after: 1 day
Restore a previous behavior before r251646, where when destructing
ZFS snapshot, the ioctl would return ENOENT when it hit any of
them in the errlist (the new behavior was only return ENOENT when
all returns error).
Illumos ZFS issues:
3829 fix for 3740 changed behavior of zfs destroy/hold/release ioctl
MFC after: 1 week
zpool create should treat -O mountpoint and -m the same
Illumos ZFS issues:
3745 zpool create should treat -O mountpoint and -m the same
MFC after: 2 weeks
snapshot.
Related illumos ZFS issue:
3699 zfs hold or release of a non-existent snapshot does not output error
Reported by: Steven Hartland <smh@FreeBSD.org>
MFC after: 3 days
release a non-existing snapshot of a existing dataset. In recursive case
error is reported if no snapshots with the requested name have been found.
Problem and proposed solution reported to illumos:
3699 zfs hold or release of a non-existent snapshot does not output error
MFC after: 8 days
doesn't copyout in this case.
To solve this issue a new struct zfs_iocparm_t is introduced consisting of:
- zfs_ioctl_version (future backwards compatibility purposes)
- user space pointer to zfs_cmd_t (copyin and copyout)
- size of zfs_cmd_t (verification purposes)
The copyin and copyout of zfs_cmd_t is now done the illumos (vendor) way
what makes porting of new changes easier and ensures correct behavior if
returning an error.
MFC after: 10 days
- provide complete backwards compatibility (old utility, new kernel)
- add zfs_cmd_t compatibility mapping in both directions
- determine ioctl address in zfs_ioctl_compat.c
in r247265 (ZFS deadman thread). Both new utilities now support the old
kernel and new kernel properly detects old utilities.
For future backwards compatibility, the vfs.zfs.version.ioctl read-only
sysctl has been introduced. With this sysctl zfs utilities will be able
to detect the ioctl interface version of the currently loaded zfs module.
As a side effect, the zfs utilities between r247265 and this revision don't
support the old kernel module. If you are using HEAD newer or equal than
r247265, install the new kernel module (or whole kernel) first.
MFC after: 10 days
random order instead of creation order.
Eliminates needless filesystem renames caused by removed parent snapshots
which subsequently causes many more errors.
PR: kern/172259
Submitted by: Steven Hartland
Reviewed by: pjd (mentor)
Approved by: pjd (mentor)
MFC after: 2 weeks