the underlying zap_count() to return no errors. However, it is possible
that the pool reaches to such a state where zap_count would return error,
leading to panics when a pool is imported.
This commit changes the ddt_zap_count to return error returned from
zap_count and handle the error appropriately. With this change, it's now
possible to let zpool rollback damaged transaction groups and import the
pool.
Obtained from: ZFS on Linux github (e8fd45a0f975c6b8ae8cd644714fc21f14fac2bf)
MFC after: 1 month
random order instead of creation order.
Eliminates needless filesystem renames caused by removed parent snapshots
which subsequently causes many more errors.
PR: kern/172259
Submitted by: Steven Hartland
Reviewed by: pjd (mentor)
Approved by: pjd (mentor)
MFC after: 2 weeks
Introduce a new dataset aclmode setting "restricted" to protect ACL's
being destroyed or corrupted by a drive-by chmod.
illumos-gate 13889:a67716f16746
3254 add support in zfs for aclmode=restricted
References:
https://www.illumos.org/issues/3254
MFC after: 2 weeks
Import the zio nop-write improvement from Illumos. To reduce I/O,
nop-write omits overwriting data if the checksum (cryptographically
secure) of new data matches the checksum of existing data.
It also saves space if snapshots are in use.
It currently works only on datasets with enabled compression, disabled
deduplication and sha256 checksums.
IllumOS 13887:196932ec9e6a and 13888:7204b3392a58
3236 zio nop-write
References:
https://www.illumos.org/issues/3236
MFC after: 2 weeks
Illumos 13886:e3261d03efbf
3349 zpool upgrade -V bumps the on disk version number, but leaves
the in core version
References:
https://www.illumos.org/issues/3349
MFC after: 1 week
Illumos r13840:97fd5cdf328a:
3145 single-copy arc
3212 ztest: race condition between vdev_online() and spa_vdev_remove()
Illumos r13849:3468a95b27cd:
3258 ztest's use of file descriptors is unstable
There is one known issue: Some probes will display an error message along the
lines of: "Invalid address (0)"
I tested this with both a simple dtrace probe and dtruss on a few different
binaries on 32-bit. I only compiled 64-bit, did not run it, but I don't expect
problems without the modules loaded. Volunteers are welcome.
MFC after: 1 month
In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.
The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.
Conducted and reviewed by: attilio
Tested by: pho
doesn't exist on a dataset we are starting from. For example if we
have the following configuration:
tank
tank/foo
tank/foo@snap
tank/bar
tank/bar@snap
We can execute:
# zfs destroy -t tank@snap
eventhough tank@snap doesn't exit.
Unfortunately it is not possible to do the same with recursive rename:
# zfs rename -r tank@snap tank@pans
cannot open 'tank@snap': dataset does not exist
...until now. This change allows to recursively rename snapshots even if
snapshot doesn't exist on the starting dataset.
Sponsored by: rsync.net
MFC after: 2 weeks
The code builds a map of regions that were freed. On every write the
code consults the map and eventually removes ranges that were freed
before, but are now overwritten.
Freed blocks are not TRIMed immediately. There is a tunable that defines
how many txg we should wait with TRIMming freed blocks (64 by default).
There is a low priority thread that TRIMs ranges when the time comes.
During TRIM we keep in-flight ranges on a list to detect colliding
writes - we have to delay writes that collide with in-flight TRIMs in
case something will be reordered and write will reached the disk before
the TRIM. We don't have to do the same for in-flight writes, as
colliding writes just remove ranges to TRIM.
Sponsored by: multiplay.co.uk
This work includes some important fixes and some improvements obtained
from the zfsonlinux project, including TRIMming entire vdevs on pool
create/add/attach and on pool import for spare and cache vdevs.
Obtained from: zfsonlinux
Submitted by: Etienne Dechamps <etienne.dechamps@ovh.net>
impatient if we sent more than 2 INT signals. This fixes a bug where
we wouldn't see aggregations print on the command line if we Ctrl-C'd
a dtrace script or command line invocation.
MFC after: 2 weeks
Merge changes from illumos
906 dtrace depends_on pragma should search all library paths, not just the
current one
949 dtrace should only include the first instance of a library found on
its library path
Illumos Revisions: 13353:936a1e45726c
13354:2b2c36a81512
Reference:
https://www.illumos.org/issues/906https://www.illumos.org/issues/949
Tested by: Fabian Keil
Obtained from: Illumos
MFC after: 3 weeks
These probes are most useful when looking into the structures
they provide, which are listed in io.d. For example:
dtrace -n 'io:genunix::start { printf("%d\n", args[0]->bio_bcount); }'
Note that the I/O systems in FreeBSD and Solaris/Illumos are sufficiently
different that there is not a 1:1 mapping from scripts that work
with one to the other.
MFC after: 1 month
Bryan Cantrill implemented the equivalent of semi-log graph
paper for Dtrace so llquantize will use one logarithmic and
one linear scale.
Special thanks to Mark Peek for providing fix to an
assertion and to Fabian Keill for testing the port.
Illumos Revision: 13355:15b74a2a9a9d
Reference:
https://www.illumos/issues/905
Obtained from: Illumos
Tested by: Fabian Keill, mp
MFC after: 4 days
1948 zpool list should show more detailed pool information
Display per-vdev information with "zpool list -v".
The added expandsize property has currently no value on FreeBSD.
This changeset allows adding expansion support to individual vdevs
in the future.
References:
https://www.illumos.org/issues/1948
Obtained from: illumos (issue #1948)
MFC after: 2 weeks
2703 add mechanism to report ZFS send progress
If the zfs send command is used with the -v flag, the amount of bytes
transmitted is reported in per second updates.
References:
https://www.illumos.org/issues/2703
Obtained from: illumos (issue #2703)
MFC after: 2 weeks
in the system-wide version of <stdlib.h> by wrapping the #include_next
<stdlib.h> within the scope of its own header protection. On FreeBSD this has
no effect, since both header protections are equivalent. However the GNU version
of <stdlib.h> implements a special header protection mechanism which allows it
to be included multiple times (in different modes).
Simply by moving the #include_next off the header protection, we allow
system-wide <stdlib.h> to implement its own protection policy, whichever that
may be.
to follow the example of OpenSolaris and its descendants, which implemented
cpu as an inline that took a value out of curthread. At certain points in
the FreeBSD scheduler curthread->td_oncpu will no longer be valid (in
particukar, just before the thread gets descheduled) so instead I have
implemented this as its own built-in variable.
Sponsored by: Sandvine Inc.
MFC after: 1 week
CTF format is not cross-platform by design, e.g. it is not guaranteed
that data generated by ctfconvert/ctfmerge on one architecture will
be successfuly read on another. CTF structures are saved/restored
using naive approach. Roughly it looks like:
write(fd, &ctf_struct, sizeof(ctf_struct))
read(fd, &ctf_struct, sizeof(ctf_struct))
By sheer luck memory layout of all type-related CTF structures is the same
on amd64/i386/mips32/mips64. It's different on ARM though. sparc, ia64,
powerpc, and powerpc64 were not tested. So in order to get file compatible
with dtrace on ARM it should be compiled on ARM. Alternative solution would
be to have "signatures" for every platform and ctfmerge should convert host's
reperesentation of CTF structure to target's one using "signature" as template.
This patch checks byte order of ELF files used for generating CTF record
and makes sure that byte order of data written to resulting files is the same
as target's byte order.
allow.mount.zfs:
allow mounting the zfs filesystem inside a jail
This way the permssions for mounting all current VFCF_JAIL filesystems
inside a jail are controlled wia allow.mount.* jail parameters.
Update sysctl descriptions.
Update jail(8) and zfs(8) manpages.
TODO: document the connection of allow.mount.* and VFCF_JAIL for kernel
developers
MFC after: 10 days
add support for "-t <datatype>" argument to zfs get
References:
https://www.illumos.org/issues/1936
Update zfs(8) manpage in respect of [1].
Fix typo in zfs(8) manpage.
Obtained from: illumos (issue #1936)
MFC after: 1 week
happen to have a .exe extension.
While here fix the shebang of a shell script that
was looking for /bin/bash.
Reviewed by: gnn
Approved by: jhb (mentor)
MFC after: 2 weeks
names and wants to sort them by name, ie. when executes:
# zfs list -t snapshot -o name -s name
Because only name is needed we don't have to read all snapshot properties.
Below you can find how long does it take to list 34509 snapshots from a single
disk pool before and after this change with cold and warm cache:
before:
# time zfs list -t snapshot -o name -s name > /dev/null
cold cache: 525s
warm cache: 218s
after:
# time zfs list -t snapshot -o name -s name > /dev/null
cold cache: 1.7s
warm cache: 1.1s
MFC after: 1 week
uint64_t values are snprintf'd using %llx. On amd64, uint64_t is
typedef'd as unsigned long, so cast the values to u_longlong_t, as is
done similarly in the rest of the file.
MFC after: 1 week
uint64_t values are snprintf'd using %llx. On amd64, uint64_t is
typedef'd as unsigned long, so cast the values to u_longlong_t, as is
done similarly in the rest of the file.
MFC after: 1 week
dt_popc() function assumes that either _ILP32 or _LP64 is defined,
otherwise it has no suitable implementation.
However, the _ILP32 and _LP64 macros come from isa_defs.h, which is not
included in this file. Add the include now, to get the macros defined.
MFC after: 1 week
This code only runs on i386 and amd64, so there should be no problems if
buf + sec->dofs_offset is not aligned (which is unlikely anyway).
MFC after: 1 week
bsd.prog.mk -- we need to compile PIC, which requires a library build.
With this change, USDT (userspace DTrace probes) work from within
shared libraries.
PR: kern/159046
Submitted by: Alex Samorukov <samm at os2.kiev.ua>
Comments by: Scott Lystig Fritchie <slfritchie at snookles.com>
MFC after: 3 days
The zfs(8) and zpool(8) manual pages now match the state of the ZFS module
and have been customized for FreeBSD.
The new texts of the "Deduplication" subsection in zfs(8), the zpool "split"
command, the zfs "dedup" property and several other missing parts have been
added from illumos or OpenSolaris snv_134 (CDDL-licensed).
The mdoc(7) reimplementation of whole manual pages, the descriptions of the
zpool "readonly" property, "zfs diff" command and descriptions of several
other missing command flags and/or options were authored by myself.
MFC after: 1 week
Manual pages from OpenSolaris svn_134 are still properly CDDL licensed
but I have been informed that the parts from s11ex are uncertain even
if they contain a CDDL header.
Improved alignment for a maximum width of 80 characters.
Mark unsupported parts as such.
Reported to vendor: Illumos issue #1801
References:
https://www.illumos.org/issues/1801
Obtained from: OpenSolaris CDDL manual pages (snv_134, s11express) [1]
MFC after: 4 days
- synchronized to match new vendor code [1]
- removed ATTRIBUTES sections
- updated SEE ALSO sections
- properly updated copyright information (required by CDDL)
- remove empty lines via MANFILTER
Obtained from: Illumos [1]
MFC after: 5 days
- synchronized to match new vendor code (Illumos rev. 13513) [1]
- removed references to sun commands (replaced with FreeBSD commands)
- removed ATTRIBUTES sections
- updated SEE ALSO sections
- properly updated copyright information (required by CDDL)
- remove empty lines via MANFILTER
zfs(8) only:
- replaced "Zones" section with new "Jails" section
- removed misleading "ZFS Volumes as Swap or Dump Devices" section
- updated shareiscsi and sharesmb option information (not supported on FreeBSD)
- replace zoned property with jailed property
zpool(8) only:
- updated device names in examples
Obtained from: Illumos (as of rev. 13513:f84d4672fdbd) [1]
MFC after: 1 week
952 separate intent logs should be obvious in 'zpool iostat' output
1337 `zpool status -D' should tell if there are no DDT entries
References:
https://www.illumos.org/issues/952https://www.illumos.org/issues/1337
Obtained from: Illumos (issues 952, 1337; changesets 13384, 13432)
MFC after: 1 week
are linked with libraries they don't use:
- zinject doesn't use libavl
- ztest doesn't use libz
- zdb uses neither libavl nor libz
- zfs uses neither libbsdxml nor libm, nor libsbuf
- zpool uses neither libbsdxml nor libm, nor libsbuf
In addition, libzfs needs libm because it uses pow(), however it isn't
linked with -lm. This went unnoticed because all its users had -lm before.
Reviewed by: pjd, mm
Approved by: kib (mentor)
MFC after: 1 week
non-legacy mountpoints. It is better to be able to rename such file systems and
let them be mounted in old places until next reboot than using live CD, etc. to
rename with remount.
This is implemented by adding -u option to 'zfs rename'. If file system's
mountpoint property is set to 'legacy' or 'none', there is no need to specify -u.
Update zfs(8) manual page to reflect this addition.
MFC after: 2 weeks
It is possible for file systems with 'mountpoint' preperty set to 'legacy'
or 'none' - we don't have to change mount directory for them.
Currently such file systems are unmounted on rename and not even mounted back.
This introduces layering violation, as we need to update 'f_mntfromname'
field in statfs structure related to mountpoint (for the dataset we are
renaming and all its children).
In my opinion it is worth it, as it allow to update FreeBSD in even cleaner
way - in ZFS-only configuration root file system is ZFS file system with
'mountpoint' property set to 'legacy'. If root dataset is named system/rootfs,
we can snapshot it (system/rootfs@upgrade), clone it (system/oldrootfs),
update FreeBSD and if it doesn't boot we can boot back from system/oldrootfs
and rename it back to system/rootfs while it is mounted as /. Before it was
not possible, because unmounting / was not possible.
MFC after: 2 weeks
or mountpoint=legacy that have children datasets. This also fixes dataset
rename when receiving incremental snapshots as reported on freebsd-fs@
This assertion was made triggerable by opensolaris change #10196.
PR: bin/160400
Reviewed by: pjd
MFC after: 1 week
in the case of a held dataset during remount.
Detailed description is available at:
https://www.illumos.org/issues/883
illumos-gate revision: 13380:161b964a0e10
Reviewed by: pjd
Approved by: re (kib)
Obtained from: Illumos (Bug #883)
MFC after: 3 days
devices are imbalanced zfs will lots of CPU searching for space on devices
which tend to be pretty full. It should instead fail quickly on the full
devices and move onto devices which have more availability.
New loader tunable: vfs.zfs.mg_alloc_failures (min = 8)
Illumos-gate changeset: 13379:4df42cc92254
Obtained from: Illumos (Bug #1051)
MFC after: 2 weeks
cddl/contrib/opensolaris/cmd/zpool/zpool.8:
cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c:
Add the "zpool labelclear" command. This command can be
used to wipe the label data from a drive that is not
active in a pool. The optional "-f" argument can be
used to treat an exported or foreign vdev as "inactive"
thus allowing its label information to be cleared.
perform pool actions is always displayed.
cddl/contrib/opensolaris/cmd/zpool/zpool_main.c:
The "zpool status" command reports the "last seen at"
device node path when the vdev name is being reported
by GUID. Augment this code to assume a GUID is reported
when a device goes missing after initial boot in addition
to the previous behavior of doing this for devices that
aren't seen at boot.
cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c:
In zpool_vdev_name(), report recently missing devices
by GUID. There is no guarantee they will return at
their previous location.
cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c:
o Add zpool_pool_state_to_name() API to libzfs which converts a
pool_state_t into a user consumable string.
o While here, correct constness of make zpool_state_to_name()
and zpool_label_disk().
MFD after: 1 week
For snapshots, this is the same as COMPRESSRATIO, but for
filesystems/volumes, the COMPRESSRATIO is based on the data "USED" (ie,
includes blocks in children, but not blocks shared with the origin).
This is needed to figure out how much space a filesystem would use if it
were not compressed (ignoring snapshots).
Illumos-gate revision: 13387
Obtained from: Illumos (Feature #1092)
MFC after: 2 weeks
but just have a comment that this is broken.
This is just a bandaid until somebody can fix this correctly. The code
is just a broken as it was before r223262 - now buildworld just doesn't
fail.
Tested by: i386 + amd64 buildworld
With hat: benl co-mentor
OpenSolaris and ZFS header files. These changes are sufficient
to allow a C++ program to use the libzfs library.
Note: The majority of these files already included 'extern "C"'
declarations, so the intention of providing C++ compatibility
already existed even if it wasn't provided.
cddl/compat/opensolaris/include/assert.h:
Wrap our compatibility assert implementation in
'extern "C"'. Since this is a compatibility header
I matched the Solaris style of doing this explicitly
rather than rely on FreeBSD's __BEGIN/END_DECLS macro.
sys/cddl/compat/opensolaris/sys/kstat.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/arc.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_pool.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/ddt.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h:
Rename parameters in function declarations that conflict
with C++ keywords. This was the solution preferred by
members of the Illumos community.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h:
In C, nested structures are visible in the global namespace,
but in C++, they take on the namespace of the structure in
which they are contained. Flatten nested structure
definitions within struct zfs_cmd so these structures are
visible in the global namespace when compiled in both
languages.
Sponsored by: Spectra Logic Corporation
Few new things available from now on:
- Data deduplication.
- Triple parity RAIDZ (RAIDZ3).
- zfs diff.
- zpool split.
- Snapshot holds.
- zpool import -F. Allows to rewind corrupted pool to earlier
transaction group.
- Possibility to import pool in read-only mode.
MFC after: 1 month
have an executable stack, due to linking in hand-assembled .S or .s
files, that have no .GNU-stack sections:
RWX --- --- /lib/libcrypto.so.6
RWX --- --- /lib/libmd.so.5
RWX --- --- /lib/libz.so.6
RWX --- --- /lib/libzpool.so.2
RWX --- --- /usr/lib/liblzma.so.5
These were found using scanelf, from the sysutils/pax-utils port.
Reviewed by: kib
if creating a mirror by attaching a new vdev to a root pool.
Reported by: James R. Van Artsdalen (on freebsd-fs@freebsd.org)
Approved by: delphij (mentor)
MFC after: 3 days
Retry IO once with ZIO_FLAG_TRYHARD before declaring a pool faulted
OpenSolaris revision and Bug IDs:
9725:0bf7402e8022
6843014 ZFS B_FAILFAST handling is broken
Approved by: delphij (mentor)
Obtained from: OpenSolaris (Bug ID 6843014)
MFC after: 3 weeks
OpenSolaris revision and Bug IDs:
9701:cc5b64682e64
6803605 should be able to offline log devices
6726045 vdev_deflate_ratio is not set when offlining a log device
6599442 zpool import has faults in the display
Approved by: delphij (mentor)
Obtained from: OpenSolaris (Bug ID 6803605, 6726045, 6599442)
MFC after: 3 weeks
linking process. This is needed because we change the source object
files and the second this dtrace -G is run, no probes will be found.
This hack allows us to build postgres with DTrace probes enabled. I'll
try to find a way to fix this without needing this hack.
Sponsored by: The FreeBSD Foundation
cannot access dataset system/usr/home: Operation not supported
by including libzfs_impl.h. What libzfs_impl.h does is to redefine ioctl() to
be compatible with OpenSolaris. More specifically OpenSolaris returns ENOMEM
when buffer is too small and sets field zc_nvlist_dst_size to the size that
will be big enough for the data. In FreeBSD case ioctl() doesn't copy data
structure back in case of a failure. We work-around it in kernel and libzfs by
returning 0 from ioctl() and always checking if zc_nvlist_dst_size hasn't
changed. For this work-around to work in pyzfs we need this compatible ioctl()
which is implemented in libzfs_impl.h.
MFC after: 2 weeks
This provides a noticeable write speedup, especially on pools with
less than 30% of free space.
Detailed information (OpenSolaris onnv changesets and Bug IDs):
11146:7e58f40bcb1c
6826241 Sync write IOPS drops dramatically during TXG sync
6869229 zfs should switch to shiny new metaslabs more frequently
11728:59fdb3b856f6
6918420 zdb -m has issues printing metaslab statistics
12047:7c1fcc8419ca
6917066 zfs block picking can be improved
Approved by: delphij (mentor)
Obtained from: OpenSolaris (Bug ID 6826241, 6869229, 6918420, 6917066)
MFC after: 2 weeks
drti.o depends on libelf and this avoids linking every other drti.o
program (namely programs with USDT probes) with libelf.
Sponsored by: The FreeBSD Foundation
Summary of changes:
* Implement a compatibility shim between Solaris libproc and our
libproc and remove several ifdefs because of this.
* Port the drti to FreeBSD.
* Implement the missing DOODAD sections
* Link with libproc and librtld_db
* Support for ustack, jstack and uregs (by sson@)
* Misc bugfixing
When writing the SUWN_dof section, we had to resort to building the ELF
file layout by "hand". This is the job of libelf, but our libelf doesn't
support this yet. When libelf is fixed, we can remove the code under
#ifdef BROKEN_LIBELF.
Sponsored by: The FreeBSD Foundation