2014-06-09 23:55:31 +02:00
|
|
|
src = @abs_top_srcdir@/module/zfs
|
|
|
|
obj = @abs_builddir@
|
2018-02-08 09:16:23 -07:00
|
|
|
target_cpu = @target_cpu@
|
2014-06-09 23:55:31 +02:00
|
|
|
|
2010-08-26 11:22:58 -07:00
|
|
|
MODULE := zfs
|
|
|
|
|
2012-07-09 11:23:00 +02:00
|
|
|
obj-$(CONFIG_ZFS) := $(MODULE).o
|
2010-08-26 11:22:58 -07:00
|
|
|
|
2018-01-10 10:49:27 -08:00
|
|
|
ccflags-y := $(ZFS_MODULE_CFLAGS) $(ZFS_MODULE_CPPFLAGS)
|
|
|
|
|
2018-02-08 09:16:23 -07:00
|
|
|
# Suppress unused-value warnings in sparc64 architecture headers
|
|
|
|
ifeq ($(target_cpu),sparc64)
|
|
|
|
ccflags-y += -Wno-unused-value
|
|
|
|
endif
|
|
|
|
|
2018-01-10 10:49:27 -08:00
|
|
|
# Suppress unused but set variable warnings often due to ASSERTs
|
|
|
|
ccflags-y += $(NO_UNUSED_BUT_SET_VARIABLE)
|
|
|
|
|
2016-07-22 11:52:49 -04:00
|
|
|
$(MODULE)-objs += abd.o
|
2017-05-25 11:32:40 -07:00
|
|
|
$(MODULE)-objs += aggsum.o
|
2014-06-09 23:55:31 +02:00
|
|
|
$(MODULE)-objs += arc.o
|
|
|
|
$(MODULE)-objs += blkptr.o
|
|
|
|
$(MODULE)-objs += bplist.o
|
|
|
|
$(MODULE)-objs += bpobj.o
|
2017-05-25 11:32:40 -07:00
|
|
|
$(MODULE)-objs += cityhash.o
|
2014-06-09 23:55:31 +02:00
|
|
|
$(MODULE)-objs += dbuf.o
|
|
|
|
$(MODULE)-objs += dbuf_stats.o
|
|
|
|
$(MODULE)-objs += bptree.o
|
2015-12-22 02:31:57 +01:00
|
|
|
$(MODULE)-objs += bqueue.o
|
2018-08-20 09:52:37 -07:00
|
|
|
$(MODULE)-objs += dataset_kstats.o
|
2014-06-09 23:55:31 +02:00
|
|
|
$(MODULE)-objs += ddt.o
|
|
|
|
$(MODULE)-objs += ddt_zap.o
|
|
|
|
$(MODULE)-objs += dmu.o
|
|
|
|
$(MODULE)-objs += dmu_diff.o
|
|
|
|
$(MODULE)-objs += dmu_object.o
|
|
|
|
$(MODULE)-objs += dmu_objset.o
|
2018-10-09 14:05:13 -07:00
|
|
|
$(MODULE)-objs += dmu_recv.o
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 09:48:13 -07:00
|
|
|
$(MODULE)-objs += dmu_redact.o
|
2014-06-09 23:55:31 +02:00
|
|
|
$(MODULE)-objs += dmu_send.o
|
|
|
|
$(MODULE)-objs += dmu_traverse.o
|
|
|
|
$(MODULE)-objs += dmu_tx.o
|
|
|
|
$(MODULE)-objs += dmu_zfetch.o
|
|
|
|
$(MODULE)-objs += dnode.o
|
|
|
|
$(MODULE)-objs += dnode_sync.o
|
|
|
|
$(MODULE)-objs += dsl_dataset.o
|
|
|
|
$(MODULE)-objs += dsl_deadlist.o
|
|
|
|
$(MODULE)-objs += dsl_deleg.o
|
|
|
|
$(MODULE)-objs += dsl_bookmark.o
|
|
|
|
$(MODULE)-objs += dsl_dir.o
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 13:36:48 -04:00
|
|
|
$(MODULE)-objs += dsl_crypt.o
|
2014-06-09 23:55:31 +02:00
|
|
|
$(MODULE)-objs += dsl_pool.o
|
|
|
|
$(MODULE)-objs += dsl_prop.o
|
|
|
|
$(MODULE)-objs += dsl_scan.o
|
|
|
|
$(MODULE)-objs += dsl_synctask.o
|
2016-06-15 15:47:05 -07:00
|
|
|
$(MODULE)-objs += edonr_zfs.o
|
2014-06-09 23:55:31 +02:00
|
|
|
$(MODULE)-objs += fm.o
|
|
|
|
$(MODULE)-objs += gzip.o
|
2017-09-12 16:15:11 -04:00
|
|
|
$(MODULE)-objs += hkdf.o
|
2014-06-09 23:55:31 +02:00
|
|
|
$(MODULE)-objs += lzjb.o
|
|
|
|
$(MODULE)-objs += lz4.o
|
|
|
|
$(MODULE)-objs += metaslab.o
|
Multi-modifier protection (MMP)
Add multihost=on|off pool property to control MMP. When enabled
a new thread writes uberblocks to the last slot in each label, at a
set frequency, to indicate to other hosts the pool is actively imported.
These uberblocks are the last synced uberblock with an updated
timestamp. Property defaults to off.
During tryimport, find the "best" uberblock (newest txg and timestamp)
repeatedly, checking for change in the found uberblock. Include the
results of the activity test in the config returned by tryimport.
These results are reported to user in "zpool import".
Allow the user to control the period between MMP writes, and the
duration of the activity test on import, via a new module parameter
zfs_multihost_interval. The period is specified in milliseconds. The
activity test duration is calculated from this value, and from the
mmp_delay in the "best" uberblock found initially.
Add a kstat interface to export statistics about Multiple Modifier
Protection (MMP) updates. Include the last synced txg number, the
timestamp, the delay since the last MMP update, the VDEV GUID, the VDEV
label that received the last MMP update, and the VDEV path. Abbreviated
output below.
$ cat /proc/spl/kstat/zfs/mypool/multihost
31 0 0x01 10 880 105092382393521 105144180101111
txg timestamp mmp_delay vdev_guid vdev_label vdev_path
20468 261337 250274925 68396651780 3 /dev/sda
20468 261339 252023374 6267402363293 1 /dev/sdc
20468 261340 252000858 6698080955233 1 /dev/sdx
20468 261341 251980635 783892869810 2 /dev/sdy
20468 261342 253385953 8923255792467 3 /dev/sdd
20468 261344 253336622 042125143176 0 /dev/sdab
20468 261345 253310522 1200778101278 2 /dev/sde
20468 261346 253286429 0950576198362 2 /dev/sdt
20468 261347 253261545 96209817917 3 /dev/sds
20468 261349 253238188 8555725937673 3 /dev/sdb
Add a new tunable zfs_multihost_history to specify the number of MMP
updates to store history for. By default it is set to zero meaning that
no MMP statistics are stored.
When using ztest to generate activity, for automated tests of the MMP
function, some test functions interfere with the test. For example, the
pool is exported to run zdb and then imported again. Add a new ztest
function, "-M", to alter ztest behavior to prevent this.
Add new tests to verify the new functionality. Tests provided by
Giuseppe Di Natale.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #745
Closes #6279
2017-07-07 20:20:35 -07:00
|
|
|
$(MODULE)-objs += mmp.o
|
2014-06-09 23:55:31 +02:00
|
|
|
$(MODULE)-objs += multilist.o
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 09:48:13 -07:00
|
|
|
$(MODULE)-objs += objlist.o
|
2016-04-13 08:55:35 -07:00
|
|
|
$(MODULE)-objs += pathname.o
|
2016-06-07 09:16:52 -07:00
|
|
|
$(MODULE)-objs += policy.o
|
2014-06-09 23:55:31 +02:00
|
|
|
$(MODULE)-objs += range_tree.o
|
|
|
|
$(MODULE)-objs += refcount.o
|
|
|
|
$(MODULE)-objs += rrwlock.o
|
|
|
|
$(MODULE)-objs += sa.o
|
|
|
|
$(MODULE)-objs += sha256.o
|
2016-06-15 15:47:05 -07:00
|
|
|
$(MODULE)-objs += skein_zfs.o
|
2014-06-09 23:55:31 +02:00
|
|
|
$(MODULE)-objs += spa.o
|
|
|
|
$(MODULE)-objs += spa_boot.o
|
2016-12-16 14:11:29 -08:00
|
|
|
$(MODULE)-objs += spa_checkpoint.o
|
2014-06-09 23:55:31 +02:00
|
|
|
$(MODULE)-objs += spa_config.o
|
|
|
|
$(MODULE)-objs += spa_errlog.o
|
|
|
|
$(MODULE)-objs += spa_history.o
|
|
|
|
$(MODULE)-objs += spa_misc.o
|
|
|
|
$(MODULE)-objs += spa_stats.o
|
|
|
|
$(MODULE)-objs += space_map.o
|
|
|
|
$(MODULE)-objs += space_reftree.o
|
|
|
|
$(MODULE)-objs += txg.o
|
|
|
|
$(MODULE)-objs += trace.o
|
|
|
|
$(MODULE)-objs += uberblock.o
|
|
|
|
$(MODULE)-objs += unique.o
|
|
|
|
$(MODULE)-objs += vdev.o
|
|
|
|
$(MODULE)-objs += vdev_cache.o
|
|
|
|
$(MODULE)-objs += vdev_disk.o
|
|
|
|
$(MODULE)-objs += vdev_file.o
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 09:30:13 -07:00
|
|
|
$(MODULE)-objs += vdev_indirect.o
|
|
|
|
$(MODULE)-objs += vdev_indirect_births.o
|
|
|
|
$(MODULE)-objs += vdev_indirect_mapping.o
|
OpenZFS 9102 - zfs should be able to initialize storage devices
PROBLEM
========
The first access to a block incurs a performance penalty on some platforms
(e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are
"thick provisioned", where supported by the platform (VMware). This can
create a large delay in getting a new virtual machines up and running (or
adding storage to an existing Engine). If the thick provision step is
omitted, write performance will be suboptimal until all blocks on the LUN
have been written.
SOLUTION
=========
This feature introduces a way to 'initialize' the disks at install or in the
background to make sure we don't incur this first read penalty.
When an entire LUN is added to ZFS, we make all space available immediately,
and allow ZFS to find unallocated space and zero it out. This works with
concurrent writes to arbitrary offsets, ensuring that we don't zero out
something that has been (or is in the middle of being) written. This scheme
can also be applied to existing pools (affecting only free regions on the
vdev). Detailed design:
- new subcommand:zpool initialize [-cs] <pool> [<vdev> ...]
- start, suspend, or cancel initialization
- Creates new open-context thread for each vdev
- Thread iterates through all metaslabs in this vdev
- Each metaslab:
- select a metaslab
- load the metaslab
- mark the metaslab as being zeroed
- walk all free ranges within that metaslab and translate
them to ranges on the leaf vdev
- issue a "zeroing" I/O on the leaf vdev that corresponds to
a free range on the metaslab we're working on
- continue until all free ranges for this metaslab have been
"zeroed"
- reset/unmark the metaslab being zeroed
- if more metaslabs exist, then repeat above tasks.
- if no more metaslabs, then we're done.
- progress for the initialization is stored on-disk in the vdev’s
leaf zap object. The following information is stored:
- the last offset that has been initialized
- the state of the initialization process (i.e. active,
suspended, or canceled)
- the start time for the initialization
- progress is reported via the zpool status command and shows
information for each of the vdevs that are initializing
Porting notes:
- Added zfs_initialize_value module parameter to set the pattern
written by "zpool initialize".
- Added zfs_vdev_{initializing,removal}_{min,max}_active module options.
Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/9102
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb
Closes #8230
2018-12-19 07:54:59 -07:00
|
|
|
$(MODULE)-objs += vdev_initialize.o
|
2014-06-09 23:55:31 +02:00
|
|
|
$(MODULE)-objs += vdev_label.o
|
|
|
|
$(MODULE)-objs += vdev_mirror.o
|
|
|
|
$(MODULE)-objs += vdev_missing.o
|
|
|
|
$(MODULE)-objs += vdev_queue.o
|
|
|
|
$(MODULE)-objs += vdev_raidz.o
|
SIMD implementation of vdev_raidz generate and reconstruct routines
This is a new implementation of RAIDZ1/2/3 routines using x86_64
scalar, SSE, and AVX2 instruction sets. Included are 3 parity
generation routines (P, PQ, and PQR) and 7 reconstruction routines,
for all RAIDZ level. On module load, a quick benchmark of supported
routines will select the fastest for each operation and they will
be used at runtime. Original implementation is still present and
can be selected via module parameter.
Patch contains:
- specialized gen/rec routines for all RAIDZ levels,
- new scalar raidz implementation (unrolled),
- two x86_64 SIMD implementations (SSE and AVX2 instructions sets),
- fastest routines selected on module load (benchmark).
- cmd/raidz_test - verify and benchmark all implementations
- added raidz_test to the ZFS Test Suite
New zfs module parameters:
- zfs_vdev_raidz_impl (str): selects the implementation to use. On
module load, the parameter will only accept first 3 options, and
the other implementations can be set once module is finished
loading. Possible values for this option are:
"fastest" - use the fastest math available
"original" - use the original raidz code
"scalar" - new scalar impl
"sse" - new SSE impl if available
"avx2" - new AVX2 impl if available
See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to
get the list of supported values. If an implementation is not supported
on the system, it will not be shown. Currently selected option is
enclosed in `[]`.
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4328
2016-04-25 10:04:31 +02:00
|
|
|
$(MODULE)-objs += vdev_raidz_math.o
|
|
|
|
$(MODULE)-objs += vdev_raidz_math_scalar.o
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 09:30:13 -07:00
|
|
|
$(MODULE)-objs += vdev_removal.o
|
2014-06-09 23:55:31 +02:00
|
|
|
$(MODULE)-objs += vdev_root.o
|
2019-03-29 09:13:20 -07:00
|
|
|
$(MODULE)-objs += vdev_trim.o
|
2014-06-09 23:55:31 +02:00
|
|
|
$(MODULE)-objs += zap.o
|
|
|
|
$(MODULE)-objs += zap_leaf.o
|
|
|
|
$(MODULE)-objs += zap_micro.o
|
2018-02-08 09:16:23 -07:00
|
|
|
$(MODULE)-objs += zcp.o
|
|
|
|
$(MODULE)-objs += zcp_get.o
|
|
|
|
$(MODULE)-objs += zcp_global.o
|
|
|
|
$(MODULE)-objs += zcp_iter.o
|
|
|
|
$(MODULE)-objs += zcp_synctask.o
|
2014-06-09 23:55:31 +02:00
|
|
|
$(MODULE)-objs += zfeature.o
|
|
|
|
$(MODULE)-objs += zfs_acl.o
|
|
|
|
$(MODULE)-objs += zfs_byteswap.o
|
|
|
|
$(MODULE)-objs += zfs_ctldir.o
|
|
|
|
$(MODULE)-objs += zfs_debug.o
|
|
|
|
$(MODULE)-objs += zfs_dir.o
|
|
|
|
$(MODULE)-objs += zfs_fm.o
|
|
|
|
$(MODULE)-objs += zfs_fuid.o
|
|
|
|
$(MODULE)-objs += zfs_ioctl.o
|
|
|
|
$(MODULE)-objs += zfs_log.o
|
|
|
|
$(MODULE)-objs += zfs_onexit.o
|
2017-08-09 15:31:08 -07:00
|
|
|
$(MODULE)-objs += zfs_ratelimit.o
|
2014-06-09 23:55:31 +02:00
|
|
|
$(MODULE)-objs += zfs_replay.o
|
|
|
|
$(MODULE)-objs += zfs_rlock.o
|
|
|
|
$(MODULE)-objs += zfs_sa.o
|
2018-09-02 15:09:53 -04:00
|
|
|
$(MODULE)-objs += zfs_sysfs.o
|
2014-06-09 23:55:31 +02:00
|
|
|
$(MODULE)-objs += zfs_vfsops.o
|
|
|
|
$(MODULE)-objs += zfs_vnops.o
|
|
|
|
$(MODULE)-objs += zfs_znode.o
|
|
|
|
$(MODULE)-objs += zil.o
|
|
|
|
$(MODULE)-objs += zio.o
|
|
|
|
$(MODULE)-objs += zio_checksum.o
|
|
|
|
$(MODULE)-objs += zio_compress.o
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 13:36:48 -04:00
|
|
|
$(MODULE)-objs += zio_crypt.o
|
2014-06-09 23:55:31 +02:00
|
|
|
$(MODULE)-objs += zio_inject.o
|
|
|
|
$(MODULE)-objs += zle.o
|
|
|
|
$(MODULE)-objs += zpl_ctldir.o
|
|
|
|
$(MODULE)-objs += zpl_export.o
|
|
|
|
$(MODULE)-objs += zpl_file.o
|
|
|
|
$(MODULE)-objs += zpl_inode.o
|
|
|
|
$(MODULE)-objs += zpl_super.o
|
|
|
|
$(MODULE)-objs += zpl_xattr.o
|
|
|
|
$(MODULE)-objs += zrlock.o
|
OpenZFS 9079 - race condition in starting and ending condensing thread for indirect vdevs
The timeline of the race condition is the following:
[1] Thread A is about to finish condesing the first vdev in
spa_condense_indirect_thread(), so it calls the
spa_condense_indirect_complete_sync() sync task which sets
the spa_condensing_indirect field to NULL. Waiting for the
sync task to finish, thread A sleeps until the txg is done.
When this happens, thread A will acquire spa_async_lock and
set spa_condense_thread to NULL.
[2] While thread A waits for the txg to finish, thread B which is
running spa_sync() checks whether it should condense the
second vdev in vdev_indirect_should_condense() by checking the
spa_condensing_indirect field which was set to NULL by
spa_condense_indirect_thread() from thread A. So it goes on
and tries to spawn a new condensing thread in
spa_condense_indirect_start_sync() and the aforementioned
assertions fails because thread A has not set spa_condense_thread
to NULL (which is basically the last thing it does before returning).
The main issue here is that we rely on both spa_condensing_indirect
and spa_condense_thread to signify whether a condensing thread is
running. Ideally we would only use one throughout the codebase. In
addition, for managing spa_condense_thread we currently use
spa_async_lock which basically tights condensing to scrubing when
it comes to pausing and resuming those actions during spa export.
This commit introduces the ZTHR infrastructure, which is basically
threads created during spa_load()/spa_create() and exist until we
export or destroy the pool. ZTHRs sleep the majority of the time,
until they are notified to wake up and do some predefined type of work.
In the context of the current bug, a zthr to does the condensing of
indirect mappings replacing the older code that used bare kthreads.
When a pool is created, the condensing zthr is spawned but sleeps
right away, until it is awaken by a signal from spa_sync(). If an
existing pool is loaded, the condensing zthr looks if there is
anything to condense before going to sleep, in case we were condensing
mappings in the pool before it got exported.
The benefits of this solution are the following:
- The current bug is fixed
- spa_condensing_indirect is the sole indicator of whether we are
currently condensing or not
- condensing is more decoupled from the spa_async_thread related
functionality.
As a final note, this commit also sets up the path on upstreaming
other features that use the ZTHR code like zpool checkpoint and
fast clone deletion.
Authored by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Ported-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://illumos.org/issues/9079
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/3dc606ee
Closes #6900
2017-03-15 16:41:52 -07:00
|
|
|
$(MODULE)-objs += zthr.o
|
2014-06-09 23:55:31 +02:00
|
|
|
$(MODULE)-objs += zvol.o
|
|
|
|
$(MODULE)-objs += dsl_destroy.o
|
|
|
|
$(MODULE)-objs += dsl_userhold.o
|
2018-03-09 16:37:15 -05:00
|
|
|
$(MODULE)-objs += qat.o
|
2017-03-23 08:58:47 +08:00
|
|
|
$(MODULE)-objs += qat_compress.o
|
2018-03-09 16:37:15 -05:00
|
|
|
$(MODULE)-objs += qat_crypt.o
|
SIMD implementation of vdev_raidz generate and reconstruct routines
This is a new implementation of RAIDZ1/2/3 routines using x86_64
scalar, SSE, and AVX2 instruction sets. Included are 3 parity
generation routines (P, PQ, and PQR) and 7 reconstruction routines,
for all RAIDZ level. On module load, a quick benchmark of supported
routines will select the fastest for each operation and they will
be used at runtime. Original implementation is still present and
can be selected via module parameter.
Patch contains:
- specialized gen/rec routines for all RAIDZ levels,
- new scalar raidz implementation (unrolled),
- two x86_64 SIMD implementations (SSE and AVX2 instructions sets),
- fastest routines selected on module load (benchmark).
- cmd/raidz_test - verify and benchmark all implementations
- added raidz_test to the ZFS Test Suite
New zfs module parameters:
- zfs_vdev_raidz_impl (str): selects the implementation to use. On
module load, the parameter will only accept first 3 options, and
the other implementations can be set once module is finished
loading. Possible values for this option are:
"fastest" - use the fastest math available
"original" - use the original raidz code
"scalar" - new scalar impl
"sse" - new SSE impl if available
"avx2" - new AVX2 impl if available
See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to
get the list of supported values. If an implementation is not supported
on the system, it will not be shown. Currently selected option is
enclosed in `[]`.
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4328
2016-04-25 10:04:31 +02:00
|
|
|
|
2017-12-07 10:28:50 -08:00
|
|
|
# Suppress incorrect warnings from versions of objtool which are not
|
|
|
|
# aware of x86 EVEX prefix instructions used for AVX512.
|
|
|
|
OBJECT_FILES_NON_STANDARD_vdev_raidz_math_avx512bw.o := y
|
|
|
|
OBJECT_FILES_NON_STANDARD_vdev_raidz_math_avx512f.o := y
|
|
|
|
|
2016-06-28 19:49:53 +02:00
|
|
|
$(MODULE)-$(CONFIG_X86) += vdev_raidz_math_sse2.o
|
|
|
|
$(MODULE)-$(CONFIG_X86) += vdev_raidz_math_ssse3.o
|
SIMD implementation of vdev_raidz generate and reconstruct routines
This is a new implementation of RAIDZ1/2/3 routines using x86_64
scalar, SSE, and AVX2 instruction sets. Included are 3 parity
generation routines (P, PQ, and PQR) and 7 reconstruction routines,
for all RAIDZ level. On module load, a quick benchmark of supported
routines will select the fastest for each operation and they will
be used at runtime. Original implementation is still present and
can be selected via module parameter.
Patch contains:
- specialized gen/rec routines for all RAIDZ levels,
- new scalar raidz implementation (unrolled),
- two x86_64 SIMD implementations (SSE and AVX2 instructions sets),
- fastest routines selected on module load (benchmark).
- cmd/raidz_test - verify and benchmark all implementations
- added raidz_test to the ZFS Test Suite
New zfs module parameters:
- zfs_vdev_raidz_impl (str): selects the implementation to use. On
module load, the parameter will only accept first 3 options, and
the other implementations can be set once module is finished
loading. Possible values for this option are:
"fastest" - use the fastest math available
"original" - use the original raidz code
"scalar" - new scalar impl
"sse" - new SSE impl if available
"avx2" - new AVX2 impl if available
See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to
get the list of supported values. If an implementation is not supported
on the system, it will not be shown. Currently selected option is
enclosed in `[]`.
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4328
2016-04-25 10:04:31 +02:00
|
|
|
$(MODULE)-$(CONFIG_X86) += vdev_raidz_math_avx2.o
|
2016-11-02 20:40:23 +01:00
|
|
|
$(MODULE)-$(CONFIG_X86) += vdev_raidz_math_avx512f.o
|
|
|
|
$(MODULE)-$(CONFIG_X86) += vdev_raidz_math_avx512bw.o
|
Add parity generation/rebuild using 128-bits NEON for Aarch64
This re-use the framework established for SSE2, SSSE3 and
AVX2. However, GCC is using FP registers on Aarch64, so
unlike SSE/AVX2 we can't rely on the registers being left alone
between ASM statements. So instead, the NEON code uses
C variables and GCC extended ASM syntax. Note that since
the kernel explicitly disable vector registers, they
have to be locally re-enabled explicitly.
As we use the variable's number to define the symbolic
name, and GCC won't allow duplicate symbolic names,
numbers have to be unique. Even when the code is not
going to be used (e.g. the case for 4 registers when
using the macro with only 2). Only the actually used
variables should be declared, otherwise the build
will fails in debug mode.
This requires the replacement of the XOR(X,X) syntax
by a new ZERO(X) macro, which does the same thing but
without repeating the argument. And perhaps someday
there will be a machine where there is a more efficient
way to zero a register than XOR with itself. This affects
scalar, SSE2, SSSE3 and AVX2 as they need the new macro.
It's possible to write faster implementations (different
scheduling, different unrolling, interleaving NEON and
scalar, ...) for various cores, but this one has the
advantage of fitting in the current state of the code,
and thus is likely easier to review/check/merge.
The only difference between aarch64-neon and aarch64-neonx2
is that aarch64-neonx2 unroll some functions some more.
Reviewed-by: Gvozden Neskovic <neskovic@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@atos.net>
Closes #4801
2016-10-03 18:44:00 +02:00
|
|
|
|
|
|
|
$(MODULE)-$(CONFIG_ARM64) += vdev_raidz_math_aarch64_neon.o
|
|
|
|
$(MODULE)-$(CONFIG_ARM64) += vdev_raidz_math_aarch64_neonx2.o
|