8012d6910c
A change to a property on a dataset must be propagated to its descendants in case that property is inherited. For datasets whose information is not currently loaded into memory (e.g. a snapshot that isn't currently mounted), there is nothing to do; the property change will take effect the next time that dataset is loaded. To handle updates to datasets that are in-core, ZFS registers a callback entry for each property of each loaded dataset with the dsl directory that holds that dataset. There is a dsl directory associated with each live dataset that references both the live dataset and any snapshots of the live dataset. A property change is effected by doing a traversal of the tree of dsl directories for a pool, starting at the directory sourcing the change, and invoking these callbacks. The current implementation both registers and de-registers properties individually for each loaded dataset. While registration for a property is O(1) (insert into a list), de-registration is O(n) (search list and then remove). The 'n' for de-registration, however, is not limited to the size (number of snapshots + 1) of the dsl directory. The eviction portion of the life cycle for the in core state of datasets is asynchronous, which allows multiple copies of the dataset information to be in-core at once. Only one of these copies is active at any time with the rest going through tear down processing, but all copies contribute to the cost of performing a dsl_prop_unregister(). One way to create multiple, in-flight copies of dataset information is by performing "zfs list" operations from multiple threads concurrently. In-core dataset information is loaded on demand and then evicted when reference counts drops to zero. For datasets that are not mounted, there is no persistent reference count to keep them resident. So, a list operation will load them, compute the information required to do the list operation, and then evict them. When performing this operation from multiple threads it is possible that some of the in-core dataset information will be reused, but also possible to lose the race and load the dataset again, even while the same information is being torn down. Compounding the performance issue further is a change made for illumos issue 5056 which made dataset eviction single threaded. In environments using automation to manage ZFS datasets, it is now possible to create enough of a backlog of dataset evictions to consume excessive amounts of kernel memory and to bog down the system. The fix employed here is to make property de-registration O(1). With this change in place, it is hoped that a single thread is more than sufficient to handle eviction processing. If it isn't, the problem can be solved by increasing the number of threads devoted to the eviction taskq. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_prop.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dir.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_prop.h: Associate dsl property callback records with both the dsl directory and the dsl dataset that is registering the callback. Both connections are protected by the dsl directory's "dd_lock". When linking callbacks into a dsl directory, group them by the property type. This helps reduce the space penalty for the double association (the property name pointer is stored once per dsl_dir instead of in each record) and reduces the number of strcmp() calls required to do callback processing when updating a single property. Property types are stored in a linked list since currently ZFS registers a maximum of 10 property types for each dataset. Note that the property buckets/records associated with a dsl directory are created on demand, but only freed when the dsl directory is freed. Given the static nature of property types and their small number, there is no benefit to freeing the few bytes of memory used to represent the property record earlier. When a property record becomes empty, the dsl directory is either going to become unreferenced a little later in this thread of execution, or there is a high chance that another dataset is going to be loaded that would recreate the bucket anyway. Replace dsl_prop_unregister() with dsl_prop_unregister_all(). All callers of dsl_prop_unregister() are trying to remove all property registrations for a given dsl dataset anyway. By changing the API, we can avoid doing any lookups of callbacks by property type and just traverse the list of all callbacks for the dataset and free each one. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c: Replace use of dsl_prop_unregister() with the new dsl_prop_unregister_all() API. illumos/illumos-gate@03bad06fbb Author: Justin Gibbs <gibbs@scsiguy.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Illumos issue: 6171 dsl_prop_unregister() slows down dataset eviction https://www.illumos.org/issues/6171 MFC after: 2 weeks |
||
---|---|---|
.. | ||
boot/zfs | ||
compat/opensolaris | ||
contrib/opensolaris | ||
dev |