Commit Graph

2 Commits

Author SHA1 Message Date
Olaf Faaland
448d7aaabc Identify locks flagged by lockdep
When running a kernel with CONFIG_LOCKDEP=y, lockdep reports possible
recursive locking in some cases and possible circular locking dependency
in others, within the SPL and ZFS modules.

This patch uses a mutex type defined in SPL, MUTEX_NOLOCKDEP, to mark
such mutexes when they are initialized.  This mutex type causes
attempts to take or release those locks to be wrapped in lockdep_off()
and lockdep_on() calls to silence the dependency checker and allow the
use of lock_stats to examine contention.

For RW locks, it uses an analogous lock type, RW_NOLOCKDEP.

The goal is that these locks are ultimately changed back to type
MUTEX_DEFAULT or RW_DEFAULT, after the locks are annotated to reflect
their relationship (e.g. z_name_lock below) or any real problem with the
lock dependencies are fixed.

Some of the affected locks are:

tc_open_lock:
=============
This is an array of locks, all with same name, which txg_quiesce must
take all of in order to move txg to next state.  All default to the same
lockdep class, and so to lockdep appears recursive.

zp->z_name_lock:
================
In zfs_rmdir,
        dzp = znode for the directory (input to zfs_dirent_lock)
        zp  = znode for the entry being removed (output of zfs_dirent_lock)

zfs_rmdir()->zfs_dirent_lock() takes z_name_lock in dzp
zfs_rmdir() takes z_name_lock in zp

Since both dzp and zp are type znode_t, the locks have the same default
class, and lockdep considers it a possible recursive lock attempt.

l->l_rwlock:
============
zap_expand_leaf() sometimes creates two new zap leaf structures, via
these call paths:

zap_deref_leaf()->zap_get_leaf_byblk()->zap_leaf_open()
zap_expand_leaf()->zap_create_leaf()->zap_expand_leaf()->zap_create_leaf()

Because both zap_leaf_open() and zap_create_leaf() initialize
l->l_rwlock in their (separate) leaf structures, the lockdep class is
the same, and the linux kernel believes these might both be the same
lock, and emits a possible recursive lock warning.

Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3895
2015-12-22 10:21:33 -08:00
Prakash Surya
ca0bf58d65 Illumos 5497 - lock contention on arcs_mtx
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Dan McDonald <danmcd@omniti.com>

Porting notes and other significant code changes:

The illumos 5368 patch (ARC should cache more metadata), which
was never picked up by ZoL, is mostly reverted by this patch.

Since ZoL relies on the kernel asynchronously calling the shrinker to
actually reap memory, the shrinker wakes up arc_reclaim_waiters_cv every
time it runs.

The arc_adapt_thread() function no longer calls arc_do_user_evicts()
since the newly-added arc_user_evicts_thread() calls it periodically.

Notable conflicting ZoL commits which conflicted with this patch or
whose effects are either duplicated or un-done by this patch:

    302f753 - Integrate ARC more tightly with Linux
    39e055c - Adjust arc_p based on "bytes" in arc_shrink
    f521ce1 - Allow "arc_p" to drop to zero or grow to "arc_c"
    77765b5 - Remove "arc_meta_used" from arc_adjust calculation
    94520ca - Prune metadata from ghost lists in arc_adjust_meta

Trace support for multilist_insert() and multilist_remove() has been
added and produces the following output:

    fio-12498 [077] .... 112936.448324: zfs_multilist__insert: ml { offset 240 numsublists 80 sublistidx 63 }
    fio-12498 [077] .... 112936.448347: zfs_multilist__remove: ml { offset 240 numsublists 80 sublistidx 29 }

The following arcstats have been removed:

    recycle_miss - Used by arcstat.py and arc_summary.py, both of which
    have been updated appropriately.

    l2_writes_hdr_miss

The following arcstats have been added:

    evict_not_enough - Number of times arc_evict_state() was unable to
    evict enough buffers to reach its target amount.

    evict_l2_skip - Number of times arc_evict_hdr() skipped eviction
    because it was being written to the l2arc.

    l2_writes_lock_retry - Replaces l2_writes_hdr_miss.  Number of times
    l2arc_write_done() failed to acquire hash_lock (and re-tries).

    arc_meta_min - Shows the value of the zfs_arc_meta_min module
    parameter (see below).

The "index" column of the "dbuf" kstat has been removed since it doesn't
have a direct analog in the new multilist scheme.  Additional multilist-
related stats could be added in the future but would likely require
extensions to the mulilist API.

The following module parameters have been added:

    zfs_arc_evict_batch_limit - Number of ARC headers to free per sub-list
    before moving on to the next sub-list.

    zfs_arc_meta_min - Enforce a floor on the amount of metadata in
    the ARC.

    zfs_arc_num_sublists_per_state - Number of multilist sub-lists per
    ARC state.

    zfs_arc_overflow_shift - Controls amount by which the ARC must exceed
    the target size to be considered "overflowing".

Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov
2015-06-11 10:27:25 -07:00