Commit Graph

662 Commits

Author SHA1 Message Date
Alexander Motin
843b26d023 8961 SPA load/import should tell us why it failed
illumos/illumos-gate@3ee8c80c74

When we fail to open or import a storage pool, we typically don't get any
additional diagnostic information, just "no pool found" or "can not import".

While there may be no additional user-consumable information, we should at
least make this situation easier to debug/diagnose for developers and support.
For example, we could start by using `zfs_dbgmsg()` to log each thing that we
try when importing, and which things failed. E.g. "tried uberblock of txg X
from label Y of device Z". Also, we could log each of the stages that we go
through in `spa_load_impl()`.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2018-02-21 23:42:30 +00:00
Alexander Motin
223661a5b7 7638 Refactor spa_load_impl into several functions
illumos/illumos-gate@1fd3785ff6

spa_load_impl has grown out of proportions.  It is currently over 700
lines long and makes it very hard to follow or debug the import process
even for experienced ZFS developers.  The objective is to split it up
in a series of well commented functions.

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2018-02-21 23:25:11 +00:00
Alexander Motin
ffaf1cfabc 9018 Replace kmem_cache_reap_now() with kmem_cache_reap_soon()
illumos/illumos-gate@36a64e6284

To prevent kmem_cache reaping from blocking other system resources, turn
kmem_cache_reap_now() (which blocks) into kmem_cache_reap_soon(). Callers
to kmem_cache_reap_soon() should use kmem_cache_reap_active(), which
exploits #9017's new taskq_empty().

Reviewed by: Bryan Cantrill <bryan@joyent.com>
Reviewed by: Dan McDonald <danmcd@joyent.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Yuri Pankov <yuripv@yuripv.net>
Author: Tim Kordas <tim.kordas@joyent.com>
2018-02-21 22:14:19 +00:00
Alexander Motin
81ef5e369c 8809 libzpool should leverage work done in libfakekernel
illumos/illumos-gate@f06dce2c1f

Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Gordon Ross <gordon.w.ross@gmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Andrew Stormont <astormont@racktopsystems.com>
2018-02-21 21:04:46 +00:00
Alexander Motin
ed2ac05a27 8969 Cannot boot from RAIDZ with parity > 1
illumos/illumos-gate@0fb055e81f

At present it is possible to boot from a root pool that is on RAIDZ but not
one that is on RAIDZ2 or RAIDZ3.  This is because, at the time the pool
version is checked to ensure support for dual/triple parity, the uberblock
has not yet been loaded into the SPA and therefore the code determines that
the pool version is too old and returns ENOTSUP.

Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Andy Fiddaman <omnios@citrus-it.co.uk>
2018-02-21 18:09:07 +00:00
Andriy Gapon
86b66fca8b 8520 7198 lzc_rollback_to should support rolling back to origin
illumos/illumos-gate@95643f75d2
95643f75d2

https://www.illumos.org/issues/8520
  lzc_rollback_to() should support rolling back to a clone's origin.
  The current checks in zfs_ioc_rollback() would not allow that because the
  origin snapshot belongs to a different filesystem.
  The overly restrictive check was introduced in 7600, but it was not a
  regression as none of the existing tools provided a way to rollback to the
  origin.

https://www.illumos.org/issues/7198
  EINVAL is returned when a dataset does not have any snapshots, so there is
  nothing to roll back to.
  Although the code in zfs_do_rollback checks for that condition in advance, it's
  still possible that the snapshot(s) gets removed after the check and before the
  rollback sync task is executed.
  At the moment zfs command would crash when that happens.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
2018-02-21 15:10:33 +00:00
Andriy Gapon
02a28ad842 8997 ztest assertion failure in zil_lwb_write_issue
illumos/illumos-gate@f864f99efe
f864f99efe

https://www.illumos.org/issues/8997
  When dmu_tx_assign is called from zil_lwb_write_issue, it's possible
  for either ERESTART or EIO to be returned.
  If ERESTART is returned, this will cause an assertion to fail directly
  in zil_lwb_write_issue, where the code assumes the return value is
  EIO if dmu_tx_assign returns a non-zero value. This can occur if the
  SPA is suspended when dmu_tx_assign is called, and most often occurs
  when running zloop.
  If EIO is returned, this can cause assertions to fail elsewhere in the
  ZIL code. For example, zil_commit_waiter_timeout contains the
  following logic:
    lwb_t *nlwb = zil_lwb_write_issue(zilog, lwb);
    ASSERT3S(lwb->lwb_state, !=, LWB_STATE_OPENED);
  In this case, if dmu_tx_assign returned EIO from within
  zil_lwb_write_issue, the lwb variable passed in will not be issued
  to disk. Thus, it's lwb_state field will remain LWB_STATE_OPENED and
  this assertion will fail. zil_commit_waiter_timeout assumes that after
  it calls zil_lwb_write_issue, the lwb will be issued to disk, and
  doesn't handle the case where this is not true; i.e. it doesn't handle
  the case where dmu_tx_assign returns EIO.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>
2018-02-21 14:33:00 +00:00
Andriy Gapon
d03529fcb8 8731 ASSERT3U(nui64s, <=, UINT16_MAX) fails for large blocks
illumos/illumos-gate@a6c1eb3c08
a6c1eb3c08

https://www.illumos.org/issues/8731
  annotate_ecksum() asserts that nui64s, calculated as nui64s = size / sizeof
  (uint64_t), is not greater than UINT16_MAX.
  This restriction is needed because histograms of incorrectly set and cleared
  bits have 16 bit counters and if the buffer consists of too many 64-bit words,
  then a counter can potentially overflow producing an incorrect result.
  When the largest buffer size was 128KB the greatest value of nui64s was 16K,
  well within the limit.
  But now we have support for large buffers and for buffer sizes of 512KB and
  above the restriction is violated.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
2018-02-21 14:30:34 +00:00
Andriy Gapon
771a769243 8966 Source file zfs_acl.c, function zfs_aclset_common contains a use after end of the lifetime of a local variable
illumos/illumos-gate@82693e09cc
82693e09cc

https://www.illumos.org/issues/8966

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: WHR <msl0000023508@gmail.com>
2018-02-21 14:12:29 +00:00
Alexander Motin
79a23a6944 7614 zfs device evacuation/removal
illumos/illumos-gate@5cabbc6b49

https://www.illumos.org/issues/7614:
This project allows top-level vdevs to be removed from the storage pool with
“zpool remove”, reducing the total amount of storage in the pool. This
operation copies all allocated regions of the device to be removed onto other
devices, recording the mapping from old to new location. After the removal is
complete, read and free operations to the removed (now “indirect”) vdev must
be remapped and performed at the new location on disk. The indirect mapping
table is kept in memory whenever the pool is loaded, so there is minimal
performance overhead when doing operations on the indirect vdev.

The size of the in-memory mapping table will be reduced when its entries
become “obsolete” because they are no longer used by any block pointers in
the pool. An entry becomes obsolete when all the blocks that use it are
freed. An entry can also become obsolete when all the snapshots that
reference it are deleted, and the block pointers that reference it have been
“remapped” in all filesystems/zvols (and clones). Whenever an indirect block
is written, all the block pointers in it will be “remapped” to their new
(concrete) locations if possible. This process can be accelerated by using
the “zfs remap” command to proactively rewrite all indirect blocks that
reference indirect (removed) vdevs.

Note that when a device is removed, we do not verify the checksum of the data
that is copied. This makes the process much faster, but if it were used on
redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy
the wrong data, when we have the correct data on e.g. the other side of the
mirror. Therefore, mirror and raidz devices can not be removed.

Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Prashanth Sreenivasa <pks@delphix.com>
2018-02-18 01:21:52 +00:00
Andriy Gapon
6fd4145b31 8857 zio_remove_child() panic due to already destroyed parent zio
illumos/illumos-gate@d6e1c446d7
d6e1c446d7

https://www.illumos.org/issues/8857
  I had an OS panic on one of our servers:

  ffffff01809128c0 vpanic()
  ffffff01809128e0 mutex_panic+0x58(fffffffffb94c904, ffffff597dde7f80)
  ffffff0180912950 mutex_vector_enter+0x347(ffffff597dde7f80)
  ffffff01809129b0 zio_remove_child+0x50(ffffff597dde7c58, ffffff32bd901ac0,
  ffffff3373370908)
  ffffff0180912a40 zio_done+0x390(ffffff32bd901ac0)
  ffffff0180912a70 zio_execute+0x78(ffffff32bd901ac0)
  ffffff0180912b30 taskq_thread+0x2d0(ffffff33bae44140)
  ffffff0180912b40 thread_start+8()

  It panicked here:
  http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/
  zio.c#430

  pio->io_lock is DEAD, thus a panic. Further analysis shows the "pio"
  (parent zio of "cio") has already been destroyed.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Youzhong Yang <youzhong@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: George Wilson <george.wilson@delphix.com>
2018-02-15 14:34:18 +00:00
Alexander Motin
d0fd40e7c9 8972 zfs holds: In scripted mode, do not pad columns with spaces
illumos/illumos-gate@e9b7d6e7f7

https://www.illumos.org/issues/8972:
'zfs holds -H' does not properly output content in scripted mode. It uses a
tab instead of two spaces, but it still pads column widths with spaces when
it should not.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Allan Jude <allanjude@freebsd.org>
2018-01-22 05:59:48 +00:00
Alexander Motin
2d224f6116 8835 Speculative prefetch in ZFS not working for misaligned reads
illumos/illumos-gate@5cb8d943bc

https://www.illumos.org/issues/8835:
Sequential reads not aligned to block size are not detected by ZFS
prefetcher as sequential, killing prefetch and severely hurting
performance.  It is caused by dmu_zfetch() in case of misaligned
sequential accesses being called with overlap of one block.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Allan Jude <allanjude@freebsd.org>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Alexander Motin <mav@FreeBSD.org>
2018-01-22 05:55:43 +00:00
Alexander Motin
1e2bad5ea0 8652 Tautological comparisons with ZPROP_INVAL
illumos/illumos-gate@4ae5f5f06c

https://www.illumos.org/issues/8652:
Clang and GCC prefer to use unsigned ints to store enums. With Clang, that
causes tautological comparison warnings when comparing a zfs_prop_t or
zpool_prop_t variable to the macro ZPROP_INVAL. It's likely that error
handling code is being silently removed as a result.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Alan Somers <asomers@gmail.com>
2018-01-22 04:48:14 +00:00
Alexander Motin
434e06c6f9 8641 "zpool clear" and "zinject" don't work on "spare" or "replacing" vdevs
illumos/illumos-gate@2ba5f978a4

https://www.illumos.org/issues/8641:
"zpool clear" and "zinject -d" can both operate on specific vdevs, either
leaf or interior. However, due to an oversight, neither works on a "spare"
or "replacing" vdev. For example:

sudo zpool create foo raidz1 c1t5000CCA000081D61d0 c1t5000CCA000186235d0 spare c1t5000CCA000094115d0
sudo zpool replace foo c1t5000CCA000186235d0 c1t5000CCA000094115d0
$ zpool status foo pool: foo
state: ONLINE
scan: resilvered 81.5K in 0h0m with 0 errors on Fri Sep 8 10:53:03 2017
config:

NAME                         STATE     READ WRITE CKSUM
        foo                          ONLINE       0     0     0
          raidz1-0                   ONLINE       0     0     0
            c1t5000CCA000081D61d0    ONLINE       0     0     0
            spare-1                  ONLINE       0     0     0
              c1t5000CCA000186235d0  ONLINE       0     0     0
              c1t5000CCA000094115d0  ONLINE       0     0     0
        spares
          c1t5000CCA000094115d0      INUSE     currently in use
$ sudo zinject -d spare-1 -A degrade foo
cannot find device 'spare-1' in pool 'foo'
$ sudo zpool clear foo spare-1
cannot clear errors for spare-1: no such device in pool

Even though there was nothing to clear, those commands shouldn't have
reported an error. by contrast, trying to clear "raidz1-0" works just fine:
$ sudo zpool clear foo raidz1-0
2018-01-22 04:35:17 +00:00
Alexander Motin
ee700ae0c6 8959 Add notifications when a scrub is paused or resumed
illumos/illumos-gate@301fd1d6f2

Reviewed by: Alek Pinchuk <pinchuk.alek@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Sean Eric Fagan <sef@ixsystems.com>
2018-01-22 04:27:05 +00:00
Alexander Motin
1536455b31 8856 arc_cksum_is_equal() doesn't take into account ABD-logic
illumos/illumos-gate@01a059ee0c

https://www.illumos.org/issues/8856:
arc_cksum_is_equal() calls zio_push_transform() that requires abd_t*
(second arg), but a void* is passed.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Roman Strashkin <roman.strashkin@nexenta.com>
2018-01-22 04:21:55 +00:00
Alexander Motin
4c30fea809 8898 creating fs with checksum=skein on the boot pools fails ungracefully
illumos/illumos-gate@9fa2266d9a

https://www.illumos.org/issues/8898:
# zfs create -o checksum=skein rpool/test
internal error: Result too large
Abort (core dumped)

Not a big deal per se, but should be handled correctly.

Also reported as: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=222199

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
2018-01-21 23:57:41 +00:00
Alexander Motin
adbaf37829 8897 zpool online -e fails assertion when run on non-leaf vdevs
illumos/illumos-gate@9a551dd645

https://www.illumos.org/issues/8897:
# zpool online -e test mirror-1
Assertion failed: nvlist_lookup_string(tgt, "path", &pathname) == 0, file ../common/libzfs_pool.c, line 2558, function zpool_vdev_online
Abort (core dumped)

Not a big deal per se, but should be handled gracefully, same way as 'offline' and 'online' without '-e'.

Also reported as: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221408

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
2018-01-21 23:52:37 +00:00
Alexander Motin
dbee84cc27 8930 zfs_zinactive: do not remove the node if the filesystem is readonly
illumos/illumos-gate@93c618e0f4

https://www.illumos.org/issues/8930:
We normally remove an unlinked node when its last user goes away and the
node becomes inactive. However, we should not do that if the filesystem
is mounted read-only including the case where it has its readonly
property set. The node will remain on the unlinked queue, so it will
not be leaked.

One particular scenario is when we receive an incremental stream into a
mounted read-only filesystem and that stream contains an unlinked file
(still on the unlinked queue). If that file is opened before the
receive and some time later after the receive it becomes inactive we
would remove it and, thus, modify the read-only filesystem. As a
result, the filesystem would diverge from its source and further
incremental receives would not be possible (without forcing a rollback).

Another related scenario, that may or may not be possible depending on an
OS / VFS policy, is when an open file is unlinked, then the filesystem is
remounted read-only, and then the file is closed.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Andriy Gapon <avg@FreeBSD.org>
2018-01-21 23:42:45 +00:00
Alexander Motin
ebd7699264 8909 8585 can cause a use-after-free kernel panic
illumos/illumos-gate@94ddd0900a

https://www.illumos.org/issues/8909:
There's a race condition that exists if `zil_free_lwb` races with either
`zil_commit_waiter_timeout` and/or `zil_lwb_flush_vdevs_done`.

Here's an example panic due to this bug:

> ::status
    debugging crash dump vmcore.0 (64-bit) from ip-10-110-205-40
    operating system: 5.11 dlpx-5.2.2.0_2017-12-04-17-28-32b6ba51fb (i86pc)
    image uuid: 4af0edfb-e58e-6ed8-cafc-d3e9167c7513
    panic message:
    BAD TRAP: type=e (#pf Page fault) rp=ffffff0010555970 addr=60 occurred in module "zfs" due to a NULL pointer dereference
    dump content: kernel pages only

> $c
    zio_shrink+0x12()
    zil_lwb_write_issue+0x30d(ffffff03dcd15cc0, ffffff03e0730e20)
    zil_commit_waiter_timeout+0xa2(ffffff03dcd15cc0, ffffff03d97ffcf8)
    zil_commit_waiter+0xf3(ffffff03dcd15cc0, ffffff03d97ffcf8)
    zil_commit+0x80(ffffff03dcd15cc0, 9a9)
    zfs_write+0xc34(ffffff03dc38b140, ffffff0010555e60, 40, ffffff03e00fb758, 0)
    fop_write+0x5b(ffffff03dc38b140, ffffff0010555e60, 40, ffffff03e00fb758, 0)
    write+0x250(42, fffffd7ff4832000, 2000)
    sys_syscall+0x177()

If there's an outstanding lwb that's in `zil_commit_waiter_timeout`
waiting to timeout, waiting on it's waiter's CV, we must be sure not to
call `zil_free_lwb`. If we end up calling `zil_free_lwb`, then that LWB
may be freed and can result in a use-after-free situation where the
stale lwb pointer stored in the `zil_commit_waiter_t` structure of the
thread waiting on the waiter's CV is used.

A similar situation can occur if an lwb is issued to disk, and thus in
the `LWB_STATE_ISSUED` state, and `zil_free_lwb` is called while the
disk is servicing that lwb. In this situation, the lwb will be freed by
`zil_free_lwb`, which will result in a use-after-free situation when the
lwb's zio completes, and `zil_lwb_flush_vdevs_done` is called.

This race condition is prevented in `zil_close` by calling `zil_commit`
before `zil_free_lwb` is called, which will ensure all outstanding (i.e.
all lwb's in the `LWB_STATE_OPEN` and/or `LWB_STATE_ISSUED` states)
reach the `LWB_STATE_DONE` state before the lwb's are freed
(`zil_commit` will not return untill all the lwb's are
`LWB_STATE_DONE`).

Further, this race condition is prevented in `zil_sync` by only calling
`zil_free_lwb` for lwb's that do not have their `lwb_buf` pointer set.
All lwb's not in the `LWB_STATE_DONE` state will have a non-null value
for this pointer; the pointer is only cleared in
`zil_lwb_flush_vdevs_done`, at which point the lwb's state will be
changed to `LWB_STATE_DONE`.

This race is present in `zil_suspend`, leading to this bug.

At first glance, it would appear as though this would not be true
because `zil_suspend` will call `zil_commit`, just like `zil_close`, but
the problem is that `zil_suspend` will set the zilog's `zl_suspend`
field prior to calling `zil_commit`. Further, in `zil_commit`, if
`zl_suspend` is set, `zil_commit` will take a special branch of logic
and use `txg_wait_synced` instead of performing the normal `zil_commit`
logic.

This call to `txg_wait_synced` might be good enough for the data to
reach disk safely before it returns, but it does not ensure that all
outstanding lwb's reach the `LWB_STATE_DONE` state before it returns.
This is because, if there's an lwb "stuck" in
`zil_commit_waiter_timeout`, waiting for it's lwb to timeout, it will
maintain a non-null value for it's `lwb_buf` field and thus `zil_sync`
will not free that lwb. Thus, even though the lwb's data is already on
disk, the lwb will be left lingering, waiting on the CV, and will
eventually timeout and be issued to disk even though the write is
unnesseary.

So, after `zil_commit` is called from `zil_suspend`, we incorrectly
assume that there are not outstanding lwb's, and proceed to free all
lwb's found on the zilog's lwb list. As a result, we free the lwb that
will later be used `zil_commit_waiter_timeout`.

Reviewed by: John Kennedy <jwk404@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>
2018-01-21 23:12:38 +00:00
Alexander Motin
d216de62d1 8603 rename zilog's "zl_writer_lock" to "zl_issuer_lock"
illumos/illumos-gate@cf07d3da99

https://www.illumos.org/issues/8603:
  To help make the ZIL's code more understandable, it was suggested that
  the zilog_t's "zl_writer_lock" field should be renamed to "zl_issuer_lock".
2018-01-21 23:04:25 +00:00
Alexander Motin
619fd3c317 8677 Open-Context Channel Programs
illumos/illumos-gate@a3b2868063

https://www.illumos.org/issues/8677
  We want to be able to run channel programs outside of synching context.
  This would greatly improve performance of channel program that just gather
  information, as we won't have to wait for synching context anymore.

  This feature should introduce the following:
  - A new command line flag in "zfs program" to specify our intention to
  run in open context.
  - A new flag/option within the channel program ioctl which selects the
  context.
  - Appropriate error handling whenever we try a channel program in
  open-context that contains zfs.sync* expressions.
  - Documentation for the new feature in the manual pages.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author:	Serapheim Dimitropoulos <serapheim@delphix.com>
2018-01-21 19:26:38 +00:00
Mark Johnston
6ff83134aa 8880 improve DTrace error checking
illumos/illumos-gate@2cf374268f
2cf374268f

https://www.illumos.org/issues/8880

Reviewed by: Tim Kordas <tim.kordas@joyent.com>
Reviewed by: Bryan Cantrill <bryan@joyent.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Jerry Jelinek <jerry.jelinek@joyent.com>
2017-12-12 00:51:39 +00:00
Andriy Gapon
0a98af26af 7531 Assign correct flags to prefetched buffers
illumos/illumos-gate@2729521654
2729521654

https://www.illumos.org/issues/7531
  I found that some buffers that could be L2ARC eligible are not flagged
  such, leading to some performance impact.  As a test I ran the same IO
  workload 10 times in a raw.  It is a metadata only workload (files
  listing).  l2arc_noprefetch=0.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: benrubson <ben.rubson@gmail.com>

MFC after:	8 days
2017-11-09 18:21:17 +00:00
Andriy Gapon
e36f861978 8607 zfs: variable set but not used
illumos/illumos-gate@b852c2f543
b852c2f543

https://www.illumos.org/issues/8607

Reviewed by: Yuri Pankov <yuripv@gmx.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Toomas Soome <tsoome@me.com>
2017-11-09 18:13:26 +00:00
Andriy Gapon
98b9b8cb4b 8713 Buffer overflow in dsl_dataset_name()
illumos/illumos-gate@f37ae9a714
f37ae9a714

https://www.illumos.org/issues/8713
  If we're creating a pool with version >= SPA_VERSION_DSL_SCRUB (v11) we need to
  account for additional space needed by the origin dataset which will also be
  snapshotted: "poolname"+"/"+"$ORIGIN"+"@"+"$ORIGIN".
  Enforce this limit in pool_namecheck().

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: loli10K <ezomori.nozomu@gmail.com>
2017-11-09 18:06:18 +00:00
Andriy Gapon
ae99a88d67 follow up to r325013, add libcmdutils.h to the vendor area 2017-10-27 11:26:36 +00:00
Andriy Gapon
fcf62a2809 640 number_to_scaled_string is duplicated in several commands
illumos/illumos-gate@0a0551200e
0a0551200e

https://www.illumos.org/issues/640
  du(1), df(1m), ls(1), and swap(1m) all include a copy (it appears literally
  copied) of the 'number_to_scaled_string' function in their source. This should
  be moved to a shared library and all 4 commands should use this instead.

Reviewed by: Sebastian Wiedenroth <wiedi@frubar.net>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Yuri Pankov <yuripv@gmx.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Jason King <jason.brian.king@gmail.com>
2017-10-26 16:20:47 +00:00
Andriy Gapon
e0cb71a82f 8081 Compiler warnings in zdb
illumos/illumos-gate@3f7978d02b
3f7978d02b

https://www.illumos.org/issues/8081
  zdb(8) is full of minor problems that generate compiler warnings. On FreeBSD,
  which uses -WError, the only way to build it is to disable all compiler
  warnings. This makes it much harder to detect newly introduced bugs. We should
  cleanup all the warnings.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Alan Somers <asomers@gmail.com>
2017-10-02 11:55:11 +00:00
Andriy Gapon
23cfd1c3cf 8648 Fix range locking in ZIL commit codepath
illumos/illumos-gate@42b1411172
42b1411172

https://www.illumos.org/issues/8648
  I'm opening this bug to track integration of the following ZFS on Linux
  commit into illumos:

  commit f763c3d1df
  Author: LOLi <loli10K@users.noreply.github.com>
  Date:   Mon Aug 21 17:59:48 2017 +0200

      Fix range locking in ZIL commit codepath

      Since OpenZFS 7578 (1b7c1e5) if we have a ZVOL with logbias=throughput
      we will force WR_INDIRECT itxs in zvol_log_write() setting itx->itx_lr
      offset and length to the offset and length of the BIO from
      zvol_write()->zvol_log_write(): these offset and length are later used
      to take a range lock in zillog->zl_get_data function: zvol_get_data().

      Now suppose we have a ZVOL with blocksize=8K and push 4K writes to
      offset 0: we will only be range-locking 0-4096. This means the
      ASSERTion we make in dbuf_unoverride() is no longer valid because now
      dmu_sync() is called from zilog's get_data functions holding a partial
      lock on the dbuf.

      Fix this by taking a range lock on the whole block in zvol_get_data().

      Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
      Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
      Signed-off-by: loli10K <ezomori.nozomu@gmail.com>

Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Alexander Motin <mav@FreeBSD.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: LOLi <loli10K@users.noreply.github.com>
2017-09-22 08:23:24 +00:00
Andriy Gapon
1d08b8472b 8661 remove "zil-cw2" dtrace probe
illumos/illumos-gate@bd9d3f9046
bd9d3f9046

https://www.illumos.org/issues/8661
  The "zil-cw1" dtrace probe was previously removed in 8558, and the "zil-cw2"
  probe should have been removed in that patch as well. Unfortunately, the "zil-
  cw2" was not removed in 8558, so this bug is to track it's removal.

Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>
2017-09-22 08:18:49 +00:00
Andriy Gapon
a373b4201a 8600 ZFS channel programs - snapshot
illumos/illumos-gate@2840dce1a0
2840dce1a0

https://www.illumos.org/issues/8600
  ZFS channel programs should be able to create snapshots.
  In addition to the base snapshot functionality, this will likely entail adding
  extra logic to handle edge cases which were formerly not possible, such as
  creating then destroying a snapshot in the same transaction sync.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Chris Williamson <chris.williamson@delphix.com>
2017-09-22 08:18:05 +00:00
Andriy Gapon
52aa70fa59 8592 ZFS channel programs - rollback
illumos/illumos-gate@000cce6b6f
000cce6b6f

https://www.illumos.org/issues/8592
  ZFS channel programs should be able to perform a rollback. This logic will
  probably look pretty similar to zfs.sync.destroy().

Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Brad Lewis <brad.lewis@delphix.com>
2017-09-22 08:15:35 +00:00
Andriy Gapon
bd021dbd51 8502 illumos#7955 broke delegated datasets when libshare is not present
illumos/illumos-gate@1c18e8fbd8
1c18e8fbd8

https://www.illumos.org/issues/8502
  The code in lib/libzfs/common/libzfs_mount.c already basically handles
  the case when libshare is not installed. We just need to not fail in
  zfs_init_libshare_impl.  I tested this in lx and things work as
  expected. I also tested there trying to set sharenfs and sharesmb on
  the delegated dataset. Neither is allowed from within a zone.  The
  spew of msgs from a native zone is not ZFS specific. I see the same
  spew simply running the share command.

Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Yuri Pankov <yuripv@gmx.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Jerry Jelinek <jerry.jelinek@joyent.com>
2017-09-22 08:13:09 +00:00
Andriy Gapon
67effa3d26 8604 Avoid unnecessary work search in VFS when unmounting snapshots
illumos/illumos-gate@ed992b0aac
ed992b0aac

https://www.illumos.org/issues/8604
  Every time we want to unmount a snapshot (happens during snapshot deletion or
  renaming) we unnecessarily iterate through all the mountpoints in the VFS layer
  (see zfs_get_vfs).
  Ideally we would just put a hold on the snapshot and access its respective VFS
  resource directly.
  gwilson_snap_unmount.svg - Flamegraph indicating the issue discussed (138 KB)
  Serapheim Dimitropoulos, 2017-09-14 06:36 PM

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
2017-09-20 07:28:18 +00:00
Andriy Gapon
de90fd2168 8605 zfs channel programs: zfs.exists undocumented and non-working
illumos/illumos-gate@5f39f884e2
5f39f884e2

https://www.illumos.org/issues/8605
  zfs.exists() in channel programs doesn't return any result, and should have a
  man page entry.

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Chris Williamson <chris.williamson@delphix.com>
2017-09-20 07:27:45 +00:00
Andriy Gapon
7257b152dc 8602 remove unused "dp_early_sync_tasks" field from "dsl_pool" structure
illumos/illumos-gate@2bcb545854
2bcb545854

https://www.illumos.org/issues/8602
  When I landed the fix for 8558, I incorrectly added the "dp_early_sync_tasks"
  field to the "dsl_pool" structure. This field is used in DelphixOS, but not in
  illumos. It was incorrectly pulled into illumos, so this bug is to remove it
  from the structure.

Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>
2017-09-20 07:24:57 +00:00
Andriy Gapon
035e679e27 8567 Inconsistent return value in zpool_read_label
illumos/illumos-gate@c861bfbd77
c861bfbd77

https://www.illumos.org/issues/8567
  If fstat64 fails, pread64 fails, or the label is unintelligible,
  zpool_read_label will return 0. But if malloc fails, it will return -1. For
  consistency, it should always return -1 on failure or 0 on success.

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Alan Somers <asomers@gmail.com>
2017-09-20 07:18:09 +00:00
Andriy Gapon
c014f2f95b 8473 scrub does not detect errors on active spares
illumos/illumos-gate@554675eee7
554675eee7

https://www.illumos.org/issues/8473
  Scrubbing is supposed to detect and repair all errors in the pool. However, it
  wrongly ignores active spare devices. The problem can easily be reproduced in
  OpenZFS at git rev 0ef125d with these commands:

  truncate -s 64m /tmp/a /tmp/b /tmp/c
  sudo zpool create testpool mirror /tmp/a /tmp/b spare /tmp/c
  sudo zpool replace testpool /tmp/a /tmp/c
  /bin/dd if=/dev/zero bs=1024k count=63 oseek=1 conv=notrunc of=/tmp/c
  sync
  sudo zpool scrub testpool
  zpool status testpool # Will show 0 errors, which is wrong
  sudo zpool offline testpool /tmp/a
  sudo zpool scrub testpool
  zpool status testpool # Will show errors on /tmp/c, which should've already been fixed

  FreeBSD head is partially affected: the first scrub will detect some errors,
  but the second scrub will detect more.

Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Alan Somers <asomers@gmail.com>
2017-09-20 06:34:48 +00:00
Andriy Gapon
f3dbcb8c81 8585 improve batching done in zil_commit()
illumos/illumos-gate@1271e4b10d
1271e4b10d

https://www.illumos.org/issues/8585
  The current implementation of zil_commit() can introduce significant
  latency, beyond what is inherent due to the latency of the underlying
  storage. The additional latency comes from two main problems:
  1. When there's outstanding ZIL blocks being written (i.e. there's
      already a "writer thread" in progress), then any new calls to
      zil_commit() will block waiting for the currently oustanding ZIL
      blocks to complete. The blocks written for each "writer thread" is
      coined a "batch", and there can only ever be a single "batch" being
      written at a time. When a batch is being written, any new ZIL
      transactions will have to wait for the next batch to be written,
      which won't occur until the current batch finishes.
  As a result, the underlying storage may not be used as efficiently
      as possible. While "new" threads enter zil_commit() and are blocked
      waiting for the next batch, it's possible that the underlying
      storage isn't fully utilized by the current batch of ZIL blocks. In
      that case, it'd be better to allow these new threads to generate
      (and issue) a new ZIL block, such that it could be serviced by the
      underlying storage concurrently with the other ZIL blocks that are
      being serviced.
  2. Any call to zil_commit() must wait for all ZIL blocks in its "batch"
      to complete, prior to zil_commit() returning. The size of any given
      batch is proportional to the number of ZIL transaction in the queue
      at the time that the batch starts processing the queue; which
      doesn't occur until the previous batch completes. Thus, if there's a
      lot of transactions in the queue, the batch could be composed of
      many ZIL blocks, and each call to zil_commit() will have to wait for
      all of these writes to complete (even if the thread calling
      zil_commit() only cared about one of the transactions in the batch).

Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>
2017-09-13 10:59:36 +00:00
Andriy Gapon
f8453396dc 8590 memory leak in dsl_destroy_snapshots_nvl()
illumos/illumos-gate@e6ab4525d1
e6ab4525d1

https://www.illumos.org/issues/8590
  In dsl_destroy_snapshots_nvl(), "snaps_normalized" is not freed after it is
  added to "arg".

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
2017-09-13 10:57:52 +00:00
Andriy Gapon
46077ff497 8552 ZFS LUA code uses floating point math
illumos/illumos-gate@916c8d8811
916c8d8811

https://www.illumos.org/issues/8552
  In the LUA interpreter used by "zfs program", the lua format() function
  accidentally includes support for '%f' and friends, which can cause compilation
  problems when building on platforms that don't support floating-point math in
  the kernel (e.g. sparc). Support for '%f' friends (%f %e %E %g %G) should be
  removed, since there's no way to supply a floating-point value anyway (all
  numbers in ZFS LUA are int64_t's).

Reviewed by: Yuri Pankov <yuripv@gmx.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
2017-09-13 10:56:19 +00:00
Andriy Gapon
dd976ed5e4 8521 nvlist memory leak in get_clones_stat() and spa_load_best()
illumos/illumos-gate@7d3000f774
7d3000f774

https://www.illumos.org/issues/8521
  Yuri reported this to the mailing list:
  doing a `reboot -d` on current illumos-gate HEAD gives the following "::
  findleaks -dv" output:
  findleaks: maximum buffers => 301061
  findleaks: actual buffers => 297587
  findleaks:
  findleaks: potential pointers => 29289774
  findleaks: dismissals => 26242305 (89.5%)
  findleaks: misses => 331153 ( 1.1%)
  findleaks: dups => 2419681 ( 8.2%)
  findleaks: follows => 296635 ( 1.0%)
  findleaks:
  findleaks: peak memory usage => 7353 kB
  findleaks: elapsed CPU time => 1.5 seconds
  findleaks: elapsed wall time => 2.0 seconds
  findleaks:
  CACHE LEAKED BUFCTL CALLER
  ffffff03d222b008 120 ffffff03ef7ceb78 nv_alloc_sys+0x1f
  ffffff03d222a448 123 ffffff03f4150cc8 nv_alloc_sys+0x1f
  ffffff03d222b448 5 ffffff03f28bd598 nv_alloc_sys+0x1f
  ffffff03d222b888 87 ffffff03f28c10f0 nv_alloc_sys+0x1f
  ffffff03d222c008 21 ffffff03f4139310 nv_alloc_sys+0x1f
  ffffff03d222b888 43 ffffff040ef3f3e8 nv_alloc_sys+0x1f
  ffffff03d222c008 120 ffffff03f4591e58 nv_alloc_sys+0x1f
  ffffff03d222b008 121 ffffff03f352c068 nv_alloc_sys+0x1f
  ffffff03d222a448 112 ffffff03f414e5f8 nv_alloc_sys+0x1f
  ffffff03d222b008 119 ffffff03ee92fdc0 nv_alloc_sys+0x1f
  ffffff03d222b888 46 ffffff03f28c1378 nv_alloc_sys+0x1f
  ffffff03d222b448 4 ffffff03f28c7708 nv_alloc_sys+0x1f
  ffffff03d222c008 20 ffffff03f2a6e7e8 nv_alloc_sys+0x1f

Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Yuri Pankov <yuripv@gmx.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2017-09-13 10:54:56 +00:00
Andriy Gapon
bdc44f62ff 7431 ZFS Channel Programs
illumos/illumos-gate@dfc115332c
dfc115332c

https://www.illumos.org/issues/7431
  ZFS channel programs (ZCP) adds support for performing compound ZFS
  administrative actions via Lua scripts in a sandboxed environment (with time
  and memory limits).
  This initial commit includes both base support for running ZCP scripts, and a
  small initial library of API calls which support getting properties and
  listing, destroying, and promoting datasets.
  Testing: in addition to the included unit tests, channel programs have been in
  use at Delphix for several months for batch destroying filesystems. The
  dsl_destroy_snaps_nvl() call has also been replaced with

  For reference, the new zfs-program manpage is included below.
  ZFS-PROGRAM(1M)                       1M                       ZFS-PROGRAM(1M)

  NAME
       zfs program – executes ZFS channel programs

  SYNOPSIS
       zfs program [-t timeout] [-m memory-limit] pool script

  DESCRIPTION
       The ZFS channel program interface allows ZFS administrative operations to
       be run programmatically as a Lua script. The entire script is executed
       atomically, with no other administrative operations taking effect
       concurrently. A library of ZFS calls is made available to channel program
       scripts. Channel programs may only be run with root privileges.

       A modified version of the Lua 5.2 interpreter is used to run channel
       program scripts. The Lua 5.2 manual can be found at:

             http://www.lua.org/manual/5.2/
  ...

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Chris Williamson <chris.williamson@delphix.com>
2017-09-13 10:45:49 +00:00
Andriy Gapon
9cbb4e8136 5115 Want Intel 40GbE NIC driver for illumos
illumos/illumos-gate@9d26e4fc02
9d26e4fc02

https://www.illumos.org/issues/5115
  Intel's NICs based on the XL710 chipset exist 1 . There exist drivers for Linux
  and FreeBSD 2 .
  It does not appear to be derived from the ixgbe driver source, so it would
  probably require porting the i40e source from FBSD to Illumos, unless a driver
  exists for a GLDv3-like platform under CDDL or similar license (none are known
  to currently be available or being worked on).

Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Robert Mustacchi <rm@joyent.com>

Note: this change has nothing to do with FreeBSD.
2017-09-13 10:41:47 +00:00
Andriy Gapon
af2da9fb2e 5815 libzpool's panic function doesn't set global panicstr, ::status not as useful
illumos/illumos-gate@fae6347731
fae6347731

https://www.illumos.org/issues/5815
  When panic() is called from within ztest, the mdb ::status command isn't as
  useful as it could be since the global panicstr variable isn't updated. We
  should modify the function to make sure panicstr is set, so ::status can
  present the error message just like it does on a failed assertion.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Reviewed by: Rich Lowe <richlowe@richlowe.net>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Prakash Surya <prakash.surya@delphix.com>
2017-09-13 10:33:09 +00:00
Andriy Gapon
9c9dc52f25 8376 cached v_path should be kept fresh
illumos/illumos-gate@e2fc3408ef
e2fc3408ef

https://www.illumos.org/issues/8376
  The logic for generating and maintaining the cached v_path value on vnodes
  could stand to be improved. If vnodes were purely ephemeral, then freshly
  calculating v_path at the time of lookup() would result in correct values (at a
  performance cost). When they persist, either as referenced by other structures
  (such as open files, process cwd, dnlc entries, etc), the opportunity for the
  v_path to become stale arises. This is exacerbated by the current behavior
  that, when v_path is found to be invalid (during a vnodetopath operation) will
  strive to recalculate it, but not preserve the result. The overall situation
  leads to both performance and correctness (due to lack of results) problems
  relating to v_path.
  This has been addressed in SmartOS through a series of changes. Firstly, to do
  proper invalidation of v_path when it's found to be stale:
  - OS-3891 stale v_path slows vfs lookups
  OS-3891 revealed that some logic made assumptions about v_path never
  transitioning from non-NULL to NULL. It was addressed here:
  - OS-4317 v_path accesses can race
  While the pathological stale v_path behavior had been addressed, there are
  still cases where the absence of valid v_path information was causing problems.
  The largest patch in this series addressed it by performing v_path checking and
  updates during vnode lookups/updates, when it is most convenient:
  - OS-5167 cached v_path should be kept fresh
  Two smaller updates are included too, to prevent erroneous behavior introduced
  by the prior changes:
  - OS-5846 procfs should follow VFS rules
  - OS-6134 vn_reinit balks on zeroed vnodes

Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Patrick Mooney <pmooney@pfmooney.com>
2017-09-13 10:25:44 +00:00
Andriy Gapon
b804e37054 8331 zfs_unshare returns wrong error code for smb unshare failure
illumos/illumos-gate@4f4378cc54
4f4378cc54

https://www.illumos.org/issues/8331
  zfs_unshare returns EZFS_UNSHARENFSFAILED on error for all share types.

Reviewed by: Marcel Telka <marcel@telka.sk>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Andrew Stormont <astormont@racktopsystems.com>
2017-09-13 10:17:14 +00:00
Andriy Gapon
03ad5aa4c1 8569 problem with inline functions in abd.h
illumos/illumos-gate@37e84ab74e
37e84ab74e

https://www.illumos.org/issues/8569
  C [C99] has peculiar rules for inline functions that are different from the
  C++ rules.  Unlike C++ where inline is "fire and forget", in C a programmer
  must pay attention to the function's storage class / visibility.  The main
  problem is with the case where a compiler decides to not inline a call to the
  function declared as inline.
  Some relevant links:
  - http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15831.html
  - http://www.drdobbs.com/the-new-c-inline-functions/184401540
  The summary is that either the inline functions should be declared 'static
  inline' or one of the compilation units (.c files) must provide a callable
  externally visible function definition.  In the former case, the compiler would
  automatically create a local non-inlined function instance in every compilation
  unit where it's needed.  In the latter case the single external definition is
  used to satisfy any non-inlined calls in all compilation units.  As things
  stand right now, we can get an undefined reference error under certain
  combinations of compilers and compiler options.  For example, this is what I
  get on FreeBSD when compiling with clang 4.0.0 and -O1:
    In function `abd_free': /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c:385:
    undefined reference to `abd_is_linear'
  So, there are two alternatives. Either to qualify each inline function in
  abd.h with static storage class.  Or to add declarations like the following to
  abd.c: extern inline boolean_t abd_is_linear(abd_t *abd); Both work. I am not
  sure which one would be more consistent with the illumos development rules.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
2017-09-01 18:02:53 +00:00