Commit Graph

715 Commits

Author SHA1 Message Date
Li-Wen Hsu
070a148127 Add an auxiliary subroutine to generate read(2) event while testing.
Reviewed by:	gnn, ngie
Differential Revision:	https://reviews.freebsd.org/D11673
2017-07-25 13:01:10 +00:00
Li-Wen Hsu
d83c70758a Add a simple script which calls open(2) and others to generate events for
testing.

This test times-out on a quiet system because there is nobody triggers
syscall::open:entry or syscall::: probe while test execution.

Reviewed by:	gnn, markj (earlier version)
Differential Revision:	https://reviews.freebsd.org/D11671
2017-07-25 12:58:03 +00:00
Li-Wen Hsu
b9de3393dd Add a simple program which calls sigtimedwait(2) to generate events for testing
This test timeout on a quiet system because there is nobody triggers
'syscall::*wait*:entry' probe while test execution.

Reviewed by:	gnn, markj, ngie
Differential Revision:	https://reviews.freebsd.org/D11668
2017-07-25 12:52:32 +00:00
Enji Cooper
31ed01a2de Fix whitespace on a line in fix(..) accidentally missed in r321424
MFC after:	1 month
MFC with:	r321424
2017-07-24 17:29:56 +00:00
Enji Cooper
f3305cae02 Style cleanup: delete spurious trailing whitespace
MFC after:	1 month
2017-07-24 17:27:21 +00:00
Enji Cooper
aa52ad5489 Don't use incorrect hardcoded path to ksh -- use /usr/bin/env
to find ksh instead

MFC after:	1 month
2017-07-23 17:57:00 +00:00
Andriy Gapon
f9cdbaba8d MFV r318946: 8021 ARC buf data scatter-ization
illumos/illumos-gate@770499e185
770499e185

https://www.illumos.org/issues/8021
  The ARC buf data project (known simply as "ABD" since its genesis in the ZoL
  community) changes the way the ARC allocates `b_pdata` memory from using linear
  `void *` buffers to using scatter/gather lists of fixed-size 1KB chunks. This
  improves ZFS's performance by helping to defragment the address space occupied
  by the ARC, in particular for cases where compressed ARC is enabled. It could
  also ease future work to allocate pages directly from `segkpm` for minimal-
  overhead memory allocations, bypassing the `kmem` subsystem.
  This is essentially the same change as the one which recently landed in ZFS on
  Linux, although they made some platform-specific changes while adapting this
  work to their codebase:
  1. Implemented the equivalent of the `segkpm` suggestion for future work
  mentioned above to bypass issues that they've had with the Linux kernel memory
  allocator.
  2. Changed the internal representation of the ABD's scatter/gather list so it
  could be used to pass I/O directly into Linux block device drivers. (This
  feature is not available in the illumos block device interface yet.)

FreeBSD notes:
- the actual (default) chunk size is 4KB (despite the text above saying 1KB)
- we can try to reimplement ABDs, so that they are not permanently
  mapped into the KVA unless explicitly requested, especially on
  platforms with scarce KVA
- we can try to use unmapped I/O and avoid intermediate allocation of a
  linear, virtual memory mapped buffer
- we can try to avoid extra data copying by referring to chunks / pages
  in the original ABD

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Dan Kimmel <dan.kimmel@delphix.com>

MFC after:	3 weeks
2017-06-20 17:39:24 +00:00
Andriy Gapon
b8d341fe26 MFV r319945,r319946: 8264 want support for promoting datasets in libzfs_core
illumos/illumos-gate@a4b8c9aa65
a4b8c9aa65

https://www.illumos.org/issues/8264
  Oddly there is a lzc_clone function, but no lzc_promote function.

Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@kebe.com>
Approved by: Dan McDonald <danmcd@kebe.com>
Author: Andrew Stormont <astormont@racktopsystems.com>
MFC after:	1 week
2017-06-14 16:31:36 +00:00
Andriy Gapon
9260925dcd MFV r319740: 8168 NULL pointer dereference in zfs_create()
illumos/illumos-gate@690031d326
690031d326

https://www.illumos.org/issues/8168
  If we manage to export the pool on which we are creating a dataset (filesystem
  or zvol) between entering libzfs`zfs_create() and libzfs`zpool_open() call (for
  which we never check the return value) we end up dereferencing a NULL pointer
  in libzfs`zpool_close().
  This was discovered on ZFS on Linux. The same issue can be reproduced on
  Illumos running in parallel:
    while :; do zpool import -d /tmp testpool ; zpool export testpool ; done
    while :; do zfs create testpool/fs; zfs destroy testpool/fs ; done
  Eventually this will result in several core dumps like this one:
  [root@52-54-00-d3-7a-01 /cores]# mdb core.zfs.4244
  Loading modules: [ libumem.so.1 libc.so.1 libtopo.so.1 libavl.so.1
  libnvpair.so.1 ld.so.1 ]
  > ::stack
  libzfs.so.1`zpool_close+0x17(0, 0, 0, 8047450)
  libzfs.so.1`zfs_create+0x1bb(8090548, 8047e6f, 1, 808cba8)
  zfs_do_create+0x545(2, 8047d74, 80778a0, 801, 0, 3)
  main+0x22c(8047d2c, fef5c6e8, 8047d64, 8055a17, 3, 8047d70)
  _start+0x83(3, 8047e64, 8047e68, 8047e6f, 0, 8047e7b)
  >
  Fix and reproducer (systemtap): https://github.com/zfsonlinux/zfs/pull/6096

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: loli10K <ezomori.nozomu@gmail.com>
MFC after:	2 weeks
2017-06-09 15:30:41 +00:00
Andriy Gapon
ad2b1a296f MFV r319744,r319745: 8269 dtrace stddev aggregation is normalized incorrectly
illumos/illumos-gate@79809f9cf4
79809f9cf4

https://www.illumos.org/issues/8269
  It seems that currently normalization of stddev aggregation is done
  incorrectly.
  We divide both the sum of values and the sum of their squares by the
  normalization factor. But we should divide the sum of squares by the
  normalization factor squared to scale the original values properly.

FreeBSD note: the actual change was committed in r316853, this commit
adds the test files and record merge information.

Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after:	1 week
Sponsored by:	Panzura
2017-06-09 15:16:39 +00:00
Allan Jude
39b0b876dc New sentences start on new lines, fix two violations
Reviewed by:	bcr
Sponsored by:	BSDCan Dev Summit
2017-06-08 01:39:17 +00:00
Allan Jude
dc379eca14 SHA-512 and Skein have been supported by the boot loader for some time.
Submitted by:	lifanov
Reviewed by:	bcr
Sponsored by:	BSDCan Dev Summit
2017-06-08 01:29:24 +00:00
Andriy Gapon
457711b02b MFV r316922: 5380 receive of a send -p stream doesn't need to try renaming snapshots
illumos/illumos-gate@471a88e499
471a88e499

https://www.illumos.org/issues/5380
  A stream created with zfs send -p -I contains properties of all snapshots of a
  given dataset as opposed to only properties of snapshots in a given range.
  Not only this is suboptimal but the receive code also does not filter
  properties by the range. So, properties of earlier snapshots would be updated
  even though the snapshots themselves are not in the stream (just their
  properties).
  Given that modifying the snapshot properties requires a TXG sync and that the
  snapshots are updated one by one the described behavior may lead to a sever
  performance penalty.

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after:	3 weeks
2017-05-24 22:30:21 +00:00
Andriy Gapon
b6be31c7ca MFC r316908: 7541 zpool import/tryimport ioctl returns ENOMEM because provided buffer is too small for config
illumos/illumos-gate@8b65a70b76
8b65a70b76

https://www.illumos.org/issues/7541
  When calling zpool import, zpool does a few ioctls to ZFS.
  zpool allocates a buffer in userland and passes it to the kernel so that ZFS
  can copy info into it. ZFS will use it to put the nvlist that describes the
  pool configuration.
  If the allocated buffer is too small, ZFS will return ENOMEM and the call will
  have to be redone. This wastes CPU time and slows down the import process. This
  happens very often for the ZFS_IOC_POOL_TRYIMPORT call.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
MFC after:	2 weeks
2017-05-24 21:32:35 +00:00
Andriy Gapon
5644318970 MFC r316904: 7729 libzfs_core`lzc_rollback() leaks result nvl
illumos/illumos-gate@ac428481f9
ac428481f9

https://www.illumos.org/issues/7729
  libzfs_core`lzc_rollback() doesn't free the result nvl after lzc_ioctl() call.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>

MFC after:	2 weeks
2017-05-24 20:53:01 +00:00
Andriy Gapon
c65389d367 MFV r316860: 7545 zdb should disable reference tracking
illumos/illumos-gate@4dd77f9e38
4dd77f9e38

https://www.illumos.org/issues/7545
  When evicting from the ARC, we manipulate some refcount_t's, e.g. arcs_size.
  When using zdb to examine a large amount of data (e.g. zdb -bb on a large pool
  with small blocks), the ARC may have a large number of entries. If reference
  tracking is enabled, there will be ~1 reference for each block in the ARC. When
  evicting, we decrement the refcount and have to search all the references to
  find the one that we are removing, which is very slow.
  Since zdb is typically used to find problems with the on-disk format, and not
  with the code it is running, we should disable reference tracking in zdb.

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC after:	2 weeks
2017-05-24 20:41:26 +00:00
Mark Johnston
b4a3f67bd6 Add a little helper program for tst.exitcore.ksh.
sleep(1) is capsicumized, which means that we cannot rely on it to dump
core as required by the test.

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-05-22 20:34:51 +00:00
Alexander Motin
c9bcbf3800 Fix time handling in cv_timedwait_hires().
pthread_cond_timedwait() receives absolute time, not relative.  Passing
wrong time there caused two threads of zdb to spin in a tight loop.

MFC after:	1 week
2017-05-19 05:12:58 +00:00
Josh Paetzel
c78abb8b50 MFV 316894
7252 7628 compressed zfs send / receive

illumos/illumos-gate@5602294fda
5602294fda

https://www.illumos.org/issues/7252
  This feature includes code to allow a system with compressed ARC enabled to
  send data in its compressed form straight out of the ARC, and receive data in
  its compressed form directly into the ARC.

https://www.illumos.org/issues/7628
  We should have longer, more readable versions of the ZFS send / recv options.

7628 create long versions of ZFS send / receive options

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: David Quigley <dpquigl@davequigley.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Dan Kimmel <dan.kimmel@delphix.com>
2017-04-25 17:57:43 +00:00
Josh Paetzel
ef18459108 MFV 316891
7386 zfs get does not work properly with bookmarks

illumos/illumos-gate@edb901aab9
edb901aab9

https://www.illumos.org/issues/7386
  The zfs get command does not work with the bookmark parameter while it works
  properly with both filesystem and snapshot:
  # zfs get -t all -r creation rpool/test
  NAME               PROPERTY  VALUE                  SOURCE
  rpool/test         creation  Fri Sep 16 15:00 2016  -
  rpool/test@snap    creation  Fri Sep 16 15:00 2016  -
  rpool/test#bkmark  creation  Fri Sep 16 15:00 2016  -
  # zfs get -t all -r creation rpool/test@snap
  NAME             PROPERTY  VALUE                  SOURCE
  rpool/test@snap  creation  Fri Sep 16 15:00 2016  -
  # zfs get -t all -r creation rpool/test#bkmark
  cannot open 'rpool/test#bkmark': invalid dataset name
  #
  The zfs get command should be modified to work properly with bookmarks too.

Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Marcel Telka <marcel@telka.sk>
2017-04-21 19:53:52 +00:00
Alan Somers
07bb15b440 MFV 316855
7900 zdb shouldn't print the path of a znode at verbosity < 5

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Alan Somers <asomers@freebsd.org>

illumos/illumos-gate@e548d2fa41
https://www.illumos.org/issues/7900

MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
2017-04-14 16:30:37 +00:00
Andriy Gapon
5ad79d9b20 dtrace: fix normalization of stddev aggregation
To be upstreamed.

Discussed with:	Bryan Cantrill <bryancantrill@gmail.com>
MFC after:	2 weeks
Sponsored by:	Panzura
2017-04-14 15:31:04 +00:00
Pedro F. Giffuni
9e98194646 MFV r316693:
8046 Let calloc() do the multiplication in libzfs_fru_refresh

5697e03e6e

https://www.illumos.org/issues/8046

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author:	Pedro Giffuni <pfg@freebsd.org>

MFC after:	3 days
2017-04-10 22:56:38 +00:00
Alexander Motin
3aef5b286a MFV r315290, r315291: 7303 dynamic metaslab selection
illumos/illumos-gate@8363e80ae7
https://github.com/illumos/illumos-gate/commit/8363e80ae72609660f6090766ca8c2c18

https://www.illumos.org/issues/7303

  This change introduces a new weighting algorithm to improve metaslab selection.
  The new weighting algorithm relies on the SPACEMAP_HISTOGRAM feature. As a result,
  the metaslab weight now encodes the type of weighting algorithm used
  (size-based vs segment-based).

  This also introduce a new allocation tracing facility and two new dcmds to help
  debug allocation problems. Each zio now contains a zio_alloc_list_t structure
  that is populated as the zio goes through the allocations stage. Here's an
  example of how to use the tracing facility:

> c5ec000::print zio_t io_alloc_list | ::walk list | ::metaslab_trace
  MSID    DVA    ASIZE      WEIGHT             RESULT               VDEV
     -      0      400           0    NOT_ALLOCATABLE           ztest.0a
     -      0      400           0    NOT_ALLOCATABLE           ztest.0a
     -      0      400           0             ENOSPC           ztest.0a
     -      0      200           0    NOT_ALLOCATABLE           ztest.0a
     -      0      200           0    NOT_ALLOCATABLE           ztest.0a
     -      0      200           0             ENOSPC           ztest.0a
     1      0      400      1 x 8M            17b1a00           ztest.0a

> 1ff2400::print zio_t io_alloc_list | ::walk list | ::metaslab_trace
  MSID    DVA    ASIZE      WEIGHT             RESULT               VDEV
     -      0      200           0    NOT_ALLOCATABLE           mirror-2
     -      0      200           0    NOT_ALLOCATABLE           mirror-0
     1      0      200      1 x 4M            112ae00           mirror-1
     -      1      200           0    NOT_ALLOCATABLE           mirror-2
     -      1      200           0    NOT_ALLOCATABLE           mirror-0
     1      1      200      1 x 4M            112b000           mirror-1
     -      2      200           0    NOT_ALLOCATABLE           mirror-2

  If the metaslab is using segment-based weighting then the WEIGHT column will
  display the number of segments available in the bucket where the allocation
  attempt was made.

Author: George Wilson <george.wilson@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Chris Siden <christopher.siden@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
2017-03-24 09:37:00 +00:00
Mark Johnston
d935f34b8f Fix memory leaks in error cases in libdtrace.
Submitted by:	Tom Rix <trix@juniper.net>
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D9705
2017-02-23 17:56:24 +00:00
Mark Johnston
74d9553e43 Fix a memory leak in an error case in libctf.
Submitted by:	Tom Rix <trix@juniper.net>
MFC after:	1 week
2017-02-23 17:54:17 +00:00
Mark Johnston
281e4f2dd4 When patching USDT probes, use non-unique names for aliases of weak symbols.
Aliases are normally given names that include a key that's unique for each
input object file. This, for example, ensures that aliases for identically
named local symbols in different object files don't conflict. However, in
some cases the static linker will leave an undefined alias after merging
identical weak symbols, resulting in a link error. A non-unique name allows
the aliases to be merged as well.

PR:		216871
X-MFC With:	r313262
2017-02-10 02:01:32 +00:00
Mark Johnston
35bf9feb41 Search for _DTRACE_VERSION in sys/sdt.h rather than unistd.h.
MFC after:	1 week
2017-02-05 02:45:35 +00:00
Mark Johnston
55c2fd519f Avoid using Sun compiler-specific flags.
MFC after:	1 week
2017-02-05 02:44:48 +00:00
Mark Johnston
273efb05a2 Fix a double free of libelf data buffers in the USDT link code.
libdtrace needs to append to the input object files' string and symbol
tables. Currently it does so by allocating a larger buffer, copying the
existing sections into them, and swapping pointers in the libelf data
descriptors. However, it also frees those buffers when its processing is
complete, which leads to a double free since the elftoolchain libelf
owns them and also frees them in elf_end(3). Instead, free the buffers
originally allocated by libelf.

MFC after:	2 weeks
2017-02-05 02:44:08 +00:00
Mark Johnston
e801af6fba Use PC-relative relocations for USDT probe sites on i386 and amd64.
When recording probe site addresses in the output DOF file, dtrace -G
needs to emit relocations for the .SUNW_dof section in order to obtain
the addresses of functions containing probe sites. DTrace expects the
addresses to be relative to the base address of the final ELF file,
and the amd64 USDT implementation was relying on some unspecified and
incorrect behaviour in the base system GNU ld to achieve this.

This change reimplements the probe site relocation handling to allow
USDT to be used with lld and newer GNU binutils. Specifically, it
makes use of R_X86_64_PC64/R_386_PC32 relocations to obtain the
probe site address relative to the DOF file address, and adds and uses a
new DOF relocation type which computes the final probe site address using
these relative offsets.

Reported by and discussed with:	Rafael Espíndola
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D9374
2017-02-05 02:39:12 +00:00
Mark Johnston
3a3e3279e7 Avoid modifying the object string table when patching USDT probes.
dtrace converts pairs of consecutive underscores in a probe name to dashes.
When dtrace -G processes relocations corresponding to USDT probe sites, it
performs this conversion on the corresponding symbol names prior to looking
up the resulting probe names in the USDT provider definition. However, in
so doing it would actually modify the input object's string table, which
breaks the string suffix merging done by recent binutils. Because we don't
care about the symbol name once the probe site is recorded, just perform the
probe lookup using a temporary copy.

Reported by:	hrs, swills
MFC after:	3 weeks
2016-12-20 18:25:41 +00:00
Mark Johnston
a46000e8ff Consistently print D variable indices in decimal when disassembling.
MFC after:	1 week
2016-12-20 05:45:52 +00:00
Mark Johnston
3c606f671e Use the correct path to date(1).
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2016-12-07 23:38:18 +00:00
Mark Johnston
058f5a9a47 Use the native data model instead of forcing ILP32 in tst.provregex3.ksh.
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2016-12-07 23:37:51 +00:00
Mark Johnston
e0d70fc1dc libdtrace: Don't use a read-only handle for enumerating pid probes.
Enumeration of return probes involves disassembling subroutines in the
target process, and ptrace(2) is currently used to read from the target
process. libproc could read from the backing file instead to avoid this
problem, but in the common case libdtrace will have a writeable handle
on the process anyway. In particular, a writeable handle is needed to list
USDT probes, and libdtrace will cache such a handle for processes that it
controls via dtrace -c and -p.
2016-12-06 04:28:56 +00:00
Pedro F. Giffuni
69718b786d Revert r253678, r253661:
Fix a segfault in ctfmerge(1) due to a bug in GCC.

The change was correct and the bug real, but upstream didn't adopt it
and we want to remain in sync. When/if upstream does something about it
we can bring their version.

The bug in question was fixed in GCC 4.9 which is now the default in
FreeBSD's ports. Our native gcc-4.2, which is still in use in some Tier-2
platforms also has a workaround so no end-user should be harmed by the
revert.
2016-12-03 17:44:43 +00:00
Andriy Gapon
61158a7ce8 MFV r308989: 6428 set canmount=off on unmounted filesystem tries to
unmount children

This is a cosmetic and bookkeeping change as the actual change is
already in FreeBSD.
See r297521, r304520, r308985.
2016-11-24 10:11:09 +00:00
Andriy Gapon
9170c18bb9 revert r304520, set canmount=on is not supposed to mount the filesystem
Not sure where I got the idea that it should.

See https://github.com/openzfs/openzfs/pull/218

Reported by:	mahrens
Pointyhat to:	avg
MFC after:	5 days
2016-11-22 11:44:30 +00:00
Alexander Motin
14b5719f6a After some ZIL changes 6 years ago zil_slog_limit got partially broken
due to zl_itx_list_sz not updated when async itx'es upgraded to sync.
Actually because of other changes about that time zl_itx_list_sz is not
really required to implement the functionality, so this patch removes
some unneeded broken code and variables.

Original idea of zil_slog_limit was to reduce chance of SLOG abuse by
single heavy logger, that increased latency for other (more latency critical)
loggers, by pushing heavy log out into the main pool instead of SLOG. Beside
huge latency increase for heavy writers, this implementation caused double
write of all data, since the log records were explicitly prepared for SLOG.
Since we now have I/O scheduler, I've found it can be much more efficient
to reduce priority of heavy logger SLOG writes from ZIO_PRIORITY_SYNC_WRITE
to ZIO_PRIORITY_ASYNC_WRITE, while still leave them on SLOG.

Existing ZIL implementation had problem with space efficiency when it
has to write large chunks of data into log blocks of limited size. In some
cases efficiency stopped to almost as low as 50%. In case of ZIL stored on
spinning rust, that also reduced log write speed in half, since head had to
uselessly fly over allocated but not written areas. This change improves
the situation by offloading problematic operations from z*_log_write() to
zil_lwb_commit(), which knows real situation of log blocks allocation and
can split large requests into pieces much more efficiently. Also as side
effect it removes one of two data copy operations done by ZIL code WR_COPIED
case.

While there, untangle and unify code of z*_log_write() functions.
Also zfs_log_write() alike to zvol_log_write() can now handle writes crossing
block boundary, that may also improve efficiency if ZPL is made to do that.

Sponsored by:	iXsystems, Inc.
2016-11-17 21:01:27 +00:00
Mark Johnston
375c8b20dc Remove the DTrace printt and typeref actions.
These are FreeBSD-specific and were added in r178576 to provide the ability
to pretty-print instances of compound types. However, the print action has
long since been augmented to provide this functionality with a simpler
interface.

Discussed with:	gnn
Differential Revision:	https://reviews.freebsd.org/D8478
2016-11-12 19:26:12 +00:00
Andriy Gapon
d5315b02cd MFV r308222: 6051 lzc_receive: allow the caller to read the begin record
illumos/illumos-gate@620f322510
620f322510

https://www.illumos.org/issues/6051
  Currently lzc_receive() requires that its snapname argument is a snapshot name
  (contains '@').
  zfs receive allows to specify just a dataset name and would try to deduce the
  snapshot name from the stream.
  I propose to allow lzc_receive() to do the same.
  That seems to be quite easy to implement, it requires only a small amount of
  logic, it does not require any additional system calls or any additional data
  from the stream.
  The benefit is that the new behavior would allow to keep the snapshot names the
  same between the sender and receiver at zero cost, without a need to pass the
  names out of band.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@icyb.net.ua>
MFC after:	2 weeks
2016-11-03 09:24:27 +00:00
Andriy Gapon
97371ba2a9 zfsbootcfg: a simple tool to set next boot (one time) options for zfsboot
(gpt)zfsboot will read one-time boot directives from a special ZFS pool
area.  The area was previously described as "Boot Block Header", but
currently it is know as Pad2, marked as reserved and is zeroed out on
pool creation.  The new code interprets data in this area, if any, using
the same format as boot.config.  The area is immediately wiped out.
Failure to parse the directives results in a reboot right after the
cleanup.  Otherwise the boot sequence proceeds as usual.

zfsbootcfg writes zfsboot arguments specified on its command line to the
Pad2 area of a disk identified by vfs.zfs.boot.primary_pool and
vfs.zfs.boot.primary_vdev kenv variables that are set by loader during
boot.  Please see the manual page for more.

Thanks to all who reviewed, contributed and made suggestions!  There are
many potential improvements to the feature, please see the review for
details.

Reviewed by:	wblock (docs)
Discussed with:	jhb, tsoome
MFC after:	3 weeks
Relnotes:	yes
Differential Revision: https://reviews.freebsd.org/D7612
2016-10-29 14:09:32 +00:00
Mark Johnston
6a4985f61c Fix tst.args1.c on LP64 platforms.
The untyped probe arguments have a width larger than int on such platforms,
so printing their value without a cast can give unexpected results.

MFC after:	1 week
2016-10-16 19:50:10 +00:00
George V. Neville-Neil
956c2a73cd Corrected non-portable reuse of va_list in dt_printf()
Submitted by:	Graeme Jenkinson
Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D8157
2016-10-11 16:12:12 +00:00
Alexander Motin
929d0128f7 MFV r304159: 7277 zdb should be able to print zfs_dbgmsg's
illumos/illumos-gate@29bdd2f916
https://github.com/illumos/illumos-gate/commit/29bdd2f916366ece37c4748bca6b3d61f
57a223b

https://www.illumos.org/issues/7277
  ztest always prints the debug messages (zfs_dbgmsg()) by calling
  zfs_dbgmsg_print(). We should add a flag to zdb to make it do this as well
  before exiting.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2016-09-03 10:07:46 +00:00
Alexander Motin
5535f02daf MFV r303081: 7163 ztest failures due to excess error injection
illumos/illumos-gate@f34284d835
https://github.com/illumos/illumos-gate/commit/f34284d835bc555f987c1310df46c034c
3101155

https://www.illumos.org/issues/7163
  Running zloop from zfs-precommit hit this assertion:
       *panicstr/s
  0xfffffd7fd7419370: assertion failed for thread 0xfffffd7fe29ed240,
  thread-id 577: parent != NULL, file ../../../uts/common/fs/zfs/dbuf.c, line
  1827
       $c
  libc.so.1`_lwp_kill+0xa()
  libc.so.1`_assfail+0x182(fffffd7ffb1c29fa, fffffd7ffb1cc028, 723)
  libc.so.1`assfail+0x19(fffffd7ffb1c29fa, fffffd7ffb1cc028, 723)
  libzpool.so.1`dbuf_dirty+0xc69(10e3bc10, 3601700)
  libzpool.so.1`dbuf_dirty+0x61e(10c73640, 3601700)
  libzpool.so.1`dbuf_dirty+0x61e(10e28280, 3601700)
  libzpool.so.1`dmu_buf_will_fill+0x64(10e28280, 3601700)
  libzpool.so.1`dmu_write+0x1b6(2c7e640, d, 400000002e000000, 200, 3717b40,
  3601700)
  ztest_replay_write+0x568(4950d0, 3717a80, 0)
  ztest_write+0x125(4950d0, d, 400000002e000000, 200, 413f000)
  ztest_io+0x1bb(4950d0, d, 400000002e000000)
  ztest_dmu_write_parallel+0xaa(4950d0, 6)
  ztest_execute+0x83(1, 420c98, 6)
  ztest_thread+0xf4(6)
  libc.so.1`_thrp_setup+0x8a(fffffd7fe29ed240)
  libc.so.1`_lwp_start()
  This is another manifestation of ECKSUM in ztest:
  The lowest level ancestor that’s in memory is the L8 (topmost). The L7
  ancestor is blkid 0x10:
       ::dbufs -O 0x2c7e640 -o d -l 7 |::dbuf
  addr object lvl blkid holds os
  600be50 d 7 4 1 ztest/ds_6
  719d880 d 7 0 4 ztest/ds_6

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
2016-09-03 08:48:51 +00:00
Alexander Motin
1f0bf00253 MFV r303080: 6451 ztest fails due to checksum errors
illumos/illumos-gate@f9eb9fdf19
https://github.com/illumos/illumos-gate/commit/f9eb9fdf196b6ed476e4ffc69cecd8b0d
a3cb7e7

https://www.illumos.org/issues/6451
  Sometimes ztest fails because zdb detects checksum errors. e.g.:
  Traversing all blocks to verify checksums and verify nothing leaked ...
  zdb_blkptr_cb: Got error 50 reading <71, 47, 0, 8000160> DVA0=<0:1cc2000:
  180000> [L0 other uint64[]] sha256 uncompressed LE contiguou
  s unique single size=100000L/100000P birth=271L/271P fill=1
  cksum=c5a3e27d1ed0f894:843bca3a5473c4bf:f76a19b6830a2e4:91292591613a12bf --
  skipping
  zdb_blkptr_cb: Got error 50 reading <71, 47, 0, 800000180> DVA0=<0:ce16800:
  180000> [L0 other uint64[]] sha256 uncompressed LE contigu
  ous unique single size=100000L/100000P birth=840L/840P fill=1
  cksum=5d018f3d061e17f3:6d1584784587bf63:2805a74a0ce37369:ba68a214806c7e75
  -- skipping
  zdb_blkptr_cb: Got error 50 reading <71, 47, 0, 1000000360> DVA0=<0:10d37400:
  180000> [L0 other uint64[]] sha256 uncompressed LE conti
  guous unique single size=100000L/100000P birth=904L/904P fill=1
  cksum=fa1e11d4138bd14b:86c9488c444473e3:f31e43c72e72e46b:e3446472d1174d
  ba -- skipping
  zdb_blkptr_cb: Got error 50 reading <71, 47, 0, 400000002c0> DVA0=<0:127ef400:
  180000> [L0 other uint64[]] sha256 uncompressed LE cont
  iguous dedup single size=100000L/100000P birth=549L/549P fill=1
  cksum=30e14955ebf13522:66dc2ff8067e6810:4607e750abb9d3b3:6582b8af909fcb
  58 -- skipping
  zdb_blkptr_cb: Got error 50 reading <657, 5, 0, 1c0> DVA0=<0:1a180400:180000>
  [L0 other uint64[]] fletcher4 uncompressed LE contiguou
  s unique single size=100000L/100000P birth=1091L/1091P fill=1 cksum=a6cf1e50:
  29b3bd01c57e5:36779b914035db9a:db61cdcf6bec56f0 -- skippin
  g
  The problem is that ztest_fault_inject() can inject multiple faults into the
  same block. It is designed such that it can inject errors on all leafs of a
  RAID-Z or mirror, but for a given range of offsets, it will only inject errors

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Jorgen Lundman <lundman@lundman.net>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>
2016-09-03 08:47:46 +00:00
Alexander Motin
cdd7c5d9b0 MFV r303079:
7147 ztest: ztest_ddt_repair fails with ztest_pattern_match assertion

illumos/illumos-gate@aab8072633
https://github.com/illumos/illumos-gate/commit/aab80726335c76a7cae32c7300890248d
73a51e3

https://www.illumos.org/issues/7147
  Here's the dbuf we're currently reading:
       966f200::dbuf
  addr object lvl blkid holds os
  966f200 4 0 0 1 ztest/ds_3
       966f200::print dmu_buf_t db_data
  db_data = 0x9ae0400
       0x9ae0400/10J
  0x9ae0400: c1c7ced932020d c1c7ced932020d c1c7ced932020d c1c7ced932020d
  c1c7ced932020d c1c7ced932020d c1c7ced932020d c1c7ced932020d
  c1c7ced932020d c1c7ced932020d
  The pattern we're expecting is actually this: a34ae10b5f2db2. If we attempt to
  read the block on disk we find that it has matches what ztest_ddt_repair()
  would have written:
       ~c1c7ced932020d=J
  ff3e383126cdfdf2
       966f200::print dmu_buf_impl_t db_blkptr | ::blkptr
  DVA0=<0:71d3c00:800>
  [L0 UINT64_OTHER] SHA256 OFF LE contiguous dedup single
  size=400L/400P birth=55L/55P fill=1
  cksum=18486450d3ce8c6d:75a72f4bbf117b0f:2d3a226314eb5650:2eb0fd68648b1af0
     1. zdb -U /rpool/tmp/zpool.cache -R ztest 0:71d3c00:800 | head
        Found vdev type: mirror
  0:71d3c00:800
  0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef
  000000: ff3e383126cdfdf2 ff3e383126cdfdf2 ...&18>....&18>.
  000010: ff3e383126cdfdf2 ff3e383126cdfdf2 ...&18>....&18>.
  000020: ff3e383126cdfdf2 ff3e383126cdfdf2 ...&18>....&18>.
  000030: ff3e383126cdfdf2 ff3e383126cdfdf2 ...&18>....&18>.
  000040: ff3e383126cdfdf2 ff3e383126cdfdf2 ...&18>....&18>.
  000050: ff3e383126cdfdf2 ff3e383126cdfdf2 ...&18>....&18>.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: George Wilson <george.wilson@delphix.com>
2016-09-03 08:46:53 +00:00
Alexander Motin
efa0867fb0 MFV r302991: 6950 ARC should cache compressed data
illumos/illumos-gate@dcbf3bd6a1
dcbf3bd6a1

https://www.illumos.org/issues/6950
  When reading compressed data from disk, the ARC should keep the compressed
  block cached and only decompress it when consumers access the block. The
  uncompressed data should be short-lived allowing the ARC to cache a much larger
  amount of data. The DMU would also maintain a smaller cache of uncompressed
  blocks to minimize the impact of decompressing frequently accessed blocks.

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: George Wilson <george.wilson@delphix.com>
2016-09-03 08:30:51 +00:00