Commit Graph

229419 Commits

Author SHA1 Message Date
Poul-Henning Kamp
fc62b7e5de Add a skeleton Clock Manager for RPi2/3, and use that from pwm
instead of frobbing the registers directly.

As a hack the bcm2835_pwm kmod presently ignores the 'status="disabled"'
in the RPI3 DTB, assuming that if you load the kld you probably
want the PWM to work.
2018-01-22 07:10:30 +00:00
Alexander Motin
0d6993dfd3 MFV r328255: 8972 zfs holds: In scripted mode, do not pad columns with spaces
illumos/illumos-gate@e9b7d6e7f7

https://www.illumos.org/issues/8972:
'zfs holds -H' does not properly output content in scripted mode. It uses a
tab instead of two spaces, but it still pads column widths with spaces when
it should not.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Allan Jude <allanjude@freebsd.org>
2018-01-22 06:00:45 +00:00
Alexander Motin
d0fd40e7c9 8972 zfs holds: In scripted mode, do not pad columns with spaces
illumos/illumos-gate@e9b7d6e7f7

https://www.illumos.org/issues/8972:
'zfs holds -H' does not properly output content in scripted mode. It uses a
tab instead of two spaces, but it still pads column widths with spaces when
it should not.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Allan Jude <allanjude@freebsd.org>
2018-01-22 05:59:48 +00:00
Alexander Motin
1dbc0fc352 MFV r328253: 8835 Speculative prefetch in ZFS not working for misaligned reads
illumos/illumos-gate@5cb8d943bc

https://www.illumos.org/issues/8835:
Sequential reads not aligned to block size are not detected by ZFS
prefetcher as sequential, killing prefetch and severely hurting
performance.  It is caused by dmu_zfetch() in case of misaligned
sequential accesses being called with overlap of one block.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Allan Jude <allanjude@freebsd.org>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Alexander Motin <mav@FreeBSD.org>
2018-01-22 05:57:14 +00:00
Alexander Motin
2d224f6116 8835 Speculative prefetch in ZFS not working for misaligned reads
illumos/illumos-gate@5cb8d943bc

https://www.illumos.org/issues/8835:
Sequential reads not aligned to block size are not detected by ZFS
prefetcher as sequential, killing prefetch and severely hurting
performance.  It is caused by dmu_zfetch() in case of misaligned
sequential accesses being called with overlap of one block.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Allan Jude <allanjude@freebsd.org>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Alexander Motin <mav@FreeBSD.org>
2018-01-22 05:55:43 +00:00
Alexander Motin
4eb2803697 MFV r328251: 8652 Tautological comparisons with ZPROP_INVAL
illumos/illumos-gate@4ae5f5f06c

https://www.illumos.org/issues/8652:
Clang and GCC prefer to use unsigned ints to store enums. With Clang, that
causes tautological comparison warnings when comparing a zfs_prop_t or
zpool_prop_t variable to the macro ZPROP_INVAL. It's likely that error
handling code is being silently removed as a result.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Alan Somers <asomers@gmail.com>
2018-01-22 05:52:39 +00:00
Alexander Motin
1e2bad5ea0 8652 Tautological comparisons with ZPROP_INVAL
illumos/illumos-gate@4ae5f5f06c

https://www.illumos.org/issues/8652:
Clang and GCC prefer to use unsigned ints to store enums. With Clang, that
causes tautological comparison warnings when comparing a zfs_prop_t or
zpool_prop_t variable to the macro ZPROP_INVAL. It's likely that error
handling code is being silently removed as a result.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Alan Somers <asomers@gmail.com>
2018-01-22 04:48:14 +00:00
Alexander Motin
cf52647c41 MFV r328249:
8641 "zpool clear" and "zinject" don't work on "spare" or "replacing" vdevs

illumos/illumos-gate@2ba5f978a4

https://www.illumos.org/issues/8641:
"zpool clear" and "zinject -d" can both operate on specific vdevs, either
leaf or interior. However, due to an oversight, neither works on a "spare"
or "replacing" vdev. For example:

sudo zpool create foo raidz1 c1t5000CCA000081D61d0 c1t5000CCA000186235d0 spare c
1t5000CCA000094115d0
sudo zpool replace foo c1t5000CCA000186235d0 c1t5000CCA000094115d0
$ zpool status foo pool: foo
state: ONLINE
scan: resilvered 81.5K in 0h0m with 0 errors on Fri Sep 8 10:53:03 2017
config:

NAME                         STATE     READ WRITE CKSUM
        foo                          ONLINE       0     0     0
          raidz1-0                   ONLINE       0     0     0
            c1t5000CCA000081D61d0    ONLINE       0     0     0
            spare-1                  ONLINE       0     0     0
              c1t5000CCA000186235d0  ONLINE       0     0     0
              c1t5000CCA000094115d0  ONLINE       0     0     0
        spares
          c1t5000CCA000094115d0      INUSE     currently in use
$ sudo zinject -d spare-1 -A degrade foo
cannot find device 'spare-1' in pool 'foo'
$ sudo zpool clear foo spare-1
cannot clear errors for spare-1: no such device in pool

Even though there was nothing to clear, those commands shouldn't have
reported an error. by contrast, trying to clear "raidz1-0" works just fine:
$ sudo zpool clear foo raidz1-0

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Alan Somers <asomers@gmail.com>
2018-01-22 04:37:04 +00:00
Alexander Motin
434e06c6f9 8641 "zpool clear" and "zinject" don't work on "spare" or "replacing" vdevs
illumos/illumos-gate@2ba5f978a4

https://www.illumos.org/issues/8641:
"zpool clear" and "zinject -d" can both operate on specific vdevs, either
leaf or interior. However, due to an oversight, neither works on a "spare"
or "replacing" vdev. For example:

sudo zpool create foo raidz1 c1t5000CCA000081D61d0 c1t5000CCA000186235d0 spare c1t5000CCA000094115d0
sudo zpool replace foo c1t5000CCA000186235d0 c1t5000CCA000094115d0
$ zpool status foo pool: foo
state: ONLINE
scan: resilvered 81.5K in 0h0m with 0 errors on Fri Sep 8 10:53:03 2017
config:

NAME                         STATE     READ WRITE CKSUM
        foo                          ONLINE       0     0     0
          raidz1-0                   ONLINE       0     0     0
            c1t5000CCA000081D61d0    ONLINE       0     0     0
            spare-1                  ONLINE       0     0     0
              c1t5000CCA000186235d0  ONLINE       0     0     0
              c1t5000CCA000094115d0  ONLINE       0     0     0
        spares
          c1t5000CCA000094115d0      INUSE     currently in use
$ sudo zinject -d spare-1 -A degrade foo
cannot find device 'spare-1' in pool 'foo'
$ sudo zpool clear foo spare-1
cannot clear errors for spare-1: no such device in pool

Even though there was nothing to clear, those commands shouldn't have
reported an error. by contrast, trying to clear "raidz1-0" works just fine:
$ sudo zpool clear foo raidz1-0
2018-01-22 04:35:17 +00:00
Alexander Motin
b5eb78f824 MFV r328247: 8959 Add notifications when a scrub is paused or resumed
illumos/illumos-gate@301fd1d6f2

Reviewed by: Alek Pinchuk <pinchuk.alek@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Sean Eric Fagan <sef@ixsystems.com>
2018-01-22 04:31:48 +00:00
Alexander Motin
ee700ae0c6 8959 Add notifications when a scrub is paused or resumed
illumos/illumos-gate@301fd1d6f2

Reviewed by: Alek Pinchuk <pinchuk.alek@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Sean Eric Fagan <sef@ixsystems.com>
2018-01-22 04:27:05 +00:00
Alexander Motin
abe7ff88e1 MFV r328245: 8856 arc_cksum_is_equal() doesn't take into account ABD-logic
illumos/illumos-gate@01a059ee0c

https://www.illumos.org/issues/8856:
arc_cksum_is_equal() calls zio_push_transform() that requires abd_t*
(second arg), but a void* is passed.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Roman Strashkin <roman.strashkin@nexenta.com>
2018-01-22 04:23:48 +00:00
Alexander Motin
1536455b31 8856 arc_cksum_is_equal() doesn't take into account ABD-logic
illumos/illumos-gate@01a059ee0c

https://www.illumos.org/issues/8856:
arc_cksum_is_equal() calls zio_push_transform() that requires abd_t*
(second arg), but a void* is passed.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Roman Strashkin <roman.strashkin@nexenta.com>
2018-01-22 04:21:55 +00:00
Kyle Evans
d5b15ddbfc usr.sbin/service: Fix -j to not be order dependant
The introduced -j option is highly dependant on the ordering of arguments,
and it exhibited broken behavior in some other circumstances. Fix these
issues, and simplify the feature by removing the unneessary double parsing
of options.

Reviewed by:	jilles
Differential Revision:	https://reviews.freebsd.org/D13952
2018-01-22 03:38:10 +00:00
Kyle Evans
df1043c201 libregex: Drop WARNS to 2 to match libc
It's become clear that my armv7 builds didn't catch all of the warnings that
other builds are picking up, drop WARNS to 2 to match libc until they're all
caught.
2018-01-22 03:12:26 +00:00
Kyle Evans
fe5bf674e6 Add missing patch from r328240
regcomp uses some libc internal collation bits that are not available in the
libregex context. It's easy enough to bring in the needed parts that can
work in a libregex world, so do so.

Pointy hat to:	me
2018-01-22 02:58:33 +00:00
Kyle Evans
b37f6c9805 Add libregex, connect it to the build
libregex is a regex(3) implementation intended to feature GNU extensions and
any other non-POSIX compliant extensions that are deemed worthy.

These extensions are separated out into a separate library for the sake of
not cluttering up libc further with them as well as not deteriorating the
speed (or lack thereof) of the libc implementation.

libregex is implemented as a build of the libc implementation with LIBREGEX
defined to distinguish this from a libc build. The reasons for
implementation like this are two-fold:

1.) Maintenance- This reduces the overhead induced by adding yet another
regex implementation to base.

2.) Ease of use- Flipping on GNU extensions will be as simple as linking
against libregex, and POSIX-compliant compilations can be guaranteed with a
REG_POSIX cflag that should be ignored by libc/regex and disables extensions
in libregex. It is also easier to keep REG_POSIX sane and POSIX pure when
implemented in this fashion.

Tests are added for future functionality, but left disconnected for the time
being while other testing is done.

Reviewed by:	cem (previous version)
Differential Revision:	https://reviews.freebsd.org/D12934
2018-01-22 02:44:41 +00:00
Pedro F. Giffuni
44c514b142 Forgot to sort here in r328238. 2018-01-22 02:26:10 +00:00
Pedro F. Giffuni
d821d36419 Unsign some values related to allocation.
When allocating memory through malloc(9), we always expect the amount of
memory requested to be unsigned as a negative value would either stand for
an error or an overflow.
Unsign some values, found when considering the use of mallocarray(9), to
avoid unnecessary casting. Also consider that indexes should be of
at least the same size/type as the upper limit they pretend to index.

MFC after:	3 weeks
2018-01-22 02:08:10 +00:00
Pedro F. Giffuni
b8d1747e75 Use the __alloc_size2 attribute where relevant.
This follows the documented use in GCC. It is basically only relevant for
calloc(3), reallocarray(3) and  mallocarray(9).

Suggested by:	Mark Millard

Reference:
https://docs.freebsd.org/cgi/mid.cgi?9DE674C6-EAA3-4E8A-906F-446E74D82FC4
2018-01-22 01:50:10 +00:00
Alexander Motin
d26c97e2fb MFV r328233:
8898 creating fs with checksum=skein on the boot pools fails ungracefully

illumos/illumos-gate@9fa2266d9a

https://www.illumos.org/issues/8898:
# zfs create -o checksum=skein rpool/test
internal error: Result too large
Abort (core dumped)

Not a big deal per se, but should be handled correctly.

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>

PR:		222199
2018-01-22 00:01:36 +00:00
Alexander Motin
4c30fea809 8898 creating fs with checksum=skein on the boot pools fails ungracefully
illumos/illumos-gate@9fa2266d9a

https://www.illumos.org/issues/8898:
# zfs create -o checksum=skein rpool/test
internal error: Result too large
Abort (core dumped)

Not a big deal per se, but should be handled correctly.

Also reported as: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=222199

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
2018-01-21 23:57:41 +00:00
Alexander Motin
9b347baa3a MFV r328231: 8897 zpool online -e fails assertion when run on non-leaf vdevs
illumos/illumos-gate@9a551dd645

https://www.illumos.org/issues/8897:
# zpool online -e test mirror-1
Assertion failed: nvlist_lookup_string(tgt, "path", &pathname) == 0, file ../common/libzfs_pool.c, line 2558, function zpool_vdev_online
Abort (core dumped)

Not a big deal per se, but should be handled gracefully, same way as 'offline' and 'online' without '-e'.

Also reported as: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221408

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
2018-01-21 23:53:56 +00:00
Alexander Motin
adbaf37829 8897 zpool online -e fails assertion when run on non-leaf vdevs
illumos/illumos-gate@9a551dd645

https://www.illumos.org/issues/8897:
# zpool online -e test mirror-1
Assertion failed: nvlist_lookup_string(tgt, "path", &pathname) == 0, file ../common/libzfs_pool.c, line 2558, function zpool_vdev_online
Abort (core dumped)

Not a big deal per se, but should be handled gracefully, same way as 'offline' and 'online' without '-e'.

Also reported as: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221408

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
2018-01-21 23:52:37 +00:00
Alexander Motin
226cd471b4 MFV r328229:
8930 zfs_zinactive: do not remove the node if the filesystem is readonly

illumos/illumos-gate@93c618e0f4

https://www.illumos.org/issues/8930:
We normally remove an unlinked node when its last user goes away and the
node becomes inactive. However, we should not do that if the filesystem
is mounted read-only including the case where it has its readonly
property set. The node will remain on the unlinked queue, so it will
not be leaked.

One particular scenario is when we receive an incremental stream into a
mounted read-only filesystem and that stream contains an unlinked file
(still on the unlinked queue). If that file is opened before the
receive and some time later after the receive it becomes inactive we
would remove it and, thus, modify the read-only filesystem. As a
result, the filesystem would diverge from its source and further
incremental receives would not be possible (without forcing a rollback).

Another related scenario, that may or may not be possible depending on an
OS / VFS policy, is when an open file is unlinked, then the filesystem is
remounted read-only, and then the file is closed.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Andriy Gapon <avg@FreeBSD.org>
2018-01-21 23:49:17 +00:00
Alexander Motin
dbee84cc27 8930 zfs_zinactive: do not remove the node if the filesystem is readonly
illumos/illumos-gate@93c618e0f4

https://www.illumos.org/issues/8930:
We normally remove an unlinked node when its last user goes away and the
node becomes inactive. However, we should not do that if the filesystem
is mounted read-only including the case where it has its readonly
property set. The node will remain on the unlinked queue, so it will
not be leaked.

One particular scenario is when we receive an incremental stream into a
mounted read-only filesystem and that stream contains an unlinked file
(still on the unlinked queue). If that file is opened before the
receive and some time later after the receive it becomes inactive we
would remove it and, thus, modify the read-only filesystem. As a
result, the filesystem would diverge from its source and further
incremental receives would not be possible (without forcing a rollback).

Another related scenario, that may or may not be possible depending on an
OS / VFS policy, is when an open file is unlinked, then the filesystem is
remounted read-only, and then the file is closed.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Andriy Gapon <avg@FreeBSD.org>
2018-01-21 23:42:45 +00:00
Alexander Motin
272f833cde MFV r328227: 8909 8585 can cause a use-after-free kernel panic
illumos/illumos-gate@94ddd0900a

https://www.illumos.org/issues/8909:
There's a race condition that exists if `zil_free_lwb` races with either
`zil_commit_waiter_timeout` and/or `zil_lwb_flush_vdevs_done`.

Here's an example panic due to this bug:

> ::status
    debugging crash dump vmcore.0 (64-bit) from ip-10-110-205-40
    operating system: 5.11 dlpx-5.2.2.0_2017-12-04-17-28-32b6ba51fb (i86pc)
    image uuid: 4af0edfb-e58e-6ed8-cafc-d3e9167c7513
    panic message:
    BAD TRAP: type=e (#pf Page fault) rp=ffffff0010555970 addr=60 occurred in mo
dule "zfs" due to a NULL pointer dereference
    dump content: kernel pages only

> $c
    zio_shrink+0x12()
    zil_lwb_write_issue+0x30d(ffffff03dcd15cc0, ffffff03e0730e20)
    zil_commit_waiter_timeout+0xa2(ffffff03dcd15cc0, ffffff03d97ffcf8)
    zil_commit_waiter+0xf3(ffffff03dcd15cc0, ffffff03d97ffcf8)
    zil_commit+0x80(ffffff03dcd15cc0, 9a9)
    zfs_write+0xc34(ffffff03dc38b140, ffffff0010555e60, 40, ffffff03e00fb758, 0)
    fop_write+0x5b(ffffff03dc38b140, ffffff0010555e60, 40, ffffff03e00fb758, 0)
    write+0x250(42, fffffd7ff4832000, 2000)
    sys_syscall+0x177()

If there's an outstanding lwb that's in `zil_commit_waiter_timeout`
waiting to timeout, waiting on it's waiter's CV, we must be sure not to
call `zil_free_lwb`. If we end up calling `zil_free_lwb`, then that LWB
may be freed and can result in a use-after-free situation where the
stale lwb pointer stored in the `zil_commit_waiter_t` structure of the
thread waiting on the waiter's CV is used.

A similar situation can occur if an lwb is issued to disk, and thus in
the `LWB_STATE_ISSUED` state, and `zil_free_lwb` is called while the
disk is servicing that lwb. In this situation, the lwb will be freed by
`zil_free_lwb`, which will result in a use-after-free situation when the
lwb's zio completes, and `zil_lwb_flush_vdevs_done` is called.

This race condition is prevented in `zil_close` by calling `zil_commit`
before `zil_free_lwb` is called, which will ensure all outstanding (i.e.
all lwb's in the `LWB_STATE_OPEN` and/or `LWB_STATE_ISSUED` states)
reach the `LWB_STATE_DONE` state before the lwb's are freed
(`zil_commit` will not return untill all the lwb's are
`LWB_STATE_DONE`).

Further, this race condition is prevented in `zil_sync` by only calling
`zil_free_lwb` for lwb's that do not have their `lwb_buf` pointer set.
All lwb's not in the `LWB_STATE_DONE` state will have a non-null value
for this pointer; the pointer is only cleared in
`zil_lwb_flush_vdevs_done`, at which point the lwb's state will be
changed to `LWB_STATE_DONE`.

This race is present in `zil_suspend`, leading to this bug.

At first glance, it would appear as though this would not be true
because `zil_suspend` will call `zil_commit`, just like `zil_close`, but
the problem is that `zil_suspend` will set the zilog's `zl_suspend`
field prior to calling `zil_commit`. Further, in `zil_commit`, if
`zl_suspend` is set, `zil_commit` will take a special branch of logic
and use `txg_wait_synced` instead of performing the normal `zil_commit`
logic.

This call to `txg_wait_synced` might be good enough for the data to
reach disk safely before it returns, but it does not ensure that all
outstanding lwb's reach the `LWB_STATE_DONE` state before it returns.
This is because, if there's an lwb "stuck" in
`zil_commit_waiter_timeout`, waiting for it's lwb to timeout, it will
maintain a non-null value for it's `lwb_buf` field and thus `zil_sync`
will not free that lwb. Thus, even though the lwb's data is already on
disk, the lwb will be left lingering, waiting on the CV, and will
eventually timeout and be issued to disk even though the write is
unnesseary.

So, after `zil_commit` is called from `zil_suspend`, we incorrectly
assume that there are not outstanding lwb's, and proceed to free all
lwb's found on the zilog's lwb list. As a result, we free the lwb that
will later be used `zil_commit_waiter_timeout`.

Reviewed by: John Kennedy <jwk404@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>
2018-01-21 23:18:42 +00:00
Alexander Motin
ebd7699264 8909 8585 can cause a use-after-free kernel panic
illumos/illumos-gate@94ddd0900a

https://www.illumos.org/issues/8909:
There's a race condition that exists if `zil_free_lwb` races with either
`zil_commit_waiter_timeout` and/or `zil_lwb_flush_vdevs_done`.

Here's an example panic due to this bug:

> ::status
    debugging crash dump vmcore.0 (64-bit) from ip-10-110-205-40
    operating system: 5.11 dlpx-5.2.2.0_2017-12-04-17-28-32b6ba51fb (i86pc)
    image uuid: 4af0edfb-e58e-6ed8-cafc-d3e9167c7513
    panic message:
    BAD TRAP: type=e (#pf Page fault) rp=ffffff0010555970 addr=60 occurred in module "zfs" due to a NULL pointer dereference
    dump content: kernel pages only

> $c
    zio_shrink+0x12()
    zil_lwb_write_issue+0x30d(ffffff03dcd15cc0, ffffff03e0730e20)
    zil_commit_waiter_timeout+0xa2(ffffff03dcd15cc0, ffffff03d97ffcf8)
    zil_commit_waiter+0xf3(ffffff03dcd15cc0, ffffff03d97ffcf8)
    zil_commit+0x80(ffffff03dcd15cc0, 9a9)
    zfs_write+0xc34(ffffff03dc38b140, ffffff0010555e60, 40, ffffff03e00fb758, 0)
    fop_write+0x5b(ffffff03dc38b140, ffffff0010555e60, 40, ffffff03e00fb758, 0)
    write+0x250(42, fffffd7ff4832000, 2000)
    sys_syscall+0x177()

If there's an outstanding lwb that's in `zil_commit_waiter_timeout`
waiting to timeout, waiting on it's waiter's CV, we must be sure not to
call `zil_free_lwb`. If we end up calling `zil_free_lwb`, then that LWB
may be freed and can result in a use-after-free situation where the
stale lwb pointer stored in the `zil_commit_waiter_t` structure of the
thread waiting on the waiter's CV is used.

A similar situation can occur if an lwb is issued to disk, and thus in
the `LWB_STATE_ISSUED` state, and `zil_free_lwb` is called while the
disk is servicing that lwb. In this situation, the lwb will be freed by
`zil_free_lwb`, which will result in a use-after-free situation when the
lwb's zio completes, and `zil_lwb_flush_vdevs_done` is called.

This race condition is prevented in `zil_close` by calling `zil_commit`
before `zil_free_lwb` is called, which will ensure all outstanding (i.e.
all lwb's in the `LWB_STATE_OPEN` and/or `LWB_STATE_ISSUED` states)
reach the `LWB_STATE_DONE` state before the lwb's are freed
(`zil_commit` will not return untill all the lwb's are
`LWB_STATE_DONE`).

Further, this race condition is prevented in `zil_sync` by only calling
`zil_free_lwb` for lwb's that do not have their `lwb_buf` pointer set.
All lwb's not in the `LWB_STATE_DONE` state will have a non-null value
for this pointer; the pointer is only cleared in
`zil_lwb_flush_vdevs_done`, at which point the lwb's state will be
changed to `LWB_STATE_DONE`.

This race is present in `zil_suspend`, leading to this bug.

At first glance, it would appear as though this would not be true
because `zil_suspend` will call `zil_commit`, just like `zil_close`, but
the problem is that `zil_suspend` will set the zilog's `zl_suspend`
field prior to calling `zil_commit`. Further, in `zil_commit`, if
`zl_suspend` is set, `zil_commit` will take a special branch of logic
and use `txg_wait_synced` instead of performing the normal `zil_commit`
logic.

This call to `txg_wait_synced` might be good enough for the data to
reach disk safely before it returns, but it does not ensure that all
outstanding lwb's reach the `LWB_STATE_DONE` state before it returns.
This is because, if there's an lwb "stuck" in
`zil_commit_waiter_timeout`, waiting for it's lwb to timeout, it will
maintain a non-null value for it's `lwb_buf` field and thus `zil_sync`
will not free that lwb. Thus, even though the lwb's data is already on
disk, the lwb will be left lingering, waiting on the CV, and will
eventually timeout and be issued to disk even though the write is
unnesseary.

So, after `zil_commit` is called from `zil_suspend`, we incorrectly
assume that there are not outstanding lwb's, and proceed to free all
lwb's found on the zilog's lwb list. As a result, we free the lwb that
will later be used `zil_commit_waiter_timeout`.

Reviewed by: John Kennedy <jwk404@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>
2018-01-21 23:12:38 +00:00
Alexander Motin
d252c97fc0 MFV r328225: 8603 rename zilog's "zl_writer_lock" to "zl_issuer_lock"
illumos/illumos-gate@cf07d3da99

https://www.illumos.org/issues/8603:
  To help make the ZIL's code more understandable, it was suggested that
  the zilog_t's "zl_writer_lock" field should be renamed to "zl_issuer_lock".

Reviewed by: C Fraire <cfraire@me.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>
2018-01-21 23:11:20 +00:00
Alexander Motin
d216de62d1 8603 rename zilog's "zl_writer_lock" to "zl_issuer_lock"
illumos/illumos-gate@cf07d3da99

https://www.illumos.org/issues/8603:
  To help make the ZIL's code more understandable, it was suggested that
  the zilog_t's "zl_writer_lock" field should be renamed to "zl_issuer_lock".
2018-01-21 23:04:25 +00:00
Alexander Motin
442814b7e7 MFV r328220: 8677 Open-Context Channel Programs
illumos/illumos-gate@a3b2868063

https://www.illumos.org/issues/8677
  We want to be able to run channel programs outside of synching context.
  This would greatly improve performance of channel program that just gather
  information, as we won't have to wait for synching context anymore.

  This feature should introduce the following:
  - A new command line flag in "zfs program" to specify our intention to
  run in open context.
  - A new flag/option within the channel program ioctl which selects the
  context.
  - Appropriate error handling whenever we try a channel program in
  open-context that contains zfs.sync* expressions.
  - Documentation for the new feature in the manual pages.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
2018-01-21 23:02:05 +00:00
Poul-Henning Kamp
137a344c63 Rename rpi_pwm to bcm283x_pwm, and build it on armv[67] and arm64.
Truncate ratio if period is lowered.

Tested on Rpi2 and Rpi3.

Rpi3 requires DTB->DTS->edit->DTB hack
2018-01-21 21:27:41 +00:00
Eitan Adler
bdf16dd67c iconv: adding missing break
break is probably intended and correct,
but has no correctness implications due to is94 => is96

Reviewed by:	cem, jilles
Reported by:	swildner@DragonFlyBSD.org
MFC After:	1 week
2018-01-21 21:09:08 +00:00
Pedro F. Giffuni
3669918823 Define a new __alloc_size2 attribute to complement the exiting support.
At least on GCC7 calling __alloc_size(x) twice is not equivalent to
calling using the attribute once with two arguments. The later is the
documented use in GCC documentation so add a new alloc_size(n, x)
alternative to cover for the few places where it is used: basically:
calloc(3), reallocarray(3) and  mallocarray(9).

Submitted by:	Mark Millard
MFC after:	3 days
Reference:
http://docs.freebsd.org/cgi/mid.cgi?F227842D-6BE2-4680-82E7-07906AF61CD7
2018-01-21 20:27:47 +00:00
Alexander Motin
619fd3c317 8677 Open-Context Channel Programs
illumos/illumos-gate@a3b2868063

https://www.illumos.org/issues/8677
  We want to be able to run channel programs outside of synching context.
  This would greatly improve performance of channel program that just gather
  information, as we won't have to wait for synching context anymore.

  This feature should introduce the following:
  - A new command line flag in "zfs program" to specify our intention to
  run in open context.
  - A new flag/option within the channel program ioctl which selects the
  context.
  - Appropriate error handling whenever we try a channel program in
  open-context that contains zfs.sync* expressions.
  - Documentation for the new feature in the manual pages.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author:	Serapheim Dimitropoulos <serapheim@delphix.com>
2018-01-21 19:26:38 +00:00
Edward Tomasz Napierala
1ee5bed72a Add missing manufacturer/serial number string descriptors.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2018-01-21 17:31:31 +00:00
Pedro F. Giffuni
ac2fffa4b7 Revert r327828, r327949, r327953, r328016-r328026, r328041:
Uses of mallocarray(9).

The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.

Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.

Reported by:	wosch
PR:		225197
2018-01-21 15:42:36 +00:00
Andriy Gapon
e09304d8f3 zfs: no need to check that size of zfs_cmd_t is not greater than IOCPARM_MAX
Nowadays we do not pass zfs_cmd_t directly through the ioctl interface.
Instead a small zfs_iocparm_t object is passed and the command is
explicitly copied in and out.  So, the check has become irrelevant.

MFC after:	3 weeks
Sponsored by:	Panzura
2018-01-21 11:19:18 +00:00
Eitan Adler
7275057a5e limits(1): fix always true condition
Reviewed by:	imp
MFC After:	1 week
2018-01-21 08:48:26 +00:00
Kyle Evans
4f8f1c798e regex(3): Resolve issues with higher WARNS levels
libc is set for WARNS=2, but the incoming libregex will use WARNS=6.
Sprinkle some casts and (void)bc's to alleviate the warnings that come along
with the higher WARNS level.

These 'bc' parameters could be outright removed, but as of right now they
will be used in some parts of libregex land. Silence the warnings instead
rather than flip-flopping.
2018-01-21 04:57:29 +00:00
Eitan Adler
7e2c40ffa9 termcap: add xterm-termite
Obtained from:	DragonFly
MFC After:	1 week
2018-01-20 22:24:45 +00:00
Jean-Sébastien Pédron
90b0eb9b4a psm: Log syncmask[1], not syncmask[0] twice
MFC after:	1 week
2018-01-20 19:04:21 +00:00
Eitan Adler
5a9fdfb303 limits(1): add missing break
Reported by:	swildner@DragonFlyBSD.org
MFC After:	1 week
2018-01-20 18:27:00 +00:00
Konstantin Belousov
c398c14664 Use correct symbol name in r328202.
Sponsored by:	The FreeBSD Foundation
MFC after:	11 days
2018-01-20 18:05:14 +00:00
Konstantin Belousov
3a5e472e17 Use predefined symbol for the CR3.PCID mask.
Sponsored by:	The FreeBSD Foundation
MFC after:	11 days
2018-01-20 17:46:09 +00:00
Michal Meloun
f8759facd2 Convert extres/phy to kobj model.
Similarly as other extres pseudo-drivers, implement phy by using kobj model.
This detaches it from provider device, so single device driver can export
multiple different phys. Additionally, this  allows phy to be subclassed to
more specialized drivers, like is USB OTG phy, or PCIe phy with hot-plug
capability.

Tested by:	manu (previous version, on Allwinner board)
MFC after:	1 month
2018-01-20 17:02:17 +00:00
Li-Wen Hsu
08c80489cd Silence the gcc warning: 'op' may be used uninitialized in this function
Approved by:	kevans
2018-01-20 15:37:47 +00:00
Roger Pau Monné
50a53194f6 xen: fix IDT setup after PTI
On amd64 the IDT handler was not set correctly when using PTI.

While there also fix the selectors to SEL_KPL.

Obtained from:	kib
MFC with:	r328083
2018-01-20 14:59:37 +00:00
Emmanuel Vadot
2b541904af clk: Get new parent freq after set_freq
During set_freq a clknode might have reparent (using a better parent that
have a higher frequency for example), before refreshing the cache, re-get
the parent frequency.

Reviewed by:	mmel
2018-01-20 14:47:27 +00:00
Edward Tomasz Napierala
ad2c142415 Remove unused index.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2018-01-20 14:05:55 +00:00