Commit Graph

223437 Commits

Author SHA1 Message Date
Alan Cox
fe71561af2 In r118390, the swap pager's approach to striping swap allocation over
multiple devices was changed.  However, swapoff_one() was not fully and
correctly converted.  In particular, with r118390's introduction of a per-
device blist, the maximum swap block size, "dmmax", became irrelevant to
swapoff_one()'s operation.  Moreover, swapoff_one() was performing out-of-
range operations on the per-device blist that were silently ignored by
blist_fill().

This change corrects both of these problems with swapoff_one(), which will
allow us to potentially increase MAX_PAGEOUT_CLUSTER.  Previously,
swapoff_one() would panic inside of blist_fill() if you increased
MAX_PAGEOUT_CLUSTER.

Reviewed by:	kib, markj
MFC after:	3 days
2017-05-27 16:40:00 +00:00
Baptiste Daroussin
b5b274ce12 Catch with the change in the user class 2017-05-27 14:07:46 +00:00
Baptiste Daroussin
5a95bf085c Use the usual FreeBSD spelling for the DHCP user class
Reported by:	lidl
2017-05-27 14:06:57 +00:00
Baptiste Daroussin
4e2a7b5c99 Capitalize DHCP
Reported by:	danfe
2017-05-27 13:55:20 +00:00
Baptiste Daroussin
aff810f1b2 Document recent changes on pxeboot 2017-05-27 13:26:18 +00:00
Baptiste Daroussin
e9ce925773 Partially revert r314948
While it sounds like a good idea to extract the RFC1048 data from PXE, in the
end it is not and it is causing lots of issues.  Our pxeloader might need
options which are incompatible with other pxe servers (for example iPXE, but
not only).

Our pxe loaders are also now settings their own user class, so it is useful to
issue our own pxe request at startup

Reviewed by:	tsoome
Differential Revision:	https://reviews.freebsd.org/D10953
2017-05-27 12:46:46 +00:00
Baptiste Daroussin
4dfd16670e Always issue the pxe request
All the code are now only issueing one single dhcp request at startup of the
loader meaning we can always request a the PXE informations from the
dhcp server.

Previous code lost that information, meaning no option 55 anymore (meaning not
working with the kea dhcp server) and no request for rootpath etc, no user class

Remove the flags from the bootp function which is not needed anymore

Reviewed by:	tsoome
Differential Revision:	https://reviews.freebsd.org/D10952
2017-05-27 12:35:01 +00:00
Baptiste Daroussin
5fe86cd909 Always build tftpfs support along with nfs for pxeboot
This change was already done for loader.efi
2017-05-27 12:20:13 +00:00
Baptiste Daroussin
404f5b6b29 Support URI scheme for root-path in netbooting
Rather that previous attempts to add tftpfs support at the same time as NFS
support. This time decide on a proper URI parser rather than hacks.

root-path can now be define the following way:
For tftpfs:

tftp://ip/path
tftp:/path (this one will consider the tftp server is the same as the one where
the pxeboot file was fetched from)

For nfs:
nfs:/path
nfs://ip/path

The historical
ip:/path
/path

are kept on NFS

Reviewed by:	tsoom, rgrimes
Differential Revision:	https://reviews.freebsd.org/D10947
2017-05-27 12:06:52 +00:00
Baptiste Daroussin
b2390b67da add a comment on vendor index 19 and 20 to avoid confusion
Suggested by:	tsoome
2017-05-27 11:41:54 +00:00
Baptiste Daroussin
6180f83d95 Pass a "FREEBSD" user-class in PXE dhcp request
rfc3004 allows to pass multiple user classes on dhcp requests
this is used by dhcp servers to differentiate the caller if needed.

As an example with isc dhcp server it will be possible to make options
only for the FreeBSD loaders:

if exists user-class and option user-class = "FREEBSD" {
   option root-path "tftp://192.168.42.1/FreeBSD;
}

Reviewed by:	tsoome
Differential Revision:	https://reviews.freebsd.org/D10951
2017-05-27 10:50:35 +00:00
Xin LI
335917f071 Tighten /entropy permissions.
PR:		219527
Reported by:	Lu Tung-Pin <lutungpin at openmailbox.org>
Submitted by:	jilles
MFC after:	3 days
2017-05-27 06:24:06 +00:00
Ed Maste
ef7161e774 uart: add AMT SOL PCI ID
I adjusted the description to be similar to existing AMT entries.

PR:		219384
Submitted by:	"Tooker"
MFC after:	1 week
2017-05-27 02:07:22 +00:00
Navdeep Parhar
7de5a71e79 libcxgb4: Use memcpy instead of copying WRs 8B at a time in the userspace
RDMA library for cxgbe(4).

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-05-27 02:05:21 +00:00
Alexander Motin
41cf0d54a2 Call VLAN_CAPABILITIES() when LAGG capabilities change.
This makes VLAN on top of LAGG to expose proper capabilities if they are
changed after creation.

MFC after:	1 week
2017-05-26 22:22:48 +00:00
Conrad Meyer
95b978955c procstat(1): Add TCP socket send/recv buffer size
Add TCP socket send and receive buffer size to procstat -f output.

Reviewed by:	kib, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D10689
2017-05-26 22:17:44 +00:00
Brooks Davis
fbf87d4016 Add missing usage and getopt(3) options
- Add the missing option 'n' to the getopt(3) string
- Add the missing options 'libxo' and 'N' to the usage message
- Add the missing options 'M' and 'N' to the man-page

Submitted by:	Keegan Drake H.P. <kdrakehp@zoho.com>
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D10915
2017-05-26 21:10:01 +00:00
John Baldwin
d68990a14c Fail large requests with EFBIG.
The adapter firmware in general does not accept PDUs larger than 64k - 1
bytes in size.  Sending crypto requests larger than this size result in
hangs or incorrect output, so reject them with EFBIG.  For requests
chaining an AES cipher with an HMAC, the firmware appears to require
slightly smaller requests (around 512 bytes).

Sponsored by:	Chelsio Communications
2017-05-26 20:20:40 +00:00
Alexander Motin
8403ab7919 Improve applying unified capabilities to the lagg ports.
Some NICs have some capabilities dependent, so that disabling one require
disabling some other (TXCSUM/RXCSUM on em).  This code tries to reach the
consensus more insistently.

PR:		219453
MFC after:	1 week
2017-05-26 20:15:33 +00:00
Andriy Gapon
b5617df55b Allow PROBE_SPINUP to fail in CAM ATA transport
The motivation for this is two-fold.

1. Some old WD SATA disks may appear as if they need to be spun up
when they are already spinning.  Those disks would respond with
an error to the spin-up request.

2. Even if we really fail to spin up the disk, we still can try to
proceed to the subsequent phases.  If we fail later on, then no
difference.  Otherwise we get a chance to communicate with the
disk which is better than completely ignoring it, because a user
can try to recover the disk.

Reviewed by:	mav
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D10896
2017-05-26 17:44:47 +00:00
David Bright
b326fec486 Add newsyslog capability to write RFC5424 compliant rotation message.
This modification adds the capability to newsyslog to write the
rotation message in a format that is compliant with RFC5424. This
capability is enabled on a per-log file basis through a new value
("T") in the flags field in newsyslog.conf. This is useful on systems
that use the RFC5424 format for log files so that the rotation message
format matches that of the other log messages. There has been recent
mention of adding an RFC5424 compliant mode to syslogd and at least
one alternative system log daemon (rsyslogd) that already has the
capability to use that format.

Reviewed by:	vangyzen, ngie
Approved by:	vangyzen (mentor)
MFC after:	2 months
Relnotes:	yes
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D10253
2017-05-26 16:36:30 +00:00
Dimitry Andric
b47efe07c4 Define a new __INO64 macro in <sys/_types.h>, to indicate the system
uses 64-bit inode numbers.  Programs can use this to avoid including
<sys/param.h>, with its associated namespace pollution.

Reviewed by:	kib
2017-05-26 16:29:55 +00:00
Michael Tuexen
5d08768a2b Use the SCTP_PCB_FLAGS_ACCEPTING flags to check for listeners.
While there, use a macro for checking the listen state to allow for
easier changes if required.

This done to help glebius@ with his listen changes.
2017-05-26 16:29:00 +00:00
Ed Maste
55b87d4621 rm stale ptrace dependencies after r305012
This is similar to r318912, except that ptrace.[sS] was previously a
file in the source tree, not a generated assembly wrapper.

Check for the existence of ptrace.[sS] in the .depend file to determine
if we have to clean it up.  This is a bit hackish and will not be left
in place indefinitely, but provides a useful example case when
investigating a better solution in bmake.

Reviewed by:	bdrewery
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D10930
2017-05-26 16:03:28 +00:00
Eric van Gyzen
afba14e2b8 libthr: increase WARNS to the default (6)
...and silence cast-align warnings from gcc.

Reviewed by:	kib
MFC after:	3 days
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D10935
2017-05-26 15:57:54 +00:00
Eric van Gyzen
718fb5ba5b libthr: fix warnings at WARNS=6
Fix more warnings about redundant declarations.

Reviewed by:	kib emaste
MFC after:	3 days
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D10932
2017-05-26 15:56:28 +00:00
Eric van Gyzen
7fb37371e8 rtld: fix warnings about redundant declarations
Fix warnings about redundant declarations in rtld
when libthr in increased to WARNS=6.

Reviewed by:	kib
MFC after:	3 days
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D10934
2017-05-26 15:55:03 +00:00
Eric van Gyzen
01618b339f libthr: fix style in previous commit
I intended to add this to the previous commit.

Reviewed by:	kib
MFC after:	3 days
Sponsored by:	Dell EMC
2017-05-26 15:53:27 +00:00
Eric van Gyzen
d25183e0a7 libthr: prevent setcontext() from masking SIGTHR
__thr_setcontext() mistakenly tested for the presence of SIGCANCEL
in its local ucontext_t instead of the parameter. Therefore,
if a thread calls setcontext() with a context whose signal mask
contains SIGTHR (a.k.a. SIGCANCEL), that signal will be blocked,
preventing the thread from being cancelled or suspended.

Reported by:	gcc 6.1 via RISC-V tinderbox
Reviewed by:	kib
MFC after:	3 days
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D10933
2017-05-26 15:51:51 +00:00
Ed Maste
b79f050a88 makefs: add -O (offset) option
NetBSD revs:
ffs.c		1.60
makefs.8	1.44
makefs.c	1.48
makefs.h	1.33
ffs/buf.c	1.20
ffs/mkfs.c	1.27

Obtained from:	NetBSD
Relnotes:	Yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D10780
2017-05-26 15:49:20 +00:00
Andriy Gapon
32ecf81aff MFV r318944: 8265 Reserve send stream flag for large dnode feature
illumos/illumos-gate@bc83969fdb
bc83969fdb

https://www.illumos.org/issues/8265
  Reserve bit 23 in the zfs send stream flags for the large
  dnode feature which has been implemented for Linux.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Brian Behlendorf <behlendorf1@llnl.gov>

MFC after:	1 week
2017-05-26 12:08:38 +00:00
Andriy Gapon
a8aa933b61 8265 Reserve send stream flag for large dnode feature
illumos/illumos-gate@bc83969fdb
bc83969fdb

https://www.illumos.org/issues/8265
  Reserve bit 23 in the zfs send stream flags for the large
  dnode feature which has been implemented for Linux.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Brian Behlendorf <behlendorf1@llnl.gov>
2017-05-26 12:07:47 +00:00
Andriy Gapon
a51eb0a964 MFV r318942: 8166 zpool scrub thinks it repaired offline device
illumos/illumos-gate@2d2f193a21
2d2f193a21

https://www.illumos.org/issues/8166
  If we do a scrub while a leaf device is offline (via "zpool offline"),
  we will inadvertently clear the DTL (dirty time log) of the offline
  device, even though it is still damaged. When the device comes back
  online, we will incompletely resilver it, thinking that the scrub
  repaired blocks written before the scrub was started. The incomplete
  resilver can lead to data loss if there is a subsequent failure of a
  different leaf device.
  The fix is to never clear the DTL of offline devices. Note that if a
  device is onlined while a scrub is in progress, the scrub will be
  restarted.
  The problem can be worked around by running "zpool scrub" after
  "zpool online".
  See also https://github.com/zfsonlinux/zfs/issues/5806

Reviewed by: George Wilson george.wilson@delphix.com
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
2017-05-26 12:04:21 +00:00
Andriy Gapon
f45d37d04e 8166 zpool scrub thinks it repaired offline device
illumos/illumos-gate@2d2f193a21
2d2f193a21

https://www.illumos.org/issues/8166
  If we do a scrub while a leaf device is offline (via "zpool offline"),
  we will inadvertently clear the DTL (dirty time log) of the offline
  device, even though it is still damaged. When the device comes back
  online, we will incompletely resilver it, thinking that the scrub
  repaired blocks written before the scrub was started. The incomplete
  resilver can lead to data loss if there is a subsequent failure of a
  different leaf device.
  The fix is to never clear the DTL of offline devices. Note that if a
  device is onlined while a scrub is in progress, the scrub will be
  restarted.
  The problem can be worked around by running "zpool scrub" after
  "zpool online".
  See also https://github.com/zfsonlinux/zfs/issues/5806

Reviewed by: George Wilson george.wilson@delphix.com
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
2017-05-26 12:02:51 +00:00
Andriy Gapon
9df2d6f729 7446 zpool create should support efi system partition
illumos/illumos-gate@7855d95b30
7855d95b30

https://www.illumos.org/issues/7446
  Since we support whole-disk configuration for boot pool, we also will need
  whole disk support with UEFI boot and for this, zpool create should create efi-
  system partition.
  I have borrowed the idea from oracle solaris, and introducing zpool create -
  B switch to provide an way to specify that boot partition should be created.
  However, there is still an question, how big should the system partition be.
  For time being, I have set default size 256MB (thats minimum size for FAT32
  with 4k blocks). To support custom size, the set on creation "bootsize"
  property is created and so the custom size can be set as: zpool create B -
  o bootsize=34MB rpool c0t0d0
  After pool is created, the "bootsize" property is read only. When -B switch is
  not used, the bootsize defaults to 0 and is shown in zpool get output with
  value ''. Older zfs/zpool implementations are ignoring this property.
  https://www.illumos.org/rb/r/219/

Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Dan McDonald <danmcd@kebe.com>
Author: Toomas Soome <tsoome@me.com>
2017-05-26 12:02:14 +00:00
Andriy Gapon
ab57ddbb43 7956 "minimum" is misspelled in zpool manpage
illumos/illumos-gate@eba8726136
eba8726136

https://www.illumos.org/issues/7956

Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Brad Lewis <blewis@delphix.com>
2017-05-26 12:00:31 +00:00
Andriy Gapon
5fbf54a543 6418 zpool should have a label clearing command
illumos/illumos-gate@6401734d54
6401734d54

https://www.illumos.org/issues/6418
  An easy, direct means of sanitizing pool vdevs can be helpful for management
  purposes.
  FreeBSD has had a 'zpool labelclear' for some time, see: https://
  svnweb.freebsd.org/base?view=revision&revision=224171
  SpectraBSD has a slightly updated version, which I propose for inclusion.

Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Will Andrews <will@firepipe.net>

Note: the bulk of the change has been already imported, this is a follow
up that imports zpool.1m changes.
2017-05-26 11:59:20 +00:00
Andriy Gapon
23d34c4278 6781 zpool man page needs updated to remove duplicate entry of "cannot be" where it discusses cache devices
illumos/illumos-gate@e4cb59f791
e4cb59f791

https://www.illumos.org/issues/6781
    cache
    A device used to cache storage pool data. A cache device cannot
    be cannot be configured as a mirror or raidz group. For more
    information, see the "Cache Devices" section.
  needs changed to
    cache
    A device used to cache storage pool data. A cache device cannot
    be configured as a mirror or raidz group. For more
    information, see the "Cache Devices" section.

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Alexander Pyhalov <apyhalov@gmail.com>
2017-05-26 11:56:28 +00:00
Andriy Gapon
15a3493f4e 2897 "zpool split" documentation missing from manpage
illumos/illumos-gate@879bece34e
879bece34e

https://www.illumos.org/issues/2897
  Found this option in some Oracle documentation and wanted to check out the
  zpool manpage on it in OI. Unfortunately it seems to be missing from the
  manpage, so I first thought it was unsupported. However, "# zpool split" does
  print the correct usage. Testing with the "-n" switch makes me believe it is
  supported (I don't actually need to split my pool).

Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Steven Burgess <sburgess@datto.com>
2017-05-26 11:55:31 +00:00
Andriy Gapon
a1a27d6fd4 4465 zpool(1M) is able to offline cache vdevs despite what man page says
5659 in the manual page for zpool(1M), one misuse of the word 'zpool' to describe a pool

illumos/illumos-gate@c8323d4323
c8323d4323

https://www.illumos.org/issues/4465
  zpool(1M) is able to offline cache vdevs despite man page saying that it isn't:
         zpool offline [-t] pool device ...
             Takes the specified physical device offline. While the device is
             offline, no attempt is made to read or write to the device.

             This command is not applicable to spares or cache devices.
  altair:root:~# zpool create testoff c9t67d0 cache c9t71d0
  altair:root:~# zpool status testoff
    pool: testoff
   state: ONLINE
    scan: none requested
  config:

          NAME        STATE     READ WRITE CKSUM
          testoff     ONLINE       0     0     0
            c9t67d0   ONLINE       0     0     0
          cache
            c9t71d0   ONLINE       0     0     0

  errors: No known data errors
  altair:root:~# zpool offline testoff c9t71d0
  altair:root:~# zpool status testoff
    pool: testoff
   state: ONLINE
  status: One or more devices has been taken offline by the administrator.
          Sufficient replicas exist for the pool to continue functioning in a
          degraded state.
  action: Online the device using 'zpool online' or replace the device with
          'zpool replace'.
    scan: none requested

https://www.illumos.org/issues/5659
  At https://github.com/illumos/illumos-gate/blob/master/usr/src/man/man1m/
  zpool.1m#L931
       Do not add a disk that is currently configured as a quorum device to
       a zpool.
  – should be:
       Do not add a disk that is currently configured as a quorum device to
       a pool.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
2017-05-26 11:54:42 +00:00
Andriy Gapon
2cd05c2473 MFV r318934: 8070 Add some ZFS comments
illumos/illumos-gate@40713f2b24
40713f2b24

https://www.illumos.org/issues/8070
  Add some ZFS comments left by various developers at different times

Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Alan Somers <asomers@gmail.com>

MFC after:	1 week
2017-05-26 11:49:42 +00:00
Andriy Gapon
2053e2d0d0 8070 Add some ZFS comments
illumos/illumos-gate@40713f2b24
40713f2b24

https://www.illumos.org/issues/8070
  Add some ZFS comments left by various developers at different times

Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Alan Somers <asomers@gmail.com>
2017-05-26 11:48:29 +00:00
Andriy Gapon
5ce561a2f5 8064 need a static DTrace probe in VN_HOLD
illumos/illumos-gate@ade42b557a
ade42b557a

https://www.illumos.org/issues/8064
  It's currently nearly impossible to trace what process places a hold on
  a vnode, as the only ways holds are place is via the `VN_HOLD()` and
  `VN_HOLD_CALLER()` macros, which inline the bumping of `v_count`. Adding
  static DTrace probes to these macros would enable tracing of where
  specific vnode references come from.
  For completeness and symmetry, a similar static probe should be added to
  `vn_rele()` and `vn_rele_dnlc()`.

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Sebastien Roy <seb@delphix.com>
2017-05-26 11:39:34 +00:00
Andriy Gapon
0a07ea0e2f MFV r318931: 8063 verify that we do not attempt to access inactive txg
illumos/illumos-gate@b7b2590dd9
b7b2590dd9

https://www.illumos.org/issues/8063
  A standard practice in ZFS is to keep track of "per-txg" state. Any of
  the 3 active TXG's (open, quiescing, syncing) can have different values
  for this state. We should assert that we do not attempt to modify other
  (inactive) TXG's.

Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC after:	2 weeks
2017-05-26 11:37:11 +00:00
Andriy Gapon
f9e7ac9d61 8063 verify that we do not attempt to access inactive txg
illumos/illumos-gate@b7b2590dd9
b7b2590dd9

https://www.illumos.org/issues/8063
  A standard practice in ZFS is to keep track of "per-txg" state. Any of
  the 3 active TXG's (open, quiescing, syncing) can have different values
  for this state. We should assert that we do not attempt to modify other
  (inactive) TXG's.

Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
2017-05-26 11:35:34 +00:00
Andriy Gapon
28c5e43e36 MFV r318929: 7786 zfs`vdev_online() needs better notification about state changes
illumos/illumos-gate@5f368aef86
5f368aef86

https://www.illumos.org/issues/7786
  Currently, vdev_online() will only post sysevent if previous state was
  "offline". It should also post the event when the state changes from "removed"
  or "faulted" to "healthy" or "degraded".
  This will fix the following scenario:
  - pull disk from slot A
  - check that hotspare has taken its place (if available)
  - insert disk into slot B
  - check that hotspare moved back to "avail" state (if spare was used)
  The problem here is that we don't get any ESC_ZFS_VDEV_* notification and fail
  to update the vdev FRU.

Reviewed by: Matthew Ahrens mahrens@delphix.com
Reviewed by: George Wilson george.wilson@delphix.com
Approved by: Albert Lee <trisk@forkgnu.org>
Author: Yuri Pankov <yuri.pankov@nexenta.com>

MFC after:	1 week
2017-05-26 11:33:34 +00:00
Andriy Gapon
92455d0cca 7786 zfs`vdev_online() needs better notification about state changes
illumos/illumos-gate@5f368aef86
5f368aef86

https://www.illumos.org/issues/7786
  Currently, vdev_online() will only post sysevent if previous state was
  "offline". It should also post the event when the state changes from "removed"
  or "faulted" to "healthy" or "degraded".
  This will fix the following scenario:
  - pull disk from slot A
  - check that hotspare has taken its place (if available)
  - insert disk into slot B
  - check that hotspare moved back to "avail" state (if spare was used)
  The problem here is that we don't get any ESC_ZFS_VDEV_* notification and fail
  to update the vdev FRU.

Reviewed by: Matthew Ahrens mahrens@delphix.com
Reviewed by: George Wilson george.wilson@delphix.com
Approved by: Albert Lee <trisk@forkgnu.org>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
2017-05-26 11:32:05 +00:00
Andriy Gapon
9c2a3c861f MFV r318927: 8025 dbuf_read() creates unnecessary zio_root() for bonus buf
illumos/illumos-gate@def4fac588
def4fac588

https://www.illumos.org/issues/8025
  dbuf_read() creates a zio_root() to track and wait for all the zio's
  that may happen as part of this call. However, if the blkptr_t for
  this buffer is NULL or a hole, we will not create any more zio's, so
  this zio_root() is unnecessary. This is always the case when calling
  dbuf_read() on a bonus buffer, because it has no blkptr (it's part of
  the containing dnode). For workloads that read a lot of bonus buffers
  (e.g. file creation and removal), creating and destroying these
  unnecessary zio's can decrease performance by around 3%.

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
2017-05-26 11:30:55 +00:00
Andriy Gapon
1c42c71f38 8025 dbuf_read() creates unnecessary zio_root() for bonus buf
illumos/illumos-gate@def4fac588
def4fac588

https://www.illumos.org/issues/8025
  dbuf_read() creates a zio_root() to track and wait for all the zio's
  that may happen as part of this call. However, if the blkptr_t for
  this buffer is NULL or a hole, we will not create any more zio's, so
  this zio_root() is unnecessary. This is always the case when calling
  dbuf_read() on a bonus buffer, because it has no blkptr (it's part of
  the containing dnode). For workloads that read a lot of bonus buffers
  (e.g. file creation and removal), creating and destroying these
  unnecessary zio's can decrease performance by around 3%.

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
2017-05-26 11:29:31 +00:00
Andriy Gapon
ebaf416f95 MFV r316929: 6914 kernel virtual memory fragmentation leads to hang
illumos/illumos-gate@af868f46a5
af868f46a5

https://www.illumos.org/issues/6914

FreeBSD note: only a ZFS part of the change is merged, changes to the VM
subsystem are not ported (obviously).  Also, now that FreeBSD has
vmem(9) we don't have to ifdef-out the code that uses it.

MFC after:	2 weeks
2017-05-26 11:23:16 +00:00