- Always unlink $cmd after exit via END block.
- The tests don't function well if kern.geom.debugflags != 0. Save debugflags,
then restore them at the end of the test.
MFC after: 1 month
Sponsored by: Dell EMC Isilon
This will allow the tool to be used with arbitrary geom(4) classes, like GEOM.
Specify class=PART explicitly in the tester to keep existing behavior.
MFC after: 1 month
Sponsored by: Dell EMC Isilon
in /boot/defaults/loader.conf to something that's actually commonly used,
"mdroot". It's arbitrary, but it's easier to find this way.
MFC after: 2 weeks
Description from Brett:
"The busdma tags used to create mappings for the tx and rx rings did not have
the device's tag as parents, meaning that they did not respect the device's
busdma properties. The other tags used in the driver had their parents set
appropriately.
Also, the dma maps for each buffer in ixl_txeof() were being NULLed after
being unloaded, which is an error because those maps are then reused without
being recreated (I believe this also leaked resources since the maps were not
destroyed). Simply removing the line that sets the maps to NULL gives the
desired behavior. There does not seem to be a similar problem with ixl_rxeof().
Functions to free the tx and rx rings also NULL out the dma maps for each
buffer, but this seems okay because the maps are destroyed and not reused in
this case.
With these fixes, my ixl card seems to be working with the IOMMU enabled."
Submitted by: Brett Gutstein <bgutstein@rice.edu>
Reviewed by: erj
Approved by: Alan Cox <alc@rice.edu>
MFC after: 1 week
now. The option does not presently work. However, similar functions in
ipfstat (for state) and ipnat (for nat) do work and provide outputs that
can be easily parsed by shell scripts or subsequently loaded into CSV
files. The intention here is to return to this option to make it work.
I suspect the problem is in printpoolfields.c.
what this field represented was also inaccurate.) Suggested by: kib
In r178792, blist_create() grew a malloc flag, allowing M_NOWAIT to be
specified. However, blist_create() was not modified to handle the
possibility that a malloc() call failed. Address this omission.
Increase the width of the local variable "radix" to 64 bits. (This
matches the width of the corresponding field in struct blist.)
Reviewed by: kib
MFC after: 6 weeks
The reboot() system call accepts a mode (RB_AUTOBOOT, RB_HALT, RB_POWEROFF,
or RB_REROOT) as well as zero or more optional flags in 'howto'.
However, RB_AUTOBOOT was only displayed if 'howto' was exactly 0.
Combinations like 'RB_AUTOBOOT | RB_DUMP' were decoded as 'RB_DUMP'.
Instead, imply that RB_AUTOBOOT was specified if none of the other "mode"
flags were specified.
Close a potential race in reading the CPU dtrace flags, where a thread can
start on one CPU, and partway through retrieving the flags be swapped out,
while another thread traps and sets the CPU_DTRACE_NOFAULT. This could
cause the first thread to return without handling the fault.
Discussed with: markj@
In particular:
- Don't evaluate event conditions with a sleepqueue lock held, since such
code may attempt to acquire arbitrary locks.
- Fix the return value for wait_event_interruptible() in the case that the
wait is interrupted by a signal.
- Implement wait_on_bit_timeout() and wait_on_atomic_t().
- Implement some functions used to test for pending signals.
- Implement a number of wait_event_*() variants and unify the existing
implementations.
- Unify the mechanism used by wait_event_*() and schedule() to put the
calling thread to sleep.
This is required to support updated DRM drivers. Thanks to hselasky for
finding and fixing a number of bugs in the original revision.
Reviewed by: hselasky
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D10986
quantity as the size of the range to fill, but returns a 32-bit quantity
as the number of blocks that were allocated to fill that range. This
revision corrects that mismatch. Currently, swaponsomething() limits
the size of a swap area to prevent arithmetic arithmetic overflow in
other parts of the blist allocator. That limit has also prevented this
type mismatch from causing problems.
Reviewed by: kib, markj
MFC after: 6 weeks
Differential Revision: https://reviews.freebsd.org/D11096
illumos/illumos-gate@690031d326690031d326https://www.illumos.org/issues/8168
If we manage to export the pool on which we are creating a dataset (filesystem
or zvol) between entering libzfs`zfs_create() and libzfs`zpool_open() call (for
which we never check the return value) we end up dereferencing a NULL pointer
in libzfs`zpool_close().
This was discovered on ZFS on Linux. The same issue can be reproduced on
Illumos running in parallel:
while :; do zpool import -d /tmp testpool ; zpool export testpool ; done
while :; do zfs create testpool/fs; zfs destroy testpool/fs ; done
Eventually this will result in several core dumps like this one:
[root@52-54-00-d3-7a-01 /cores]# mdb core.zfs.4244
Loading modules: [ libumem.so.1 libc.so.1 libtopo.so.1 libavl.so.1
libnvpair.so.1 ld.so.1 ]
> ::stack
libzfs.so.1`zpool_close+0x17(0, 0, 0, 8047450)
libzfs.so.1`zfs_create+0x1bb(8090548, 8047e6f, 1, 808cba8)
zfs_do_create+0x545(2, 8047d74, 80778a0, 801, 0, 3)
main+0x22c(8047d2c, fef5c6e8, 8047d64, 8055a17, 3, 8047d70)
_start+0x83(3, 8047e64, 8047e68, 8047e6f, 0, 8047e7b)
>
Fix and reproducer (systemtap): https://github.com/zfsonlinux/zfs/pull/6096
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: loli10K <ezomori.nozomu@gmail.com>
MFC after: 2 weeks
illumos/illumos-gate@dbfd9f9300dbfd9f9300https://www.illumos.org/issues/8156
dbuf_evict_notify() holds the dbuf_evict_lock while checking if it should do
the eviction itself (because the evict thread is not able to keep up).
This can result in massive lock contention.
It isn't necessary to hold the lock, because if we make the wrong choice
occasionally, nothing bad will happen.
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after: 1 week
illumos/illumos-gate@5b062782535b06278253https://www.illumos.org/issues/8005
RAID-Z requires that space be allocated in multiples of P+1 sectors,
because this is the minimum size block that can have the required amount
of parity. Thus blocks on RAIDZ1 must be allocated in a multiple of 2
sectors; on RAIDZ2 multiple of 3; and on RAIDZ3 multiple of 4. A sector
is a unit of 2^ashift bytes, typically 512B or 4KB.
To satisfy this constraint, the allocation size is rounded up to the
proper multiple, resulting in up to 3 "pad sectors" at the end of some
blocks. The contents of these pad sectors are not used, so we do not
need to read or write these sectors. However, some storage hardware
performs much worse (around 1/2 as fast) on mostly-contiguous writes
when there are small gaps of non-overwritten data between the writes.
Therefore, ZFS creates "optional" zio's when writing RAID-Z blocks that
include pad sectors. If writing a pad sector will fill the gap between
two (required) writes, we will issue the optional zio, thus doubling
performance. The gap-filling performance improvement was introduced in
July 2009.
Writing the optional zio is done by the io aggregation code in
vdev_queue.c. The problem is that it is also subject to the limit on
the size of aggregate writes, zfs_vdev_aggregation_limit, which is by
default 128KB. For a given block, if the amount of data plus padding
written to a leaf device exceeds zfs_vdev_aggregation_limit, the
optional zio will not be written, resulting in a ~2x performance
degradation.
The problem occurs only for certain values of ashift, compressed block
size, and RAID-Z configuration (number of parity and data disks). It
cannot occur with the default recordsize=128KB. If compression is
enabled, all configurations with recordsize=1MB or larger will be
impacted to some degree.
The problem notably occurs with recordsize=1MB, compression=off, with 10
disks in a RAIDZ2 or RAIDZ3 group (with 512B or 4KB sectors). Therefore
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after: 10 days
illumos/illumos-gate@adaec86ad2adaec86ad2https://www.illumos.org/issues/8155
When writing pre-compressed buffers, arc_write() requires that the compression
algorithm used to compress the buffer matches the compression algorithm
requested by the zio_prop_t, which is set by dmu_write_policy().
This makes dmu_write_policy() and its callers a bit more complicated.
We can simplify this by making arc_write() trust the caller to supply the type
of pre-compressed buffer that it wants to write, and override the compression
setting in the zio_prop_t.
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after: 10 days
illumos/illumos-gate@79809f9cf479809f9cf4https://www.illumos.org/issues/8269
It seems that currently normalization of stddev aggregation is done
incorrectly.
We divide both the sum of values and the sum of their squares by the
normalization factor. But we should divide the sum of squares by the
normalization factor squared to scale the original values properly.
FreeBSD note: the actual change was committed in r316853, this commit
adds the test files and record merge information.
Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after: 1 week
Sponsored by: Panzura
8269 dtrace stddev aggregation is normalized incorrectly
illumos/illumos-gate@79809f9cf479809f9cf4https://www.illumos.org/issues/8269
It seems that currently normalization of stddev aggregation is done
incorrectly.
We divide both the sum of values and the sum of their squares by the
normalization factor. But we should divide the sum of squares by the
normalization factor squared to scale the original values properly.
Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
illumos/illumos-gate@79809f9cf479809f9cf4https://www.illumos.org/issues/8269
It seems that currently normalization of stddev aggregation is done
incorrectly.
We divide both the sum of values and the sum of their squares by the
normalization factor. But we should divide the sum of squares by the
normalization factor squared to scale the original values properly.
Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
illumos/illumos-gate@22c8b9583d22c8b9583dhttps://www.illumos.org/issues/8108
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
illumos/illumos-gate@0255edcc850255edcc85https://www.illumos.org/issues/8056
The send size estimate for a zvol can be too low, if the size of the record
headers (dmu_replay_record_t's) is a significant portion of the size.
This is typically the case when the data is highly compressible, especially
with embedded blocks.
The problem is that dmu_adjust_send_estimate_for_indirects() assumes that
blocks are the size of the "recordsize" property (128KB).
However, for zvols, the blocks are the size of the "volblocksize" property
(8KB). Therefore, we estimate that there will be 16x less record headers than
there really will be.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Paul Dagnelie <pcd@delphix.com>
illumos/illumos-gate@dbfd9f9300dbfd9f9300https://www.illumos.org/issues/8156
dbuf_evict_notify() holds the dbuf_evict_lock while checking if it should do
the eviction itself (because the evict thread is not able to keep up).
This can result in massive lock contention.
It isn't necessary to hold the lock, because if we make the wrong choice
occasionally, nothing bad will happen.
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>