Commit Graph

215612 Commits

Author SHA1 Message Date
Dimitry Andric
02d4a225db With clang 3.9.0, compiling uplcom results in the following warnings:
sys/dev/usb/serial/uplcom.c:543:29: error: implicit conversion from 'int' to 'int8_t' (aka 'signed char') changes value from 192 to -64 [-Werror,-Wconstant-conversion]
        if (uplcom_pl2303_do(udev, UT_READ_VENDOR_DEVICE, UPLCOM_SET_REQUEST, 0x8484, 0, 1)
            ~~~~~~~~~~~~~~~~       ^~~~~~~~~~~~~~~~~~~~~
sys/dev/usb/usb.h:179:53: note: expanded from macro 'UT_READ_VENDOR_DEVICE'
#define UT_READ_VENDOR_DEVICE   (UT_READ  | UT_VENDOR | UT_DEVICE)
                                 ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~

This is because UT_READ is 0x80, so the int8_t argument is wrapped to a
negative value.  Fix this by using uint8_t instead.

Reviewed by:	imp, hselasky
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D7776
2016-09-04 16:59:35 +00:00
Mateusz Guzik
591df14528 cache: defer freeing entries until after the global lock is dropped
This also defers vdrop for held vnodes.

Glanced at by:	kib
2016-09-04 16:52:14 +00:00
Bruce Evans
f776d19f07 Oops, the previous i386 version of e_fmodf.S and e_fmodl.S was
actually the amd64 version.
2016-09-04 15:08:14 +00:00
Bruce Evans
5432c3bec3 Disconnect the "optimized" asm variants of cos(), sin() and tan() from
the build on i386.  Leave them in the source tree for regression tests.

The asm functions were always much less accurate (by a factor of more
than 10**18 in the worst case).  They were faster on old CPUs.  But
with each new generation of CPUs they get relatively slower.  The
double precision C version's average advantage is about a factor of 2
on Haswell.

The asm functions were already intentionally avoided in float and long
double precision on i386 and in all precisions on amd64.  Float
precision and amd64 give larger advantages to the C version.  The long
double precision C code and compilers' understanding of long double
precision are not so good, so the i387 is still slightly faster for
long double precision, except for the unimportant subcase of huge args
where the sub-optimal C code now somehow beats the i387 by about a
factor of 2.
2016-09-04 14:12:19 +00:00
Mateusz Guzik
1c1c35c74e fd: fix up fdeget_file
It was supposed to return NULL if a fp is not installed.

Facepalm-by: mjg
2016-09-04 13:31:57 +00:00
Bruce Evans
83e449a402 Add asm versions of fmod(), fmodf() and fmodl() on amd64. Add asm
versions of fmodf() amd fmodl() on i387.

fmod is similar to remainder, and the C versions are 3 to 9 times
slower than the asm versions on x86 for both, but we had the strange
mixture of all 6 variants of remainder in asm and only 1 of 6
variants of fmod in asm.
2016-09-04 12:22:14 +00:00
Dag-Erling Smørgrav
e2d1500434 Upgrade to Unbound 1.5.9. 2016-09-04 12:17:57 +00:00
Bruce Evans
a4c138885e Fix missing fmodl() on arches with 53-bit long doubles.
PR:		199422, 211965
MFC after:	1 week
2016-09-04 12:01:32 +00:00
Mateusz Guzik
31977b420a cache: manage negative entry list with a dedicated lock
Since negative entries are managed with a LRU list, a hit requires a
modificaton.

Currently the code tries to upgrade the global lock if needed and is
forced to retry the lookup if it fails.

Provide a dedicated lock for use when the cache is only shared-locked.

Reviewed by:	kib
MFC after:	1 week
2016-09-04 08:58:35 +00:00
Mateusz Guzik
b9042ae1bf cache: put all negative entry management code into dedicated functions
Reviewed by:	kib
MFC after:	1 week
2016-09-04 08:55:15 +00:00
Landon J. Fuller
eb83f2e1ea bhndb(4): Fix probing of bhndb-attached bhnd_nvram devices.
This fixes bhnd(4) nvram handling on devices that map SPROM CSRs via PCI
configuration space.

The probe method previously required that a bhnd(4) device be attached to the
parent bridge; now that the bhnd_nvram device is always attached first, this
unnecessary sanity check always failed.

Approved by:	adrian (mentor, implicit)
2016-09-04 01:47:21 +00:00
Landon J. Fuller
63fb0e8236 bhndb(4): Skip disabled cores when performing bridge configuration probing.
On BCM4321 chipsets, both PCI and PCIe cores are included, with one of
the cores potentially left floating.

Since the PCI core appears first in the device table, and the PCI
profiles appear first in the resource configuration tables, this resulted in
incorrectly matching and using the PCI/v1 resource configuration on PCIe
devices, rather than the correct PCIe/v1 profile.

Approved by:	adrian (mentor, implicit)
2016-09-04 01:43:54 +00:00
Landon J. Fuller
f32befd1a2 siba(4): Add missing bhnd_device/bhnd_device_quirk table terminator entries.
This resulted in an over-read on siba chipsets that failed to match the
existing entries.

Approved by:	adrian (mentor, implicit)
2016-09-04 01:25:46 +00:00
Landon J. Fuller
d9189ae727 Remove empty directories left by r299241, r302190, r304870, and r301410
Approved by:	adrian (mentor, implicit)
2016-09-04 01:17:16 +00:00
Landon J. Fuller
111d7cb2e3 Migrate bhndb(4) to the new bhnd_erom API.
Adds support for probing and initializing bhndb(4) bridge state using
the bhnd_erom API, ensuring that full bridge configuration is available
*prior* to actually attaching and enumerating the bhnd(4) child device,
allowing us to safely allocate bus-level agent/device resources during
bhnd(4) bus enumeration.

- Add a bhnd_erom_probe() method usable by bhndb(4). This is an analogue
  to the existing bhnd_erom_probe_static() method, and allows the bhndb
  bridge to discover the best available erom parser class prior to newbus
  probing of its children.
- Add support for supplying identification hints when probing erom
  devices. This is required on early EXTIF-only chipsets, where chip
  identification registers are not available.
- Migrate bhndb over to the new bhnd_erom API, using bhnd_core_info
  records rather than bridged bhnd(4) device_t references to determine
  the bridged chipsets' capability/bridge configuration.
- The bhndb parent (e.g. if_bwn) is now required to supply a hardware
  priority table to the bridge. The default table is currently sufficient
  for our supported devices.
- Drop the two-pass attach approach we used for compatibility with bhndb(4) in
  the bhnd(4) bus drivers, and instead perform bus enumeration immediately,
  and allocate bridged per-child bus-level resources during that enumeration.

Approved by:	adrian (mentor)
Differential Revision:	https://reviews.freebsd.org/D7768
2016-09-04 00:58:19 +00:00
Mark Johnston
3da0f3c9ae Micro-optimize sleepq_signal().
Lift a comparison out of the loop that finds the highest-priority thread
on the queue.

MFC after:	1 week
2016-09-04 00:29:48 +00:00
Mark Johnston
dd9cb6da0b Respect the caller's hints when performing swap readahead.
The pager getpages interface allows the caller to bound the number of
readahead and readbehind pages, and vm_fault_hold() makes use of this
feature. These bounds were ignored after r305056, causing the swap pager
to potentially page in more than the specified number of pages.

Reported and reviewed by:	alc
X-MFC with:	r305056
2016-09-04 00:25:49 +00:00
Landon J. Fuller
664a749708 Implement a generic bhnd(4) device enumeration table API.
This defines a new bhnd_erom_if API, providing a common interface to device
enumeration on siba(4) and bcma(4) devices, for use both in the bhndb bridge
and SoC early boot contexts, and migrates mips/broadcom over to the new API.

This also replaces the previous adhoc device enumeration support implemented
for mips/broadcom.

Migration of bhndb to the new API will be implemented in a follow-up commit.


- Defined new bhnd_erom_if interface for bhnd(4) device enumeration, along
  with bcma(4) and siba(4)-specific implementations.
- Fixed a minor bug in bhndb that logged an error when we attempted to map the
  full siba(4) bus space (18000000-17FFFFFF) in the siba EROM parser.
- Reverted use of the resource's start address as the ChipCommon enum_addr in
  bhnd_read_chipid(). When called from bhndb, this address is found within the
  host address space, resulting in an invalid bridged enum_addr.
- Added support for falling back on standard bus_activate_resource() in
  bhnd_bus_generic_activate_resource(), enabling allocation of the bhnd_erom's
  bhnd_resource directly from a nexus-attached bhnd(4) device.
- Removed BHND_BUS_GET_CORE_TABLE(); it has been replaced by the erom API.
- Added support for statically initializing bhnd_erom instances, for use prior
  to malloc availability. The statically allocated buffer size is verified both
  at runtime, and via a compile-time assertion (see BHND_EROM_STATIC_BYTES).
- bhnd_erom classes are registered within a module via a linker set, allowing
  mips/broadcom to probe available EROM parser instances without creating a
  strong reference to bcma/siba-specific symbols.
- Migrated mips/broadcom to bhnd_erom_if, replacing the previous MIPS-specific
  device enumeration implementation.

Approved by:	adrian (mentor)
Differential Revision:	https://reviews.freebsd.org/D7748
2016-09-03 23:57:17 +00:00
Andrey A. Chernov
cd3912b6be The bug:
$ echo x | awk '/[[:cntrl:]]/'
x

The NUL character in cntrl class truncates the pattern, and an empty
pattern matches anything. The patch skips NUL as a quick fix.

PR:     195792
Submitted by:   kdrakehp@zoho.com
Approved by:    bwk@cs.princeton.edu (the author)
MFC after:      3 days
2016-09-03 23:04:56 +00:00
Mark Johnston
d96700a6da Remove redefinitions of some kernel types from mbuf.d.
These override the kernel's definitions and do not match in some cases,
which can break scripts that use these types. With r305055, dtrace is able
to trace fields of struct mbuf's anonymous structs and unions, so there is
no need to redefine types already defined in CTF.

MFC after:	3 days
2016-09-03 20:43:59 +00:00
Mark Johnston
dbbaf04f1e Remove support for idle page zeroing.
Idle page zeroing has been disabled by default on all architectures since
r170816 and has some bugs that make it seemingly unusable. Specifically,
the idle-priority pagezero thread exacerbates contention for the free page
lock, and yields the CPU without releasing it in non-preemptive kernels. The
pagezero thread also does not behave correctly when superpage reservations
are enabled: its target is a function of v_free_count, which includes
reserved-but-free pages, but it is only able to zero pages belonging to the
physical memory allocator.

Reviewed by:	alc, imp, kib
Differential Revision:	https://reviews.freebsd.org/D7714
2016-09-03 20:38:13 +00:00
Dimitry Andric
2db7b9f259 With clang 3.9.0, compiling cxgb results in the following warning:
sys/dev/cxgb/cxgb_sge.c:2873:44: error: implicit conversion from 'int'
to 'char' changes value from 128 to -128 [-Werror,-Wconstant-conversion]
                        *mtod(m, char *) = CPL_ASYNC_NOTIF;
                                         ~ ^~~~~~~~~~~~~~~

This is because CPL_ASYNC_NOTIF is 0x80, so the plain char argument is
wrapped to a negative value.  Fix this by using uint8_t instead.

Reviewed by:	np
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D7772
2016-09-03 19:01:11 +00:00
Navdeep Parhar
5a63431265 Use correct CTR<n> variant. 2016-09-03 18:54:26 +00:00
Enji Cooper
74191470e8 Update contrib/netbsd-tests with new content from NetBSD
This updates the snapshot from 09/30/2014 to 08/11/2016

This brings in a number of new testcases from upstream, most
notably:

- bin/cat
- lib/libc
- lib/msun
- lib/libthr
- usr.bin/sort

lib/libc/tests/stdio/open_memstream_test.c was moved to
lib/libc/tests/stdio/open_memstream2_test.c to accomodate
the new open_memstream test from NetBSD.

MFC after:	2 months
Tested on:	amd64 (VMware fusion VM; various bare metal platforms); i386 (VMware fusion VM); make tinderbox
Sponsored by:	EMC / Isilon Storage Division
2016-09-03 18:11:48 +00:00
Enji Cooper
3919472360 Skip testcases 9/10 if jail(8) isn't installed
These testcases require jail support

MFC after:	1 week
Sponsored by:	EMC / Isilon Storage Division
2016-09-03 17:59:46 +00:00
Enji Cooper
46f4fe1eb8 Add a missing "Bail out!" if zpool create fails
This will make the exit info more meaningful if/when zpool create fails,
and establishes parity with the other 2 zfs acl testcases (01, 03).

MFC after:	3 days
Sponsored by:	EMC / Isilon Storage Division
2016-09-03 17:31:13 +00:00
Andrew Turner
6f0c70d446 Explicitly include all .rodata.* sections in the kernel .rodata. This
helps link the kernel with lld as it will then put all these into a single
.rodata section.

MFC after:	1 week
Sponsored by:	ABT Systems Ltd
2016-09-03 17:23:24 +00:00
Jared McNeill
1403e695b7 Use the root key in the Security ID EFUSE (when valid) to generate a
MAC address instead of creating a random one each boot.
2016-09-03 15:28:09 +00:00
Warner Losh
155d3e43ff Don't use -N to set the OMAGIC with data and text writeable and data
not page aligned. To do this, use the ld script gnu ld installs on my
system.

This is imperfect: LDFLAGS_BIN and LD_FLAGS_BIN describe different
things. The loader script could be better named and take into account
other architectures. And having two different mechanisms to do
basically the same thing needs study. However, it's blocking forward
progress on lld, so I'll work in parallel to sort these out.

Differential Revision: https://reviews.freebsd.org/D7409
Reviewed by: emaste
2016-09-03 15:26:28 +00:00
Jared McNeill
d69d5ab04f Add support for Allwinner A64 thermal sensors. 2016-09-03 15:26:00 +00:00
Jared McNeill
1738b325d0 Add cpu-supply xref to cpu@0 2016-09-03 15:24:30 +00:00
Jared McNeill
b18b1b0015 Add SID, THS, and CPU operating points. 2016-09-03 15:23:59 +00:00
Jared McNeill
0503b90dde Add support for reading root key on A83T/A64. 2016-09-03 15:22:50 +00:00
Dag-Erling Smørgrav
a6533d8899 import unbound 1.5.9 2016-09-03 15:08:13 +00:00
Dimitry Andric
3128fa9a5a With clang 3.9.0, compiling ppbus(4) results in the following warnings:
sys/dev/ppbus/ppb_1284.c:296:46: error: implicit conversion from 'int'
to 'char' changes value from 144 to -112 [-Werror,-Wconstant-conversion]
        if ((error = do_peripheral_wait(bus, SELECT | nBUSY, 0))) {
                     ~~~~~~~~~~~~~~~~~~      ~~~~~~~^~~~~~~
sys/dev/ppbus/ppb_1284.c:785:48: error: implicit conversion from 'int'
to 'char' changes value from 240 to -16 [-Werror,-Wconstant-conversion]
                if (do_1284_wait(bus, nACK | SELECT | PERROR | nBUSY,
                    ~~~~~~~~~~~~      ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
sys/dev/ppbus/ppb_1284.c:786:29: error: implicit conversion from 'int'
to 'char' changes value from 240 to -16 [-Werror,-Wconstant-conversion]
                                        nACK | SELECT | PERROR | nBUSY)) {
                                        ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~

This is because nBUSY is 0x80, so the plain char argument is wrapped to
a negative value.  Fix this in a minimal fashion, by using uint8_t in a
few places.

Reviewed by:	emaste
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D7771
2016-09-03 13:48:44 +00:00
Dimitry Andric
402e32a8af Define drmP.h's __OS_HAS_AGP and __OS_HAS_MTRR macros in a defined and
portable way.

Reviewed by:	dumbbell
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D7770
2016-09-03 13:33:28 +00:00
Ed Maste
8aa5c6cfeb remove CONSTRUCTORS from MIPS uboot linker script
The linker script CONSTRUCTORS keyword is only meaningful "when linking
object file formats which do not support arbitrary sections, such as
ECOFF and XCOFF"[1] and is ignored for other object file formats.

LLVM's lld does not yet accept (and ignore) CONSTRUCTORS, so just remove
CONSTRUCTORS from the linker script as it has no effect.

[1] https://sourceware.org/binutils/docs/ld/Output-Section-Keywords.html
2016-09-03 13:01:37 +00:00
Alexander Motin
9b9258a12a Missed FreeBSD-specific piece of r305338. 2016-09-03 11:17:33 +00:00
Alexander Motin
d7e781bda3 MFC r305337: 7004 dmu_tx_hold_zap() does dnode_hold() 7x on same object
Using a benchmark which has 32 threads creating 2 million files in the
same directory, on a machine with 16 CPU cores, I observed poor
performance. I noticed that dmu_tx_hold_zap() was using about 30% of
all CPU, and doing dnode_hold() 7 times on the same object (the ZAP
object that is being held).

dmu_tx_hold_zap() keeps a hold on the dnode_t the entire time it is
running, in dmu_tx_hold_t:txh_dnode, so it would be nice to use the
dnode_t that we already have in hand, rather than repeatedly calling
dnode_hold(). To do this, we need to pass the dnode_t down through
all the intermediate calls that dmu_tx_hold_zap() makes, making these
routines take the dnode_t* rather than an objset_t* and a uint64_t
object number. In particular, the following routines will need to have
analogous *_by_dnode() variants created:

dmu_buf_hold_noread()
dmu_buf_hold()
zap_lookup()
zap_lookup_norm()
zap_count_write()
zap_lockdir()
zap_count_write()

This can improve performance on the benchmark described above by 100%,
from 30,000 file creations per second to 60,000. (This improvement is on
top of that provided by working around the object allocation issue. Peak
performance of ~90,000 creations per second was observed with 8 CPUs;
adding CPUs past that decreased performance due to lock contention.) The
CPU used by dmu_tx_hold_zap() was reduced by 88%, from 340 CPU-seconds
to 40 CPU-seconds.

Sponsored by: Intel Corp.

Closes #109

Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Ned Bass <bass6@llnl.gov>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Author: Matthew Ahrens <mahrens@delphix.com>

openzfs/openzfs@d3e523d489
2016-09-03 11:00:29 +00:00
Alexander Motin
4ad4b70e77 MFV r305336: 7247 zfs receive of deduplicated stream fails
This resolves two 'zfs recv' issues. First, when receiving into an
existing filesystem, a snapshot created during the receive process is
not added to the guid->dataset map for the stream, resulting in failed
lookups for deduped streams when a WRITE_BYREF record refers to a
snapshot received earlier in the stream. Second, the newly created
snapshot was also not set properly, referencing the snapshot before the
new receiving dataset rather than the existing filesystem.

Closes #159

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Author: Chris Williamson <chris.williamson@delphix.com>

openzfs/openzfs@b09697c8c1
2016-09-03 10:59:05 +00:00
Alexander Motin
070da3f779 MFV r305335: 7003 zap_lockdir() should tag hold
zap_lockdir() / zap_unlockdir() should take a "void *tag" argument which
tags the hold on the zap. This will help diagnose programming errors
which misuse the hold on the ZAP.

Sponsored by: Intel Corp.

Closes #108

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Author: Matthew Ahrens <mahrens@delphix.com>

openzfs/openzfs@0780b3eab5
2016-09-03 10:58:14 +00:00
Alexander Motin
72a9a6ded9 7004 dmu_tx_hold_zap() does dnode_hold() 7x on same object
Using a benchmark which has 32 threads creating 2 million files in the
same directory, on a machine with 16 CPU cores, I observed poor
performance. I noticed that dmu_tx_hold_zap() was using about 30% of
all CPU, and doing dnode_hold() 7 times on the same object (the ZAP
object that is being held).

dmu_tx_hold_zap() keeps a hold on the dnode_t the entire time it is
running, in dmu_tx_hold_t:txh_dnode, so it would be nice to use the
dnode_t that we already have in hand, rather than repeatedly calling
dnode_hold(). To do this, we need to pass the dnode_t down through
all the intermediate calls that dmu_tx_hold_zap() makes, making these
routines take the dnode_t* rather than an objset_t* and a uint64_t
object number. In particular, the following routines will need to have
analogous *_by_dnode() variants created:

dmu_buf_hold_noread()
dmu_buf_hold()
zap_lookup()
zap_lookup_norm()
zap_count_write()
zap_lockdir()
zap_count_write()

This can improve performance on the benchmark described above by 100%,
from 30,000 file creations per second to 60,000. (This improvement is on
top of that provided by working around the object allocation issue. Peak
performance of ~90,000 creations per second was observed with 8 CPUs;
adding CPUs past that decreased performance due to lock contention.) The
CPU used by dmu_tx_hold_zap() was reduced by 88%, from 340 CPU-seconds
to 40 CPU-seconds.

Sponsored by: Intel Corp.

Closes #109

Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Ned Bass <bass6@llnl.gov>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Author: Matthew Ahrens <mahrens@delphix.com>

openzfs/openzfs@d3e523d489
2016-09-03 10:54:56 +00:00
Alexander Motin
6ee1596f02 7247 zfs receive of deduplicated stream fails
This resolves two 'zfs recv' issues. First, when receiving into an
existing filesystem, a snapshot created during the receive process is
not added to the guid->dataset map for the stream, resulting in failed
lookups for deduped streams when a WRITE_BYREF record refers to a
snapshot received earlier in the stream. Second, the newly created
snapshot was also not set properly, referencing the snapshot before the
new receiving dataset rather than the existing filesystem.

Closes #159

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Author: Chris Williamson <chris.williamson@delphix.com>

openzfs/openzfs@b09697c8c1
2016-09-03 10:50:43 +00:00
Alexander Motin
7b84e6dc6f 7003 zap_lockdir() should tag hold
zap_lockdir() / zap_unlockdir() should take a "void *tag" argument which
tags the hold on the zap. This will help diagnose programming errors
which misuse the hold on the ZAP.

Sponsored by: Intel Corp.

Closes #108

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Author: Matthew Ahrens <mahrens@delphix.com>

openzfs/openzfs@0780b3eab5
2016-09-03 10:48:48 +00:00
Alexander Motin
d3ec2cdb4a MFV r304157:
7230 add assertions to dmu_send_impl() to verify that stream includes BEGIN and END records

illumos/illumos-gate@12b90ee2d3
https://github.com/illumos/illumos-gate/commit/12b90ee2d3b10689fc45f4930d2392f5f
e1d9cfa

https://www.illumos.org/issues/7230
  A test failure occurred where a send stream had only a BEGIN record. This
  should not be possible if the send returns without error. Prevented this from
  happening in the future by adding an assertion to dmu_send_impl() to verify
  that if the function returns 0 (success) both a BEGIN and END record are
  present. Did this by adding flags to dmu_sendarg_t (indicating whether BEGIN o
r
  END records sent), having dump_record() set flags appropriately, adding VERIFY
  statement to dmu_send_impl().

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matt Krantz <matt.krantz@delphix.com>
2016-09-03 10:10:58 +00:00
Alexander Motin
7aafc9d4c8 MFV r304156: 7235 remove unused func dsl_dataset_set_blkptr
illumos/illumos-gate@bd56f80007
https://github.com/illumos/illumos-gate/commit/bd56f80007857b960e0981ed0797ad8ec
844a96b

https://www.illumos.org/issues/7235
  The function dsl_dataset_set_blkptr() is unused. We should remove it.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
2016-09-03 10:09:23 +00:00
Alexander Motin
929d0128f7 MFV r304159: 7277 zdb should be able to print zfs_dbgmsg's
illumos/illumos-gate@29bdd2f916
https://github.com/illumos/illumos-gate/commit/29bdd2f916366ece37c4748bca6b3d61f
57a223b

https://www.illumos.org/issues/7277
  ztest always prints the debug messages (zfs_dbgmsg()) by calling
  zfs_dbgmsg_print(). We should add a flag to zdb to make it do this as well
  before exiting.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2016-09-03 10:07:46 +00:00
Alexander Motin
c9fa25c110 MFV r304155: 7090 zfs should improve allocation order and throttle allocations
illumos/illumos-gate@0f7643c737
https://github.com/illumos/illumos-gate/commit/0f7643c7376dd69a08acbfc9d1d7d548b
10c846a

https://www.illumos.org/issues/7090
  When write I/Os are issued, they are issued in block order but the ZIO pipelin
e
  will drive them asynchronously through the allocation stage which can result i
n
  blocks being allocated out-of-order. It would be nice to preserve as much of
  the logical order as possible.
  In addition, the allocations are equally scattered across all top-level VDEVs
  but not all top-level VDEVs are created equally. The pipeline should be able t
o
  detect devices that are more capable of handling allocations and should
  allocate more blocks to those devices. This allows for dynamic allocation
  distribution when devices are imbalanced as fuller devices will tend to be
  slower than empty devices.
  The change includes a new pool-wide allocation queue which would throttle and
  order allocations in the ZIO pipeline. The queue would be ordered by issued
  time and offset and would provide an initial amount of allocation of work to
  each top-level vdev. The allocation logic utilizes a reservation system to
  reserve allocations that will be performed by the allocator. Once an allocatio
n
  is successfully completed it's scheduled on a given top-level vdev. Each top-
  level vdev maintains a maximum number of allocations that it can handle
  (mg_alloc_queue_depth). The pool-wide reserved allocations (top-levels *
  mg_alloc_queue_depth) are distributed across the top-level vdevs metaslab
  groups and round robin across all eligible metaslab groups to distribute the
  work. As top-levels complete their work, they receive additional work from the
  pool-wide allocation queue until the allocation queue is emptied.

Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: George Wilson <george.wilson@delphix.com>
2016-09-03 10:04:37 +00:00
Alexander Motin
5535f02daf MFV r303081: 7163 ztest failures due to excess error injection
illumos/illumos-gate@f34284d835
https://github.com/illumos/illumos-gate/commit/f34284d835bc555f987c1310df46c034c
3101155

https://www.illumos.org/issues/7163
  Running zloop from zfs-precommit hit this assertion:
       *panicstr/s
  0xfffffd7fd7419370: assertion failed for thread 0xfffffd7fe29ed240,
  thread-id 577: parent != NULL, file ../../../uts/common/fs/zfs/dbuf.c, line
  1827
       $c
  libc.so.1`_lwp_kill+0xa()
  libc.so.1`_assfail+0x182(fffffd7ffb1c29fa, fffffd7ffb1cc028, 723)
  libc.so.1`assfail+0x19(fffffd7ffb1c29fa, fffffd7ffb1cc028, 723)
  libzpool.so.1`dbuf_dirty+0xc69(10e3bc10, 3601700)
  libzpool.so.1`dbuf_dirty+0x61e(10c73640, 3601700)
  libzpool.so.1`dbuf_dirty+0x61e(10e28280, 3601700)
  libzpool.so.1`dmu_buf_will_fill+0x64(10e28280, 3601700)
  libzpool.so.1`dmu_write+0x1b6(2c7e640, d, 400000002e000000, 200, 3717b40,
  3601700)
  ztest_replay_write+0x568(4950d0, 3717a80, 0)
  ztest_write+0x125(4950d0, d, 400000002e000000, 200, 413f000)
  ztest_io+0x1bb(4950d0, d, 400000002e000000)
  ztest_dmu_write_parallel+0xaa(4950d0, 6)
  ztest_execute+0x83(1, 420c98, 6)
  ztest_thread+0xf4(6)
  libc.so.1`_thrp_setup+0x8a(fffffd7fe29ed240)
  libc.so.1`_lwp_start()
  This is another manifestation of ECKSUM in ztest:
  The lowest level ancestor that’s in memory is the L8 (topmost). The L7
  ancestor is blkid 0x10:
       ::dbufs -O 0x2c7e640 -o d -l 7 |::dbuf
  addr object lvl blkid holds os
  600be50 d 7 4 1 ztest/ds_6
  719d880 d 7 0 4 ztest/ds_6

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
2016-09-03 08:48:51 +00:00
Alexander Motin
1f0bf00253 MFV r303080: 6451 ztest fails due to checksum errors
illumos/illumos-gate@f9eb9fdf19
https://github.com/illumos/illumos-gate/commit/f9eb9fdf196b6ed476e4ffc69cecd8b0d
a3cb7e7

https://www.illumos.org/issues/6451
  Sometimes ztest fails because zdb detects checksum errors. e.g.:
  Traversing all blocks to verify checksums and verify nothing leaked ...
  zdb_blkptr_cb: Got error 50 reading <71, 47, 0, 8000160> DVA0=<0:1cc2000:
  180000> [L0 other uint64[]] sha256 uncompressed LE contiguou
  s unique single size=100000L/100000P birth=271L/271P fill=1
  cksum=c5a3e27d1ed0f894:843bca3a5473c4bf:f76a19b6830a2e4:91292591613a12bf --
  skipping
  zdb_blkptr_cb: Got error 50 reading <71, 47, 0, 800000180> DVA0=<0:ce16800:
  180000> [L0 other uint64[]] sha256 uncompressed LE contigu
  ous unique single size=100000L/100000P birth=840L/840P fill=1
  cksum=5d018f3d061e17f3:6d1584784587bf63:2805a74a0ce37369:ba68a214806c7e75
  -- skipping
  zdb_blkptr_cb: Got error 50 reading <71, 47, 0, 1000000360> DVA0=<0:10d37400:
  180000> [L0 other uint64[]] sha256 uncompressed LE conti
  guous unique single size=100000L/100000P birth=904L/904P fill=1
  cksum=fa1e11d4138bd14b:86c9488c444473e3:f31e43c72e72e46b:e3446472d1174d
  ba -- skipping
  zdb_blkptr_cb: Got error 50 reading <71, 47, 0, 400000002c0> DVA0=<0:127ef400:
  180000> [L0 other uint64[]] sha256 uncompressed LE cont
  iguous dedup single size=100000L/100000P birth=549L/549P fill=1
  cksum=30e14955ebf13522:66dc2ff8067e6810:4607e750abb9d3b3:6582b8af909fcb
  58 -- skipping
  zdb_blkptr_cb: Got error 50 reading <657, 5, 0, 1c0> DVA0=<0:1a180400:180000>
  [L0 other uint64[]] fletcher4 uncompressed LE contiguou
  s unique single size=100000L/100000P birth=1091L/1091P fill=1 cksum=a6cf1e50:
  29b3bd01c57e5:36779b914035db9a:db61cdcf6bec56f0 -- skippin
  g
  The problem is that ztest_fault_inject() can inject multiple faults into the
  same block. It is designed such that it can inject errors on all leafs of a
  RAID-Z or mirror, but for a given range of offsets, it will only inject errors

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Jorgen Lundman <lundman@lundman.net>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>
2016-09-03 08:47:46 +00:00