Commit Graph

123382 Commits

Author SHA1 Message Date
Alexander Motin
9abeb6d79e MFV r336958: 9337 zfs get all is slow due to uncached metadata
This project's goal is to make read-heavy channel programs and zfs(1m)
administrative commands faster by caching all the metadata that they will
need in the dbuf layer. This will prevent the data from being evicted, so
that any future call to i.e. zfs get all won't have to go to disk (very
much).

illumos/illumos-gate@adb52d9262

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author:     Matthew Ahrens <mahrens@delphix.com>
2018-07-31 00:58:21 +00:00
Alexander Motin
d1cf4052d0 MFV r336955: 9236 nuke spa_dbgmsg
We should use zfs_dbgmsg instead of spa_dbgmsg.  Or at least,
metaslab_condense() should call zfs_dbgmsg because it's important and rare
enough to always log. It's possible that the message in zio_dva_allocate()
would be too high-frequency for zfs_dbgmsg.

illumos/illumos-gate@21f7c81cc1

Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
2018-07-31 00:47:27 +00:00
Alexander Motin
656d46109d MFV r336952: 9192 explicitly pass good_writes to vdev_uberblock/label_sync
Currently vdev_label_sync and vdev_uberblock_sync take a zio_t and assume
that its io_private is a pointer to the good_writes count. They should
instead accept this argument explicitly.

illumos/illumos-gate@a3b5583021

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
2018-07-31 00:37:45 +00:00
Alexander Motin
194000fa21 MFV r336950: 9290 device removal reduces redundancy of mirrors
Mirrors are supposed to provide redundancy in the face of whole-disk failure
and silent damage (e.g. some data on disk is not right, but ZFS hasn't
detected the whole device as being broken). However, the current device
removal implementation bypasses some of the mirror's redundancy.

illumos/illumos-gate@3a4b1be953

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Tim Chase <tim@chase2k.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
2018-07-31 00:25:39 +00:00
Alexander Motin
9cd6f162c0 MFV r336948: 9112 Improve allocation performance on high-end systems
On high-end systems running async sequential write workloads, especially
NUMA systems with flash or NVMe storage, one significant performance
bottleneck is selecting a metaslab to do allocations from. This process
can be parallelized, providing significant performance increases for
these workloads.

illumos/illumos-gate@f78cdc34af

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Alexander Motin <mav@FreeBSD.org>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Paul Dagnelie <pcd@delphix.com>
2018-07-31 00:02:42 +00:00
Alexander Motin
6413a6d31f MFV r336946: 9238 ZFS Spacemap Encoding V2
The current space map encoding has the following disadvantages:
[1] Assuming 512 sector size each entry can represent at most 16MB for a segment.
This makes the encoding very inefficient for large regions of space.
[2] As vdev-wide space maps have started to be used by new features (i.e.
device removal, zpool checkpoint) we've started imposing limits in the
vdevs that can be used with them based on the maximum addressable offset
(currently 64PB for a top-level vdev).

The new remains backwards compatible with the old one. The introduced
two-word entry format, besides extending the limits imposed by the single-entry
layout, also includes a vdev field and some extra padding after its prefix.

The extra padding after the prefix should is reserved for future usage (e.g.
new prefixes for future encodings or new fields for flags). The new vdev field
not only makes the space maps more self-descriptive, but also opens the doors
for pool-wide space maps.

One final important note is that the number of bits used for vdevs is reduced
to 24 bits for blkptrs. That was decided as we don't know of any setups that
use more than 16M vdevs for the time being and
we wanted to fit the vdev field in the space map. In addition that gives us
some extra bits in dva_t.

illumos/illumos-gate@17f11284b4

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <gwilson@zfsmail.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
2018-07-30 23:47:38 +00:00
Alexander Motin
eb235f2f8e MFV r336942: 9189 Add debug to vdev_label_read_config when txg check fails
illumos/illumos-gate@b6bf6e1540

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Approved by: Matt Ahrens <mahrens@delphix.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2018-07-30 22:03:29 +00:00
Michael Tuexen
888973f5ae Allow implicit TCP connection setup for TCP/IPv6.
TCP/IPv4 allows an implicit connection setup using sendto(), which
is used for TTCP and TCP fast open. This patch adds support for
TCP/IPv6.
While there, improve some tests for detecting multicast addresses,
which are mapped.

Reviewed by:		bz@, kbowling@, rrs@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16458
2018-07-30 21:27:26 +00:00
Michael Tuexen
e2662978b8 Send consistent SEG.WIN when using timewait codepath for TCP.
When sending TCP segments from the timewait code path, a stored
value of the last sent window is used. Use the same code for
computing this in the timewait code path as in the main code
path used in tcp_output() to avoiv inconsistencies.

Reviewed by:		rrs@
MFC after:		1 month
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16503
2018-07-30 21:13:42 +00:00
Ed Maste
22e56aea3f msdosfs: use same max filesize #define as NetBSD and move to header
For use by makefs msdosfs support.

Obtained from:	NetBSD denode.h 1.6
Sponsored by:	The FreeBSD Foundation
2018-07-30 20:36:51 +00:00
Michael Tuexen
8db239dc6b Fix some TCP fast open issues.
The following issues are fixed:
* Whenever a TCP server with TCP fast open enabled, calls accept(),
  recv(), send(), and close() before the TCP-ACK segment has been received,
  the TCP connection is just dropped and the reception of the TCP-ACK
  segment triggers the sending of a TCP-RST segment.
* Whenever a TCP server with TCP fast open enabled, calls accept(), recv(),
  send(), send(), and close() before the TCP-ACK segment has been received,
  the first byte provided in the second send call is not transferred.
* Whenever a TCP client with TCP fast open enabled calls sendto() followed
  by close() the TCP connection is just dropped.

Reviewed by:		jtl@, kbowling@, rrs@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16485
2018-07-30 20:35:50 +00:00
Rick Macklem
743d528198 Silence newer gcc warnings.
Newer versions of gcc generate "set, but not used" warnings.
Add __unused macros to silence these warnings.
Although the variables are not being used, they are values parsed from
arguments to callback RPCs that might be needed in the future.

Requested by:	mmacy
2018-07-30 20:25:32 +00:00
Michael Tuexen
6138da62a9 Add missing send/recv dtrace probes for TCP.
These missing probe are mostly in the syncache and timewait code.

Reviewed by:		markj@, rrs@
MFC after:		1 month
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16369
2018-07-30 20:13:38 +00:00
Justin Hibbits
2a9ee5fcfe snd_hda: Make codec control path endian safe
The CORB and RIRB buffers exist in DMA memory, but the device reads them as
little-endian only.  Read and write as LE into the DMA memory block, to work on
BE platforms.
2018-07-30 20:00:56 +00:00
Justin Hibbits
0cd8715680 Add ofw_bus_if.h to the SRCS list for ipmi module on powerpc64
PR:		230194
Reported by:	sbruno
2018-07-30 18:29:20 +00:00
Kyle Evans
1ddc8a8e68 Follow up to r336919 and r336921: s/efi.rt_disabled/efi.rt.disabled/
The latter matches the rest of the tree better [0]. The UPDATING entry has
been updated to reflect this, and the new tunable is now documented in
loader(8) [1].

Reported by:	imp [0], Shawn Webb [1]
2018-07-30 18:13:20 +00:00
Mark Johnston
4765717321 Remove a redundant check.
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2018-07-30 17:58:41 +00:00
Kyle Evans
164138e7d8 amd64/GENERIC: Enable EFIRT by default
As noted in UDPATING, the new loader tunable efi.rt_disabled may be used to
disable EFIRT at runtime. It should have no effect if you are not booted via
UEFI boot.

MFC after:	6 weeks
2018-07-30 17:54:18 +00:00
Kyle Evans
21307740e0 efirt: Add tunable to allow disabling EFI Runtime Services
Leading up to enabling EFIRT in GENERIC, allow runtime services to be
disabled with a new tunable: efi.rt_disabled. This makes it so that EFIRT
can be disabled easily in case we run into some buggy UEFI implementation
and fail to boot.

Discussed with:	imp, kib
MFC after:	1 week
2018-07-30 17:40:27 +00:00
Justin Hibbits
9b8d0a4615 powerpcspe: Unconditionally save an restore SPEFSCR on task switch
The SPEFSCR is not guarded by the SPV bit in MSR, it's just another SPR.
Protect processes from other tasks setting the SPEFSCR for their own needs.
2018-07-30 17:03:15 +00:00
Konstantin Belousov
8e36389535 Remove unneeded CLDs instructions in the SMAP-ed version of several
functions from support.S.

I believe they re-appeared due to me mis-merging my r327820 into the
topic branch.

Sponsored by:	The FreeBSD Foundation
2018-07-30 16:54:51 +00:00
Andrew Turner
9fc89b6017 Enable VIMAGE on arm64 again. A workaround for modules with static VNET
variables has been committed so these should work now.

PR:		223670
Sponsored by:	DARPA, AFRL
2018-07-30 15:57:58 +00:00
Alan Somers
6040822c4e Make timespecadd(3) and friends public
The timespecadd(3) family of macros were imported from NetBSD back in
r35029. However, they were initially guarded by #ifdef _KERNEL. In the
meantime, we have grown at least 28 syscalls that use timespecs in some
way, leading many programs both inside and outside of the base system to
redefine those macros. It's better just to make the definitions public.

Our kernel currently defines two-argument versions of timespecadd and
timespecsub.  NetBSD, OpenBSD, and FreeDesktop.org's libbsd, however, define
three-argument versions.  Solaris also defines a three-argument version, but
only in its kernel.  This revision changes our definition to match the
common three-argument version.

Bump _FreeBSD_version due to the breaking KPI change.

Discussed with:	cem, jilles, ian, bde
Differential Revision:	https://reviews.freebsd.org/D14725
2018-07-30 15:46:40 +00:00
Justin Hibbits
bdafaf0aee snd_hda: Print error codes in decimal, rather than hex
It's easy to confuse the error code as naked it looks decimal (EINVAL is
reported as error 16, instead of error 22, so first reading looks like EBUSY).
2018-07-30 15:19:59 +00:00
Justin Hibbits
cf40916b63 snd_hda: Only free streams DMA maps if the streams list has been created
If hdac_attach fails prior to allocating sc->streams, cleanup in the
hdac_attach_fail label will dereference a NULL pointer, panicking.
2018-07-30 15:15:33 +00:00
Andrew Turner
b6ea4c5a2a As with DPCPU_DEFINE_STATIC make VNET_DEFINE_STATIC non-static on arm64 in
modules. It also fails in the same way, we are unable to relocate static
variables as the compiler uses PC-relative loads with nothing for the
kernel linker to relocate.

Sponsored by:	DARPA, AFRL
2018-07-30 15:05:07 +00:00
Andrew Turner
cd2106eaea Ensure the DPCPU and VNET module spaces are aligned to hold a pointer.
Previously they may have been aligned to a char, leading to misaligned
DPCPU and VNET variables.

Sponsored by:	DARPA, AFRL
2018-07-30 14:25:17 +00:00
Andrew Turner
bc61d94997 As with DPCPU_DEFINE make it a compile error to use static with VNET_DEFINE.
There is the VNET_DEFINE_STATIC macro for that.
2018-07-30 12:44:44 +00:00
Ruslan Bukin
a304bc9729 Disable VIMAGE on RISC-V.
Similar to r326179 ("Temporarily disable VIMAGE on arm64") creation of
if_lagg or epair on RISC-V results a kernel panic.

Sponsored by:	DARPA, AFRL
2018-07-30 12:22:49 +00:00
Roger Pau Monné
5477025a10 xen/grants: fix deadlocks in the free callbacks
This fixes the panic caused by deadlocking when grant-table free
callbacks are used.

The cause of the recursion is: check_free_callbacks() is always called
with the lock gnttab_list_lock held. In turn the callback function is
also called with the lock held. Then when the client uses any of the grant
reference methods which also attempt the lock the gnttab_list_lock
mutex from within the free callback a deadlock happens.

Fix this by making the gnttab_list_lock recursive.

Submitted by:		Pratyush Yadav <pratyush@freebsd.org>
Differential Revision:	https://reviews.freebsd.org/D16505
2018-07-30 11:41:51 +00:00
Roger Pau Monné
83c2fa73e6 xen-blkfront: fix memory leak in xbd_connect error path
If gnttab_grant_foreign_access() fails for any of the indirection
pages, the code breaks out of both the loops without freeing the local
variable indirectpages, causing a memory leak.

Submitted by:		Pratyush Yadav <pratyush@freebsd.org>
Differential Review:	https://reviews.freebsd.org/D16136
2018-07-30 11:27:51 +00:00
Roger Pau Monné
8b19549b0e xen-blkfront: fix length check
Length is an unsigned integer, so checking against < 0 doesn't make
sense. While there also make clear that a length of 0 always succeeds.

Submitted by:		Pratyush Yadav <pratyush@freebsd.org>
Differential Review:	https://reviews.freebsd.org/D16045
2018-07-30 11:15:20 +00:00
Andrew Turner
011dc75d9c Remove teh non-INTRNG code from the ARM GIC interrupt controller driver.
We don't build for the non-INTRNG case and it was makeing the code harder
to read.
2018-07-30 10:55:02 +00:00
Randall Stewart
4ad5b7a0ac This fixes a hole where rack could end up
sending an invalid segment into the reassembly
queue. This would happen if you enabled the
data after close option.

Sponsored by:	Netflix
Differential Revision: https://reviews.freebsd.org/D16453
2018-07-30 10:23:29 +00:00
Andrew Turner
3810edcf6b Require ARMv5 for arm. All current kernels are for ARMv5 or later, and it
will allow us to clean out old ARMv4 (and earlier) specific assembly.

Relnotes:	yes
2018-07-30 09:50:26 +00:00
David E. O'Brien
455d358977 Correct copyright dates. 2018-07-30 07:01:00 +00:00
Alan Cox
cd914c2fc2 Prepare for adding psind == 1 support to armv6's pmap_enter().
Precompute the new PTE before entering the critical section.

Eliminate duplication of the pmap and pv list unlock operations in
pmap_enter() by implementing a single return path.  Otherwise, the
duplication will only increase with the upcoming support for psind == 1.

Reviewed by:	mmel
Tested by:	mmel
Discussed with:	kib, markj
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D16443
2018-07-30 01:54:25 +00:00
Rick Macklem
8014c97147 Silence newer gcc warnings.
Newer versions of gcc generate "set, but not used" warnings in the NFS server.
Add __unused macros to silence these warnings.

Requested by:	mmacy
2018-07-29 21:51:17 +00:00
Konstantin Belousov
b3a7db3b06 Use SMAP on amd64.
Ifuncs selectors dispatch copyin(9) family to the suitable variant, to
set rflags.AC around userspace access.  Rflags.AC bit is cleared in
all kernel entry points unconditionally even on machines not
supporting SMAP.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D13838
2018-07-29 20:47:00 +00:00
Alan Somers
9dea3ac8c2 freebsd32_getrusage(2): skip freebsd32_rusage_out on error
PR:		230153
Reported by:	kib
MFC after:	2 weeks
X-MFC-With:	336871
Differential Revision:	https://reviews.freebsd.org/D16500
2018-07-29 19:20:13 +00:00
Alan Somers
5cf35a100c getrusage(2): fix return value under 32-bit emulation
According to the man page, getrusage(2) should return EFAULT if the rusage
argument lies outside of the process's address space. But due to an
oversight in r100384, that's never been the case during 32-bit emulation.
Fix it.

PR:		230153
Reported by:	tests(7)
Reviewed by:	cem
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D16500
2018-07-29 18:22:26 +00:00
Ian Lepore
6dfd050075 The device ID tables are used only within the driver, make them static so
that both of these drivers can exist in the same kernel.
2018-07-29 16:55:28 +00:00
Antoine Brodin
ccd6ac9f6e Add allow.mlock to jail parameters
It allows locking or unlocking physical pages in memory within a jail

This allows running elasticsearch with "bootstrap.memory_lock" inside a jail

Reviewed by:	jamie@
Differential Revision:	https://reviews.freebsd.org/D16342
2018-07-29 12:41:56 +00:00
Don Lewis
290d906084 Fix the long term ULE load balancer so that it actually works. The
initial call to sched_balance() during startup is meant to initialize
balance_ticks, but does not actually do that since smp_started is
still zero at that time.  Since balance_ticks does not get set,
there are no further calls to sched_balance().  Fix this by setting
balance_ticks in sched_initticks() since we know the value of
balance_interval at that time, and eliminate the useless startup
call to sched_balance().  We don't need to randomize the intial
value of balance_ticks.

Since there is now only one call to sched_balance(), we can hoist
the tests at the top of this function out to the caller and avoid
the overhead of the function call when running a SMP kernel on UP
hardware.

PR:		223914
Reviewed by:	kib
MFC after:	2 weeks
2018-07-29 00:30:06 +00:00
Rick Macklem
a3e709cd33 Modify the NFSv4.1 server so that it allows ReclaimComplete as done by ESXi 6.7.
I believe that a ReclaimComplete with rca_one_fs == TRUE is only
to be used after a file system has been transferred to a different
file server.  However, RFC5661 is somewhat vague w.r.t. this and
the ESXi 6.7 client does both a ReclaimComplete with rca_one_fs == TRUE
and one with ReclaimComplete with rca_one_fs == FALSE.
Therefore, just ignore the rca_one_fs == TRUE operation and return
NFS_OK without doing anything instead of replying NFS4ERR_NOTSUPP.
This allows the ESXi 6.7 NFSv4.1 client to do a mount.
After discussion on the NFSv4 IETF working group mailing list, doing this
along with setting a flag to note that a ReclaimComplete with rca_one_fs TRUE
was an appropriate way to handle this.
The flag that indicates that a ReclaimComplete with rca_one_fs == TRUE was
done may be used to disable replies of NFS4ERR_GRACE for non-reclaim
state operations in a future commit.

This patch along with r332790, r334492 and r336357 allow ESXi 6.7 NFSv4.1 mounts
work ok. ESX 6.5 NFSv4.1 mounts do not work well, due to what I believe are
violations of RFC-5661 and should not be used.

Reported by:	andreas.nagy@frequentis.com
Tested by:	andreas.nagy@frequentis.com, daniel@ftml.net (earlier version)
MFC after:	2 weeks
Relnotes:	yes
2018-07-28 20:21:04 +00:00
Andrew Turner
cb5ce014d4 Use the cp15 functions to read cp15 registers rather than using assembly
functions. The former are static inline functions so will compile to a
single instruction.
2018-07-28 17:21:34 +00:00
Andrew Turner
59c9a22424 Remove an unneeded cpu_ident() prototype. 2018-07-28 16:56:46 +00:00
Marius Strobl
c98027b2bc Implement atomic_swap_{32,64,int,long,ptr}(9). 2018-07-28 15:42:57 +00:00
Andrew Turner
a0e00905f0 Remove some write only global values from the arm cpufunc code. 2018-07-28 12:53:10 +00:00
Andrew Turner
836108c21e Remove an unused function from the arm ELF trampoline. It tries to find
properties about the CPU caches, however we never use these values.
2018-07-28 12:52:03 +00:00