124124 Commits

Author SHA1 Message Date
Mateusz Guzik
bca84f54ce zfs: fix a panic after failed mount
r338927("zfs: depessimize zfs_root with rmlocks") failed to error check
the mount before caching root vnode.

Results in crashes in rrw_enter_read_impl tracing back to zfs_mount.

Reported by:	Mike Tancsa
Tested by:	allanjude
Approved by:	re (kib)
2018-10-14 16:14:01 +00:00
Eric Joyner
adf93b56dc em/igb/ix(4): Port two Tx/Rx fixes made to ixl in r339338
- Fix assert/panic on receive when Jumbo Frames are enabled.

From the commit I made to ixl:
"It turns out that *_isc_rxd_available is supposed to return how many
packets are available to be cleaned on the rx ring. This patch removes
a section of code where if the budget argument is 1, the function would return
one if there was a descriptor available, not necessarily a packet.

This is okay in regular mtu 1500 traffic since the max frame size is less
than the configured receive buffer size (2048), but this doesn't work when
received packets can span more than one  descriptor, as is the case when the
mtu is 9000 and the receive buffer size is 4096."

- Fix possible Tx hang because *_isc_txd_credits_update returns incorrect result

From the commit by Krzysztof Galazka to ixl: "Function isc_txd_update_credits
called with clear set to false should return 1 if there are TX descriptors
already handled by HW. It was always returning 0 causing troubles with UDP TX
traffic."

PR:             231659
Reported by:    lev@
Approved by:	re (gjb@)
Sponsored by:   Intel Corporation
2018-10-14 05:09:43 +00:00
Mateusz Guzik
6816c88458 amd64: partially depessimize cpu_fetch_syscall_args and cpu_set_syscall_retval
Vast majority of syscalls take 6 or less arguments. Move handling of other
cases to a fallback function. Similarly, special casing for _syscall
and __syscall
magic syscalls is moved away.

Return is almost always 0. The change replaces 3 branches with 1 in the common
case. Also the 'frame' variable convinces clang not to reload it on each access.

Reviewed by:	kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17542
2018-10-13 21:18:31 +00:00
Navdeep Parhar
c3a88be466 cxgbe(4): Fix a divide-by-zero that occurs when hw.cxgbe.toecaps_allowed
is set to 0 with a kernel that has both TCP_OFFLOAD and RATELIMIT.

Approved by:	re@ (gjb@)
Sponsored by:	Chelsio Communications
2018-10-13 00:13:24 +00:00
Mateusz Guzik
98fca94d22 capsicum: provide cap_rights_fde_inline
Reading caps is in the hot path (on each successful fd lookup), but
completely unnecessarily requires a function call.

Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
2018-10-12 23:48:10 +00:00
Bjoern A. Zeeb
4ba16a92c7 In udp_input() when walking the pcblist we can come across
an inp marked FREED after the epoch(9) changes.
Check once we hold the lock and skip the inp if it is the case.

Contrary to IPv6 the locking of the inp is outside the multicast
section and hence a single check seems to suffice.

PR:		232192
Reviewed by:	mmacy, markj
Approved by:	re (kib)
Differential Revision:	https://reviews.freebsd.org/D17540
2018-10-12 22:51:45 +00:00
Eric Joyner
77c1fcec91 ixl/iavf(4): Change ixlv to iavf and update it to use iflib(9)
Finishes the conversion of the 40Gb Intel Ethernet drivers to iflib(9) for
FreeBSD 12.0, and fixes numerous bugs in both ixl(4) and iavf(4).

This commit also re-adds the VF driver to GENERIC since it now compiles and
functions.

The VF driver name was changed from ixlv(4) to iavf(4) because the VF driver is
now intended to be used with future products, not just with Fortville/Fort Park
VFs.

A man page update that documents these drivers is forthcoming in a separate
commit.

Reviewed by:    sbruno@, kbowling@
Tested by:      jeffrey.e.pieper@intel.com
Approved by:	re (gjb@)
Relnotes:       yes
Sponsored by:   Intel Corporation
Differential Revision: https://reviews.freebsd.org/D16429
2018-10-12 22:40:54 +00:00
Mateusz Guzik
3cf1291d2e amd64: employ MEMMOVE in copyin/copyout
See r339205 for justification.

Reviewed by:	kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17526
2018-10-12 21:59:09 +00:00
Alexander Motin
178777f516 Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children.  Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.

This is a part of ZoL port of device removal patch:

commit a1d477c24c7badc89c60955995fd84d311938486
Author: Matthew Ahrens <mahrens@delphix.com>
Ported-by: Tim Chase <tim@chase2k.com>

Approved by:	re (kib)
MFC after:	1 week
2018-10-12 16:55:28 +00:00
Konstantin Belousov
555225c062 Call initializecpucache() before ifuncs are resolved.
The function tweaks CPU capabilities based on the VM platform and
tunables, which affected selection of the cache flush method before
ifuncs were used, and should affect the cache flush in the same way
after ifunc.

PR:	232081
Reported by:	phk
Analyzed by:	avg
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
2018-10-12 16:00:21 +00:00
Ruslan Bukin
05efeb8430 Initialize interrupt priority to 0 on all sources.
Without this hardware raises an interrupt regardless of any
pending bits set.

This fixes operation on RocketChip and derivatives (e.g. lowRISC).

Approved by:	re (kib)
Sponsored by:	DARPA, AFRL
2018-10-12 15:51:41 +00:00
Konstantin Belousov
78a3652794 bhyve: emulate CLFLUSH and CLFLUSHOPT.
Apparently CLFLUSH on mmio can cause VM exit, as reported in the PR.
I do not see that anything useful can be done except emulating page
faults on invalid addresses.

Due to the instruction encoding pecularity, also emulate SFENCE.

PR:	232081
Reported by:	phk
Reviewed by:	araujo, avg, jhb (all: previous version)
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17482
2018-10-12 15:30:15 +00:00
Ruslan Bukin
053ec0508e Add support for the UART device found in lowRISC system-on-a-chip.
The only source of documentation for this device is verilog,
so driver is minimalistic.

Reviewed by:	Dr Jonathan Kimmitt <jrrk2@cam.ac.uk>
Approved by:	re (kib)
Sponsored by:	DARPA, AFRL
2018-10-12 15:19:41 +00:00
Alexander Motin
770ce5c3bf Add ZIO_TYPE_FREE support for indirect vdevs.
Upstream code expects only ZIO_TYPE_READ and some ZIO_TYPE_WRITE
requests to removed (indirect) vdevs, while on FreeBSD there is also
ZIO_TYPE_FREE (TRIM).  ZIO_TYPE_FREE requests do not have the data
buffers, so don't need the pointer adjustment.

PR:		228750, 229007
Reviewed by:	allanjude, sef
Approved by:	re (kib)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D17523
2018-10-12 15:14:22 +00:00
Bjoern A. Zeeb
3afdfcaf33 r217592 moved the check for imo in udp_input() into the conditional block
but leaving the variable assignment outside the block, where it is no longer
used. Move both the variable and the assignment one block further in.

This should result in no functional changes. It will however make upcoming
changes slightly easier to apply.

Reviewed by:		markj, jtl, tuexen
Approved by:		re (kib)
Differential Revision:	https://reviews.freebsd.org/D17525
2018-10-12 11:30:46 +00:00
Mateusz Guzik
c9964045a0 Add a file missed in r339321
Approved by:	re (implicit)
Sad face: mjg
2018-10-12 00:32:45 +00:00
Mateusz Guzik
3c9a1d0493 amd64: make memmove and memcpy less slow with mov
The reasoning is the same as with the memset change, see r339205

Reviewed by:	kib (previous version)
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17441
2018-10-11 23:37:57 +00:00
Mateusz Guzik
3f102f5881 Provide string functions for use before ifuncs get resolved.
The change is a no-op for architectures which don't ifunc memset,
memcpy nor memmove.

Convert places which need them. Xen bits by royger.

Reviewed by:	kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17487
2018-10-11 23:28:04 +00:00
Yuri Pankov
937b0f2559 Fix PNP entries for if_ix and if_ixv properly using the IFLIB_PNP_INFO()
macro.

Reviewed by:	imp, sbruno
Approved by:	re (gjb), kib (mentor)
Differential Revision:	https://reviews.freebsd.org/D17473
2018-10-11 22:27:12 +00:00
John Baldwin
b843f9be5e Fully restore the GDTR, IDTR, and LDTR after VT-x VM exits.
The VT-x VMCS only stores the base address of the GDTR and IDTR.  As a
result, VM exits use a fixed limit of 0xffff for the host GDTR and
IDTR losing the smaller limits set in when the initial GDT is loaded
on each CPU during boot.  Explicitly save and restore the full GDTR
and IDTR contents around VM entries and exits to restore the correct
limit.

Similarly, explicitly save and restore the LDT selector.  VM exits
always clear the host LDTR as if the LDT was loaded with a NULL
selector and a userspace hypervisor is probably using a NULL selector
anyway, but save and restore the LDT explicitly just to be safe.

PR:		230773
Reported by:	John Levon <levon@movementarian.org>
Reviewed by:	kib
Tested by:	araujo
Approved by:	re (rgrimes)
MFC after:	1 week
2018-10-11 18:27:19 +00:00
Allan Jude
c79b58ccc5 Pull in a follow-on commit to resolve a deadlock in ZFS sequential
resilver (r334844)

MFV/ZoL: Fix deadlock in IO pipeline

commit a76f3d0437e5e974f0f748f8735af3539443b388
Author: Brian Behlendorf <behlendorf1@llnl.gov>
Date:   Fri Mar 16 16:46:06 2018 -0700

    Fix deadlock in IO pipeline

    In vdev_queue_aggregate() the zio_execute() bypass should not be
    called under the vdev queue lock.  This can result in a deadlock
    as shown in the stack traces below.

    Drop the vdev queue lock then walk the parents of the aggregate IO
    to determine the list of component IOs to be bypassed.  This can
    be done safely without holding the io_lock since the new aggregate
    IO has not yet been returned and its parents cannot change.

    ---  THREAD 1 ---
    arc_read()
      zio_nowait()
        zio_vdev_io_start()
          vdev_queue_io() <--- mutex_enter(vq->vq_lock)
            vdev_queue_io_to_issue()
              vdev_queue_aggregate()
                zio_execute()
            vdev_queue_io_to_issue()
              vdev_queue_aggregate()
                zio_execute()
                  zio_vdev_io_assess()
                    zio_wait_for_children() <- mutex_enter(zio->io_lock)

    --- THREAD 2 --- (inverse order)
    arc_read()
      zio_change_priority() <- mutex_enter(zio->zio_lock)
        vdev_queue_change_io_priority() <- mutex_enter(vq->vq_lock)

    Reviewed-by: Tom Caputi <tcaputi@datto.com>
    Reviewed-by: Don Brady <don.brady@delphix.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Reported by:	ZFS Leadership Meeting
Reviewed by:	mav
Approved by:	re (kib)
Obtained from:	ZFS-on-Linux
MFC after:	2 weeks
Sponsored by:	Klara Systems
Differential Revision:	https://reviews.freebsd.org/D17495
2018-10-10 22:59:15 +00:00
Allan Jude
bee8a18986 Add missing sysctls for tuning vdev queue depths for new I/O types
This connects new tunables that were added but not exposed in:
r329502 (zpool remove)
r337007 (zpool initialize)

Reviewed by:	avg
Approved by:	re (kib)
MFC after:	2 weeks
Sponsored by:	Klara Systems
Differential Revision:	https://reviews.freebsd.org/D17494
2018-10-10 22:55:31 +00:00
Allan Jude
cd00e3e1af Resolve a hang in ZFS during vnode reclaimation
This is caused by a deadlock between zil_commit() and zfs_zget()

Add a way for zfs_zget() to break out of the retry loop in the common case

PR:		229614
Reported by:	grembo, Andreas Sommer, many others
Tested by:	Andreas Sommer, Vicki Pfau
Reviewed by:	avg (no objection)
Approved by:	re (gjb)
MFC after:	2 months
Sponsored by:	Klara Systems
Differential Revision:	https://reviews.freebsd.org/D17460
2018-10-10 19:39:47 +00:00
Alexander Motin
7a492ba93c Remove extra thread_exit() call left after r329802.
spa_condense_indirect_thread() is no longer a thread function, but just
a callback for new zthr KPI.

Submitted by:	allanjude
Approved by:	re (gjb)
MFC after:	3 days
2018-10-10 16:34:53 +00:00
Marcin Wojtas
0693b1d2a9 Update Armada 38x UART device tree binding
Recent changes in Linux updated Marvell Armada 38x
UART compatible string. As a result the FreeBSD driver
(uart_dev_snps) does not probe. This commit fixes the
situation, however not applying any functional modification
to the driver methods.

Approved by: re (kib)
Obtained from: Semihalf
2018-10-10 10:34:17 +00:00
Glen Barber
c3fb2eae88 Update head from ALPHA8 to ALPHA9 as part of the 12.0-RELEASE
cycle.

Approved by:	re (implicit)
Sponsored by:	The FreeBSD Foundation
2018-10-09 21:54:58 +00:00
Glen Barber
1da7787f71 Merge the remainder of the projects/openssl111 branch to head.
- Update OpenSSL to version 1.1.1.
- Update Kerberos/Heimdal API for OpenSSL 1.1.1 compatibility.
- Bump __FreeBSD_version.

Approved by:	re (kib)
Sponsored by:	The FreeBSD Foundation
2018-10-09 21:28:26 +00:00
Brooks Davis
c7d0908e1c Regenerated assorted syscall related files after:
- r327895: Implement 'domainset'...
 - r329876: Use linux types for linux-specific syscalls

Diff generated with:
	find . -name syscalls.conf | xargs dirname | \
	    xargs -n1 -I DIR make -C DIR sysent

Approved by:	re (kib)
Sponsored by:	DARPA, AFRL
2018-10-09 20:42:17 +00:00
Stephen Hurd
0544676baf Use mbuf defines to construct csum offload masks rather than literals
Reviewed by:	erj
Approved by:	re (rgrimes)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D17442
2018-10-09 20:16:19 +00:00
Jung-uk Kim
6f1f1a6395 Update ACPICA to 20181003.
Approved by:	re (gjb)
2018-10-09 18:40:36 +00:00
Glen Barber
7c32835287 MFH r338661 through r339253.
Sponsored by:	The FreeBSD Foundation
2018-10-09 14:27:55 +00:00
Jonathan T. Looney
13c6ba6d94 There are three places where we return from a function which entered an
epoch section without exiting that epoch section. This is bad for two
reasons: the epoch section won't exit, and we will leave the epoch tracker
from the stack on the epoch list.

Fix the epoch leak by making sure we exit epoch sections before returning.

Reviewed by:	ae, gallatin, mmacy
Approved by:	re (gjb, kib)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D17450
2018-10-09 13:26:06 +00:00
Rick Macklem
910ccc7727 Fix the pNFS server's reporting of disk space usage for the "#<path>" case.
The pNFS server would report the total disk space used and free for all
of the DSs, even when certain DSs are assigned to the file system via
the "#<path>" suffix used in the "nfsd -p" option argument.
This patch fixes this case. It only reports usage for the file system
that the argument vnode resides on. This is consistent with the non-pNFS
NFSv4 server. In NFSv4 it is possible to have subtrees on other file
systems, but these are not included in the usage information for NFSv4.

Approved by:	re (gjb)
2018-10-09 01:10:50 +00:00
Glen Barber
846803208a Fix a mismerge from head to projects/openssl111.
r339213 was cherry-picked back to head from the project branch, which
caused a conflict.  This commit properly records the mergeinfo from
head.

r339205 was missed, and r339214 is required for reintegration.

Sponsored by:	The FreeBSD Foundation
2018-10-08 19:39:05 +00:00
Glen Barber
fc3f42d80f MFH r339206-r339212, r339215-r339239
Sponsored by:	The FreeBSD Foundation
2018-10-08 18:06:40 +00:00
Alexander Motin
f3b515aea5 Fix r336951 mismerge -- use of uninitialized variable.
Reported by:	tsoome
Approved by:	re (gjb)
MFC after:	3 days
2018-10-08 15:19:03 +00:00
Hans Petter Selasky
2df98d5eec Add missing steering rules for virtual function, VF, in mlx4en(4) driver.
When acting as a VF it is required to add steering rules for all unicast
addresses. Even if promiscious mode is selected. Else incoming data packets
will be dropped.

MFC after:		3 days
Approved by:		re (gjb)
Sponsored by:		Mellanox Technologies
2018-10-08 14:52:21 +00:00
Eric van Gyzen
382000a1fd em/igb: Do not print link state messages
These messages are totally redundant with the iflib messages.
They're also not very useful, since they don't include the
interface name.

Discussed with:	shurd
Approved by:	re (rgrimes)
Sponsored by:	Dell EMC Isilon
2018-10-08 01:28:46 +00:00
Michael Tuexen
6b45121a6d Address the warning regarding duplicate option 'GEOM_PART_GPT' when
configuring kernels for i386, amd64, and arm64.
The 'GEOM_PART_GPT' option was added to the DEFAULTS configuration
in r337967.

Approved by:		re (kib@)
Reviewed by:		ler@
Differential Revision:	https://reviews.freebsd.org/D17458
Sponsored by:		Netflix, Inc.
2018-10-07 15:54:13 +00:00
Michael Tuexen
3535cdc43e Avoid truncating unrecognised parameters when reporting them.
This resulted in sending malformed packets.

Approved by:		re (kib@)
MFC after:		1 week
2018-10-07 15:13:47 +00:00
Michael Tuexen
20a2f77eec Enable TCP Fast Open support for PPC platforms.
Reviewed by:		kbowling@, andreast@
Approved by:		re (kib@)
Differential Revision:	https://reviews.freebsd.org/D17407
2018-10-07 12:56:05 +00:00
Michael Tuexen
3924dfa721 Ensure that the ips_localout counter is incremented for
locally generated SCTP packets sent over IPv4. This make
the behaviour consistent with IPv6.

Reviewed by:		ae@, bz@, jtl@
Approved by:		re (kib@)
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D17406
2018-10-07 11:26:15 +00:00
Justin Hibbits
7e524b0746 powerpc/pseries: EOI interrupts in XICS by setting lowest priority
Discussing with Benjamin Herrenschmidt, OPAL_INT_GET_XIRR masks the
returned priority, so must be resumed before more interrupts can be
handled at this priority.  Since there are only two priorities used in
FreeBSD, we know that the previous priority in an EOI will always be
0xff (lowest priority).

Reviewed by:	nwhitehorn
Approved by:	re(rgrimes)
Differential Revision: https://reviews.freebsd.org/D17361
2018-10-06 18:51:49 +00:00
Justin Hibbits
013cc176c9 powerpc64/powernv: Don't mask MSIs in OPAL
Summary:
Discussing with Benjamin Herrenschmidt, MSIs, and edge-triggered
interrupts in general, must not be masked in XICS and XIVE, else
subsequent interrupts may be ignored.

Testing locally on my Talos II (single CPU, 18-core POWER9), NVMe now
works with MSI, improving read throughput by ~70% (900MB/s -> 1.67GB/s,
with 64MB block size) over INTx interrupts, and snd_hda(4) now will
actually play music with MSI.  Previously, snd_hda(4) would not receive
interrupts, timing out, and declaring the channels dead.

This has also been tested by Kevin Bowling, and others, with great
success.  Kevin reported NVMe unusable on his Talos II prior to this
patch.

Reviewed by:	nwhitehorn, kbowling
Approved by:	re(rgrimes)
Differential Revision: https://reviews.freebsd.org/D17356
2018-10-06 03:20:26 +00:00
Jamie Gritton
08b4333399 Fix the test prohibiting jails from sharing IP addresses.
It's not supposed to be legal for two jails to contain the same IP address,
unless both jails contain only that one address.  This is the behavior
documented in jail(8), and is there to prevent confusion when multiple
jails are listening on IADDR_ANY.

VIMAGE jails (now the default for GENERIC kernels) test this correctly,
but non-VIMAGE jails have been performing an incomplete test when nested
jails are used.

Approved by:	re@ (kib@)
MFC after:	5 days
2018-10-06 02:10:32 +00:00
Stephen Hurd
e873ccd0fc Fix igb corrupting checksums with BPF and VLAN
When using a vlan with igb and the vlanhwcsum option, any mbufs which
already had the TCP, UDP, or SCTP checksum calculated and therefore don't
have the CSUM_[IP|IP6]_[TCP|UDP|SCTP] bits set in the csum_flags field would
have the L4 checksum corrupted by the hardware.

This was caused by the driver setting E1000_TXD_POPTS_TXSM any time a
checksum bit was set OR a vlan tag was present.

The patched driver only sets E1000_TXD_POPTS_TXSM when an offload is
requested.

PR:		231416
Reported by:	pi
Approved by:	re (gjb)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D17404
2018-10-05 20:16:20 +00:00
Mateusz Guzik
97bb9a0818 amd64: make memset less slow with mov
rep stos has a high startup time even on modern microarchitectures like
Skylake. Intel optimization manuals discuss how for small sizes it is
beneficial to go for streaming stores. Since those cannot be used without
extra penalty in the kernel I investigated performance impact of just
regular movs.

The patch below implements a very simple scheme: a 32-byte loop followed
by filling in the remainder of at most 31 bytes. It has a 256 breaking
point on which it falls back to rep stos. It provides a significant win
over the current primitive on several machines I tested (both Intel and
AMD). A 64-byte loop did not provide any benefit even for multiple of 64
sizes.

See the review for benchmark data.

Reviewed by:	kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17398
2018-10-05 19:25:09 +00:00
Glen Barber
01d4e2149e MFH r338661 through r339200.
Sponsored by:	The FreeBSD Foundation
2018-10-05 17:53:47 +00:00
Alexander Motin
1f55b2a4b5 Add sysctls for dbuf metadata cache variables added in r336959.
Approved by:	re (gjb)
MFC after:	1 week
2018-10-05 16:05:59 +00:00
Tom Jones
b6e870116f Convert UDP length to host byte order
When getting the number of bytes to checksum make sure to convert the UDP
length to host byte order when the entire header is not in the first mbuf.

Reviewed by: jtl, tuexen, ae
Approved by: re (gjb), jtl (mentor)
Differential Revision:  https://reviews.freebsd.org/D17357
2018-10-05 12:51:30 +00:00