Commit Graph

124216 Commits

Author SHA1 Message Date
Alexander Motin
178777f516 Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children.  Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.

This is a part of ZoL port of device removal patch:

commit a1d477c24c
Author: Matthew Ahrens <mahrens@delphix.com>
Ported-by: Tim Chase <tim@chase2k.com>

Approved by:	re (kib)
MFC after:	1 week
2018-10-12 16:55:28 +00:00
Konstantin Belousov
555225c062 Call initializecpucache() before ifuncs are resolved.
The function tweaks CPU capabilities based on the VM platform and
tunables, which affected selection of the cache flush method before
ifuncs were used, and should affect the cache flush in the same way
after ifunc.

PR:	232081
Reported by:	phk
Analyzed by:	avg
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
2018-10-12 16:00:21 +00:00
Ruslan Bukin
05efeb8430 Initialize interrupt priority to 0 on all sources.
Without this hardware raises an interrupt regardless of any
pending bits set.

This fixes operation on RocketChip and derivatives (e.g. lowRISC).

Approved by:	re (kib)
Sponsored by:	DARPA, AFRL
2018-10-12 15:51:41 +00:00
Konstantin Belousov
78a3652794 bhyve: emulate CLFLUSH and CLFLUSHOPT.
Apparently CLFLUSH on mmio can cause VM exit, as reported in the PR.
I do not see that anything useful can be done except emulating page
faults on invalid addresses.

Due to the instruction encoding pecularity, also emulate SFENCE.

PR:	232081
Reported by:	phk
Reviewed by:	araujo, avg, jhb (all: previous version)
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17482
2018-10-12 15:30:15 +00:00
Ruslan Bukin
053ec0508e Add support for the UART device found in lowRISC system-on-a-chip.
The only source of documentation for this device is verilog,
so driver is minimalistic.

Reviewed by:	Dr Jonathan Kimmitt <jrrk2@cam.ac.uk>
Approved by:	re (kib)
Sponsored by:	DARPA, AFRL
2018-10-12 15:19:41 +00:00
Alexander Motin
770ce5c3bf Add ZIO_TYPE_FREE support for indirect vdevs.
Upstream code expects only ZIO_TYPE_READ and some ZIO_TYPE_WRITE
requests to removed (indirect) vdevs, while on FreeBSD there is also
ZIO_TYPE_FREE (TRIM).  ZIO_TYPE_FREE requests do not have the data
buffers, so don't need the pointer adjustment.

PR:		228750, 229007
Reviewed by:	allanjude, sef
Approved by:	re (kib)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D17523
2018-10-12 15:14:22 +00:00
Bjoern A. Zeeb
3afdfcaf33 r217592 moved the check for imo in udp_input() into the conditional block
but leaving the variable assignment outside the block, where it is no longer
used. Move both the variable and the assignment one block further in.

This should result in no functional changes. It will however make upcoming
changes slightly easier to apply.

Reviewed by:		markj, jtl, tuexen
Approved by:		re (kib)
Differential Revision:	https://reviews.freebsd.org/D17525
2018-10-12 11:30:46 +00:00
Mateusz Guzik
c9964045a0 Add a file missed in r339321
Approved by:	re (implicit)
Sad face: mjg
2018-10-12 00:32:45 +00:00
Mateusz Guzik
3c9a1d0493 amd64: make memmove and memcpy less slow with mov
The reasoning is the same as with the memset change, see r339205

Reviewed by:	kib (previous version)
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17441
2018-10-11 23:37:57 +00:00
Mateusz Guzik
3f102f5881 Provide string functions for use before ifuncs get resolved.
The change is a no-op for architectures which don't ifunc memset,
memcpy nor memmove.

Convert places which need them. Xen bits by royger.

Reviewed by:	kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17487
2018-10-11 23:28:04 +00:00
Yuri Pankov
937b0f2559 Fix PNP entries for if_ix and if_ixv properly using the IFLIB_PNP_INFO()
macro.

Reviewed by:	imp, sbruno
Approved by:	re (gjb), kib (mentor)
Differential Revision:	https://reviews.freebsd.org/D17473
2018-10-11 22:27:12 +00:00
John Baldwin
b843f9be5e Fully restore the GDTR, IDTR, and LDTR after VT-x VM exits.
The VT-x VMCS only stores the base address of the GDTR and IDTR.  As a
result, VM exits use a fixed limit of 0xffff for the host GDTR and
IDTR losing the smaller limits set in when the initial GDT is loaded
on each CPU during boot.  Explicitly save and restore the full GDTR
and IDTR contents around VM entries and exits to restore the correct
limit.

Similarly, explicitly save and restore the LDT selector.  VM exits
always clear the host LDTR as if the LDT was loaded with a NULL
selector and a userspace hypervisor is probably using a NULL selector
anyway, but save and restore the LDT explicitly just to be safe.

PR:		230773
Reported by:	John Levon <levon@movementarian.org>
Reviewed by:	kib
Tested by:	araujo
Approved by:	re (rgrimes)
MFC after:	1 week
2018-10-11 18:27:19 +00:00
Allan Jude
c79b58ccc5 Pull in a follow-on commit to resolve a deadlock in ZFS sequential
resilver (r334844)

MFV/ZoL: Fix deadlock in IO pipeline

commit a76f3d0437
Author: Brian Behlendorf <behlendorf1@llnl.gov>
Date:   Fri Mar 16 16:46:06 2018 -0700

    Fix deadlock in IO pipeline

    In vdev_queue_aggregate() the zio_execute() bypass should not be
    called under the vdev queue lock.  This can result in a deadlock
    as shown in the stack traces below.

    Drop the vdev queue lock then walk the parents of the aggregate IO
    to determine the list of component IOs to be bypassed.  This can
    be done safely without holding the io_lock since the new aggregate
    IO has not yet been returned and its parents cannot change.

    ---  THREAD 1 ---
    arc_read()
      zio_nowait()
        zio_vdev_io_start()
          vdev_queue_io() <--- mutex_enter(vq->vq_lock)
            vdev_queue_io_to_issue()
              vdev_queue_aggregate()
                zio_execute()
            vdev_queue_io_to_issue()
              vdev_queue_aggregate()
                zio_execute()
                  zio_vdev_io_assess()
                    zio_wait_for_children() <- mutex_enter(zio->io_lock)

    --- THREAD 2 --- (inverse order)
    arc_read()
      zio_change_priority() <- mutex_enter(zio->zio_lock)
        vdev_queue_change_io_priority() <- mutex_enter(vq->vq_lock)

    Reviewed-by: Tom Caputi <tcaputi@datto.com>
    Reviewed-by: Don Brady <don.brady@delphix.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Reported by:	ZFS Leadership Meeting
Reviewed by:	mav
Approved by:	re (kib)
Obtained from:	ZFS-on-Linux
MFC after:	2 weeks
Sponsored by:	Klara Systems
Differential Revision:	https://reviews.freebsd.org/D17495
2018-10-10 22:59:15 +00:00
Allan Jude
bee8a18986 Add missing sysctls for tuning vdev queue depths for new I/O types
This connects new tunables that were added but not exposed in:
r329502 (zpool remove)
r337007 (zpool initialize)

Reviewed by:	avg
Approved by:	re (kib)
MFC after:	2 weeks
Sponsored by:	Klara Systems
Differential Revision:	https://reviews.freebsd.org/D17494
2018-10-10 22:55:31 +00:00
Allan Jude
cd00e3e1af Resolve a hang in ZFS during vnode reclaimation
This is caused by a deadlock between zil_commit() and zfs_zget()

Add a way for zfs_zget() to break out of the retry loop in the common case

PR:		229614
Reported by:	grembo, Andreas Sommer, many others
Tested by:	Andreas Sommer, Vicki Pfau
Reviewed by:	avg (no objection)
Approved by:	re (gjb)
MFC after:	2 months
Sponsored by:	Klara Systems
Differential Revision:	https://reviews.freebsd.org/D17460
2018-10-10 19:39:47 +00:00
Alexander Motin
7a492ba93c Remove extra thread_exit() call left after r329802.
spa_condense_indirect_thread() is no longer a thread function, but just
a callback for new zthr KPI.

Submitted by:	allanjude
Approved by:	re (gjb)
MFC after:	3 days
2018-10-10 16:34:53 +00:00
Marcin Wojtas
0693b1d2a9 Update Armada 38x UART device tree binding
Recent changes in Linux updated Marvell Armada 38x
UART compatible string. As a result the FreeBSD driver
(uart_dev_snps) does not probe. This commit fixes the
situation, however not applying any functional modification
to the driver methods.

Approved by: re (kib)
Obtained from: Semihalf
2018-10-10 10:34:17 +00:00
Glen Barber
c3fb2eae88 Update head from ALPHA8 to ALPHA9 as part of the 12.0-RELEASE
cycle.

Approved by:	re (implicit)
Sponsored by:	The FreeBSD Foundation
2018-10-09 21:54:58 +00:00
Glen Barber
1da7787f71 Merge the remainder of the projects/openssl111 branch to head.
- Update OpenSSL to version 1.1.1.
- Update Kerberos/Heimdal API for OpenSSL 1.1.1 compatibility.
- Bump __FreeBSD_version.

Approved by:	re (kib)
Sponsored by:	The FreeBSD Foundation
2018-10-09 21:28:26 +00:00
Brooks Davis
c7d0908e1c Regenerated assorted syscall related files after:
- r327895: Implement 'domainset'...
 - r329876: Use linux types for linux-specific syscalls

Diff generated with:
	find . -name syscalls.conf | xargs dirname | \
	    xargs -n1 -I DIR make -C DIR sysent

Approved by:	re (kib)
Sponsored by:	DARPA, AFRL
2018-10-09 20:42:17 +00:00
Stephen Hurd
0544676baf Use mbuf defines to construct csum offload masks rather than literals
Reviewed by:	erj
Approved by:	re (rgrimes)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D17442
2018-10-09 20:16:19 +00:00
Jung-uk Kim
6f1f1a6395 Update ACPICA to 20181003.
Approved by:	re (gjb)
2018-10-09 18:40:36 +00:00
Glen Barber
7c32835287 MFH r338661 through r339253.
Sponsored by:	The FreeBSD Foundation
2018-10-09 14:27:55 +00:00
Jonathan T. Looney
13c6ba6d94 There are three places where we return from a function which entered an
epoch section without exiting that epoch section. This is bad for two
reasons: the epoch section won't exit, and we will leave the epoch tracker
from the stack on the epoch list.

Fix the epoch leak by making sure we exit epoch sections before returning.

Reviewed by:	ae, gallatin, mmacy
Approved by:	re (gjb, kib)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D17450
2018-10-09 13:26:06 +00:00
Rick Macklem
910ccc7727 Fix the pNFS server's reporting of disk space usage for the "#<path>" case.
The pNFS server would report the total disk space used and free for all
of the DSs, even when certain DSs are assigned to the file system via
the "#<path>" suffix used in the "nfsd -p" option argument.
This patch fixes this case. It only reports usage for the file system
that the argument vnode resides on. This is consistent with the non-pNFS
NFSv4 server. In NFSv4 it is possible to have subtrees on other file
systems, but these are not included in the usage information for NFSv4.

Approved by:	re (gjb)
2018-10-09 01:10:50 +00:00
Glen Barber
846803208a Fix a mismerge from head to projects/openssl111.
r339213 was cherry-picked back to head from the project branch, which
caused a conflict.  This commit properly records the mergeinfo from
head.

r339205 was missed, and r339214 is required for reintegration.

Sponsored by:	The FreeBSD Foundation
2018-10-08 19:39:05 +00:00
Glen Barber
fc3f42d80f MFH r339206-r339212, r339215-r339239
Sponsored by:	The FreeBSD Foundation
2018-10-08 18:06:40 +00:00
Alexander Motin
f3b515aea5 Fix r336951 mismerge -- use of uninitialized variable.
Reported by:	tsoome
Approved by:	re (gjb)
MFC after:	3 days
2018-10-08 15:19:03 +00:00
Hans Petter Selasky
2df98d5eec Add missing steering rules for virtual function, VF, in mlx4en(4) driver.
When acting as a VF it is required to add steering rules for all unicast
addresses. Even if promiscious mode is selected. Else incoming data packets
will be dropped.

MFC after:		3 days
Approved by:		re (gjb)
Sponsored by:		Mellanox Technologies
2018-10-08 14:52:21 +00:00
Eric van Gyzen
382000a1fd em/igb: Do not print link state messages
These messages are totally redundant with the iflib messages.
They're also not very useful, since they don't include the
interface name.

Discussed with:	shurd
Approved by:	re (rgrimes)
Sponsored by:	Dell EMC Isilon
2018-10-08 01:28:46 +00:00
Michael Tuexen
6b45121a6d Address the warning regarding duplicate option 'GEOM_PART_GPT' when
configuring kernels for i386, amd64, and arm64.
The 'GEOM_PART_GPT' option was added to the DEFAULTS configuration
in r337967.

Approved by:		re (kib@)
Reviewed by:		ler@
Differential Revision:	https://reviews.freebsd.org/D17458
Sponsored by:		Netflix, Inc.
2018-10-07 15:54:13 +00:00
Michael Tuexen
3535cdc43e Avoid truncating unrecognised parameters when reporting them.
This resulted in sending malformed packets.

Approved by:		re (kib@)
MFC after:		1 week
2018-10-07 15:13:47 +00:00
Michael Tuexen
20a2f77eec Enable TCP Fast Open support for PPC platforms.
Reviewed by:		kbowling@, andreast@
Approved by:		re (kib@)
Differential Revision:	https://reviews.freebsd.org/D17407
2018-10-07 12:56:05 +00:00
Michael Tuexen
3924dfa721 Ensure that the ips_localout counter is incremented for
locally generated SCTP packets sent over IPv4. This make
the behaviour consistent with IPv6.

Reviewed by:		ae@, bz@, jtl@
Approved by:		re (kib@)
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D17406
2018-10-07 11:26:15 +00:00
Justin Hibbits
7e524b0746 powerpc/pseries: EOI interrupts in XICS by setting lowest priority
Discussing with Benjamin Herrenschmidt, OPAL_INT_GET_XIRR masks the
returned priority, so must be resumed before more interrupts can be
handled at this priority.  Since there are only two priorities used in
FreeBSD, we know that the previous priority in an EOI will always be
0xff (lowest priority).

Reviewed by:	nwhitehorn
Approved by:	re(rgrimes)
Differential Revision: https://reviews.freebsd.org/D17361
2018-10-06 18:51:49 +00:00
Justin Hibbits
013cc176c9 powerpc64/powernv: Don't mask MSIs in OPAL
Summary:
Discussing with Benjamin Herrenschmidt, MSIs, and edge-triggered
interrupts in general, must not be masked in XICS and XIVE, else
subsequent interrupts may be ignored.

Testing locally on my Talos II (single CPU, 18-core POWER9), NVMe now
works with MSI, improving read throughput by ~70% (900MB/s -> 1.67GB/s,
with 64MB block size) over INTx interrupts, and snd_hda(4) now will
actually play music with MSI.  Previously, snd_hda(4) would not receive
interrupts, timing out, and declaring the channels dead.

This has also been tested by Kevin Bowling, and others, with great
success.  Kevin reported NVMe unusable on his Talos II prior to this
patch.

Reviewed by:	nwhitehorn, kbowling
Approved by:	re(rgrimes)
Differential Revision: https://reviews.freebsd.org/D17356
2018-10-06 03:20:26 +00:00
Jamie Gritton
08b4333399 Fix the test prohibiting jails from sharing IP addresses.
It's not supposed to be legal for two jails to contain the same IP address,
unless both jails contain only that one address.  This is the behavior
documented in jail(8), and is there to prevent confusion when multiple
jails are listening on IADDR_ANY.

VIMAGE jails (now the default for GENERIC kernels) test this correctly,
but non-VIMAGE jails have been performing an incomplete test when nested
jails are used.

Approved by:	re@ (kib@)
MFC after:	5 days
2018-10-06 02:10:32 +00:00
Stephen Hurd
e873ccd0fc Fix igb corrupting checksums with BPF and VLAN
When using a vlan with igb and the vlanhwcsum option, any mbufs which
already had the TCP, UDP, or SCTP checksum calculated and therefore don't
have the CSUM_[IP|IP6]_[TCP|UDP|SCTP] bits set in the csum_flags field would
have the L4 checksum corrupted by the hardware.

This was caused by the driver setting E1000_TXD_POPTS_TXSM any time a
checksum bit was set OR a vlan tag was present.

The patched driver only sets E1000_TXD_POPTS_TXSM when an offload is
requested.

PR:		231416
Reported by:	pi
Approved by:	re (gjb)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D17404
2018-10-05 20:16:20 +00:00
Mateusz Guzik
97bb9a0818 amd64: make memset less slow with mov
rep stos has a high startup time even on modern microarchitectures like
Skylake. Intel optimization manuals discuss how for small sizes it is
beneficial to go for streaming stores. Since those cannot be used without
extra penalty in the kernel I investigated performance impact of just
regular movs.

The patch below implements a very simple scheme: a 32-byte loop followed
by filling in the remainder of at most 31 bytes. It has a 256 breaking
point on which it falls back to rep stos. It provides a significant win
over the current primitive on several machines I tested (both Intel and
AMD). A 64-byte loop did not provide any benefit even for multiple of 64
sizes.

See the review for benchmark data.

Reviewed by:	kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17398
2018-10-05 19:25:09 +00:00
Glen Barber
01d4e2149e MFH r338661 through r339200.
Sponsored by:	The FreeBSD Foundation
2018-10-05 17:53:47 +00:00
Alexander Motin
1f55b2a4b5 Add sysctls for dbuf metadata cache variables added in r336959.
Approved by:	re (gjb)
MFC after:	1 week
2018-10-05 16:05:59 +00:00
Tom Jones
b6e870116f Convert UDP length to host byte order
When getting the number of bytes to checksum make sure to convert the UDP
length to host byte order when the entire header is not in the first mbuf.

Reviewed by: jtl, tuexen, ae
Approved by: re (gjb), jtl (mentor)
Differential Revision:  https://reviews.freebsd.org/D17357
2018-10-05 12:51:30 +00:00
Matt Macy
d9f1b8dbf2 hwpmc: Refactor sample ring buffer handling to fix races
Refactor sample ring buffer ring handling to make it more robust to
long running callchain collection handling

r338112 introduced a (now fixed) regression that exposed a number of race
conditions within the management of the sample buffers. This
simplifies the handling and moves the decision to overwrite a
callchain sample that has taken too long out of the NMI in to the
hardlock handler. With this change the problem no longer shows up as a
ring corruption but as the code spending all of its time in callchain
collection.

- Makes the producer / consumer index incrementing monotonic, making it
  easier (for me at least) to reason about.
- Moves the decision to overwrite a sample from NMI context to interrupt
  context where we can enforce serialization.
- Puts a time limit on waiting to collect a user callchain - putting a
  bound on head-of-line blocking causing samples to be dropped
- Removes the flush routine which was previously needed to purge
  dangling references to the pmc from the sample buffers but now is only
  a source of a race condition on unload.

Previously one could lock up or crash HEAD by running:
pmcstat -S inst_retired.any_p -T and then hitting ^C

After this change it is no longer possible.

PR:	231793
Reviewed by:	markj@
Approved by:	re (gjb@)
Differential Revision:	https://reviews.freebsd.org/D17011
2018-10-05 05:55:56 +00:00
Matt Macy
e8bb589d56 eliminate locking surrounding ui_vmsize and swap reserve by using atomics
Change swap_reserve and swap_total to be in units of pages so that
swap reservations can be done using only atomics instead of using a single
global mutex for swap_reserve and a single mutex for all processes running
under the same uid for uid accounting.

Results in mmap speed up and a 70% increase in brk calls / second.

Reviewed by:	alc@, markj@, kib@
Approved by:	re (delphij@)
Differential Revision:	https://reviews.freebsd.org/D16273
2018-10-05 05:50:56 +00:00
Brooks Davis
9bc603bd20 Revert r339174: Move 32-bit compat support for FIODGNAME to the right place.
A case was missed in this commit which breaks sshing into a 32-bit sshd
on a 64-bit system.

Approved by:	re (gjb)
2018-10-04 23:55:03 +00:00
Ryan Stone
083a010c62 Hold a write lock across udp_notify()
With the new route cache feature udp_notify() will modify the inp when it
needs to invalidate the route cache.  Ensure that we hold a write lock on
the inp before calling the function to ensure that multiple threads don't
race while trying to invalidate the cache (which previously lead to a page
fault).

Differential Revision: https://reviews.freebsd.org/D17246
Reviewed by: sbruno, bz, karels
Sponsored by: Dell EMC Isilon
Approved by:	re (gjb)
2018-10-04 22:03:58 +00:00
Mateusz Guzik
9657b80ce7 amd64: hide non-erms jump label under non-erms copyin/copyout
This change is a no-op in terms of semantics, but has a side effect
of removing a perfectly useless nop sled for CPUs with ERMS.

Approved by:	re (gjb)
Sponsored by:   The FreeBSD Foundation
2018-10-04 20:01:48 +00:00
Oleksandr Tymoshenko
627e5af85a [ig4] style(9) clean-up
Submitted by:	Rajesh Kumar <rajfbsd@gmail.com>
Approved by:	re (gjb, kib)
2018-10-04 19:54:47 +00:00
Brooks Davis
23f2e22802 Move 32-bit compat support for FIODGNAME to the right place.
ioctl(2) commands only have meaning in the context of a file descriptor
so translating them in the syscall layer is incorrect.

The new handler users an accessor to retrieve/construct a pointer from
the last member of the passed structure and relies on type punning to
access the other member which requires no translation.

Reviewed by:	kib
Approved by:	re (rgrimes, gjb)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Review:	https://reviews.freebsd.org/D17388
2018-10-03 20:39:48 +00:00
Gleb Smirnoff
ad7eb8cad5 In PR 227259, a user is reporting that they have code which is using
shutdown() to wakeup another thread blocked on a stream listen socket.
This code is failing, while it used to work on FreeBSD 10 and still
works on Linux.

It seems reasonable to add another exception to support something users are
actually doing, which used to work on FreeBSD 10, and still works on Linux.
And, it seems like it should be acceptable to POSIX, as we still return
ENOTCONN.

This patch is different to what had been committed to stable/11, since
code around listening sockets is different. Patch in D15019 is written
by jtl@, slightly modified by me.

PR:		227259
Obtained from:	jtl
Approved by:	re (kib)
Differential Revision:  D15019
2018-10-03 17:40:04 +00:00
Mark Johnston
7c179abac7 Fix an inverted test in ucode_load_ap().
This caused microcode to be updated only on the BSP if hyperthreading
was disabled, typically resulting in a hang or reset.

Approved by:	re (kib)
Sponsored by:	The FreeBSD Foundation
2018-10-03 14:20:43 +00:00
Michael Tuexen
580e30a33e Use strlcpy() instead of strncpy().
Approved by:            re (kib@)
CID:			1395980, 1395981
X-MFC with:		r339012
MFC after:              1 week
2018-10-03 07:35:16 +00:00
Brooks Davis
4364eab875 Move 32-bit compat support for CDIOREADTOCENTRYS to the right place.
ioctl(2) commands only have meaning in the context of a file descriptor
so translating them in the syscall layer is incorrect.

The new handler users an accessor to retrieve/construct a pointer from
the last member of the passed structure and relies on type punning to
access the other members which require no translation.

Reviewed by:	kib (prior version), jhb
Approved by:	re (rgrimes)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Review:	https://reviews.freebsd.org/D17378
2018-10-02 23:23:56 +00:00
Kevin Bowling
8ac2f3ba9f Use nda(4) on powerpc64
Approved by:	re@ (kib), krion (mentor), imp
Differential Revision:	https://reviews.freebsd.org/D17368
2018-10-02 21:36:00 +00:00
Bjoern A. Zeeb
9cffbc68bd After r338257 is was possible to trigger a KASSERT() in ud6_output()
using an application trying to use a v4mapped destination address on a
kernel without INET support or on a v6only socket.
Catch this case and prevent the packet from going anywhere;
else, without the KASSERT() armed, a v4mapped destination
address might go out on the wire or other undefined behaviour
might happen, while with the KASSERT() we panic.

PR:		231728
Reported by:	Jeremy Faulkner (gldisater gmail.com)
Approved by:	re (kib)
2018-10-02 17:29:56 +00:00
Robert Watson
2ddefb6d5d Rework the logic around quick checks for auditing that take place at
system-call entry and whenever audit arguments or return values are
captured:

1. Expose a single global, audit_syscalls_enabled, which controls
   whether the audit framework is entered, rather than exposing
   components of the policy -- e.g., if the trail is enabled,
   suspended, etc.

2. Introduce a new function audit_syscalls_enabled_update(), which is
   called to update audit_syscalls_enabled whenever an aspect of the
   policy changes, so that the value can be updated.

3. Remove a check of trail enablement/suspension from audit_new() --
   at the point where this function has been entered, we believe that
   system-call auditing is already in force, or we wouldn't get here,
   so simply proceed to more expensive policy checks.

4. Use an audit-provided global, audit_dtrace_enabled, rather than a
   dtaudit-provided global, to provide policy indicating whether
   dtaudit would like system calls to be audited.

5. Do some minor cosmetic renaming to clarify what various variables
   are for.

These changes collectively arrange it so that traditional audit
(trail, pipes) or the DTrace audit provider can enable system-call
probes without the other configured.  Otherwise, dtaudit cannot
capture system-call data without auditd(8) started.

Reviewed by:		gnn
Sponsored by:		DARPA, AFRL
Approved by:		re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17348
2018-10-02 15:58:17 +00:00
Kenneth D. Merry
aabac0c176 Fix a da(4) driver memory leak for SCSI SMR devices.
In the probe case for SCSI SMR Host Aware or Most Managed drives, be sure
to free allocated memory.

sys/cam/scsi/scsi_da.c:
	In dadone_probezone(), free the data pointer before returning.

MFC after:	3 days
Sponsored by:	Spectra Logic
Approved by:	re (kib)
2018-10-01 19:00:46 +00:00
Mark Johnston
93db904d19 Use an unsigned iterator for domain sets.
Otherwise (iter % ds->ds_cnt) is not guaranteed to lie in the range
[0, MAXMEMDOM).

Reported by:	pho
Reviewed by:	kib
Approved by:	re (rgrimes)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17374
2018-10-01 18:51:39 +00:00
Andrew Turner
8696dcdacf Add kernel ifunc support on arm64.
Tested with ifunc resolvers in the kernel and module with calls from
kernel to kernel, module to kernel, and module to module.

Reviewed by:	kib (previous version)
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17370
2018-10-01 18:51:08 +00:00
Mark Johnston
cb4961abc1 Apply r339046 to i386.
Belatedly add a comment to the amd64 pmap explaining why we initialize
the kernel pmap's resident page count.

Reviewed by:	alc, kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17377
2018-10-01 18:48:33 +00:00
Mark Johnston
c6c770d041 Count bootstrap data as resident in the kernel pmap.
Such data may later be unmapped.  This occurs, for example, when a
loader-provided microcode update file is discarded.

Reviewed by:	alc, kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17340
2018-10-01 14:47:49 +00:00
Emmanuel Vadot
47d41ab50e arm64: Raise again L3 table for early devmap
The initial raise in r336519 wasn't enough for using big resolution
(1920 x 1200 for example). Raise it again.

Reported by:	bob prohaska <fbsd@www.zefox.net>
Tested by:	bob prohaska <fbsd@www.zefox.net>
Approved by:	re (gjb@)
2018-10-01 14:27:53 +00:00
Andrew Gallatin
30c5525b3c Allow empty NUMA memory domains to support Threadripper2
The AMD Threadripper 2990WX is basically a slightly crippled Epyc.
Rather than having 4 memory controllers, one per NUMA domain, it has
only 2  memory controllers enabled. This means that only 2 of the
4 NUMA domains can be populated with physical memory, and the
others are empty.

Add support to FreeBSD for empty NUMA domains by:

- creating empty memory domains when parsing the SRAT table,
    rather than failing to parse the table
- not running the pageout deamon threads in empty domains
- adding defensive code to UMA to avoid allocating from empty domains
- adding defensive code to cpuset to avoid binding to an empty domain
    Thanks to Jeff for suggesting this strategy.

Reviewed by:	alc, markj
Approved by:	re (gjb@)
Differential Revision:	https://reviews.freebsd.org/D1683
2018-10-01 14:14:21 +00:00
Michael Tuexen
15a087e551 Mitigate providing a timing signal if the COOKIE or AUTH
validation fails.
Thanks to jmg@ for reporting the issue, which was discussed in
https://admbugs.freebsd.org/show_bug.cgi?id=878

Approved by:            re (TBD@)
MFC after:              1 week
2018-10-01 14:05:31 +00:00
Michael Tuexen
9d2e3f14c4 After allocating chunks set the fields in a consistent way.
This removes two assignments for the flags field being done
twice and adds one, which was missing.
Thanks to Felix Weinrank for reporting the issue he found
by using fuzz testing of the userland stack.

Approved by:            re (kib@)
MFC after:              1 week
2018-10-01 13:09:18 +00:00
Andrey V. Elsukov
384a5c3c28 Add INP_INFO_WUNLOCK_ASSERT() macro and use it instead of
INP_INFO_UNLOCK_ASSERT() in TCP-related code. For encapsulated traffic
it is possible, that the code is running in net_epoch_preempt section,
and INP_INFO_UNLOCK_ASSERT() is very strict assertion for such case.

PR:		231428
Reviewed by:	mmacy, tuexen
Approved by:	re (kib)
Differential Revision:	https://reviews.freebsd.org/D17335
2018-10-01 10:46:00 +00:00
Bjoern A. Zeeb
ef08929f90 Fix the MODULE_PNP_INFO() for iwm(4) where I got the bus and module
arguments wrong in r339020.

PR:			231625
Reported by:		Yuri Pankov (yuripv yuripv.net)
Reviewed by:		cem, Yuri Pankov (yuripv yuripv.net)
Approved by:		re (kib)
Pointyhat to:		bz (a rather big one for this one)
2018-10-01 10:44:33 +00:00
Michael Tuexen
1b084a5e5e Plug mbuf leak in the SCTP input path in an error case.
Approved by:            re (kib@)
MFC after:              1 week
CID:			749312
2018-09-30 21:54:02 +00:00
Michael Tuexen
66bcf0b333 Plug mbuf leaks in the SCTP output path in error cases.
Approved by:            re (kib@)
MFC after:              1 week
CID:			1395307
2018-09-30 21:31:33 +00:00
Allan Jude
72851e8598 Use PNP metadata to allow devmatch to autoload ure(4)
Reviewed by:	manu imp
Approved by:	re (kib)
X-MFC-with:	devmatch
Sponsored by:	Klara Systems
2018-09-30 21:23:31 +00:00
Konstantin Belousov
632227a739 Update x86/ifunc.h.
Remove ifunc emulation.
Add helper for usermode ifunc resolver definition.
Update copyright years.

Sponsored by:	The FreeBSD Foundation
Approved by:	re (rgrimes)
MFC after:	1 week
2018-09-30 16:57:30 +00:00
Michael Tuexen
8184648425 Fix the handling of ancillary data for SCTP socket. Implement
sctp_process_cmsgs_for_init() and sctp_findassociation_cmsgs()
similar to sctp_find_cmsg() to improve consistency and avoid
the signed/unsigned issues in sctp_process_cmsgs_for_init()
and sctp_findassociation_cmsgs().

Thanks to andrew@ for reporting the problem he found using
syzcaller.

Approved by:            re (kib@)
MFC after:              1 week
2018-09-30 16:21:31 +00:00
Michael Tuexen
ae0a9a8850 Increment the corresponding UDP stats counter (udps_opackets) when
sending UDP encapsulated SCTP packets.
This is consistent with the behaviour that when such packets are received,
the corresponding UDP stats counter (udps_ipackets) is incremented.
Thanks to Peter Lei for making me aware of this inconsistency.

Approved by:            re (kib@)
MFC after:              1 week
2018-09-30 12:16:06 +00:00
Bjoern A. Zeeb
916023bda9 Provide MODULE_PNP_INFO() for iwm(4) so that devmatch(8) can
do its job.

PR:		231625
Submitted by:	Yuri Pankov (yuripv yuripv.net)
Approved by:	re (kib)
2018-09-29 21:14:54 +00:00
Konstantin Belousov
5f11ee2075 Fix UP build.
Reported by:	tijl
Sponsored by:	The FreeBSD Foundation
Approved by:	re (rgrimes)
2018-09-29 16:17:35 +00:00
Michael Tuexen
1687b1ab24 For changing the MTU on tun/tap devices, it should not matter whether it
is done via using ifconfig, which uses a SIOCSIFMTU ioctl() command, or
doing it using a TUNSIFINFO/TAPSIFINFO ioctl() command.
Without this patch, for IPv6 the new MTU is not used when creating routes.
Especially, when initiating TCP connections after increasing the MTU,
the old MTU is still used to compute the MSS.
Thanks to ae@ and bz@ for helping to improve the patch.

Reviewed by:		ae@, bz@
Approved by:		re (kib@)
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D17180
2018-09-29 13:01:23 +00:00
Allan Jude
9d967dd27d Avoid panic when adjusting priority of a read in the face of an IO error
PR:		231516
Reported by:	sbruno
Approved by:	re (rgrimes)
Obtained from:	ZFS-on-Linux
X-MFC-with:	334844
Sponsored by:	Klara Systems

MFV/ZoL:	Fix zio->io_priority failed (7 < 6) assert

commit c26cf0966d
Author: Tony Hutter <hutter2@llnl.gov>
Date:   Tue May 29 18:13:48 2018 -0700

  Fix zio->io_priority failed (7 < 6) assert

  This fixes an assert in vdev_queue_change_io_priority():

    VERIFY3(zio->io_priority < ZIO_PRIORITY_NUM_QUEUEABLE) failed (7 < 6)
    PANIC at vdev_queue.c:832:vdev_queue_change_io_priority()

  Reviewed-by: Tom Caputi <tcaputi@datto.com>
  Reviewed-by: George Melikov <mail@gmelikov.ru>
  Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
  Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2018-09-29 01:26:07 +00:00
Oleksandr Tymoshenko
2a99321a57 [sdhci] Add ACPI identifier for AMD eMMC 5.0 controller
Submitted by:	Rajesh Kumar <rajfbsd@gmail.com>
Approved by:	re (rgrimes)
Differential Revision:	https://reviews.freebsd.org/D17189
2018-09-29 00:35:36 +00:00
Michael Tuexen
3552f16d82 Fix typo in comment.
Reported by:		@danfe
Approved by:		re (kib@)
MFC after:		1 week
X-MFC:			r338941
2018-09-28 19:47:32 +00:00
John Baldwin
ff13c0a24f Regenerate after UNIMPL -> OBSOL changes in r339001.
Approved by:	re (gjb)
2018-09-28 17:25:28 +00:00
John Baldwin
46e2054905 Mark various removed system calls as OBSOL instead of UNIMPL.
This is mostly a cosmetic change except that obsolete system calls are
assigned meaningful names in the names arrays which means that using
tools like kdump or truss against binaries invoking these system calls
will print out the name instead of the number.  The script I use to
generate the XML list of syscalls for GDB also ignores UNIMPL but not
OBSOL entries.  In general UNIMPL should only be used to reserve
placeholders for system calls that have never been implemented while
system calls that existed at one time in FreeBSD but were removed
should be marked OBSOL instead.

Reviewed by:	brooks, kib, imp
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17344
2018-09-28 17:23:54 +00:00
Konstantin Belousov
c62637d679 Correct vm_fault_copy_entry() handling of backing file truncation
after the file mapping was wired.

if a wired map entry is backed by vnode and the file is truncated,
corresponding pages are invalidated.  vm_fault_copy_entry() should be
aware of it and allow for invalid pages past end of file. Also, such
pages should be not mapped into userspace.  If userspace accesses the
truncated part of the mapping later, it gets a signal, there is no way
kernel can prevent the page fault.

Reported by:	andrew using syzkaller
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17323
2018-09-28 14:11:38 +00:00
Konstantin Belousov
9f25ab83f9 In vm_fault_copy_entry(), we should not assert that entry is charged
if the dst_object is not of swap type.

It can only happen when entry does not require copy, otherwise
vm_map_protect() already adds the charge. So the assert was right for
the case where swap object was allocated in the vm_fault_copy_entry(),
but not when it was just copied from src_entry and its type is not
swap.

Reported by:	andrew using syzkaller
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17323
2018-09-28 14:11:01 +00:00
Konstantin Belousov
a60d3db15e In vm_fault_copy_entry(), collect the code to initialize a newly
allocated dst_object in a single place.

Suggested and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17323
2018-09-28 14:10:12 +00:00
Konstantin Belousov
f76bec2a28 Revert part of the r338891 which reordered local invalidation and IPI.
For PCID case, there is a dependency between pm_gen zeroing and
reading pm_active for IPI target selection, to ensure that the
invalidation is not missed.

Reported and tested by:	mjg
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
2018-09-28 14:08:20 +00:00
Andrew Turner
9e024036f5 Export ID_AA64ISAR{0,1}_EL1 to userland.
As with r338962 also export the instruction set attribute register. This
will allow userland to identify optional instructions the hardware
supports, for example in a future ifunc handler to decide which
implementation of a function to return.

Approved by:	re (kib)
2018-09-28 11:57:40 +00:00
Glen Barber
d0addc700e Update head from ALPHA7 to ALPHA8 as part of the 12.0-RELEASE
cycle.

Approved by:	re (implicit)
Sponsored by:	The FreeBSD Foundation
2018-09-28 00:01:45 +00:00
Brooks Davis
b7edb6fa77 Centralize compat support for PCIOCGETCONF.
The pre-7.x compat for both native and 32-bit code was already in
pci_user.c. Use this infrastructure to add implement 32-bit support.
This is more correct as ioctl(2) commands only have meaning in the
context of a file descriptor.

Reviewed by:	kib
Approved by:	re (gjb)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential revision:	https://reviews.freebsd.org/D17324
2018-09-27 21:08:32 +00:00
Mateusz Guzik
825eeb55f4 amd64: fix return value of copyinstr after r338970
The function stopped swapping rdi and rsi, but the error handling
code was not updated with the new register name.

Approved by:	re (implicit)
Sponsored by:	The FreeBSD Foundation
2018-09-27 20:48:07 +00:00
Gordon Tetlow
89f652d149 Clear stack allocated data structure to prevent kernel memory leak.
Reported by:	Thomas Barabosch, Fraunhofer FKIE
Reviewed by:	wes@
Approved by:	re (implicit)
Approved by:	so
Security:	FreeBSD-EN-18:12.mem
Security:	CVE-2018-17155
2018-09-27 18:39:54 +00:00
John Baldwin
83382d027f Don't clear DR6 for debug exceptions from userland.
This reverts part of r333368.  The attempt to clear DR6 was occuring
too soon as trapsignal() does not pause to let the debugger notice the
SIGTRAP and query DR6.  The signal exchange does not occur until much
later during ast().  As a result, GDB was no longer recognizing
hardware breakpoints and watchpoints on x86.

In addition, any userland programs that want to inspect DR6 in a
SIGTRAP handler don't have a way to do this if we clear DR6 in the
exception handler.

Instead of relying on the kernel to clear DR6, debuggers will have to
explicitly clear it after a trace trap (which they needed to do on
older kernels anyway).

Reviewed by:	kib
Approved by:	re (delphij)
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D17319
2018-09-27 17:33:59 +00:00
Mateusz Guzik
5910b87605 amd64: macroify and mostly depessimize copyinstr
See r338968 for details.

Reviewed by:	kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17288
2018-09-27 15:53:36 +00:00
Bjoern A. Zeeb
e15e0e3e4d In in6_pcbpurgeif0() called, e.g., from if_clone_destroy(),
once we have a lock, make sure the inp is not marked freed.
This can happen since the list traversal and locking was
converted to epoch(9).  If the inp is marked "freed", skip it.

This prevents a NULL pointer deref panic later on.

Reported by:	slavash (Mellanox)
Tested by:	slavash (Mellanox)
Reviewed by:	markj (no formal review but caught my unlock mistake)
Approved by:	re (kib)
2018-09-27 15:32:37 +00:00
Mateusz Guzik
3d95cc51bb amd64: mostly depessimize copystr
- remove a forward branch in the common case
- replace xchg + lodsb/stosb loop with simple movs

A simple test on Intel(R) Core(TM) i7-4600U CPU @ 2.10GH copying
/foo/bar/baz in a loop goes from 295715863 ops/s to 465807408.

Further changes are pending.

Reviewed by:	kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17281
2018-09-27 15:27:53 +00:00
Mateusz Guzik
0e59ecce47 amd64: clean up copyin/copyout
- move the PSL.AC comment to the fault handler
- stop testing for zero-sized ops. after several minutes of package
building there were no copyin calls with zero bytes and very few
copyout. the semantic of returning 0 in this case is preserved
- shorten exit paths by clearing %eax earlier
- replace xchg with 3 movs. this is what compilers do. a naive
benchmark on EPYC suggests about 1% increase in thoughput thanks to
this change.
- remove the useless movb %cl,%al from copyout. it looks like a
leftover from many years ago

Reviewed by:	kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17286
2018-09-27 15:24:16 +00:00
Mateusz Guzik
a8e3f99ec1 amd64: implement memcmp in assembly
Both the in-kernel C variant and libc asm variant have very poor performance.
The former compiles to a single byte comparison loop, which breaks down even
for small sizes. The latter uses rep cmpsq/b which turn out to have very poor
throughput and are slower than a hand-coded 32-byte comparison loop.

Depending on size this is about 3-4 times faster than the current routines.

Reviewed by:	kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17328
2018-09-27 14:05:44 +00:00
Andrew Turner
e3f284eee7 Export ID_AA64PFR0_EL1 to userland
Create a user view of the ID_AA64PFR0_EL1 register with values common
across all CPUs.

Approved by:	re (kib)
Sponsored by:	ABT Systems Ltd
Differential Revision:	https://reviews.freebsd.org/D17301
2018-09-27 13:54:09 +00:00
Andrew Turner
c7637c4d19 Move the undefined instruction handler to identcpu.c so we have access
to the registers from boot.

Approved by:	re (kib)
Sponsored by:	ABT Systems Ltd
Differential Revision:	https://reviews.freebsd.org/D17301
2018-09-27 13:50:57 +00:00
Mateusz Piotrowski
8385e87c78 newvers.sh: Unbreak building in Git repositories.
Building the kernel in Git repositories when git-svn is not available and
the "help.autocorrect" Git parameter is enabled results in Git trying to
replace the "svn" command (it does not know) with "serve". As a result the
output of the "git server" command is appended to the value of the
environmental variable VERINFO, which causes the auto generated vers.c
file to contain invalid C syntax (missing newline escapes):

    #define "@(#)FreeBSD 12.0-ALPHA7  r000eversion 2
    0015agent=git/2.19.0
    000cls-refs
    0012fetch=shallow
    0012server-option
    0000=5e2272613fa(splash-vt)"
    #define VERSTR "FreeBSD 12.0-ALPHA7  r000eversion 2
    0015agent=git/2.19.0
    000cls-refs
    0012fetch=shallow
    0012server-option
    0000=5e2272613fa(splash-vt)\n"

Using `-c help.autocorrect=0` seems to be a good solution as it does not
modify user's environment. I am not sure, however, if we should use
programs (or Git commands), which we are not sure exist (we never check if
git-svn is available on the host), as there may be more unexpected
behaviors like this one.

Reviewed by:	eadler, emaste, krion
Approved by:	re (gjb), krion (mentor)
Sponsored by:	Bally Wulff Games & Entertainment GmbH
Differential Revision:	https://reviews.freebsd.org/D17271
2018-09-27 12:15:31 +00:00
Andrew Turner
27d2645787 Handle a guest executing a vm instruction by trapping and raising an
undefined instruction exception. Previously we would exit the guest,
however an unprivileged user could execute these.

Found with:	syzkaller
Reviewed by:	araujo, tychon (previous version)
Approved by:	re (kib)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D17192
2018-09-27 11:16:19 +00:00
Navdeep Parhar
ac4031d805 cxgbe(4): Enable support for per-connection rate limiting in the default
firmware configuration files.

Approved by:	re@ (gjb@)
Sponsored by:	Chelsio Communications
2018-09-26 21:16:07 +00:00
Sean Eric Fagan
a7fcb1afcb Add per-session locking to cryptosoft (swcr).
As part of ZFS Crypto, I started getting a series of panics when I did not
have AESNI loaded.  Adding locking fixed it, and I concluded that the
Reinit function altered the AES key schedule.  This locking is not as
fine-grained as it could be (AESNI uses per-cpu locking), but
it's minimally invasive.

Sponsored by: iXsystems Inc
Reviewed by: cem, mav
Approved by: re (gjb), mav (mentor)
Differential Revision: https://reviews.freebsd.org/D17307
2018-09-26 20:23:12 +00:00
Warner Losh
b28589287b Remove bogus spaces.
Spaces aren't allowed in these strings.

Approved by: re@ (glen)
2018-09-26 19:41:00 +00:00
Warner Losh
0dc34160f3 Add PNP info to PCI attachments of cbb, cxgb, ida, iwn, ixl, ixlv,
mfi, mps, mpr, mvs, my, oce, pcn, ral, rl. This only labels existing
pci device tables, and has no probe / attach code changes.

Reviewed by: imp, chuck
Submitted by: Lakhan Shiva Kamireddy <lakhanshiva@gmail.com>
Sponsored by: Google, Inc. (GSoC 2018)
Approved by: re (glen)
2018-09-26 17:12:30 +00:00
Warner Losh
329e817fcc Reapply, with minor tweaks, r338025, from the original commit:
Remove unused and easy to misuse PNP macro parameter

Inspired by r338025, just remove the element size parameter to the
MODULE_PNP_INFO macro entirely.  The 'table' parameter is now required to
have correct pointer (or array) type.  Since all invocations of the macro
already had this property and the emitted PNP data continues to include the
element size, there is no functional change.

Mostly done with the coccinelle 'spatch' tool:

  $ cat modpnpsize0.cocci
    @normaltables@
    identifier b,c;
    expression a,d,e;
    declarer MODULE_PNP_INFO;
    @@
     MODULE_PNP_INFO(a,b,c,d,
    -sizeof(d[0]),
     e);

    @singletons@
    identifier b,c,d;
    expression a;
    declarer MODULE_PNP_INFO;
    @@
     MODULE_PNP_INFO(a,b,c,&d,
    -sizeof(d),
     1);

  $ rg -l MODULE_PNP_INFO -- sys | \
    xargs spatch --in-place --sp-file modpnpsize0.cocci

(Note that coccinelle invokes diff(1) via a PATH search and expects diff to
tolerate the -B flag, which BSD diff does not.  So I had to link gdiff into
PATH as diff to use spatch.)

Tinderbox'd (-DMAKE_JUST_KERNELS).
Approved by: re (glen)
2018-09-26 17:12:14 +00:00
Andrey V. Elsukov
0ddfd867ed Fix witness warning in xform_init().
Do not call crypto_newsession() while holding xforms_lock mutex.
Release mutex before invoking crypto_newsession(), and use
ipsec_kmod_enter()/ipsec_kmod_exit() functions to protect from doing
access to unloaded kernel module memory.

Move xform-releated functions into subr_ipsec.c to be able use
ipsec_kmod_* functions. Also unconditionally build ipsec_kmod_*
functions, since now they are always used by IPSec code.

Add xf_cntr field to struct xformsw, it is used by ipsec_kmod_*
functions. Also constify xf_name field, since it is not expected to be
modified.

Approved by:	re (kib)
Differential Revision:	https://reviews.freebsd.org/D17302
2018-09-26 14:47:51 +00:00
Slava Shwartsman
a4ea412d4f Add PCIV_INVALID definition
From PCI Spec rev 2.2, 6.2.1. Device Identification:
Vendor ID This field identifies the manufacturer of the device. Valid
vendor identifiers are allocated by the PCI SIG to ensure uniqueness.
0FFFFh is an invalid value for Vendor ID.

MFC after:      3 days
Approved by:    re (Glen), hselasky (mentor), kib (mentor)
Sponsored by:   Mellanox Technologies
2018-09-26 13:16:55 +00:00
Michael Tuexen
0277ec9c43 Whitespace changes and fixing a typo. No functional change.
Approved by:	re (kib@)
MFC after:	1 week
2018-09-26 10:24:50 +00:00
Navdeep Parhar
cb8dedc76e cxgbe(4): Treat base/end of firmware parameters as signed integers when
figuring out whether the range is valid or not.

Approved by:	re@ (rgrimes@)
MFC after:	1 week
Sponsored by:	Chelsio Communications
2018-09-26 02:27:37 +00:00
Konstantin Belousov
05e1cca97a Fix some uses of dmaplimit.
dmaplimit is the first byte after the end of DMAP.

Reported by:	"Johnson, Archna" <Archna.Johnson@netapp.com>
Reviewed by:	alc, markj
Approved by:	re (gjb)
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17318
2018-09-25 20:07:58 +00:00
Konstantin Belousov
cbe100dfca Fix an issue in r338862.
For pmap_invalidate_all_pcid(), only reset pm_gen for non-kernel
pmaps, as it was done before the conversion to ifuncs.  The reset is
useless but innocent for kernel_pmap. Coverity reported that cpuid is
used uninitialized in this case.

Reported by:	cem
Reviewed by:	alc, cem, markj
CID:	 1395807
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
Differential revision:	https://reviews.freebsd.org/D17314
2018-09-25 18:24:25 +00:00
Mateusz Guzik
af534f8d99 zfs: depessimize zfs_root with rmlocks
Currently vfs calls the root method on each absolute lookup and when
crossing mount points.

zfs_root ends up looking up the inode internally as if it was not
instantianted which results in significant lock contention on systems
like EPYC.

Store the vnode in the mount point and protect the access with rmlocks.
This is a temporary hack for 12.0.

Sample result:

before:
make -s -j 128 buildkernel 2778.09s user 3319.45s system 8370% cpu 1:12.85 total

after:
make -s -j 128 buildkernel 3199.57s user 1772.78s system 8232% cpu 1:00.40 total

Tested by:	pho (zfs mount/unmount tests)
Reviewed by:	kib, mav, sef (different parts)
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17233
2018-09-25 17:58:06 +00:00
Navdeep Parhar
ea710848dc cxgbe(4): Link related changes.
- Switch to using 32b port/link capabilities in the driver.  The 32b
  format is used internally by firmwares > 1.16.45.0 and the driver will
  now interact with the firmware in its native format, whether it's 16b
  or 32b.  Note that the 16b format doesn't have room for 50G, 200G, or
  400G speeds.

- Add a bit in the pause_settings knobs to allow negotiated PAUSE
  settings to override manual settings.

- Ensure that manual link settings persist across an administrative
  down/up as well as transceiver unplug/replug.

- Remove unused is_*G_port() functions.

Approved by:	re@ (gjb@)
MFC after:	1 month
Sponsored by:	Chelsio Communications
2018-09-25 05:52:42 +00:00
Justin Hibbits
fd8cf3be5f powerpc: Blacklist the top 64kB range of the lower 4GB PA space
The PHB4 host bridge used by the POWER9 uses a 64kB range in 32-bit
space at the address 0xffff0000-0xffffffff.  Reserve this range so that
DMA memory cannot be allocated within this range.  This fixes seemingly
random crashes on a POWER9 system.  Ideally this range will have been
reserved by the firmware, but as of now this is not the case.

Submitted by:	git_bdragon.rtk0.net
Reviewed by:	nwhitehorn
Approved by:	re(kib)
Differential Revision:	https://reviews.freebsd.org/D17183
2018-09-25 02:34:28 +00:00
Colin Percival
189bd7a978 Recognize the Amazon PCI serial device found in i3.metal EC2 instances
as an NS8250 UART.

Reviewed by:	sbruno, imp
Approved by:	re (delphij)
Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D17250
2018-09-24 22:15:04 +00:00
Mark Johnston
463406ac4a Add more NUMA-specific low memory predicates.
Use these predicates instead of inline references to vm_min_domains.
Also add a global all_domains set, akin to all_cpus.

Reviewed by:	alc, jeff, kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17278
2018-09-24 19:24:17 +00:00
John Baldwin
b65a9568f8 Restore the API of the kf_sa_local and kf_sa_peer members.
In 11.x and earlier these were accessible as direct members of 'struct
kinfo_file'.  Existing code already knows about the new location of
these members as well, so wrapper macros did not work for these
fields.  Instead, define an anonymous struct containing the fields
from 'struct kinfo_file' in FreeBSD 11 that were not part of the
'kf_un' union.  This anonymous struct is then placed in an anonymous
union along with the new 'kf_un' union.  This preserves the API of
both structure layouts without requiring any wrapper macros.

PR:		231525
Reviewed by:	kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17262
2018-09-24 18:20:38 +00:00
John Baldwin
7a102e0463 Implement pmap_sync_icache().
This invokes "fence" on the hart performing the write followed by an IPI
to execute "fence.i" on all harts.

This is required to support userland debuggers setting breakpoints in
user processes.

Reviewed by:	br (earlier version), markj
Approved by:	re (gjb)
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D17139
2018-09-24 17:41:29 +00:00
Alexander Motin
cf4a52cf67 Fix use-after-free in RAID0 error reporting of GEOM_RAID.
PR:		231510
Submitted by:	yangx92@hotmail.com
Approved by:	re (gjb)
MFC after:	1 week
2018-09-24 16:58:55 +00:00
Alan Cox
f5fbe90de4 Passing UMA_ZONE_NOFREE to uma_zcreate() for swpctrie_zone and swblk_zone is
redundant, because uma_zone_reserve_kva() is performed on both zones and it
sets this same flag on the zone.  (Moreover, the implementation of the swap
pager does not itself require these zones to be UMA_ZONE_NOFREE.)

Reviewed by:	kib, markj
Approved by:	re (gjb)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D17296
2018-09-24 16:49:02 +00:00
Mark Johnston
3d14a7bb43 Ensure that "domain" is initialized when vm_ndomains == 1.
Reported by:	alc
Approved by:	re (gjb)
2018-09-24 15:32:46 +00:00
Mateusz Guzik
9afff6b1c0 Eliminate false sharing in malloc due to statistic collection
Currently stats are collected in a MAXCPU-sized array which is not
aligned and suffers enormous false-sharing. Fix the problem by
utilizing per-cpu allocation.

The counter(9) API is not used here as it is too incomplete and does
not provide a win over per-cpu zone sized for malloc stats struct. In
particular stats are being reported for each cpu separately by just
copying what is supposed to be an array element for given cpu.

This eliminates significant false-sharing during malloc-heavy tests
e.g. on Skylake. See the review for details.

Reviewed by:	markj
Approved by:	re (kib)
Differential Revision:	https://reviews.freebsd.org/D17289
2018-09-23 19:00:06 +00:00
Michael Tuexen
078a49a077 Remove the unused parameter 'locked' from the function
syncache_respond(). There is no functional change. The
parameter became unused in r313330, but wasn't removed.

Approved by:		re (kib@)
MFC after:		1 month
Sponsored by:		Netflix, Inc.
2018-09-23 16:37:32 +00:00
Konstantin Belousov
b7befdf509 Correct panic messages.
Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
Approved by:	re (rgrimes)
MFC after:	1 week
2018-09-22 17:05:49 +00:00
Konstantin Belousov
108ff63e8a Further reorganize pmap_invalidate TLB code.
Split calculation of mask for shootdown IPI and local
invalidation. Reorder IPI before local.

Suggested by:	alc
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Approved by:	re (rgrimes)
Differential revision:	https://reviews.freebsd.org/D17277
2018-09-22 17:04:39 +00:00
Mateusz Guzik
334107b91b vfs: __predict common case in VFS_EPILOGUE/PROLOGUE
NFS is the only in-tree filesystem using the feature, but all ops test
for it.

Currently the resulting sigdefer calls have to be jumped over in the
common case.

This is a bandaid, longer term fix will move this feature away.

Approved by:	re (kib)
2018-09-22 11:39:30 +00:00
Navdeep Parhar
061bbaf7e7 cxgbe(4): Reuse existing "switching" L2T entries when possible.
Approved by:	re@ (rgrimes@)
Sponsored by:	Chelsio Communications
2018-09-22 01:24:30 +00:00
Alexander Motin
ae1b0b825a MFV r338866: 9700 ZFS resilvered mirror does not balance reads
illumos/illumos-gate@82f63c3c2b

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author:     Jerry Jelinek <jerry.jelinek@joyent.com>

Approved by:	re (delphij)
2018-09-21 21:56:00 +00:00
Mark Johnston
6cdde6fda2 Use the GNU as-compatible .endm instead of .endmacro.
Approved by:	re (gjb)
2018-09-21 20:20:03 +00:00
Konstantin Belousov
d995b5b1ea Convert x86 TLB top-level invalidation functions to ifuncs.
Note that shootdown IPI handlers are already per-mode.

Suggested by:	alc
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
Differential revision:	https://reviews.freebsd.org/D17184
2018-09-21 17:53:06 +00:00
Mateusz Guzik
68c7542edf amd64: even up copyin/copyout with memcpy + other cleanup
- _fault handlers for both primitives are identical, provide just one
- change the copying scheme to match memcpy (in particular jump
avoidance for the most common case of multiply of 8)
- stop re-reading pcb address on exit, just store it locally (in r9)

Reviewed by:	kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17265
2018-09-21 15:00:46 +00:00
Andrey V. Elsukov
50f5c94edb Fix possible NULL pointer dereference in ffec_alloc_mbufcl().
PR:		231514
Approved by:	re (kib)
MFC after:	1 week
2018-09-21 13:44:05 +00:00
Ed Maste
f789d9839a Include kernel ident in uname
In non-reproducible mode we have the kernel ident as a side effect of
including the build directory.  Explicitly add it to the ident string in
reproducible mode.

Reported by:	mjg
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
2018-09-21 13:43:06 +00:00
Mateusz Guzik
d6fda03a64 select: stop doing zero-sized memsets
Approved by:	re (kib)
2018-09-21 13:20:41 +00:00
Ed Maste
18c00da806 remove double space between branch and version in kernel ident
Reported by:	dim
Approved by:	re (kib)
Sponsored by:	The FreeBSD Foundation
2018-09-21 13:02:25 +00:00
Mateusz Guzik
bb32268086 amd64: check for small size in memmove, memcpy and memset
If the size is 15 bytes or less avoid spinning up rep just to copy the 8
bytes. In my tests on EPYC and old Intel microarchs without ERMS (like
Westmere) it provided a nice win over the current version (e.g. for EPYC
memset with 15 bytes of size goes from 59712651 ops/s to 70600095) all
while almost not pessimizing the other cases.

Data collected during package building shows that < 16 sizes are pretty
common.

Verified with the glibc test suite.

Approved by:	re (kib)
2018-09-21 12:27:36 +00:00
Matt Macy
b08d611de8 fix vlan locking to permit sx acquisition in ioctl calls
- update vlan(9) to handle changes earlier this year in multicast locking

Tested by: np@, darkfiberu at gmail.com

PR:	230510
Reviewed by:	mjoras@, shurd@, sbruno@
Approved by:	re (gjb@)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D16808
2018-09-21 01:37:08 +00:00
Glen Barber
a128aaea95 Update head from ALPHA6 to ALPHA7 as part of the 12.0-RELEASE
cycle.

Approved by:	re (implicit)
Sponsored by:	The FreeBSD Foundation
2018-09-20 23:59:42 +00:00
Mateusz Guzik
5254f065ab amd64: macroify copyin/copyout and provide erms variants, follow up
Fix a fat-fingered typo with a "funny" side-effect: when doing copyin on a
cpu without ERMS and with size being a multiply of 8 a page fault would be
triggered resulting in EFAULT.

Pointy hat: mjg
Approved by:	re (implicit)
2018-09-20 20:32:08 +00:00
Stephen Hurd
861437f83a Add IFCAP_TSO6 for igb
It seems igb supports TSO6, but the capability got lost in
the iflib update. Restore this capability.

PR:		231476
Reported by:	lev
Reviewed by:	erj
Approved by:	re (gjb)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D17242
2018-09-20 20:06:44 +00:00
Andrey V. Elsukov
76b09d1823 Add new field max_hdrsize to struct encap_config.
It is currently unused and reserved for future use to keep KBI/KPI.
Also add several spare pointers to be able extend structure if it
will be needed.

Approved by:	re (gjb)
2018-09-20 19:45:27 +00:00
Stephen Hurd
0c919c2370 Fix capabilities handling for iflib drivers
Various capabilities were not being handled correctly in the
SIOCSIFCAP handler. Specifically:

IFCAP_RXCSUM and IFCAP_RXCSUM_IPV6 could be set even if not supported

It was impossible to disable IFCAP_RXCSUM and/or IFCAP_RXCSUM_IPV6 via
ifconfig since it does ioctl() per command-line flag rather than combine
them into a single call.

IFCAP_VLAN_HWCSUM could not be modified via the ioctl()

Setting any combination of the three IFCAP_WOL flags would set only
IFCAP_WOL_MCAST | IFCAP_WOL_MAGIC. For example, setting only
IFCAP_WOL_UCAST would result in both IFCAP_WOL_MCAST and IFCAP_WOL_MAGIC
being enabled, but IFCAP_WOL_UCAST would not be enabled.

Because if_vlancap() was called before if_togglecapenable(), vlan flags
were sometimes not applied correctly.

Interfaces were being unnecessarily stopped and restarted for WoL

PR:		231151
Submitted by:	Kaho Toshikazu <kaho@elam.kais.kyoto-u.ac.jp>
Reported by:	Shirkdog <mshirk@daemon-security.com>
Reviewed by:	galladin
Approved by:	re (gjb)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D17158
2018-09-20 19:35:35 +00:00
Mateusz Guzik
4bf0035fd1 amd64: macroify copyin/copyout and provide erms variants
Reviewed by:	kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17257
2018-09-20 18:30:17 +00:00
Mark Johnston
969e147aff Ensure that imports into per-domain kmem arenas are KVA_QUANTUM-aligned.
The old code appears to assume that vmem_alloc() would import
size-aligned KVA chunks from the parent kernel_arena, but vmem doesn't
provide this guarantee.

Also remove the unused global RWX arena and add comments explaining why
we have per-domain arenas.

Reported by:	alc
Reviewed by:	alc, kib (previous version)
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17249
2018-09-20 18:29:55 +00:00
Mateusz Guzik
c396945b74 vfs: remove lookup_shared tunable
Reviewed by:	kib, jhb
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17253
2018-09-20 18:25:26 +00:00
Bjoern A. Zeeb
6675bee81a In icmp6_rip6_input(), once we have a lock, make sure the inp is
not freed.  This can happen since the list traversal and locking
was converted to epoch(9).  If the inp is marked "freed", skip it.

This prevents a NULL pointer deref panic in ip6_savecontrol_v4()
trying to access the socket hanging off the inp, which was gone
by the time we got there.

Reported by:	andrew
Tested by:	andrew
Approved by:	re (gjb)
2018-09-20 15:45:53 +00:00
Mark Johnston
25ed23cfbb Change the domain selection policy in kmem_back().
Ensure that pages backing the same virtual large page come from the
same physical domain, as kmem_malloc_domain() does.

PR:		231038
Reviewed by:	alc, kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17248
2018-09-20 15:45:12 +00:00
Mateusz Guzik
51e13c93b6 fd: prevent inlining of _fdrop thorough kern_descrip.c
fdrop is used in several places in the file and almost never has to call
_fdrop. Thus inlining it is a pure waste of space.

Approved by:	re (kib)
2018-09-20 13:32:40 +00:00
Mateusz Guzik
a286a3099c amd64: move fusufault after all users
A lot of function have the following check:
        cmpq    %rax,%rdi                       /* verify address is valid */
        ja      fusufault

The label is present earlier in kernel .text, which means this is a jump
backwards. Absent any information in branch predictor, the cpu predicts it
as taken. Since it is almost never taken in practice, this results in a
completely avoidable misprediction.

Move it past all consumers, so that it is predicted as not taken.

Approved by:	re (kib)
2018-09-20 13:29:43 +00:00
John Baldwin
232d0b87e0 Various fixes for floating point on RISC-V.
- Explicitly load an empty initial state into FP registers when taking
  the fault on the first FP instruction in a thread.  Setting
  SSTATE.FS to INITIAL is just a marker to let context switch restore
  code know that it can load FP registers with zeroes instead of
  memory loads.  It does not imply that the hardware will reset all
  registers to zero on first access.  In addition, set the state to
  CLEAN instead of INITIAL after the first FP instruction.
  cpu_switch() doesn't do anything for INITIAL and only restores from
  the pcb if the state is CLEAN.  We could perhaps change cpu_switch
  to call fpe_state_clear if the state was INITIAL and leave SSTATE.FS
  set to INITIAL instead of CLEAN after the first FP instruction.
  However, adding this complexity to cpu_switch() doesn't seem worth
  the supposed gain.
- Only save the current FPU registers in fill_fpregs() if the request
  is made to save the current thread's registers.  Previously if a
  debugger requested FP registers via ptrace() it was getting a copy
  of the debugger's FP registers rather than the debugee's.
- Zero the entire FP register set structure returned for ptrace() if a
  thread hasn't used FP registers rather than leaking garbage in the
  fp_fcsr field.
- If a debugger writes FP registers via ptrace(), always mark the pcb
  as having valid FP registers and set SSTATUS.FS_MASK to CLEAN so
  that the registers will be restored when the debugged thread
  resumes.
- Be more explicit about clearing the SSTATUS.FS field before setting
  it to CLEAN on the first FP instruction trap.

Submitted by:	br, markj
Approved by:	re (rgrimes)
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D17141
2018-09-19 23:45:18 +00:00