Commit Graph

121303 Commits

Author SHA1 Message Date
Gleb Smirnoff
dd388cfd9b The net.inet.tcp.nolocaltimewait=1 optimization prevents local TCP connections
from entering the TIME_WAIT state. However, it omits sending the ACK for the
FIN, which results in RST. This becomes a bigger deal if the sysctl
net.inet.tcp.blackhole is 2. In this case RST isn't send, so the other side of
the connection (also local) keeps retransmitting FINs.

To fix that in tcp_twstart() we will not call tcp_close() immediately. Instead
we will allocate a tcptw on stack and proceed to the end of the function all
the way to tcp_twrespond(), to generate the correct ACK, then we will drop the
last PCB reference.

While here, make a few tiny improvements:
- use bools for boolean variable
- staticize nolocaltimewait
- remove pointless acquisiton of socket lock

Reported by:	jtl
Reviewed by:	jtl
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D14697
2018-03-21 20:59:30 +00:00
Conrad Meyer
0e33efe4e4 Import Blake2 algorithms (blake2b, blake2s) from libb2
The upstream repository is on github BLAKE2/libb2.  Files landed in
sys/contrib/libb2 are the unmodified upstream files, except for one
difference:  secure_zero_memory's contents have been replaced with
explicit_bzero() only because the previous implementation broke powerpc
link.  Preferential use of explicit_bzero() is in progress upstream, so
it is anticipated we will be able to drop this diff in the future.

sys/crypto/blake2 contains the source files needed to port libb2 to our
build system, a wrapped (limited) variant of the algorithm to match the API
of our auth_transform softcrypto abstraction, incorporation into the Open
Crypto Framework (OCF) cryptosoft(4) driver, as well as an x86 SSE/AVX
accelerated OCF driver, blake2(4).

Optimized variants of blake2 are compiled for a number of x86 machines
(anything from SSE2 to AVX + XOP).  On those machines, FPU context will need
to be explicitly saved before using blake2(4)-provided algorithms directly.
Use via cryptodev / OCF saves FPU state automatically, and use via the
auth_transform softcrypto abstraction does not use FPU.

The intent of the OCF driver is mostly to enable testing in userspace via
/dev/crypto.  ATF tests are added with published KAT test vectors to
validate correctness.

Reviewed by:	jhb, markj
Obtained from:	github BLAKE2/libb2
Differential Revision:	https://reviews.freebsd.org/D14662
2018-03-21 16:18:14 +00:00
Conrad Meyer
5fbc5b5a3c cryptosoft(4): Zero plain hash contexts, too
An OCF-naive user program could use these primitives to implement HMAC, for
example.  This would make the freed context sensitive data.

Probably other bzeros in this file should be explicit_bzeros as well.
Future work.

Reviewed by:	jhb, markj
Differential Revision:	https://reviews.freebsd.org/D14662 (minor part of a larger work)
2018-03-21 16:12:07 +00:00
Stephen Hurd
7021bf0569 Update copyright per Matthew Macy
"Under my tutelage Nicole did 85% of the work. At the time it seemed
simplest for a number of reasons to put my copyright on it. I now consider
that to have been a mistake."

Submitted by:	Matthew Macy <mmacy@mattmacy.io>
Reviewed by:	shurd
Approved by:	shurd
Differential Revision:	https://reviews.freebsd.org/D14766
2018-03-21 15:57:36 +00:00
Jonathan T. Looney
7fb2986ff6 If the INP lock is uncontested, avoid taking a reference and jumping
through the lock-switching hoops.

A few of the INP lookup operations that lock INPs after the lookup do
so using this mechanism (to maintain lock ordering):

1. Lock lookup structure.
2. Find INP.
3. Acquire reference on INP.
4. Drop lock on lookup structure.
5. Acquire INP lock.
6. Drop reference on INP.

This change provides a slightly shorter path for cases where the INP
lock is uncontested:

1. Lock lookup structure.
2. Find INP.
3. Try to acquire the INP lock.
4. If successful, drop lock on lookup structure.

Of course, if the INP lock is contested, the functions will need to
revert to the previous way of switching locks safely.

This saves a few atomic operations when the INP lock is uncontested.

Discussed with:	gallatin, rrs, rwatson
MFC after:	2 weeks
Sponsored by:	Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D12911
2018-03-21 15:54:46 +00:00
Andrew Turner
d614c09a82 Use a table to find the endpoint configuration
On the Allwinner SoCs we need to set a custom endpoint configuration. To
allow for this use a table to store the configuration so the attachment
can override it.

Reviewed by:	hselasky
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14783
2018-03-21 15:17:54 +00:00
Warner Losh
7961a77148 Mark psycho interrupts as MPSAFE. It's safe to do so now that we don't
need Giant to call shutdown_nice().
2018-03-21 14:47:17 +00:00
Warner Losh
026fb270ca Unlock giant when calling shutdown_nice() 2018-03-21 14:47:12 +00:00
Warner Losh
b799e21b28 This is MPSAFE on this platform, so don't take Giant out while running
the callback.
2018-03-21 14:47:08 +00:00
Warner Losh
9b4bb7d500 These interrupts call shutdown_nice() which should be called Giant
unlocked. Rather than dropping it in the interrupt handler, mark these
handlers as MPSAFE.
2018-03-21 14:47:03 +00:00
Warner Losh
3e867f24cb bufshutdown is no longer called with Giant held, so there's no need to
drop or pickup Giant anymore. Remove that code and adjust comments.
2018-03-21 14:46:59 +00:00
Warner Losh
d5292812f8 Remove Giant from init creation and vfs_mountroot.
Sponsored by: Netflix
Discussed with: kib@, mckusick@
Differential Review: https://reviews.freebsd.org/D14712
2018-03-21 14:46:54 +00:00
Warner Losh
df4ee7639e Revert r331273: "Release the "TUR" reference when clearing the TUR work flag. We mostly"
It exposes other issues, so revert to the pervious state of known issues.
2018-03-21 12:55:59 +00:00
Konstantin Belousov
661722e76f Move sysinit and sysuninit linker sets in the data (writeable) section.
Both sets are sorted in place, and with the introduction of read-only
permissions on the amd64 kernel text, the sorting override depended on
CR0.WP turned off.  Make it correct by moving the sets into writeable
part of the KVA, also fixing boot on machines where hand-off from BIOS
to OS occurs with CR0.WP set.

Based on submission by:	Peter Lei <peter.lei@ieee.org>
MFC after:	1 week
2018-03-21 10:26:39 +00:00
Conrad Meyer
c37125d9e5 Add missed sys/limits.h include
Apparently header pollution on x86 hid its absense.  Sorry, other arch
users.

Fix the missed header introduced in r331279.

Reported by:	tinderbox
2018-03-21 03:43:40 +00:00
Conrad Meyer
4948f7bf11 Regenerate sysent files after r331279. 2018-03-21 01:17:01 +00:00
Conrad Meyer
e9ac27430c Implement getrandom(2) and getentropy(3)
The general idea here is to provide userspace programs with well-defined
sources of entropy, in a fashion that doesn't require opening a new file
descriptor (ulimits) or accessing paths (/dev/urandom may be restricted
by chroot or capsicum).

getrandom(2) is the more general API, and comes from the Linux world.
Since our urandom and random devices are identical, the GRND_RANDOM flag
is ignored.

getentropy(3) is added as a compatibility shim for the OpenBSD API.

truss(1) support is included.

Tests for both system calls are provided.  Coverage is believed to be at
least as comprehensive as LTP getrandom(2) test coverage.  Additionally,
instructions for running the LTP tests directly against FreeBSD are provided
in the "Test Plan" section of the Differential revision linked below.  (They
pass, of course.)

PR:		194204
Reported by:	David CARLIER <david.carlier AT hardenedbsd.org>
Discussed with:	cperciva, delphij, jhb, markj
Relnotes:	maybe
Differential Revision:	https://reviews.freebsd.org/D14500
2018-03-21 01:15:45 +00:00
Jamie Gritton
672756aa9f Represent boolean jail options as an array of structures containing the
flag and both the regular and "no" names, instead of two different string
arrays whose indices need to match the flag's bit position.  This makes
them similar to the say "jailsys" options are represented.

Loop through either kind of option array with a structure pointer rather
then an integer index.
2018-03-20 23:08:42 +00:00
Alexander V. Chernikov
b2b7ca49dc Use count(9) api for the bpf(4) statistics.
Currently each bfp descriptor uses u64 variables to maintain its counters.
On interfaces with high packet rate this leads to unnecessary contention
and inaccurate reporting.

PR:		kern/205320
Reported by:	elofu17 at hotmail.com
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D14726
2018-03-20 22:57:06 +00:00
Warner Losh
7b0eb8dbf8 Release the "TUR" reference when clearing the TUR work flag. We mostly
do this right, except when there's no BP and we do a TUR by request.
In that case, we clear the flag, but don't release the reference,
leaking the reference on rare occasion.

PR: 226510
Sponsored by: Netflix
2018-03-20 22:07:45 +00:00
Gleb Smirnoff
83fc34ea0d At this point iwmesg isn't initialized yet, so print pointer to lock
rather than panic before panicing.
2018-03-20 22:05:21 +00:00
Warner Losh
4e96c99bdf Push down Giant one layer. In the days of yore, back when Penitums
were the new kids on the block and F00F hacks were all the rage, one
needed to take out Giant to do anything moderately complicated with
the VM, mappings and such. So the pccard / cardbus code held Giant for
the entire insertion or removal process.

Today, the VM is MP safe. The lock is only needed for dealing with
newbus things. Move locking and unlocking Giant to be only around
adding and probing devices in pccard and cardbus.
2018-03-20 22:01:18 +00:00
Mark Johnston
1de56ac728 Revert part of r331264: disable interrupts before disabling WP.
We might otherwise be preempted, leaving WP disabled while another
thread runs on the CPU.

Reported by:	kib
X-MFC with:	r331264
2018-03-20 21:36:35 +00:00
Warner Losh
b6735f6ff5 Drop support for lint for cdefs.h. 2018-03-20 21:18:40 +00:00
Warner Losh
39e5f2b207 Remove obsolete lint support. 2018-03-20 21:17:48 +00:00
Mark Johnston
7a79ce2e38 Make use of the KPI added in r331252.
MFC after:	2 weeks
2018-03-20 21:16:26 +00:00
Ed Maste
6e7f286b47 Restore close quote lost in r331254 2018-03-20 21:04:47 +00:00
John Baldwin
e875be212d Use <stdarg.h> instead of <machine/stdarg.h> in userland.
<machine/stdarg.h> is a kernel-only header.  The standard header for
userland is <stdarg.h>.  Using the standard header in userland avoids
weird build errors when building with external compilers that include
their own stdarg.h header.

Reviewed by:	arichardson, brooks, imp
Sponsored by:	DARPA / AFRL
Differential Revision:	https://reviews.freebsd.org/D14776
2018-03-20 21:00:45 +00:00
Konstantin Belousov
8fbcc3343f Move the CR0.WP manipulation KPI to x86.
This should allow to avoid some #ifdefs in the common x86/ code.

Requested by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-03-20 20:20:49 +00:00
Ed Maste
b7d779b3e5 Make linuxulator fn declaration match definition
I accidentally swapped 'linux_fixup_elf' to 'linux_elf_fixup' in amd64's
declaration (only),  while bringing this change over from git and
encountering a conflict.
2018-03-20 19:28:52 +00:00
Ed Maste
fc2a8776a2 Rename assym.s to assym.inc
assym is only to be included by other .s files, and should never
actually be assembled by itself.

Reviewed by:	imp, bdrewery (earlier)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D14180
2018-03-20 17:58:51 +00:00
Konstantin Belousov
9cffc92c62 Disable write protection around patching of XSAVE instruction in the
context switch code.

Some BIOSes give control to the OS with CR0.WP already set, making the
kernel text read-only before cpu_startup().

Reported by:	Peter Lei <peter.lei@ieee.org>
Reviewed by:	jtl
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14768
2018-03-20 17:47:29 +00:00
Konstantin Belousov
2337dc6430 Provide KPI for handling of rw/ro kernel text.
This is a pure syntax patch to create an interface to enable and later
restore write access to the kernel text and other read-only mapped
regions.  It is in line with e.g. vm_fault_disable_pagefaults() by
allowing the nesting.

Discussed with:	Peter Lei <peter.lei@ieee.org>
Reviewed by:	jtl
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14768
2018-03-20 17:43:50 +00:00
John Baldwin
fd40ecf3d4 Set the proper vnet in IPsec callback functions.
When using hardware crypto engines, the callback functions used to handle
an IPsec packet after it has been encrypted or decrypted can be invoked
asynchronously from a worker thread that is not associated with a vnet.
Extend 'struct xform_data' to include a vnet pointer and save the current
vnet in this new member when queueing crypto requests in IPsec.  In the
IPsec callback routines, use the new member to set the current vnet while
processing the modified packet.

This fixes a panic when using hardware offload such as ccr(4) with IPsec
after VIMAGE was enabled in GENERIC.

Reported by:	Sony Arpita Das and Harsh Jain @ Chelsio
Reviewed by:	bz
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D14763
2018-03-20 17:05:23 +00:00
Konstantin Belousov
79e9552ebb Check for wrap-around in vm_phys_alloc_seg_contig().
It is possible to provide insane values for size in contigmalloc(9)
request, which usually not reaches the phys allocator due to failing
KVA allocation.  But with the forthcoming 4/4 i386, where 32bit
architecture has almost 4G KVA, contigmalloc(1G) is not unreasonable
outright and KVA might be available sometimes.

Then, the calculation of pa_end could wrap around, depending on the
physical address, and the checks in vm_phys_alloc_seg_contig() would
pass while the iteration in the loop after the 'done' label goes out
of the vm_page_array bounds.

Fix it by detecting the wrap.

Reported and tested by:	pho
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14767
2018-03-20 16:17:55 +00:00
Mark Johnston
8c7549da2b Drop KTR_CONTENTION.
It is incomplete, has not been adopted in the other locking primitives,
and we have other means of measuring lock contention (lock_profiling,
lockstat, KTR_LOCK). Drop it to slightly de-clutter the mutex code and
free up a precious KTR class index.

Reviewed by:	jhb, mjg
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D14771
2018-03-20 15:51:05 +00:00
Andrew Turner
ed4c884f2e Check if the gettime runtime service is valid.
The U-Boot efi runtime service expects us to set the address map before
calling any runtime services. It will then remap a few functions to their
runtime version. One of these is the gettime function. If we call into
this without having set a runtime map we get a page fault.

Add a check to see if this is valid in efi_init() so we don't try to use
the possibly invalid pointer.

Reviewed by:	imp, kevans (both previous version)
X-MFC-With:	r330868
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14759
2018-03-20 13:35:20 +00:00
Warner Losh
400326b667 Kill assert I shouldn't have committed 2018-03-20 13:14:10 +00:00
Warner Losh
afdbfe1e1b Starting LBA is a 64bit number, so use htole64 instead of htole32. The
latter casts the LBA to a 32-bit number before assigning it to the 64
bit structure entity. This works fine on the first 2TB of TRIMs, but
terrible beyond that due to trucation.

Also, add an assert to make sure we don't end too many DSM TRIM
entries in one request.

Sponsored by: Netflix
2018-03-20 03:37:14 +00:00
Warner Losh
6f591d13fd Make kern.cam.nda.num_trim tunable to limit the number of BIO_DELETE
requests that we'll collapse into one DSM_TRIM. By default it is a
256, which is the max that will fit into a 4k page.

Sponsored by: Netflix
2018-03-20 03:37:09 +00:00
Warner Losh
fdfc0a83a3 Remove some redundant MPSAFE flags.
This was pointed out in a code review I'm having trouble finding right
now, but go ahead and eliminate these.

Sponsored by: Netfix
2018-03-20 03:37:04 +00:00
Ed Maste
0a26f9316a Rationalize license text on Linuxolator files
i386 linux.h missed in r330239.

Approved by:	sos
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2018-03-20 02:50:11 +00:00
Justin Hibbits
2acde6a85a Cast through uintptr_t to narrow the buf domain pointer on 32-bit archs
arg2 is an intmax_t, which on 32-bit architectures is 64 bits, wider than a
pointer.  When &bdomain[i] is added to arg2 it widens from uintptr_t to
intmax_t, then gcc whines when it gets cast to a pointer.  Casting through
uintptr_t silences this warning.
2018-03-20 02:01:30 +00:00
Justin Hibbits
a029f84189 Fix powerpc Book-E build post-331018/331048.
pagedaemon_wakeup() was moved from vm_pageout.h to vm_pagequeue.h.
2018-03-20 01:07:22 +00:00
Oleksandr Tymoshenko
108117cc22 [ofw] fix errneous checks for OF_finddevice(9) return value
OF_finddevices returns ((phandle_t)-1) in case of failure. Some code
in existing drivers checked return value to be equal to 0 or
less/equal to 0 which is also wrong because phandle_t is unsigned
type. Most of these checks were for negative cases that were never
triggered so trhere was no impact on functionality.

Reviewed by:	nwhitehorn
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D14645
2018-03-20 00:03:49 +00:00
Alexander Motin
5f5baf0e96 Update mpr(4) driver from v15 to v18 from Broadcom site.
Version 16 is just a number bump, since we already had those changes.

Version 17 introduces new AdapterType value, that allows new user-space
tools from Broadcom to differentiate adapter generations 3 and 3.5.

Version 18 updates headers and adds SAS_DEVICE_DISCOVERY_ERROR reporting.

MFC after:	2 weeks
2018-03-19 23:21:45 +00:00
Matt Joras
d6160f6079 Fix initialization of eventhandler mutex.
mtx_init does not do a copy of the name string it is passed. The
eventhandler code incorrectly passed the parameter string directly to
mtx_init instead of using the copy it makes. This was an existing
problem with the code that I dutifully copied over in my changes in r325621.

Reported by:	Anton Rang <rang AT acm.org>
Reviewed by:	rstone, markj
Approved by:	rstone (mentor)
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14764
2018-03-19 22:43:27 +00:00
Ed Maste
dc85846736 Rename linuxulator functions with linux_ prefix
It's preferable to have a consistent prefix.  This also reduces
differences between the three linux*_sysvec.c files.

Sponsored by:	Turing Robotic Industries Inc.
2018-03-19 21:26:32 +00:00
Kristof Provost
b4b8fa3387 pf: Fix memory leak in DIOCRADDTABLES
If a user attempts to add two tables with the same name the duplicate table
will not be added, but we forgot to free the duplicate table, leaking memory.
Ensure we free the duplicate table in the error path.

Reported by:	Coverity
CID:		1382111
MFC after:	3 weeks
2018-03-19 21:13:25 +00:00
Eric Joyner
7d48aa4c72 ixgbe(4): Update shared code, add support for X552 1G, fix bug
This patch will:

- Update ixgbe shared code
- Add support for Intel(R) Ethernet Connection X552 1000BASE-T
- Add error handling for link state check preventing VF from stopping traffic
  after changing PF's MTU value

Submitted by: Krzysztof Galazka <krzysztof.galazka@intel.com>
Reviewed by: Intel Networking
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D13885
2018-03-19 20:55:05 +00:00
Kenneth D. Merry
0afdc47158 cam_periph_acquire() now returns an errno.
The ch(4) driver was missed in change 328918, which changed
cam_periph_acquire() to return an errno instead of cam_status.

As a result, ch(4) failed to attach.

Sponsored by:	Spectra Logic
2018-03-19 20:19:00 +00:00
John Baldwin
7af5f2acfb Fix a typo.
Reviewed by:	kib
2018-03-19 17:14:56 +00:00
Lawrence Stewart
370efe5ac8 Add support for the experimental Internet-Draft "TCP Alternative Backoff with
ECN (ABE)" proposal to the New Reno congestion control algorithm module.
ABE reduces the amount of congestion window reduction in response to
ECN-signalled congestion relative to the loss-inferred congestion response.

More details about ABE can be found in the Internet-Draft:
https://tools.ietf.org/html/draft-ietf-tcpm-alternativebackoff-ecn

The implementation introduces four new sysctls:

- net.inet.tcp.cc.abe defaults to 0 (disabled) and can be set to non-zero to
  enable ABE for ECN-enabled TCP connections.

- net.inet.tcp.cc.newreno.beta and net.inet.tcp.cc.newreno.beta_ecn set the
  multiplicative window decrease factor, specified as a percentage, applied to
  the congestion window in response to a loss-based or ECN-based congestion
  signal respectively. They default to the values specified in the draft i.e.
  beta=50 and beta_ecn=80.

- net.inet.tcp.cc.abe_frlossreduce defaults to 0 (disabled) and can be set to
  non-zero to enable the use of standard beta (50% by default) when repairing
  loss during an ECN-signalled congestion recovery episode. It enables a more
  conservative congestion response and is provided for the purposes of
  experimentation as a result of some discussion at IETF 100 in Singapore.

The values of beta and beta_ecn can also be set per-connection by way of the
TCP_CCALGOOPT TCP-level socket option and the new CC_NEWRENO_BETA or
CC_NEWRENO_BETA_ECN CC algo sub-options.

Submitted by:	Tom Jones <tj@enoti.me>
Tested by:	Tom Jones <tj@enoti.me>, Grenville Armitage <garmitage@swin.edu.au>
Relnotes:	Yes
Differential Revision:	https://reviews.freebsd.org/D11616
2018-03-19 16:37:47 +00:00
Emmanuel Vadot
d9d3a08ed4 sys/dts: Remove arm64 from subdir as it no longer exists.
r325987 removed the arm64 directory, remove it from SUBDIR too.
2018-03-19 15:35:26 +00:00
Ed Maste
9bec2ea66e linux*_sysvec.c: rationalize whitespace and comments
There's a fair amount of duplication between MD linuxulator files.
Make indentation and comments consistent between the three versions of
linux_sysvec.c to reduce diffs when comparing them.

Sponsored by:	Turing Robotic Industries Inc.
2018-03-19 15:11:10 +00:00
Hans Petter Selasky
5cd5781c75 Remove redundant integer cast in ibcore. The "ref_count" field already
has integer type.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-19 13:51:33 +00:00
Ian Lepore
d892051323 Add the device/chip type to the disk d_descr field, and print more info
about the chip including the erase block size at attach time.

Also add myself to the copyrights since at this point svn blame would point
to me as the culprit for much of this.
2018-03-18 18:58:47 +00:00
Ian Lepore
3c9af13c75 Add support for 4K and 32K erase block sizes. Many of the supported chips
have these flags set in the ident table, but there was no code to support
using the smaller erase sizes.
2018-03-18 18:37:47 +00:00
Ian Lepore
c03ab159f6 Make all internal routines return an int error status, and check the
status at all call points.  Combine the get_status and wait_for_ready
routines, since waiting for ready is the only reason to ever get status.
2018-03-18 17:47:57 +00:00
Ian Lepore
89a1585b8d Add sc_parent to the softc and use it in place of device_get_parent() calls
all over the place.  Also pass the softc as the arg to all the internal
functions instead of passing a device_t and calling device_get_softc() in
each function.
2018-03-18 17:25:23 +00:00
Mark Johnston
95099bbad1 Fix an access of an uninitialized variable in dtrace_probe().
Reported by:	Coverity, via cem
MFC after:	3 days
2018-03-18 17:01:50 +00:00
Ian Lepore
89a895b63c Bugfix: wait for writes/erases to complete after starting them, instead of
before starting them.

Using the wait-before logic would make sense if there was useful time-
consuming work that could be done between the end of one write and the
beginning of the next, but it also requires doing the wait-for-ready before
reading, because a prior write or erase could still be in progress.  Reading
is the far more common case, so adding a whole extra bus transaction to
check for ready before each read would soak up any small gains that might be
had from doing async writes.
2018-03-18 16:52:31 +00:00
Mark Johnston
c6a70eaea8 Avoid dequeuing the fault page during a soft fault.
Such pages are re-enqueued at the end of the fault handler, preserving
LRU. Rather than performing two separate operations per fault, simply
requeue the page at the end of the fault (or bump its activation count
if it resides in PQ_ACTIVE, avoiding the page queue lock entirely).
This elides some page lock and page queue lock operations in common
cases, e.g., CoW faults.

Note that we must still dequeue the source page for "optimized" CoW
faults since the page may not remain enqueued while it is moved to
another object.

Reviewed by:	alc, kib
Tested by:	pho
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D14625
2018-03-18 16:49:30 +00:00
Mark Johnston
0eb50f9cd2 Have vm_page_{deactivate,launder}() requeue already-queued pages.
In many cases the page is not enqueued so the change will have no
effect. However, the change is needed to support an optimization in
the fault handler and in some cases (sendfile, the buffer cache) it
was being emulated by the caller anyway.

Reviewed by:	alc
Tested by:	pho
MFC after:	2 weeks
X-Differential Revision: https://reviews.freebsd.org/D14625
2018-03-18 16:40:56 +00:00
Ian Lepore
19aa9f7183 Eliminate some unneeded intermediate variables. Eliminate some redundant
parens in shift-and-mask expressions.  Reword and reflow some comments.
2018-03-18 16:36:14 +00:00
Mark Johnston
434862acb1 Have vm_page_replace() assert that the new page is not enqueued.
The new page does not belong to a VM object, but the page daemon does
not expect to encounter such pages.

Reviewed by:	alc, kib
Tested by:	pho
MFC after:	1 week
X-Differential Revision: https://reviews.freebsd.org/D14625
2018-03-18 16:35:40 +00:00
Ian Lepore
f432eb7ea1 Remove a pointless KASSERT and reword a comment a bit. The KASSERT tested
for the same condition that the preceeding lines checked for and would have
returned EIO, so the assert could never possibly trigger (sc_sectorsize must
inherently be an integer multiple of FLASH_PAGE_SIZE).
2018-03-18 16:10:14 +00:00
Ian Lepore
dac94adb63 Do not overwrite the contents of BIO_WRITE buffers. SPI inherently
transfers data in both directions at once.  When writing to the device,
use a dummy buffer for the incoming data, not the same buffer as the
outgoing data.  Writes are done in FLASH_PAGE_SIZE chunks, which is only
256 bytes, so just put the dummy buffer into the softc.
2018-03-18 15:56:10 +00:00
Mariusz Zaborski
9ea857cf0f Remove unneeded variable which was introduced in r328472.
Pointed out by:	pjd@
2018-03-18 15:09:55 +00:00
Conrad Meyer
22aec4de9f lib(private)zstd: Fix riscv build
Link __bswap[ds]i2() intrinsics in to libzstd for riscv, where the C runtime
apparently lacks such intrinsics.

Broken in r330894.

Reported by:	asomers
Sponsored by:	Dell EMC Isilon
2018-03-18 03:42:57 +00:00
Mateusz Guzik
09bdec20a0 locks: slightly depessimize lockstat
The slow path is always taken when lockstat is enabled. This induces
rdtsc (or other) calls to get the cycle count even when there was no
contention.

Still go to the slow path to not mess with the fast path, but avoid
the heavy lifting unless necessary.

This reduces sys and real time during -j 80 buildkernel:
before: 3651.84s user 1105.59s system 5394% cpu 1:28.18 total
after: 3685.99s user 975.74s system 5450% cpu 1:25.53 total
disabled: 3697.96s user 411.13s system 5261% cpu 1:18.10 total

So note this is still a significant hit.

LOCK_PROFILING results are not affected.
2018-03-17 19:26:33 +00:00
Jeff Roberson
3cec5c77d6 Move the dirty queues inside the per-domain structure. This resolves a bug
where we had not hit global dirty limits but a single queue was starved
for space by dirty buffers.  A single buf_daemon is maintained for now.

Add a bd_speedup() when we are low on bufspace.  This can happen due to SUJ
keeping many bufs locked until a cg block is written.  Document this with
a comment.

Fix sysctls to work with per-domain variables.  Add more ddb debugging.

Reported by:	pho
Reviewed by:	kib
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14705
2018-03-17 18:14:49 +00:00
Alan Somers
b521cf275c audit(4): fix a typo in a comment
no functional change
2018-03-17 17:56:08 +00:00
Warner Losh
8d1b99a023 Use kern.opts.mk instead of bsd.own.mk (which includes src.opts.mk)
here.
2018-03-17 17:18:46 +00:00
Warner Losh
8346f88834 Use FreeBSD-current conventions for building options rather than
FreeBSD 10 conventions: inlude kern.opts.mk.
2018-03-17 17:18:41 +00:00
Warner Losh
d4dccac09a Remove commented out code to generate opt_inet*.h. That's handled
automatically by kern.opts.mk now. Include that instead.
2018-03-17 17:18:37 +00:00
Warner Losh
4dcef3bca1 Add EFI to kernel options.
Some parts of MI modules will soon depend on whether EFI is available
or not. Add EFI to the list of kernel options so we can use it in
the modules build.
2018-03-17 17:18:29 +00:00
Alexander V. Chernikov
1435dcd94f Fix outgoing TCP/UDP packet drop on arp/ndp entry expiration.
Current arp/nd code relies on the feedback from the datapath indicating
 that the entry is still used. This mechanism is incorporated into the
 arpresolve()/nd6_resolve() routines. After the inpcb route cache
 introduction, the packet path for the locally-originated packets changed,
 passing cached lle pointer to the ether_output() directly. This resulted
 in the arp/ndp entry expire each time exactly after the configured max_age
 interval. During the small window between the ARP/NDP request and reply
 from the router, most of the packets got lost.

Fix this behaviour by plugging datapath notification code to the packet
 path used by route cache. Unify the notification code by using single
 inlined function with the per-AF callbacks.

Reported by:	sthaug at nethelp.no
Reviewed by:	ae
MFC after:	2 weeks
2018-03-17 17:05:48 +00:00
Warner Losh
378e38c1cf Only take out the periph lock when we're modifying the flags of the
softc for an async unit attention. CAM locks, sometimes, the periph
lock and other times does not. We were taking the lock always and
running into lock recursion issues on a non-recursive lock. Now we
take it selectively. It's not clear why xpt takes the lock selectively
before calling us, though, and that's still under investigation.

Reported by:	avg
PR:		226510 (same panic, differnt circumstances)
Sponsored by:	Netflix
2018-03-17 16:04:06 +00:00
Ed Maste
f91e2d3a95 Move assym.s to DPSRCS in vmbus module
assym.s is only to be included by other .s files, and should not
actually be assembled by itself.
2018-03-17 14:50:20 +00:00
Ed Maste
d8ba45e213 Revert r313780 (UFS_ prefix) 2018-03-17 12:59:55 +00:00
Ed Maste
1e2b9afca9 Prefix UFS symbols with UFS_ to reduce namespace pollution
Followup to r313780.  Also prefix ext2's and nandfs's versions with
EXT2_ and NANDFS_.

Reported by:	kib
Reviewed by:	kib, mckusick
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D9623
2018-03-17 01:48:27 +00:00
Ed Maste
4e78ff7068 ANSIfy sys/x86 2018-03-17 01:40:09 +00:00
Brooks Davis
28e7752907 Add _IOC_NEWLEN() and _IOC_NEWTYPE() macros.
These macros take an existing ioctl(2) command and replace the length
with the specified length or length of the specified type respectively.
These can be used to define commands for 32-bit compatibility with fewer
opportunities for cut-and-paste errors then a whole new definition.

Reviewed by:	cem, kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14706
2018-03-16 22:23:04 +00:00
Conrad Meyer
db488e4f52 random(4): Poll for signals during large reads
Occasionally poll for signals during large reads of the /dev/u?random
devices.  This allows cancellation via SIGINT of accidental invocations of
very large reads.  (A 2GB /dev/random read, which takes about 10 seconds on
my 2017 AMD Zen processor, can be aborted.)

I believe this behavior was intended since 2014 (r273997), just not fully
implemented.

This is motivated by a potential getrandom(2) interface that may not
explicitly forbid extremely large reads on 64-bit platforms -- even larger
than the 2GB limit imposed on devfs I/O by default.  Such reads, if they are
to be allowed, should be cancellable by the user or administrator.

Reviewed by:	delphij
Approved by:	secteam (delphij)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14684
2018-03-16 18:50:26 +00:00
Ian Lepore
9c45f7b4fd Use EFI RTC capabilities info when registering, add bootverbose diagnostics.
Make some small improvements to the efirtc driver by obtaining the clock
capabilities (resolution and whether the sub-second counters are reset) and
using the info when registering the clock. When the hardware zeroes out the
subsecond info on clock-set, schedule clock updates to happen just before
top-of-second, so that the RTC time is closely in-sync with kernel time.

Also, in the identify() routine, always add the driver if EFI runtime
services are available, then decide in probe() whether to attach the driver
or not. If not attaching and bootverbose is on, say why. All of this is
basically to avoid "silent failure" -- if someone thinks there should be an
efi rtc and it's not attaching, at least they can set bootverbose and maybe
get a clue from the output.

Differential Revision:	https://reviews.freebsd.org/D14565 (timed out)
2018-03-16 18:16:27 +00:00
Ian Lepore
9c237f3a13 Add the header file needed for the recently-added call to pagedaemon_wakeup(). 2018-03-16 16:06:25 +00:00
Michael Tuexen
1574b1e41e Set the inp_vflag consistently for accepted TCP/IPv6 connections when
net.inet6.ip6.v6only=0.

Without this patch, the inp_vflag would have INP_IPV4 and the
INP_IPV6 flags for accepted TCP/IPv6 connections if the sysctl
variable net.inet6.ip6.v6only is 0. This resulted in netstat
to report the source and destination addresses as IPv4 addresses,
even they are IPv6 addresses.

PR:			226421
Reviewed by:		bz, hiren, kib
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D13514
2018-03-16 15:26:07 +00:00
Ed Maste
d595c5c0d6 linux_errno.c: add newer errno values
Also introduce a static assert to ensure the list is kept up to date.

Sponsored by:	Turing Robotic Industries Inc.
2018-03-16 14:51:47 +00:00
Ed Maste
6e481f83f7 Share a single bsd-linux errno table across MD consumers
Three copies of the linuxulator linux_sysvec.c contained identical
BSD to Linux errno translation tables, and future work to support other
architectures will also use the same table.  Move the table to a common
file to be used by all.  Make it 'const int' to place it in .rodata.

(Some existing Linux architectures use MD errno values, but x86 and Arm
share the generic set.)

This change should introduce no functional change; a followup will add
missing errno values.

MFC after:	3 weeks
Sponsored by:	Turing Robotic Industries Inc.
Differential Revision:	https://reviews.freebsd.org/D14665
2018-03-16 14:46:38 +00:00
Ed Maste
1d071ce340 Move assym.s to DPSRC in sgx module
assym.s is only to be included by other .s files, and should not
actually be assembled by itself.
2018-03-16 13:33:42 +00:00
Ed Maste
dd851c507b ANSIfy i386/vm86.c 2018-03-16 12:12:41 +00:00
Conrad Meyer
27cb8d849f Garbage collect unused chacha20 code
Two copies of chacha20 were imported into the tree on Apr 15 2017 (r316982)
and Apr 16 2017 (r317015).  Only the latter is actually used by anything, so
just go ahead and garbage collect the unused version while it's still only
in CURRENT.

I'm not making any judgement on which implementation is better.  If I pulled
the wrong one, feel free to swap the existing implementation out and replace
it with the other code (conforming to the API that actually gets used in
randomdev, of course).  We only need one generic implementation.

Sponsored by:	Dell EMC Isilon
2018-03-16 07:11:53 +00:00
Conrad Meyer
5d3b36666b Fix GCC build: Remove redundant pagedaemon_wakeup declaration
Introduced in r331018.

Reported by:	kevans
Sponsored by:	Dell EMC Isilon
2018-03-16 07:05:09 +00:00
Warner Losh
d85d964829 Try polling the qpairs on timeout.
On some systems, we're getting timeouts when we use multiple queues on
drives that work perfectly well on other systems. On a hunch, Jim
Harris suggested I poll the completion queue when we get a timeout.
This patch polls the completion queue if no fatal status was
indicated. If it had pending I/O, we complete that request and
return. Otherwise, if aborts are enabled and no fatal status, we abort
the command and return. Otherwise we reset the card.

This may clear up the problem, or we may see it result in lots of
timeouts and a performance problem. Either way, we'll know the next
step. We may also need to pay attention to the fatal status bit
of the controller.

PR: 211713
Suggested by: Jim Harris
Sponsored by: Netflix
2018-03-16 05:23:48 +00:00
Ian Lepore
3a25d855de Add required interface header.
Reported by:	andreast@
2018-03-16 02:46:08 +00:00
Andriy Voskoboinyk
5f792f7478 rtwn(4): de-hardcode ('h/w rate index' - 'corresponding MCS index') constant 2018-03-16 01:03:10 +00:00
Andriy Voskoboinyk
46e18fc6d4 urtw(4), zyd(4): reduce code verbosity.
No functional change intended.
2018-03-16 00:38:10 +00:00
Andriy Voskoboinyk
2757acf673 urtw(4): provide names for some commonly used rate indices + drop
now-unused urtw_rate2rtl()
2018-03-16 00:09:16 +00:00
Andriy Voskoboinyk
2a440d19c1 Correct comment for IFM_IEEE80211_VHT media variant. 2018-03-15 23:32:29 +00:00
Brooks Davis
064c9c3d42 Add a request structure and make the implementation use it.
This allows compatibility translation to take place on the stack
(md_ioctl is too big) and is more suitable as a public interface within
the kernel than the kern_ioctl interface.

Except for the initialization of the md_req from the md_ioctl
(including detection of kernel md_file pointers) and the updating
of the md_ioctl prior to return, this is a mechanical replacment
of md_ioctl and mdio with md_req and mdr.

Reviewed by:	markj, cem, kib (assorted versions)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14704
2018-03-15 21:42:49 +00:00
Jeff Roberson
30fbfdda6c Eliminate pageout wakeup races. Take another step towards lockless
vmd_free_count manipulation.  Reduce the scope of the free lock by
using a pageout lock to synchronize sleep and wakeup.  Only trigger
the pageout daemon on transitions between states.  Drive all wakeup
operations directly as side-effects from freeing memory rather than
requiring an additional function call.

Reviewed by:	markj, kib
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14612
2018-03-15 19:23:07 +00:00
Brooks Davis
b65794ad84 Move implementation of ioctls into kern_*() functions.
Move locks from outside ioctl to the individual implementations.

This is the first step of changing the implementations to act on a
kernel-internal request struct rather than on struct md_ioctl and to
removing the use of kern_ioctl in mountroot.

Reviewed by:	cem, kib, markj (prior version)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14700
2018-03-15 18:12:55 +00:00
Edward Tomasz Napierala
6616539dcc Fix iSCSI target crash on session reinstation.
The crash scenario goes like this: there's a thread waiting on "reinstate";
because it doesn't update the timeout counter it gets terminated by the
callout; at this point the maintenance thread starts the termination routine.
The first thread finishes waiting, proceeds to icl_conn_handoff(), and drops
the refcount, which allows the maintenance thread to free its resources.  At
this point another thread receives a PDU.  Boom.

PR:		222898, 219866
Reported by:	Eugene M. Zheganin <emz at norma.perm.ru>
Tested by:	Eugene M. Zheganin <emz at norma.perm.ru>
Reviewed by:	mav@ (earlier version)
MFC after:	2 weeks
Sponsored by:	playkey.net
2018-03-15 17:36:13 +00:00
Brooks Davis
94598ac9f9 Restore the behavior of returning the total number of units by
unconditionally incrementing i in the loop;

Reported by:	cem
MFC with:	r330880
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14685
2018-03-15 16:37:43 +00:00
Conrad Meyer
8475a4175f aesni(4): Stylistic/comment enhancements
Improve clarity of a comment and style(9) some areas.

No functional change.

Reported by:	markj (on review of a mostly-copied driver)
Sponsored by:	Dell EMC Isilon
2018-03-15 16:17:02 +00:00
Andriy Gapon
aca41af247 g_access: deal with races created by geoms that drop the topology lock
The problem is that g_access() must be called with the GEOM topology
lock held.  And that gives a false impression that the lock is indeed
held across the call.  But this isn't always true because many classes,
ZVOL being one of the many, need to drop the lock.  It's either to
perform an I/O on the first open or to acquire a different lock (like in
g_mirror_access).

That, of course, can break many assumptions.  For example,
g_slice_access() adds an extra exclusive count on the first open. As
described above, an underlying geom may drop the topology lock and that
would open a race with another thread that would also request another
extra exclusive count.  In general, two consumers may be granted
incompatible accesses.

To avoid this problem the code is changed to mark a geom with special
flag before calling its access method and clear the flag afterwards.  If
another thread sees that flag, then it means that the topology lock has
been dropped (either by the geom in question or downstream from it), so
it is not safe to make another access call.  So, the second thread would
use g_topology_sleep() to wait until the flag is cleared and only then
would it proceed with the access.

Also see http://docs.freebsd.org/cgi/mid.cgi?809d9254-ee56-59d8-69a4-08838e985cea

PR:		225960
Reported by:	asomers
Reviewed by:	markj, mav
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D14533
2018-03-15 09:16:10 +00:00
Andriy Gapon
289c14e811 MFV r330973: 9164 assert: newds == os->os_dsl_dataset
illumos/illumos-gate@5f5913bb83
5f5913bb83

https://www.illumos.org/issues/9164
  This issue has been reported by Alan Somers as
  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=225877

  dmu_objset_refresh_ownership() first disowns a dataset (and releases
  it) and then owns it again. There is an assert that the new dataset
  object is the same as the old dataset object.  When running ZFS Test
  Suite on FreeBSD we see this panic from zpool_upgrade_007_pos test:

  panic: solaris assert: newds == os->os_dsl_dataset (0xfffff80045f4c000
  == 0xfffff80021ab4800)

  I see that the old dataset has dsl_dataset_evict_async() pending in
  ds_dbu.dbu_tqent and its ds_dbuf is NULL.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Don Brady <don.brady@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Andriy Gapon <avg@FreeBSD.org>

PR:		225877
Reported by:	asomers
MFC after:	1 week
2018-03-15 08:49:21 +00:00
Wojciech Macek
d90930743f Reverting r330925 for now 2018-03-15 06:19:45 +00:00
Alexander Motin
db08ef4353 Increase ABOUT FIRMWARE command timeout to 5s.
It seems default timeout of 100ms is not enough for my 2694L card,
while it was perfectly fine for others, even for full-height 2694.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2018-03-15 01:07:21 +00:00
Ed Maste
03d2db1542 Remove KERNEL_RETPOLINE from BROKEN_OPTIONS on i386
Clang will compile both amd64 and i386 with retpoline.

Sponsored by:	The FreeBSD Foundation
2018-03-15 00:57:57 +00:00
Jung-uk Kim
8438a7a80a Merge ACPICA 20180313. 2018-03-14 23:45:48 +00:00
Jung-uk Kim
ba425ae46d Remove local definitions for _STA method in favor of ACPICA.
These macros were added in ACPICA 20051216, more than a decade ago.
2018-03-14 23:42:28 +00:00
Warner Losh
5d7fd8f726 Fix error messages in cut and pasted code.
Also, fix an unnecessary deref to get ctrlr.

Noticed by: rpokala@
Sponsored by: Netflix
2018-03-14 23:28:28 +00:00
Warner Losh
8b1e6ebe0e When tearing down a queue pair, also delete the queue entries.
The NVME standard has required in section 7.2.6, since at least 1.1,
that a clean shutdown is signalled by deleting the subission and the
completion queues before setting the shutdown bit in CC. The 1.0
standard, apparently, did not and many of the early Intel cards didn't
care. Some newer cards care, at least one whose beta firmware can
scramble the card on an unclean shutdown. Linux has done this for some
time. To make it possible to move forward with an evaluation of this
pre-release card with wonky firmware, delete the queues on the card
when we delete the qpair structures.

Sponsored by: Netflix
2018-03-14 23:01:18 +00:00
Warner Losh
d61cf64d0e Don't make the namespace devices eternal.
We'll need to delete namespaces soon, so go ahead and stop making
these devices eternal. It doesn't help much, and will be getting in
the way soon.

Sponsored by: Netflix
2018-03-14 23:01:04 +00:00
Conrad Meyer
330b675f65 vfs_bio.c: Apply cleanups motivated by Coverity analysis
It is believed that the conditions Coverity indicated were actually
impossible to hit.  So this patch just adds a cleanup to only compute
v_mount once in brelse(), and in vfs_bio_getpages() always initializes error
to zero to appease the static analyzer.

No functional change intended.

Submitted by:	Darrick Lew <darrick.freebsd AT gmail.com>
Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14613
2018-03-14 22:11:45 +00:00
Steven Hartland
7d147b81ea Fix mps deadlock when handling panic
During shutdown mps waits for its SSU requests to complete however when
performing a reboot after handling a panic the scheduler is stopped so
getmicrotime which is used can be non-functional.

Switch to using the same method as shutdown_panic to ensure we actually
complete.

In addition reduce the timeout when RB_NOSYNC is set in howto as we expect
this to fail.

Reviewed by:	slm
MFC after:	1 week
Sponsored by:	Multiplay
Differential Revision:	https://reviews.freebsd.org/D12776
2018-03-14 21:32:23 +00:00
Steven Hartland
a59e720e65 Prevent ZFS TRIM breaking VTOC8 partitions
Update the ZFS TRIM code to ensure it respects VTOC8 partition headers as
documented by the ZFS On-Disk Specification section 1.3

Before this a zpool create on a VTOC8 partitioned device would overwrite the
partition metadata.

Reported by:	marius
Reviewed by:	marius agv
MFC after:	1 week
Sponsored by:	Multiplay
2018-03-14 21:21:03 +00:00
Brooks Davis
f287c3e4d3 Fix FSACTL_GET_NEXT_ADAPTER_FIB under 32-bit compat.
This includes FSACTL_LNX_GET_NEXT_ADAPTER_FIB.

Reviewed by:	cem
Obtained from:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14672
2018-03-14 21:11:41 +00:00
John Baldwin
6b02bb1aa7 Fix the check for an empty send socket buffer on a TOE TLS socket.
Compare sbavail() with the cached sb_off of already-sent data instead of
always comparing with zero.  This will correctly close the connection and
send the FIN if the socket buffer contains some previously-sent data but
no unsent data.

Reported by:	Harsh Jain @ Chelsio
Sponsored by:	Chelsio Communications
2018-03-14 20:49:51 +00:00
John Baldwin
02d2bcfaba Remove TLS-related inlines from t4_tom.h to fix iw_cxgbe(4) build.
- Remove the one use of is_tls_offload() and the function.  AIO special
  handling only needs to be disabled when a TOE socket is actively doing
  TLS offload on transmit.  The TOE socket's mode (which affects receive
  operation) doesn't matter, so remove the check for the socket's mode and
  only check if a TOE socket has TLS transmit keys configured to determine
  if an AIO write request should fall back to the normal socket handling
  instead of the TOE fast path.
- Move can_tls_offload() into t4_tls.c.  It is not used in critical paths,
  so inlining isn't that important.  Change return type to bool while here.

Sponsored by:	Chelsio Communications
2018-03-14 20:46:25 +00:00
Brooks Davis
2026d70da4 Add opt_compat.h to isp(4) as required by r330876.
MFC with:	r330876
2018-03-14 20:07:52 +00:00
Hans Petter Selasky
cbfc3c73ce Fix compliancy of the kstrtoXXX() functions in the LinuxKPI, by skipping
one newline character at the end, if any.

Found by:	greg@unrelenting.technology
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-14 19:51:28 +00:00
Edward Tomasz Napierala
6960c4e135 Fix typo in a warning message.
MFC after:	2 weeks
2018-03-14 18:27:06 +00:00
Nathan Whitehorn
7c95bf1e68 Fix fat-fingering ("optional standard") and move all the OF code to
being marked "standard", which is less confusing than having it conditional
on AIM CPUs here, and then picked up through options FDT from conf/files
on Book-E.

Request by:	jhibbits
2018-03-14 18:07:40 +00:00
Warner Losh
d38677d23c Create a sysctl kern.cam.{,a,n}da.X.invalidate
kern.cam.{,a,n}da.X.invalidate=1 forces *daX to detach by calling
cam_periph_invalidate on the underlying periph. This is for testing
purposes only. Include only with options CAM_TEST_FAILURE and rename
the former [AN]DA_TEST_FAILURE, and fix nda to compile with it set.
We're using it at work to harden geom and the buffer cache to be
resilient in the face of drive failure. Today, it far too often
results in a panic. While much work was done on SIM initiated removal
for the USB thumnb drive removal work, little has been done for periph
initiated removal. This simulates what *daerror() does for some errors
nicely: we get the same panics with it that we do with failing drives.

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D14581
2018-03-14 17:53:37 +00:00
Warner Losh
2a559cb8c8 This should have been += so clean builds work.
Noticed by: hps@
2018-03-14 16:45:04 +00:00
Warner Losh
157cb465c4 Fix inverted logic that counted all completions as errors, except when
they were actual errors.

Sponsored by: Netflix
2018-03-14 16:44:57 +00:00
Warner Losh
807e94b2c3 Implement trim collapsing in nda
When multiple trims are in the queue, collapse them as much as
possible. At present, this usually results in only a few trims being
collapsed together, but more work on that will make it possible to do
hundreds (up to some configurable max).

Sponsored by: Netflix
2018-03-14 16:44:50 +00:00
Warner Losh
8a3de7bc34 Allow NULL ccb to cam_iosched_bio_complete
When the ccb is NULL to cam_iosched_bio_complete, just update the
other statistics, but not the time. If many operations are collapsed
together, this is needed to keep stats properly for the grouped bp.
This should fix trim accounting.

Sponsored by: Netflix
2018-03-14 16:44:16 +00:00
Nathan Whitehorn
94f513c8db The expression (aim | fdt) is always true on PowerPC. The last PowerPC
platform that can run without a device tree (PS3) still uses the OF_*()
functions to check if one exists and OF_* is used unconditionally in
core parts of the system like powerpc/machdep.c. Reflect this reality
in files.powerpc, for example by changing occurrences of aim | fdt to
standard.
2018-03-14 16:16:25 +00:00
Ed Maste
7b194b3d3b Remove stray ; at end of linux_vdso_deinstall() 2018-03-14 13:20:36 +00:00
Wojciech Macek
22eedd96c7 PowerNV: Fix I2C to compile if FDT is disabled
Submitted by:          Wojciech Macek <wma@semihalf.com>
Obtained from:         Semihalf
Sponsored by:          IBM, QCM Technologies
2018-03-14 09:20:03 +00:00
Conrad Meyer
052d3c1290 Update to Zstandard 1.3.3
Includes patch to conditionalize use of __builtin_clz(ll) on __has_builtin().
The issue is tracked upstream at https://github.com/facebook/zstd/pull/884 .
Otherwise, these are vanilla Zstandard 1.3.3 files.

Note that the 1.3.4 release should be due out soon.

Sponsored by:	Dell EMC Isilon
2018-03-14 03:00:17 +00:00
Warner Losh
f8f471cf5f We need opt_compat.h after r330819 and 330820.
Add opt_compat.h to fix the stand-alone build case.

Sponsored by: Netflix.
2018-03-13 23:36:15 +00:00
John Baldwin
1e9538d253 Support for TLS offload of TOE connections on T6 adapters.
The TOE engine in Chelsio T6 adapters supports offloading of TLS
encryption and TCP segmentation for offloaded connections.  Sockets
using TLS are required to use a set of custom socket options to upload
RX and TX keys to the NIC and to enable RX processing.  Currently
these socket options are implemented as TCP options in the vendor
specific range.  A patched OpenSSL library will be made available in a
port / package for use with the TLS TOE support.

TOE sockets can either offload both transmit and reception of TLS
records or just transmit.  TLS offload (both RX and TX) is enabled by
setting the dev.t6nex.<x>.tls sysctl to 1 and requires TOE to be
enabled on the relevant interface.  Transmit offload can be used on
any "normal" or TLS TOE socket by using the custom socket option to
program a transmit key.  This permits most TOE sockets to
transparently offload TLS when applications use a patched SSL library
(e.g. using LD_LIBRARY_PATH to request use of a patched OpenSSL
library).  Receive offload can only be used with TOE sockets using the
TLS mode.  The dev.t6nex.0.toe.tls_rx_ports sysctl can be set to a
list of TCP port numbers.  Any connection with either a local or
remote port number in that list will be created as a TLS socket rather
than a plain TOE socket.  Note that although this sysctl accepts an
arbitrary list of port numbers, the sysctl(8) tool is only able to set
sysctl nodes to a single value.  A TLS socket will hang without
receiving data if used by an application that is not using a patched
SSL library.  Thus, the tls_rx_ports node should be used with care.
For a server mostly concerned with offloading TLS transmit, this node
is not needed as plain TOE sockets will fall back to software crypto
when using an unpatched SSL library.

New per-interface statistics nodes are added giving counts of TLS
packets and payload bytes (payload bytes do not include TLS headers or
authentication tags/MACs) offloaded via the TOE engine, e.g.:

dev.cc.0.stats.rx_tls_octets: 149
dev.cc.0.stats.rx_tls_records: 13
dev.cc.0.stats.tx_tls_octets: 26501823
dev.cc.0.stats.tx_tls_records: 1620

TLS transmit work requests are constructed by a new variant of
t4_push_frames() called t4_push_tls_records() in tom/t4_tls.c.

TLS transmit work requests require a buffer containing IVs.  If the
IVs are too large to fit into the work request, a separate buffer is
allocated when constructing a work request.  This buffer is associated
with the transmit descriptor and freed when the descriptor is ACKed by
the adapter.

Received TLS frames use two new CPL messages.  The first message is a
CPL_TLS_DATA containing the decryped payload of a single TLS record.
The handler places the mbuf containing the received payload on an
mbufq in the TOE pcb.  The second message is a CPL_RX_TLS_CMP message
which includes a copy of the TLS header and indicates if there were
any errors.  The handler for this message places the TLS header into
the socket buffer followed by the saved mbuf with the payload data.
Both of these handlers are contained in tom/t4_tls.c.

A few routines were exposed from t4_cpl_io.c for use by t4_tls.c
including send_rx_credits(), a new send_rx_modulate(), and
t4_close_conn().

TLS keys for both transmit and receive are stored in onboard memory
in the NIC in the "TLS keys" memory region.

In some cases a TLS socket can hang with pending data available in the
NIC that is not delivered to the host.  As a workaround, TLS sockets
are more aggressive about sending CPL_RX_DATA_ACK messages anytime that
any data is read from a TLS socket.  In addition, a fallback timer will
periodically send CPL_RX_DATA_ACK messages to the NIC for connections
that are still in the handshake phase.  Once the connection has
finished the handshake and programmed RX keys via the socket option,
the timer is stopped.

A new function select_ulp_mode() is used to determine what sub-mode a
given TOE socket should use (plain TOE, DDP, or TLS).  The existing
set_tcpddp_ulp_mode() function has been renamed to set_ulp_mode() and
handles initialization of TLS-specific state when necessary in
addition to DDP-specific state.

Since TLS sockets do not receive individual TCP segments but always
receive full TLS records, they can receive more data than is available
in the current window (e.g. if a 16k TLS record is received but the
socket buffer is itself 16k).  To cope with this, just drop the window
to 0 when this happens, but track the overage and "eat" the overage as
it is read from the socket buffer not opening the window (or adding
rx_credits) for the overage bytes.

Reviewed by:	np (earlier version)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D14529
2018-03-13 23:05:51 +00:00
John Baldwin
9689995d23 Simplify error handling in t4_tom.ko module loading.
- Change t4_ddp_mod_load() to return void instead of always returning
  success.  This avoids having to pretend to have proper support for
  unloading when only part of t4_tom_mod_load() has run.
- If t4_register_uld() fails, don't invoke t4_tom_mod_unload() directly.
  The module handling code in the kernel invokes MOD_UNLOAD on a module
  whose MOD_LOAD fails with an error already.

Reviewed by:	np (part of a larger patch)
MFC after:	1 month
Sponsored by:	Chelsio Communications
2018-03-13 21:42:38 +00:00
Brooks Davis
c92c85ffeb md_pad is used by MDIOCLIST and not available for future use.
MFC after:	1 week
2018-03-13 20:54:18 +00:00
Brooks Davis
8b9f77a14c Don't overflow the kernel struct mdio in the MDIOCLIST ioctl.
Always terminate the list with -1 and document the ioctl behavior.
This preserves existing behavior as seen from userspace with the
addition of the unconditional termination which will not be seen by
working consumers of MDIOCLIST.

Because this ioctl can only be performed by root (in default
configurations) and is not used in the base system this bug is not
deemed to warrant either a security advisory or an eratta notice.

Reviewed by:	kib
Obtained from:	CheriBSD
Discussed with:	security-officer (gordon)
MFC after:	3 days
Security:	kernel heap buffer overflow
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14685
2018-03-13 20:39:06 +00:00
Brooks Davis
8037cdcd9a Fix ISP_FC_LIP and ISP_RESCAN on big-endian 64-bit systems.
For _IO() ioctls, addr is a pointer to uap->data which is a caddr_t.
When the caddr_t stores an int, dereferencing addr as an (int *) results
in truncation on little-endian 64-bit systems and corruption (owing to
extracting top bits) on big-endian 64-bit systems. In practice the
value of chan was probably always zero on systems of the latter type as
all such FreeBSD platforms use a register-based calling convention.

Reviewed by:	mav
Obtained from:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14673
2018-03-13 19:56:10 +00:00
Konstantin Belousov
741e1c9196 Revert the chunk from r330410 in vm_page_reclaim_run().
There, the pages freed might be managed but the page's lock is not
owned.  For KPI correctness, the page lock is requried around the call
to vm_page_free_prep(), which is asserted.  Reclaim loop already did
the work which could be done by vm_page_free_prep(), so the lock is
not needed and the only consequence of not owning it is the assert
trigger.

Instead of adding the locking to satisfy the assert, revert to the
code that calls vm_page_free_phys() directly.

Reported by:	pho
Discussed with:	jeff
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-03-13 18:27:23 +00:00
Nathan Whitehorn
9b0ec025d4 Restore missing temporary variable, deleted by accident in r330845. This
unbreaks the ppc32 AIM build.

Reported by:	jhibbits
2018-03-13 18:24:21 +00:00
Kyle Evans
63ee68c220 EFIRT: SetVirtualAddressMap with 1:1 mapping after exiting boot services
This fixes a problem encountered on the Lenovo Thinkpad X220/Yoga 11e where
runtime services would try to inexplicably jump to other parts of memory
where it shouldn't be when attempting to enumerate EFI vars, causing a
panic.

The virtual mapping is enabled by default and can be disabled by setting
efi_disable_vmap in loader.conf(5).

Reviewed by:	kib (earlier version)
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D14677
2018-03-13 17:10:52 +00:00
Ed Maste
a95659f75f Use C99 boolean type for translate_osrel
Migrate to modern types before creating MD Linuxolator bits for new
architectures.

Reviewed by:	cem
Sponsored by:	Turing Robotic Industries Inc.
Differential Revision:	https://reviews.freebsd.org/D14676
2018-03-13 16:40:29 +00:00
Nathan Whitehorn
8864f35942 Execute PowerPC64/AIM kernel from direct map region when possible.
When the kernel can be in real mode in early boot, we can execute from
high addresses aliased to the kernel's physical memory. If that high
address has the first two bits set to 1 (0xc...), those addresses will
automatically become part of the direct map. This reduces page table
pressure from the kernel and it sets up the kernel to be used with
radix translation, for which it has to be up here.

This is accomplished by exploiting the fact that all PowerPC kernels are
built as position-independent executables and relocate themselves
on start. Before this patch, the kernel runs at 1:1 VA:PA, but that
VA/PA is random and set by the bootloader. Very early, it processes
its ELF relocations to operate wherever it happens to find itself.
This patch uses that mechanism to re-enter and re-relocate the kernel
a second time witha new base address set up in the early parts of
powerpc_init().

Reviewed by:	jhibbits
Differential Revision:	D14647
2018-03-13 15:03:58 +00:00
Kyle Evans
92f1731bf3 Correct minor typo in comment, efi_dmcap -> efi_tmcap 2018-03-13 15:02:46 +00:00
Kyle Evans
8521b4a9df efirtc: Pass a dummy tmcap pointer to efi_get_time_locked
As noted in the comment, UEFI spec claims the capabilities pointer is
optional, but some implementations will choke and attempt to dereference it
without checking. This specific problem was found on a Lenovo Thinkpad X220
that would panic in efirtc_identify.
2018-03-13 15:01:23 +00:00
Ed Maste
b7feabf906 Use C99 designated initializers for struct execsw
It it makes use slightly more clear and facilitates grepping.
2018-03-13 13:09:10 +00:00
Roger Pau Monné
4a6d4e7b58 at_rtc: check in ACPI FADT boot flags if the RTC is present
Or else disable the device. Note that the detection can be bypassed by
setting the hw.atrtc.enable option in the loader configuration file.
More information can be found on atrtc(4).

Sponsored by:		Citrix Systems R&D
Reviewed by:		ian
Differential revision:	https://reviews.freebsd.org/D14399
2018-03-13 09:42:33 +00:00