Commit Graph

130622 Commits

Author SHA1 Message Date
imp
bf523f13ef Only take out the periph lock when we're modifying the flags of the
softc for an async unit attention. CAM locks, sometimes, the periph
lock and other times does not. We were taking the lock always and
running into lock recursion issues on a non-recursive lock. Now we
take it selectively. It's not clear why xpt takes the lock selectively
before calling us, though, and that's still under investigation.

Reported by:	avg
PR:		226510 (same panic, differnt circumstances)
Sponsored by:	Netflix
2018-03-17 16:04:06 +00:00
emaste
22a89bf952 Move assym.s to DPSRCS in vmbus module
assym.s is only to be included by other .s files, and should not
actually be assembled by itself.
2018-03-17 14:50:20 +00:00
emaste
61bad5ab72 Revert r313780 (UFS_ prefix) 2018-03-17 12:59:55 +00:00
emaste
e23f2eb452 Prefix UFS symbols with UFS_ to reduce namespace pollution
Followup to r313780.  Also prefix ext2's and nandfs's versions with
EXT2_ and NANDFS_.

Reported by:	kib
Reviewed by:	kib, mckusick
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D9623
2018-03-17 01:48:27 +00:00
emaste
f545864525 ANSIfy sys/x86 2018-03-17 01:40:09 +00:00
brooks
108df59632 Add _IOC_NEWLEN() and _IOC_NEWTYPE() macros.
These macros take an existing ioctl(2) command and replace the length
with the specified length or length of the specified type respectively.
These can be used to define commands for 32-bit compatibility with fewer
opportunities for cut-and-paste errors then a whole new definition.

Reviewed by:	cem, kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14706
2018-03-16 22:23:04 +00:00
cem
d9759abd65 random(4): Poll for signals during large reads
Occasionally poll for signals during large reads of the /dev/u?random
devices.  This allows cancellation via SIGINT of accidental invocations of
very large reads.  (A 2GB /dev/random read, which takes about 10 seconds on
my 2017 AMD Zen processor, can be aborted.)

I believe this behavior was intended since 2014 (r273997), just not fully
implemented.

This is motivated by a potential getrandom(2) interface that may not
explicitly forbid extremely large reads on 64-bit platforms -- even larger
than the 2GB limit imposed on devfs I/O by default.  Such reads, if they are
to be allowed, should be cancellable by the user or administrator.

Reviewed by:	delphij
Approved by:	secteam (delphij)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14684
2018-03-16 18:50:26 +00:00
ian
b654d815d1 Use EFI RTC capabilities info when registering, add bootverbose diagnostics.
Make some small improvements to the efirtc driver by obtaining the clock
capabilities (resolution and whether the sub-second counters are reset) and
using the info when registering the clock. When the hardware zeroes out the
subsecond info on clock-set, schedule clock updates to happen just before
top-of-second, so that the RTC time is closely in-sync with kernel time.

Also, in the identify() routine, always add the driver if EFI runtime
services are available, then decide in probe() whether to attach the driver
or not. If not attaching and bootverbose is on, say why. All of this is
basically to avoid "silent failure" -- if someone thinks there should be an
efi rtc and it's not attaching, at least they can set bootverbose and maybe
get a clue from the output.

Differential Revision:	https://reviews.freebsd.org/D14565 (timed out)
2018-03-16 18:16:27 +00:00
ian
5cd5492226 Add the header file needed for the recently-added call to pagedaemon_wakeup(). 2018-03-16 16:06:25 +00:00
tuexen
0130aa4ca8 Set the inp_vflag consistently for accepted TCP/IPv6 connections when
net.inet6.ip6.v6only=0.

Without this patch, the inp_vflag would have INP_IPV4 and the
INP_IPV6 flags for accepted TCP/IPv6 connections if the sysctl
variable net.inet6.ip6.v6only is 0. This resulted in netstat
to report the source and destination addresses as IPv4 addresses,
even they are IPv6 addresses.

PR:			226421
Reviewed by:		bz, hiren, kib
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D13514
2018-03-16 15:26:07 +00:00
emaste
9fe078da1c linux_errno.c: add newer errno values
Also introduce a static assert to ensure the list is kept up to date.

Sponsored by:	Turing Robotic Industries Inc.
2018-03-16 14:51:47 +00:00
emaste
566c3d41cc Share a single bsd-linux errno table across MD consumers
Three copies of the linuxulator linux_sysvec.c contained identical
BSD to Linux errno translation tables, and future work to support other
architectures will also use the same table.  Move the table to a common
file to be used by all.  Make it 'const int' to place it in .rodata.

(Some existing Linux architectures use MD errno values, but x86 and Arm
share the generic set.)

This change should introduce no functional change; a followup will add
missing errno values.

MFC after:	3 weeks
Sponsored by:	Turing Robotic Industries Inc.
Differential Revision:	https://reviews.freebsd.org/D14665
2018-03-16 14:46:38 +00:00
emaste
eea3e87431 Move assym.s to DPSRC in sgx module
assym.s is only to be included by other .s files, and should not
actually be assembled by itself.
2018-03-16 13:33:42 +00:00
emaste
d90e216823 ANSIfy i386/vm86.c 2018-03-16 12:12:41 +00:00
cem
a38af1df67 Garbage collect unused chacha20 code
Two copies of chacha20 were imported into the tree on Apr 15 2017 (r316982)
and Apr 16 2017 (r317015).  Only the latter is actually used by anything, so
just go ahead and garbage collect the unused version while it's still only
in CURRENT.

I'm not making any judgement on which implementation is better.  If I pulled
the wrong one, feel free to swap the existing implementation out and replace
it with the other code (conforming to the API that actually gets used in
randomdev, of course).  We only need one generic implementation.

Sponsored by:	Dell EMC Isilon
2018-03-16 07:11:53 +00:00
cem
2344d24206 Fix GCC build: Remove redundant pagedaemon_wakeup declaration
Introduced in r331018.

Reported by:	kevans
Sponsored by:	Dell EMC Isilon
2018-03-16 07:05:09 +00:00
imp
0ac2e39d57 Try polling the qpairs on timeout.
On some systems, we're getting timeouts when we use multiple queues on
drives that work perfectly well on other systems. On a hunch, Jim
Harris suggested I poll the completion queue when we get a timeout.
This patch polls the completion queue if no fatal status was
indicated. If it had pending I/O, we complete that request and
return. Otherwise, if aborts are enabled and no fatal status, we abort
the command and return. Otherwise we reset the card.

This may clear up the problem, or we may see it result in lots of
timeouts and a performance problem. Either way, we'll know the next
step. We may also need to pay attention to the fatal status bit
of the controller.

PR: 211713
Suggested by: Jim Harris
Sponsored by: Netflix
2018-03-16 05:23:48 +00:00
ian
f95d03b9ad Add required interface header.
Reported by:	andreast@
2018-03-16 02:46:08 +00:00
avos
5366d448e4 rtwn(4): de-hardcode ('h/w rate index' - 'corresponding MCS index') constant 2018-03-16 01:03:10 +00:00
avos
45880e40f4 urtw(4), zyd(4): reduce code verbosity.
No functional change intended.
2018-03-16 00:38:10 +00:00
avos
0cf1f77d48 urtw(4): provide names for some commonly used rate indices + drop
now-unused urtw_rate2rtl()
2018-03-16 00:09:16 +00:00
avos
6ddab56276 Correct comment for IFM_IEEE80211_VHT media variant. 2018-03-15 23:32:29 +00:00
brooks
3e372bcfc3 Add a request structure and make the implementation use it.
This allows compatibility translation to take place on the stack
(md_ioctl is too big) and is more suitable as a public interface within
the kernel than the kern_ioctl interface.

Except for the initialization of the md_req from the md_ioctl
(including detection of kernel md_file pointers) and the updating
of the md_ioctl prior to return, this is a mechanical replacment
of md_ioctl and mdio with md_req and mdr.

Reviewed by:	markj, cem, kib (assorted versions)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14704
2018-03-15 21:42:49 +00:00
jeff
1dfd513751 Eliminate pageout wakeup races. Take another step towards lockless
vmd_free_count manipulation.  Reduce the scope of the free lock by
using a pageout lock to synchronize sleep and wakeup.  Only trigger
the pageout daemon on transitions between states.  Drive all wakeup
operations directly as side-effects from freeing memory rather than
requiring an additional function call.

Reviewed by:	markj, kib
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14612
2018-03-15 19:23:07 +00:00
brooks
5ad95dd1de Move implementation of ioctls into kern_*() functions.
Move locks from outside ioctl to the individual implementations.

This is the first step of changing the implementations to act on a
kernel-internal request struct rather than on struct md_ioctl and to
removing the use of kern_ioctl in mountroot.

Reviewed by:	cem, kib, markj (prior version)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14700
2018-03-15 18:12:55 +00:00
trasz
7d1ee7db72 Fix iSCSI target crash on session reinstation.
The crash scenario goes like this: there's a thread waiting on "reinstate";
because it doesn't update the timeout counter it gets terminated by the
callout; at this point the maintenance thread starts the termination routine.
The first thread finishes waiting, proceeds to icl_conn_handoff(), and drops
the refcount, which allows the maintenance thread to free its resources.  At
this point another thread receives a PDU.  Boom.

PR:		222898, 219866
Reported by:	Eugene M. Zheganin <emz at norma.perm.ru>
Tested by:	Eugene M. Zheganin <emz at norma.perm.ru>
Reviewed by:	mav@ (earlier version)
MFC after:	2 weeks
Sponsored by:	playkey.net
2018-03-15 17:36:13 +00:00
brooks
94a6309b43 Restore the behavior of returning the total number of units by
unconditionally incrementing i in the loop;

Reported by:	cem
MFC with:	r330880
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14685
2018-03-15 16:37:43 +00:00
cem
1822819eaa aesni(4): Stylistic/comment enhancements
Improve clarity of a comment and style(9) some areas.

No functional change.

Reported by:	markj (on review of a mostly-copied driver)
Sponsored by:	Dell EMC Isilon
2018-03-15 16:17:02 +00:00
avg
ee6ae8018d g_access: deal with races created by geoms that drop the topology lock
The problem is that g_access() must be called with the GEOM topology
lock held.  And that gives a false impression that the lock is indeed
held across the call.  But this isn't always true because many classes,
ZVOL being one of the many, need to drop the lock.  It's either to
perform an I/O on the first open or to acquire a different lock (like in
g_mirror_access).

That, of course, can break many assumptions.  For example,
g_slice_access() adds an extra exclusive count on the first open. As
described above, an underlying geom may drop the topology lock and that
would open a race with another thread that would also request another
extra exclusive count.  In general, two consumers may be granted
incompatible accesses.

To avoid this problem the code is changed to mark a geom with special
flag before calling its access method and clear the flag afterwards.  If
another thread sees that flag, then it means that the topology lock has
been dropped (either by the geom in question or downstream from it), so
it is not safe to make another access call.  So, the second thread would
use g_topology_sleep() to wait until the flag is cleared and only then
would it proceed with the access.

Also see http://docs.freebsd.org/cgi/mid.cgi?809d9254-ee56-59d8-69a4-08838e985cea

PR:		225960
Reported by:	asomers
Reviewed by:	markj, mav
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D14533
2018-03-15 09:16:10 +00:00
avg
e226f0acba MFV r330973: 9164 assert: newds == os->os_dsl_dataset
illumos/illumos-gate@5f5913bb83
5f5913bb83

https://www.illumos.org/issues/9164
  This issue has been reported by Alan Somers as
  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=225877

  dmu_objset_refresh_ownership() first disowns a dataset (and releases
  it) and then owns it again. There is an assert that the new dataset
  object is the same as the old dataset object.  When running ZFS Test
  Suite on FreeBSD we see this panic from zpool_upgrade_007_pos test:

  panic: solaris assert: newds == os->os_dsl_dataset (0xfffff80045f4c000
  == 0xfffff80021ab4800)

  I see that the old dataset has dsl_dataset_evict_async() pending in
  ds_dbu.dbu_tqent and its ds_dbuf is NULL.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Don Brady <don.brady@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Andriy Gapon <avg@FreeBSD.org>

PR:		225877
Reported by:	asomers
MFC after:	1 week
2018-03-15 08:49:21 +00:00
wma
a61b260648 Reverting r330925 for now 2018-03-15 06:19:45 +00:00
mav
0ee8132a5f Increase ABOUT FIRMWARE command timeout to 5s.
It seems default timeout of 100ms is not enough for my 2694L card,
while it was perfectly fine for others, even for full-height 2694.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2018-03-15 01:07:21 +00:00
emaste
0b97ee736f Remove KERNEL_RETPOLINE from BROKEN_OPTIONS on i386
Clang will compile both amd64 and i386 with retpoline.

Sponsored by:	The FreeBSD Foundation
2018-03-15 00:57:57 +00:00
jkim
48286dea10 Merge ACPICA 20180313. 2018-03-14 23:45:48 +00:00
jkim
94fa9c3258 Remove local definitions for _STA method in favor of ACPICA.
These macros were added in ACPICA 20051216, more than a decade ago.
2018-03-14 23:42:28 +00:00
imp
354a82d881 Fix error messages in cut and pasted code.
Also, fix an unnecessary deref to get ctrlr.

Noticed by: rpokala@
Sponsored by: Netflix
2018-03-14 23:28:28 +00:00
imp
35688b25ae When tearing down a queue pair, also delete the queue entries.
The NVME standard has required in section 7.2.6, since at least 1.1,
that a clean shutdown is signalled by deleting the subission and the
completion queues before setting the shutdown bit in CC. The 1.0
standard, apparently, did not and many of the early Intel cards didn't
care. Some newer cards care, at least one whose beta firmware can
scramble the card on an unclean shutdown. Linux has done this for some
time. To make it possible to move forward with an evaluation of this
pre-release card with wonky firmware, delete the queues on the card
when we delete the qpair structures.

Sponsored by: Netflix
2018-03-14 23:01:18 +00:00
imp
a8dedf4fe6 Don't make the namespace devices eternal.
We'll need to delete namespaces soon, so go ahead and stop making
these devices eternal. It doesn't help much, and will be getting in
the way soon.

Sponsored by: Netflix
2018-03-14 23:01:04 +00:00
cem
a476ed15dd vfs_bio.c: Apply cleanups motivated by Coverity analysis
It is believed that the conditions Coverity indicated were actually
impossible to hit.  So this patch just adds a cleanup to only compute
v_mount once in brelse(), and in vfs_bio_getpages() always initializes error
to zero to appease the static analyzer.

No functional change intended.

Submitted by:	Darrick Lew <darrick.freebsd AT gmail.com>
Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14613
2018-03-14 22:11:45 +00:00
smh
8d975f8d73 Fix mps deadlock when handling panic
During shutdown mps waits for its SSU requests to complete however when
performing a reboot after handling a panic the scheduler is stopped so
getmicrotime which is used can be non-functional.

Switch to using the same method as shutdown_panic to ensure we actually
complete.

In addition reduce the timeout when RB_NOSYNC is set in howto as we expect
this to fail.

Reviewed by:	slm
MFC after:	1 week
Sponsored by:	Multiplay
Differential Revision:	https://reviews.freebsd.org/D12776
2018-03-14 21:32:23 +00:00
smh
4890f838a2 Prevent ZFS TRIM breaking VTOC8 partitions
Update the ZFS TRIM code to ensure it respects VTOC8 partition headers as
documented by the ZFS On-Disk Specification section 1.3

Before this a zpool create on a VTOC8 partitioned device would overwrite the
partition metadata.

Reported by:	marius
Reviewed by:	marius agv
MFC after:	1 week
Sponsored by:	Multiplay
2018-03-14 21:21:03 +00:00
brooks
b4c951aff5 Fix FSACTL_GET_NEXT_ADAPTER_FIB under 32-bit compat.
This includes FSACTL_LNX_GET_NEXT_ADAPTER_FIB.

Reviewed by:	cem
Obtained from:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14672
2018-03-14 21:11:41 +00:00
jhb
54f8d69889 Fix the check for an empty send socket buffer on a TOE TLS socket.
Compare sbavail() with the cached sb_off of already-sent data instead of
always comparing with zero.  This will correctly close the connection and
send the FIN if the socket buffer contains some previously-sent data but
no unsent data.

Reported by:	Harsh Jain @ Chelsio
Sponsored by:	Chelsio Communications
2018-03-14 20:49:51 +00:00
jhb
6ab76844d8 Remove TLS-related inlines from t4_tom.h to fix iw_cxgbe(4) build.
- Remove the one use of is_tls_offload() and the function.  AIO special
  handling only needs to be disabled when a TOE socket is actively doing
  TLS offload on transmit.  The TOE socket's mode (which affects receive
  operation) doesn't matter, so remove the check for the socket's mode and
  only check if a TOE socket has TLS transmit keys configured to determine
  if an AIO write request should fall back to the normal socket handling
  instead of the TOE fast path.
- Move can_tls_offload() into t4_tls.c.  It is not used in critical paths,
  so inlining isn't that important.  Change return type to bool while here.

Sponsored by:	Chelsio Communications
2018-03-14 20:46:25 +00:00
brooks
758039e578 Add opt_compat.h to isp(4) as required by r330876.
MFC with:	r330876
2018-03-14 20:07:52 +00:00
hselasky
938790ab57 Fix compliancy of the kstrtoXXX() functions in the LinuxKPI, by skipping
one newline character at the end, if any.

Found by:	greg@unrelenting.technology
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-14 19:51:28 +00:00
trasz
7b4996df9c Fix typo in a warning message.
MFC after:	2 weeks
2018-03-14 18:27:06 +00:00
nwhitehorn
faeabd77f0 Fix fat-fingering ("optional standard") and move all the OF code to
being marked "standard", which is less confusing than having it conditional
on AIM CPUs here, and then picked up through options FDT from conf/files
on Book-E.

Request by:	jhibbits
2018-03-14 18:07:40 +00:00
imp
67b935fa34 Create a sysctl kern.cam.{,a,n}da.X.invalidate
kern.cam.{,a,n}da.X.invalidate=1 forces *daX to detach by calling
cam_periph_invalidate on the underlying periph. This is for testing
purposes only. Include only with options CAM_TEST_FAILURE and rename
the former [AN]DA_TEST_FAILURE, and fix nda to compile with it set.
We're using it at work to harden geom and the buffer cache to be
resilient in the face of drive failure. Today, it far too often
results in a panic. While much work was done on SIM initiated removal
for the USB thumnb drive removal work, little has been done for periph
initiated removal. This simulates what *daerror() does for some errors
nicely: we get the same panics with it that we do with failing drives.

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D14581
2018-03-14 17:53:37 +00:00
imp
92a4d586c6 This should have been += so clean builds work.
Noticed by: hps@
2018-03-14 16:45:04 +00:00
imp
6e933f49d8 Fix inverted logic that counted all completions as errors, except when
they were actual errors.

Sponsored by: Netflix
2018-03-14 16:44:57 +00:00
imp
1ddb7299f0 Implement trim collapsing in nda
When multiple trims are in the queue, collapse them as much as
possible. At present, this usually results in only a few trims being
collapsed together, but more work on that will make it possible to do
hundreds (up to some configurable max).

Sponsored by: Netflix
2018-03-14 16:44:50 +00:00
imp
bfff519b42 Allow NULL ccb to cam_iosched_bio_complete
When the ccb is NULL to cam_iosched_bio_complete, just update the
other statistics, but not the time. If many operations are collapsed
together, this is needed to keep stats properly for the grouped bp.
This should fix trim accounting.

Sponsored by: Netflix
2018-03-14 16:44:16 +00:00
nwhitehorn
8d10093050 The expression (aim | fdt) is always true on PowerPC. The last PowerPC
platform that can run without a device tree (PS3) still uses the OF_*()
functions to check if one exists and OF_* is used unconditionally in
core parts of the system like powerpc/machdep.c. Reflect this reality
in files.powerpc, for example by changing occurrences of aim | fdt to
standard.
2018-03-14 16:16:25 +00:00
emaste
9975d0e7b5 Remove stray ; at end of linux_vdso_deinstall() 2018-03-14 13:20:36 +00:00
wma
9d6defb064 PowerNV: Fix I2C to compile if FDT is disabled
Submitted by:          Wojciech Macek <wma@semihalf.com>
Obtained from:         Semihalf
Sponsored by:          IBM, QCM Technologies
2018-03-14 09:20:03 +00:00
cem
739ad86827 Update to Zstandard 1.3.3
Includes patch to conditionalize use of __builtin_clz(ll) on __has_builtin().
The issue is tracked upstream at https://github.com/facebook/zstd/pull/884 .
Otherwise, these are vanilla Zstandard 1.3.3 files.

Note that the 1.3.4 release should be due out soon.

Sponsored by:	Dell EMC Isilon
2018-03-14 03:00:17 +00:00
imp
1814749084 We need opt_compat.h after r330819 and 330820.
Add opt_compat.h to fix the stand-alone build case.

Sponsored by: Netflix.
2018-03-13 23:36:15 +00:00
jhb
0688d78f09 Support for TLS offload of TOE connections on T6 adapters.
The TOE engine in Chelsio T6 adapters supports offloading of TLS
encryption and TCP segmentation for offloaded connections.  Sockets
using TLS are required to use a set of custom socket options to upload
RX and TX keys to the NIC and to enable RX processing.  Currently
these socket options are implemented as TCP options in the vendor
specific range.  A patched OpenSSL library will be made available in a
port / package for use with the TLS TOE support.

TOE sockets can either offload both transmit and reception of TLS
records or just transmit.  TLS offload (both RX and TX) is enabled by
setting the dev.t6nex.<x>.tls sysctl to 1 and requires TOE to be
enabled on the relevant interface.  Transmit offload can be used on
any "normal" or TLS TOE socket by using the custom socket option to
program a transmit key.  This permits most TOE sockets to
transparently offload TLS when applications use a patched SSL library
(e.g. using LD_LIBRARY_PATH to request use of a patched OpenSSL
library).  Receive offload can only be used with TOE sockets using the
TLS mode.  The dev.t6nex.0.toe.tls_rx_ports sysctl can be set to a
list of TCP port numbers.  Any connection with either a local or
remote port number in that list will be created as a TLS socket rather
than a plain TOE socket.  Note that although this sysctl accepts an
arbitrary list of port numbers, the sysctl(8) tool is only able to set
sysctl nodes to a single value.  A TLS socket will hang without
receiving data if used by an application that is not using a patched
SSL library.  Thus, the tls_rx_ports node should be used with care.
For a server mostly concerned with offloading TLS transmit, this node
is not needed as plain TOE sockets will fall back to software crypto
when using an unpatched SSL library.

New per-interface statistics nodes are added giving counts of TLS
packets and payload bytes (payload bytes do not include TLS headers or
authentication tags/MACs) offloaded via the TOE engine, e.g.:

dev.cc.0.stats.rx_tls_octets: 149
dev.cc.0.stats.rx_tls_records: 13
dev.cc.0.stats.tx_tls_octets: 26501823
dev.cc.0.stats.tx_tls_records: 1620

TLS transmit work requests are constructed by a new variant of
t4_push_frames() called t4_push_tls_records() in tom/t4_tls.c.

TLS transmit work requests require a buffer containing IVs.  If the
IVs are too large to fit into the work request, a separate buffer is
allocated when constructing a work request.  This buffer is associated
with the transmit descriptor and freed when the descriptor is ACKed by
the adapter.

Received TLS frames use two new CPL messages.  The first message is a
CPL_TLS_DATA containing the decryped payload of a single TLS record.
The handler places the mbuf containing the received payload on an
mbufq in the TOE pcb.  The second message is a CPL_RX_TLS_CMP message
which includes a copy of the TLS header and indicates if there were
any errors.  The handler for this message places the TLS header into
the socket buffer followed by the saved mbuf with the payload data.
Both of these handlers are contained in tom/t4_tls.c.

A few routines were exposed from t4_cpl_io.c for use by t4_tls.c
including send_rx_credits(), a new send_rx_modulate(), and
t4_close_conn().

TLS keys for both transmit and receive are stored in onboard memory
in the NIC in the "TLS keys" memory region.

In some cases a TLS socket can hang with pending data available in the
NIC that is not delivered to the host.  As a workaround, TLS sockets
are more aggressive about sending CPL_RX_DATA_ACK messages anytime that
any data is read from a TLS socket.  In addition, a fallback timer will
periodically send CPL_RX_DATA_ACK messages to the NIC for connections
that are still in the handshake phase.  Once the connection has
finished the handshake and programmed RX keys via the socket option,
the timer is stopped.

A new function select_ulp_mode() is used to determine what sub-mode a
given TOE socket should use (plain TOE, DDP, or TLS).  The existing
set_tcpddp_ulp_mode() function has been renamed to set_ulp_mode() and
handles initialization of TLS-specific state when necessary in
addition to DDP-specific state.

Since TLS sockets do not receive individual TCP segments but always
receive full TLS records, they can receive more data than is available
in the current window (e.g. if a 16k TLS record is received but the
socket buffer is itself 16k).  To cope with this, just drop the window
to 0 when this happens, but track the overage and "eat" the overage as
it is read from the socket buffer not opening the window (or adding
rx_credits) for the overage bytes.

Reviewed by:	np (earlier version)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D14529
2018-03-13 23:05:51 +00:00
jhb
0a4f4f2bad Simplify error handling in t4_tom.ko module loading.
- Change t4_ddp_mod_load() to return void instead of always returning
  success.  This avoids having to pretend to have proper support for
  unloading when only part of t4_tom_mod_load() has run.
- If t4_register_uld() fails, don't invoke t4_tom_mod_unload() directly.
  The module handling code in the kernel invokes MOD_UNLOAD on a module
  whose MOD_LOAD fails with an error already.

Reviewed by:	np (part of a larger patch)
MFC after:	1 month
Sponsored by:	Chelsio Communications
2018-03-13 21:42:38 +00:00
brooks
f1c62a79ab md_pad is used by MDIOCLIST and not available for future use.
MFC after:	1 week
2018-03-13 20:54:18 +00:00
brooks
83fa13c3ab Don't overflow the kernel struct mdio in the MDIOCLIST ioctl.
Always terminate the list with -1 and document the ioctl behavior.
This preserves existing behavior as seen from userspace with the
addition of the unconditional termination which will not be seen by
working consumers of MDIOCLIST.

Because this ioctl can only be performed by root (in default
configurations) and is not used in the base system this bug is not
deemed to warrant either a security advisory or an eratta notice.

Reviewed by:	kib
Obtained from:	CheriBSD
Discussed with:	security-officer (gordon)
MFC after:	3 days
Security:	kernel heap buffer overflow
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14685
2018-03-13 20:39:06 +00:00
brooks
0254454772 Fix ISP_FC_LIP and ISP_RESCAN on big-endian 64-bit systems.
For _IO() ioctls, addr is a pointer to uap->data which is a caddr_t.
When the caddr_t stores an int, dereferencing addr as an (int *) results
in truncation on little-endian 64-bit systems and corruption (owing to
extracting top bits) on big-endian 64-bit systems. In practice the
value of chan was probably always zero on systems of the latter type as
all such FreeBSD platforms use a register-based calling convention.

Reviewed by:	mav
Obtained from:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14673
2018-03-13 19:56:10 +00:00
kib
0050a49011 Revert the chunk from r330410 in vm_page_reclaim_run().
There, the pages freed might be managed but the page's lock is not
owned.  For KPI correctness, the page lock is requried around the call
to vm_page_free_prep(), which is asserted.  Reclaim loop already did
the work which could be done by vm_page_free_prep(), so the lock is
not needed and the only consequence of not owning it is the assert
trigger.

Instead of adding the locking to satisfy the assert, revert to the
code that calls vm_page_free_phys() directly.

Reported by:	pho
Discussed with:	jeff
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-03-13 18:27:23 +00:00
nwhitehorn
4d403428c7 Restore missing temporary variable, deleted by accident in r330845. This
unbreaks the ppc32 AIM build.

Reported by:	jhibbits
2018-03-13 18:24:21 +00:00
kevans
29870e52c0 EFIRT: SetVirtualAddressMap with 1:1 mapping after exiting boot services
This fixes a problem encountered on the Lenovo Thinkpad X220/Yoga 11e where
runtime services would try to inexplicably jump to other parts of memory
where it shouldn't be when attempting to enumerate EFI vars, causing a
panic.

The virtual mapping is enabled by default and can be disabled by setting
efi_disable_vmap in loader.conf(5).

Reviewed by:	kib (earlier version)
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D14677
2018-03-13 17:10:52 +00:00
emaste
6851f84d1f Use C99 boolean type for translate_osrel
Migrate to modern types before creating MD Linuxolator bits for new
architectures.

Reviewed by:	cem
Sponsored by:	Turing Robotic Industries Inc.
Differential Revision:	https://reviews.freebsd.org/D14676
2018-03-13 16:40:29 +00:00
nwhitehorn
2ecae5119d Execute PowerPC64/AIM kernel from direct map region when possible.
When the kernel can be in real mode in early boot, we can execute from
high addresses aliased to the kernel's physical memory. If that high
address has the first two bits set to 1 (0xc...), those addresses will
automatically become part of the direct map. This reduces page table
pressure from the kernel and it sets up the kernel to be used with
radix translation, for which it has to be up here.

This is accomplished by exploiting the fact that all PowerPC kernels are
built as position-independent executables and relocate themselves
on start. Before this patch, the kernel runs at 1:1 VA:PA, but that
VA/PA is random and set by the bootloader. Very early, it processes
its ELF relocations to operate wherever it happens to find itself.
This patch uses that mechanism to re-enter and re-relocate the kernel
a second time witha new base address set up in the early parts of
powerpc_init().

Reviewed by:	jhibbits
Differential Revision:	D14647
2018-03-13 15:03:58 +00:00
kevans
821af50293 Correct minor typo in comment, efi_dmcap -> efi_tmcap 2018-03-13 15:02:46 +00:00
kevans
c51cd359c8 efirtc: Pass a dummy tmcap pointer to efi_get_time_locked
As noted in the comment, UEFI spec claims the capabilities pointer is
optional, but some implementations will choke and attempt to dereference it
without checking. This specific problem was found on a Lenovo Thinkpad X220
that would panic in efirtc_identify.
2018-03-13 15:01:23 +00:00
emaste
742eaeb102 Use C99 designated initializers for struct execsw
It it makes use slightly more clear and facilitates grepping.
2018-03-13 13:09:10 +00:00
royger
edf2293a55 at_rtc: check in ACPI FADT boot flags if the RTC is present
Or else disable the device. Note that the detection can be bypassed by
setting the hw.atrtc.enable option in the loader configuration file.
More information can be found on atrtc(4).

Sponsored by:		Citrix Systems R&D
Reviewed by:		ian
Differential revision:	https://reviews.freebsd.org/D14399
2018-03-13 09:42:33 +00:00
royger
17785aa2d9 vt_vga: check if VGA is available from ACPI FADT table
On x86 the IA-PC Boot Flags in the FADT can signal whether VGA is
available or not.

Sponsored by:		Citrix systems R&D
Reviewed by:		marcel
Differential revision:	https://reviews.freebsd.org/D14397
2018-03-13 09:38:53 +00:00
emaste
bc4d21ce60 Apply some style(9) to Linuxulator linux_sysvec.c comments 2018-03-13 00:40:05 +00:00
emaste
c303c68d6a imgact_linux.c: use standard indentation
Sponsored by:	Turing Robotic Industries Inc.
2018-03-12 23:28:25 +00:00
brooks
dbd4ce23b1 Use the stack for temporary storage in OTIOCCONS.
The old code used the thread's pcb via the uap->data pointer.

Reviewed by:	ed
Approved by:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14674
2018-03-12 23:04:42 +00:00
brooks
19c462b096 Reject ioctls to SCSI enclosures from 32-bit compat processes.
The ioctl objects contain pointers and require translation and some
refactoring of the infrastructure to work. For now prevent opertion
on garbage values. This is very slightly overbroad in that ENCIOC_INIT
is safe.

Reviewed by:	imp, kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14671
2018-03-12 23:02:01 +00:00
brooks
8e8de6204b Reject CAMIOGET and CAMIOQUEUE ioctl's on pass(4) in 32-bit compat mode.
These take a union ccb argument which is full of kernel pointers.
Substantial translation efforts would be required to make this work.
By rejecting the request we avoid processing or returning entierly
wrong data.

Reviewed by:	imp, ken, markj, cem
Obtained from:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14654
2018-03-12 22:58:07 +00:00
brooks
8673b89284 MIPS: Implement fue*word* and casueword* in assembly.
Remove NO_FUEWORD so the 'e' variants are wrapped by the non-'e'
variants.  This is more correct and leaves sparc64 as the outlier.

Reviewed by:	jmallett, kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14603
2018-03-12 22:10:06 +00:00
tsoome
ad7bc92929 e1000g: this statement may fall through
The gcc 7 does check for switch statement fall through cases, and if legit,
such complaint can besilenced by /* FALLTHROUGH */ comment. Unfortunately
such comment is quite limited, but will still notify the reader.

This patch is backport from illumos, see
https://www.illumos.org/rb/r/941/

Reviewed by:	eadler
Differential Revision:	https://reviews.freebsd.org/D14663
2018-03-12 17:05:53 +00:00
mav
a7ab51623b Print fuses and fna fields in identify data.
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2018-03-12 16:31:25 +00:00
emaste
1f17982a75 ANSIfy sys/kern/imgact_* 2018-03-12 15:45:50 +00:00
emaste
ac81b47ca6 Linuxulator: apply style(9) to return
Sponsored by:	Turing Robotic Industries Inc.
2018-03-12 15:35:24 +00:00
ian
49fbeddce5 Give the atrtc_time_lock a unique name.
Reported by:	hps@
2018-03-12 15:26:11 +00:00
imp
02d268dd26 Tighten up periph lock to avoid some races
Make sure the periph lock is held around rmw access to softc data,
espeically flags, including work flags in iosched.
Add asserts for the periph lock where it should be held.

PR: 226510
Sponsored by: Netflix
Differential Review: https://reviews.freebsd.org/D14456
2018-03-12 15:17:16 +00:00
avg
d03e4e760f fix r297857, do not modify CPU extension bits under virtual machines
r297857 was meant for real hardware only.

PR:		213155
Submitted by:	mainland@apeiron.net
MFC after:	1 week
2018-03-12 11:28:09 +00:00
ae
95b4812930 Do not try to reassemble IPv6 fragments in "reass" rule.
ip_reass() expects IPv4 packet and will just corrupt any IPv6 packets
that it gets. Until proper IPv6 fragments handling function will be
implemented, pass IPv6 packets to next rule.

PR:		170604
MFC after:	1 week
2018-03-12 09:40:46 +00:00
cem
e60f2e5d99 Implement NO_WCAST_QUAL for gcc4.2 architectures 2018-03-12 05:41:27 +00:00
scottl
655aca928f Implement a sysctl to dump in-flight I/O state for debugging. The tool to
parse it will be committed in a separate action.

Sponsored by:	Netflix
2018-03-12 05:02:22 +00:00
manu
e086fcfb8c arm: Remove SoC Specific -MMCCAM kernelconfig
One should use the GENERIC-MMCCAM for this.
2018-03-11 23:14:50 +00:00
ian
de8e5f9bb1 Revert r330780, it was improperly tested and results in taking a spin
mutex before acquiring sleep mutexes.

Reported by:	kib@
2018-03-11 20:13:15 +00:00
ian
b00d088ba5 Remove MTX_NOPROFILE from atrtc_lock, it was inappropriately copy/pasted
from the i8254 driver when I created separate mutexes for each.  The i8254
driver could be the active timecounter, leading to recursion during mutex
profiling, but the atrtc driver cannot be a timecounter, so it isn't needed.
2018-03-11 19:56:07 +00:00
ian
00abf0e72d Eliminate atrtc_time_lock, and use atrtc_lock for efirtc locking. 2018-03-11 19:22:58 +00:00
ae
c27960e4e8 Rework key_sendup_mbuf() a bit:
o count in_nomem counter when we have failed to allocate mbuf for
  promisc socket;
o count in_msgtarget counter when we have secussfully sent data to socket;
o Since we are sending messages in a loop, returning error on first fail
  interrupts the loop, and all remaining sockets will not receive this
  message. So, do not return error when we have failed to send data to ALL
  or REGISTERED target. Return error only for KEY_SENDUP_ONE case. Now,
  when some socket has overfilled its receive buffer, this will not break
  other sockets.

MFC after:	2 weeks
2018-03-11 19:14:01 +00:00
ian
583404330f Everywhere that multiple registers are accessed in sequence, lock/unlock
just once around the whole group of accesses.
2018-03-11 18:54:45 +00:00
ae
b72f9e86e9 Add KASSERT to check that proper targed was used.
MFC after:	2 weeks
2018-03-11 18:46:40 +00:00
ae
c7b8ffc01f Replace panic() with KASSERTs.
MFC after:	2 weeks
2018-03-11 18:37:55 +00:00
ian
95221efb08 Use separate mutexes for atrtc and i8254 locking. Change all the strange
un-function-like RTC_LOCK/UNLOCK macro usage into normal function calls.
Since there is no longer any need to handle register access from a debugger
context, those function calls can just be regular mutex lock/unlock calls.

Requested by:  bde
2018-03-11 18:20:49 +00:00
ae
1629f3ec27 Check that we have PF_KEY sockets before iterating over all RAW sockets.
MFC after:	2 weeks
2018-03-11 18:10:59 +00:00
ae
a0388b5a24 Remove obsoleted and unused key_sendup() function.
Also remove declaration for nonexistend key_usrreq() function.

MFC after:	2 weeks
2018-03-11 18:03:55 +00:00
ian
5e5730983b Convert atrtc the new style rtc debugging output. Remove the db show
command handler which provided much the same information.  Removing the
possibility of accessing the hardware regs from the debugger context
paves the way for simplifying the locking code in the driver.
2018-03-11 16:57:14 +00:00
brooks
1db1eebcab Remove obsolete pcaudioio.h.
Nothing uses the #define's values or the types.  (Some NTP code does use
an audio_info_t, but it is in #ifdef'd support for Solaris and is not
this audio_info_t).

Sponsored by:	DARPA, AFRL
2018-03-11 16:17:53 +00:00
mav
73b7aa323b Add new opcodes and statuses from NVMe 1.3a.
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2018-03-11 06:30:09 +00:00
mav
559bce3bae Add new identify data structures fields from NVMe 1.3a.
Some of them are already supported by existing hardware, so reporting
them `nvmecontrol identify` can be useful.
2018-03-11 05:09:02 +00:00
manu
bb028756a3 extres/regulators: Add sysctls for regulators
For each regulators create an hw.regulator.<regname>. :
uvolt: Current value
always_on: 1 If the reg is always on
boot_on: 1 If the reg is set at boot time
enable_cnt: Number of consumer(s)
enable_delay: Delay before enabling the regulator
ramp_delay: The Ramp delay
max_uamp: The maximum value of the regulator in uAmps
min_uamp: The minimal value of the regulator in uAmps
max_uvolt: The maximum value of the regulator in uVolts
min_uvolt: The minimal value of the regulator in uVolts

Reviewed by:	ian
Differential Revision:	https://reviews.freebsd.org/D14578
2018-03-11 04:37:05 +00:00
manu
16ffb66f76 allwinner: Add IR clock to sun8i
Add ir clock definition to sun8i-r-ccu.
No idea if it's working but aw_cir seems happy now and the frequency
is set to 3Mhz as it should.
2018-03-11 04:01:23 +00:00
nwhitehorn
54e47408c5 Make FDT-using parts of ofw_machdep.c condition on options FDT. This fixes
the kernel build when options FDT is absent.
2018-03-11 01:09:31 +00:00
avos
268d447d05 otus(4): check mcast / mgt / ucast rates during Tx descriptor setup
These parameters may be changed via ifconfig(8); by default,
mgt / mcast rates are lowest possible and ucast rate is not set
(matches previous configuration).

While here, store some variables locally for better readability.
2018-03-11 00:38:08 +00:00
avos
018ee508e6 rtwn(4): reset Tx power values before calling get_txpower()
for RTL8192C / RTL8188E (like it is done for other chipsets).
2018-03-10 23:47:03 +00:00
avos
bafaad47a3 usb/wlan/*: properly include "opt_wlan.h" into all drivers
Without it driver cannot be loaded when wlan(4) module is built with
'options IEEE80211_DEBUG_REFCNT'.
2018-03-10 23:16:24 +00:00
avos
3a6afc1bc6 run(4): drop few unused variables.
Found by: Clang static analyzer
2018-03-10 22:52:39 +00:00
ian
8b0ba03b3f Make root mount timeout logic work for filesystems other than ufs.
The vfs.mountroot.timeout tunable and .timeout directive in a mount.conf(5)
file allow specifying a wait timeout for the device(s) hosting the root
filesystem to become usable.  The current mechanism for waiting for devices
and detecting their availability can't be used for zfs-hosted filesystems.
See the comment #20 in the PR for some expanded detail on these points.

This change adds retry logic to the actual root filesystem mount.  That is,
insted of relying on device availability using device name lookups, it uses
the kernel_mount() call itself to detect whether the filesystem can be
mounted, and loops until it succeeds or the configured timeout is exceeded.

These changes are based on the patch attached to the PR, but it's rewritten
enough that all mistakes belong to me.

PR:		208882
X-MFC after:	sufficient testing, and hopefully in time for 11.1
2018-03-10 22:07:57 +00:00
trasz
063049cf5d Check for duplicates when modifying an iSCSI session. Previously we did
this check on open, but "iscsictl -M", or an iSCSI redirect received by
iscsid(8) could end up with two sessions with the same target name and
portal.

MFC after:	2 weeks
2018-03-10 14:21:37 +00:00
gonzo
12c635bc5e [rpi] remove IRQ support for BCM233x RNG
Upstream DTBs don't provide IRQ lines for the RNG. Moreover, harvesting
bytes as often as the RNG interrupt is triggered (87 times per sec) is an
overkill.

For these reasons, get rid of the interrupt mode and make callout mode the
default, with random bits harvested every 4 seconds.

Submitted by:	Sylvain Garrigues <sylgar@gmail.com>
Reviewed by:	ian, imp, manu, mmel
Approved by:	emaste
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D14541
2018-03-10 02:49:58 +00:00
bdrewery
38895a107b Fix rebase mismerge in r330724.
X-MFC-With:	r330724
MFC after:	2 weeks
Sponsored by:	Dell EMC
2018-03-10 02:13:48 +00:00
bdrewery
d53bde59c3 Don't skip reading depend for 'make obj' unless it is alone.
This was effectively done in bsd.dep.mk quite some time ago.

MFC after:	2 weeks
Sponsored by:	Dell EMC
2018-03-10 02:10:26 +00:00
bdrewery
cb3c3fefe1 Skip reading depend files with -V unless looking up a depend variable.
This speeds up some simple -V lookups significantly.

Reported by:	bde
MFC after:	2 weeks
Sponsored by:	Dell EMC
2018-03-10 02:10:19 +00:00
bdrewery
1cfde7f353 Reduce overhead for simple 'make -V' lookups by avoiding 'find sys/'.
Setting -DNO_SKIP_MPATH can be used for debugging.

Reported by:	bde
MFC after:	2 weeks
Sponsored by:	Dell EMC
2018-03-10 02:09:36 +00:00
cem
707a958fe9 subr_gtaskqueue: Fix braino from r330715
Submitted by:	markj
Sponsored by:	Dell EMC Isilon
2018-03-10 01:53:42 +00:00
cem
9ab6695f02 nvme_da: Fix minor memory leak in error case
Reported by:	cppcheck
Sponsored by:	Dell EMC Isilon
2018-03-10 01:28:55 +00:00
brooks
9817f60530 Remove obsolete dataacq.h.
Nothing includes this file, lists it in a Makefile, or uses any of the
ioctl definitions.
2018-03-10 01:07:30 +00:00
cem
bde6462cfb subr_gtaskqueue: Fix minor leak of tq_name in error case
Reported by:	cppcheck
Sponsored by:	Dell EMC Isilon
2018-03-10 01:01:01 +00:00
cem
c4ca702797 mlx5(4): Remove redundant declaration of mlx5_enter_error_state
Broken in r330644.

Sponsored by:	Dell EMC Isilon
2018-03-10 00:59:48 +00:00
hselasky
58bad4a91d Implement proper support for complete_all() in the LinuxKPI.
When complete_all() is called there might be multiple waiters. The
current implementation could only handle one waiter. Make sure the
completion is sticky when complete_all() is called to be compatible
with Linux.

Found by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-03-09 12:16:55 +00:00
avos
b9207b4013 net80211: wrap protection frame allocation into ieee80211_alloc_prot()
Move copy-pasted code for RTS/CTS frame allocation into net80211.
While here, add stat / debug message for allocation failures
(copied from run(4)) + return error here in bwn(4).

Reviewed by:	adrian
Differential Revision:	https://reviews.freebsd.org/D14628
2018-03-09 11:33:56 +00:00
andrew
39dd4716fd Use the correct address to write back to memory in the GICv3 ITS driver.
This seems to no be needed on supported hardware as they are cache-coherent,
however this may not be the case on all platforms.

Sponsored by:	DARPA, AFRL
2018-03-09 10:34:44 +00:00
ume
e35a88be6c Fix Bad file descriptor error.
MFC after:	1 week
2018-03-09 04:45:24 +00:00
brooks
efdbf71b92 Copyout a whole int to cpuset_domain's policy pointer.
The previous code only copied 16-bits and corrupted the target int.

Reviewed by:	kib, markj
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14611
2018-03-09 00:50:40 +00:00
sbruno
fcab00fadd Update tcp_lro with tested bugfixes from Netflix and LLNW:
rrs - Lets make the LRO code look for true dup-acks and window update acks
          fly on through and combine.
    rrs - Make the LRO engine a bit more aware of ack-only seq space. Lets not
          have it incorrectly wipe out newer acks for older acks when we have
          out-of-order acks (common in wifi environments).
    jeggleston - LRO eating window updates

Based on all of the above I think we are RFC compliant doing it this way:

https://tools.ietf.org/html/rfc1122

section 4.2.2.16

"Note that TCP has a heuristic to select the latest window update despite
possible datagram reordering; as a result, it may ignore a window update with
a smaller window than previously offered if neither the sequence number nor the
acknowledgment number is increased."

Submitted by:	Kevin Bowling <kevin.bowling@kev009.com>
Reviewed by:	rstone gallatin
Sponsored by:	NetFlix and Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14540
2018-03-09 00:08:43 +00:00
manu
887aed6d15 arm: Add GENERIC-MMCCAM kernel config
MMCCAM is the new mmc stack currently developped by kibab@, add a kernel
configuration file that include GENERIC so it's easier to test for people.
2018-03-08 22:54:50 +00:00
manu
fb8d1c20e3 Fix build when option MMCCAM is defined. 2018-03-08 22:49:36 +00:00
kib
0ec7603cd4 Remove unused variable.
Sponsored by:	The FreeBSD Foundation
2018-03-08 22:04:54 +00:00
kib
8fb6f9328a Make mlx5 compilable on ILP32 arches.
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2018-03-08 22:03:43 +00:00
emaste
47aed18b8a bktr: correct Japan IF frequency
PR:		36451
Submitted by:	Hijiri Umemoto <hijiri at umemoto.org>
MFC after:	2 weeks
2018-03-08 19:24:10 +00:00
emaste
7b5146ffd1 asmc: update temperature sensor name/description
PR:		225911
Submitted by:	Trev <fbsdbugs4 at sentry.org>
MFC after:	1 week
2018-03-08 18:52:47 +00:00
avos
6413474a76 iwi(4): factor out rateset setup into iwi_set_rateset().
No functional change intended.
2018-03-08 18:42:23 +00:00
markj
41071ca69a Return E2BIG if we run out of space writing a compressed kernel dump.
ENOSPC causes the MD kernel dump code to retry the dump, but this is
undesirable in the case where we legitimately ran out of space.
2018-03-08 17:04:36 +00:00
hselasky
e51031ae13 Set correct SL in completion for RoCE in mlx5ib(4).
There is a difference when parsing a completion entry between Ethernet
and IB ports. When link layer is Ethernet the bits describe the type of
L3 header in the packet. In the case when link layer is Ethernet and VLAN
header is present the value of SL is equal to the 3 UP bits in the VLAN
header. If VLAN header is not present then the SL is undefined and consumer
of the completion should check if IB_WC_WITH_VLAN is set.

While that, this patch also fills the vlan_id field in the completion if
present.

linux commit 12f8fedef2ec94c783f929126b20440a01512c14

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 16:27:31 +00:00
hselasky
e298ebb5d9 Add call to setup firmware data dump structure during device load in
mlx5core.

Do not consider the inability to create a firmware dump fatal, but
inform about the situation and allow the driver to attach. The device
might not implement the needed VSC, or we might not know the layout of
the registers map. In either case, only firmware dump functionality is
limited, the network operations should be fine.

Submitted by:	kib@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 16:19:01 +00:00
hselasky
d052f35859 Avoid more LFENCE/SFENCe on x86 in mlx5en(4),
by using the FreeBSD native fences.

Submitted by:	kib@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 15:58:30 +00:00
hselasky
8354e9c7e8 Fix mlx5en(4) driver to properly call m_defrag().
When the mlx5en(4) driver was converted to using BUSDMA(9) the call to
m_defrag() was moved after the part of the TX routine that strips the
header from the mbuf chain. Before it called m_defrag it first trimmed
off the now-empty mbufs from the start of the chain. This has the side
effect of also removing the head of the chain that has M_PKTHDR set.
m_defrag() will not defrag a chain that does not have M_PKTHDR set,
thus it was effectively never defragging the mbuf chains.

As it turns out, trimming the mbufs in this fashion is unnecessary since
the call to bus_dmamap_load_mbuf_sg doesn't map empty mbufs anyway, so
remove it.

Differential Revision:	https://reviews.freebsd.org/D12050
Submitted by:	mjoras@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 15:53:04 +00:00
hselasky
de9c6c6b70 Use vport rather than physical-port MTU in mlx5en(4).
Set and report vport MTU rather than physical MTU,
The driver will set both vport and physical port mtu
and will rely on the query of vport mtu.

SRIOV VFs have to report their MTU to their vport manager (PF),
and this will allow them to work with any MTU they need
without failing the request.

Also for some cases where the PF is not a port owner, PF can
work with MTU less than the physical port mtu if set physical
port mtu didn't take effect.

Based on Linux upstream commit:
cd255efff9baadd654d6160e52d17ae7c568c9d3

Submitted by:	Meny Yossefi <menyy@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 15:47:17 +00:00
hselasky
df57808ec1 Use the device unit number for naming the ifnet interface in mlx5en(4).
Currently the ifnet interface is named mceX, where X is a monotonically
incremented value. If the device is reset due to a fatal error, then the
interface name will change.  Using the device unit number will keep the
naming consistent across the reset logic.

Submitted by:	Matthew Finlay <matt@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 15:43:41 +00:00
hselasky
eb028439e6 Remove duplicate prototypes.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 15:37:09 +00:00
hselasky
d1efa6e93f Add kernel and userspace code to dump the firmware state of supported
ConnectX-4/5 devices in mlx5core.

The dump is obtained by reading a predefined register map from the
non-destructive crspace, accessible by the vendor-specific PCIe
capability (VSC). The dump is stored in preallocated kernel memory and
managed by the mlx5tool(8), which communicates with the driver using a
character device node.

The utility allows to store the dump in format
    <address> <value>
into a file, to reset the dump content, and to manually initiate the
dump.

A call to mlx5_fwdump() should be added at the places where a dump
must be fetched automatically. The most likely place is right before a
firmware reset request.

Submitted by:	kib@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 15:21:56 +00:00
hselasky
2514deebb7 Add vendor specific capability interface support in mlx5core.
Add the ability to access the vendor specific space gateway in order
to support reading and writing data into the different configuration
domains.

Submitted by:	Matthew Finlay <matt@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 11:59:47 +00:00
hselasky
b0b6754a3a Use device_printf() instead of printf() when printing warnings and errors
to dmesg(8) in mlx5core.

Submitted by:	Matthew Finlay <matt@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 11:58:27 +00:00
hselasky
330bf9c54e Add support for per priority flow control, PFC, to mlx5en(4).
Add support for PFC and implement reading the per priority statistics
using the sysctl(8) interface. PFC is used together with VLAN priority
and can be enabled and disabled on a per priority basis.

Global pause frames and PFC are incompatible features and surrounding
logic has been added to warn the user about misconfiguration.

Update relevant mlx5core APIs for PFC configuration.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 11:40:39 +00:00
hselasky
5686316ed2 Add support for explicit congestion notification, ECN, to mlx5ib(4).
ECN configuration and statistics is available through a set of sysctl(8)
nodes under sys.class.infiniband.mlx5_X.cong . The ECN configuration
nodes can also be used as loader tunables.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 11:23:14 +00:00
hselasky
b363020feb Use the autogenerated interface file for all commands in mlx5core.
This patch accumulates the following Linux commits:
- 90b3e38d048f09b22fb50bcd460cea65fd00b2d7
  mlx5_core: Modify CQ moderation parameters
- 09a7d9eca1a6cf5eb4f9abfdf8914db9dbd96f08
  mlx5_core: QP/XRCD commands via mlx5 ifc
- 1a412fb1caa2c1b77719ccb5ed8b0c3c2bc65da7
  mlx5_core: Modify QP commands via mlx5 ifc
- ec22eb53106be1472ba6573dc900943f52f8fd1e
  mlx5_core: MKey/PSV commands via mlx5 ifc
- 73b626c182dff06867ceba996a819e8372c9b2ce
  mlx5_core: EQ commands via mlx5 ifc
- 20ed51c643b6296789a48adc3bc2cc875a1612cf
  mlx5_core: Access register and MAD IFC commands via mlx5 ifc
- a533ed5e179cd15512d40282617909d3482a771c
  mlx5_core: Pages management commands via mlx5 ifc
- b8a4ddb2e8f44f872fb93bbda2d541b27079fd2b
  mlx5_core: Add MLX5_ARRAY_SET64 to fix BUILD_BUG_ON
- af1ba291c5e498973cc325c501dd8da80b234571
  mlx5_core: Refactor internal SRQ API
- b06e7de8a9d8d1d540ec122bbdf2face2a211634
  mlx5_core: Refactor device capability function
- c4f287c4a6ac489c18afc4acc4353141a8c53070
  mlx5_core: Unify and improve command interface

Submitted by:	Matthew Finlay <matt@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 10:43:42 +00:00
hselasky
3b71a31f1b Fix race between PCI error handlers and health work in mlx5core.
linux commit 05ac2c0b7438ea08c5d54b48797acf9b22cb2f6f

Submitted by:	Matthew Finlay <matt@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 09:58:41 +00:00
hselasky
11da02d327 Avoid calling sleeping function from the health poll thread in mlx5core.
linux commit c1d4d2e92ad670168a17a57dfa182a5a5baa72d4

Submitted by:	Matthew Finlay <matt@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 09:51:33 +00:00
hselasky
f5d1640091 Updates for PCI and health monitor recovery in mlx5core.
This patch accumulates the following Linux commits:

mlx5_health.c
- 78ccb25861d76a8fc5c678d762180e6918834200
  mlx5_core: Fix wrong name in struct
- 171bb2c560f45c0427ca3776a4c8f4e26e559400
  mlx5_core: Update health syndromes
- 0144a95e2ad53a40c62148f44fb0c1f9d2a0d1e9
  mlx5_core: Use accessor functions to read from device memory
- ac6ea6e81a80172612e0c9ef93720f371b198918
  mlx5_core: Use private health thread for each device
- fd76ee4da55abb21babfc69310d321b9cb9a32e0
  mlx5_core: Fix internal error detection conditions
- 2241007b3d783cbdbaa78c30bdb1994278b6f9b9
  mlx5: Clear health sick bit when starting health poll
- 712bfef60912d91033cb25739f7444d5b8d8c59f
  mlx5: Fix version printout in case of health issue
- 89d44f0a6c732db23b219be708e2fe1e03ee4842
  mlx5_core: Add pci error handlers to mlx5_core driver

mlx5_cmd.c
- be87544de8df2b1eb34bcb5e32691287d96f9ec4
  mlx5_core: Fix async commands return code
- a31208b1e11df334d443ec8cace7636150bb8ce2
  mlx5_core: New init and exit flow for mlx5_core
- 020446e01eebc9dbe7eda038e570ab9c7ab13586
  mlx5_core: Prepare cmd interface to system errors handling
- 89d44f0a6c732db23b219be708e2fe1e03ee4842
  mlx5_core: Add pci error handlers to mlx5_core driver
- 0d834442cc247c7b3f3bd6019512ae03e96dd99a
  mlx5: Fix teardown errors that happen in pci error handler

mlx5_main.c
- 5fc7197d3a256d9c5de3134870304b24892a4908
  mlx5: Add pci shutdown callback

Submitted by:	Matthew Finlay <matt@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 09:47:09 +00:00
jeff
ff01cfd694 Don't assert that the domain free lock is held until we're certain that
there is a valid reservation.  This can trip erroneously when memory
falls within a domain but doesn't have the reservation initialized because
it does not meet size or alignment requirements.

Reported by:	pho, mjg
Sponsored by:	Netflix, Dell/EMC Isilon
2018-03-07 22:04:27 +00:00
tychon
6cbf64d8bf Fix a lock recursion introduced in r327065.
Reported by:	kmacy
Reviewed by:	grehan, jhb
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14548
2018-03-07 18:03:22 +00:00
nwhitehorn
33f088cfc9 Move the powerpc64 direct map base address from zero to high memory. This
accomplishes a few things:
- Makes NULL an invalid address in the kernel, which is useful for catching
  bugs.
- Lays groundwork for radix-tree translation on POWER9, which requires the
  direct map be at high memory.
- Similarly lays groundwork for a direct map on 64-bit Book-E.

The new base address is chosen as the base of the fourth radix quadrant
(the minimum kernel address in this translation mode) and because all
supported CPUs ignore at least the first two bits of addresses in real
mode, allowing direct-map addresses to be used in real-mode handlers.
This is required by Linux and is part of the architecture standard
starting in POWER ISA 3, so can be relied upon.

Reviewed by:	jhibbits, Breno Leitao
Differential Revision:	D14499
2018-03-07 17:08:07 +00:00
hselasky
fee156166f Implement priority to traffic class mapping in mlx5core.
Add support for mapping priority to traffic class via sysctl

Submitted by:	Slava Shwartsman <slavash@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 15:23:07 +00:00
hselasky
45fe161a52 Implement rate limit per traffic class in mlx5core.
Add support for rate limiting traffic class via sysctl.

Submitted by:	Slava Shwartsman <slavash@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 15:17:36 +00:00
hselasky
bfb0b7487a Implement missing query for current port rate in mlx5ib(4).
- Factor out port speed definitions into new port.h header file,
  similarly as done in Linux upstream.
- Correct two existing port speed definitions in mlx5en according to
  Linux upstream.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 15:03:11 +00:00
hselasky
94029962b9 Add log message for unsupported QSFPs in mlx5core.
Submitted by:	Matthew Finlay <matt@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 14:51:50 +00:00
hselasky
ab9224328e Make sure default VNET is set when adding a new interface in mlx5core.
Adding an interface might be done outside the device_attach() routine
and will then cause a panic, due to the VNET not being set.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 14:49:27 +00:00
eadler
7927756d5b sys/cloudabi: Avoid relying on GNU specific extensions
An empty initializer list is not technically valid C grammar.

MFC After:	1 week
2018-03-07 14:47:43 +00:00
eadler
9b1ea58f58 sys: Fix a few potential infoleaks in cloudabi
While there is no immediate leak, if the structure changes underneath
us, there might be in the future.

Submitted by:	Domagoj Stolfa <domagoj.stolfa@gmail.com>
MFC After:	1 month
Sponsored by:	DARPA/AFRL
2018-03-07 14:44:32 +00:00
hselasky
bcaf4a6169 Add timeout handle to commands with callback in mlx5core.
The current implementation does not handle timeout in case of command
with callback request, and this can lead to deadlock if the command
doesn't get firmware response. Add delayed callback timeout work
before posting the command to firmware. In case of real firmware
command completion we will cancel the delayed work. In case of
firmware command timeout the callback timeout handler will be called
and it will simulate firmware completion with timeout error.

linux commit 65ee67084589c1783a74b4a4a5db38d7264ec8b5

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 14:41:29 +00:00
hselasky
deeafae0ca Fix potential deadlock in command mode change in mlx5core.
Call command completion handler in case of timeout when working in
interrupts mode. Avoid flushing the commands workqueue after acquiring
the semaphores to prevent a potential deadlock.

linux commit commit 9cba4ebcf374c3772f6eb61f2d065294b2451b49

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 14:35:28 +00:00
hselasky
7525588e0c Use a macro in mlx5_command_str() instead of copying OP name.
linux commit 42ca502e179d0654ef441333a9d0f35c948734f3

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 14:29:30 +00:00
hselasky
13f5c3c4e7 Disable unsupported disassociate ucontext functionality in mlx5ib(4).
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 14:03:31 +00:00
hselasky
847a90f040 Bump version information in mlx4ib(4).
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 13:59:46 +00:00
hselasky
af0869686f The mlx4ib(4) should not be loaded before the ibcore is initialized.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 13:58:58 +00:00
hselasky
970559599e Disable unsupported disassociate ucontext functionality in mlx4ib(4).
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 13:57:32 +00:00
andrew
d682824945 Bump MAXCPUS on arm64. We are starting to see hardware with more than 96
cores so increase it to the same as amd64.

Sponsored by:	DARPA, AFRL
Sponsored by:	Cavium (Hardware)
2018-03-07 13:54:44 +00:00
avg
b31a639914 MFV r330591: 8984 fix for 6764 breaks ACL inheritance
illumos/illumos-gate@e9bacc6d1a
e9bacc6d1a

https://www.illumos.org/issues/8984
  Consider a directory configured as:
  drwx-ws---+ 2 henson cpp 3 Jan 23 12:35 dropbox/
  user:henson:rwxpdDaARWcC--:f-i----:allow
  owner@:--------------:f-i----:allow
  group@:--------------:f-i----:allow
  everyone@:--------------:f-i----:allow
  owner@:rwxpdDaARWcC--:-di----:allow
  group:cpp:-wx-----------:-------:allow
  owner@:rwxpdDaARWcC--:-------:allow
  A new file created in this directory ends up looking like:
  rw-r--r-+ 1 astudent cpp 0 Jan 23 12:39 testfile
  user:henson:rw-pdDaARWcC--:------I:allow
  owner@:--------------:------I:allow
  group@:--------------:------I:allow
  everyone@:--------------:------I:allow
  owner@:rw-p--aARWcCos:-------:allow
  group@:r-----a-R-c--s:-------:allow
  everyone@:r-----a-R-c--s:-------:allow
  with extraneous group@ and everyone@ entries allowing read access that
  shouldn't exist.
  Per Albert Lee on the zfs mailing list:
  "aclinherit=passthrough/passthrough-x should still
  ignore the requested mode when an inheritable ACE for owner@ group@,
  or everyone@ is present in the parent directory.
  It appears there was an oversight in my fix for
  https://www.illumos.org/issues/6764 which made calling zfs_acl_chmod
  from zfs_acl_inherit unconditional. I think the parent ACL check for
  aclinherit=passthrough needs to be reintroduced in zfs_acl_inherit."
  We have a large number of faculty who use dropbox directories like the example
  to have students submit projects. All of these directories are now allowing

Reviewed by: Sam Zaydel <szaydel@racktopsystems.com>
Reviewed by: Paul B. Henson <henson@acm.org>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Dominik Hassler <hadfl@omniosce.org>

PR:		216886
MFC after:	2 weeks
2018-03-07 13:49:26 +00:00
hselasky
57ecb463ee Make sure VNET is set when calling sa6_recoverscope() in ibcore.
Else panic will occur when VIMAGE is enabled.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 13:32:52 +00:00
hselasky
8ed779d4df Define values instead of using hardcoding.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 13:30:38 +00:00
hselasky
638909cdbe Recover IPv6 scope ID for multicast link-local addresses as well as
unicast link-local addresses.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 13:28:12 +00:00
hselasky
3445322d74 Embed the IPv6 scope ID before calling rtalloc1() in ibcore.
Else rtalloc1() will resolve to the loopback interface.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 13:25:40 +00:00
andrew
ed08bbe6a2 Create macros for the ACPI interrupt cross references. This is considered a
band aid until a better solution to find the correct interrupt controller
can be found.

While here fix one place in the GICv3 ITS driver where the offset wasn't
correctly applied.

Sponsored by:	DARPA, AFRL
Sponsored by:	Cavium (Hardware)
2018-03-07 13:16:03 +00:00
hselasky
d239624a48 Add IB_SPEED_HDR definition in ibcore.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 13:01:00 +00:00
hselasky
1f96ecc8c3 Make sure the IPv6 scope ID gets properly masked in ibcore.
When exchanging CM messages the IPv6 scope ID should be ignored
for link local addresses when doing comparisons. Make sure the
scope ID is always set to zero for link local addresses.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 12:58:51 +00:00
hselasky
fddeee5573 Fix for use-after-free when using delayed work structures in ibcore.
It is not enough to cancel delayed work structures before freeing.
Always cancel delayed work synchronously before freeing!

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 12:56:04 +00:00
andrew
e71e95a425 Add an acpi attachment to the pci_host_generic driver and have the ACPI
bus provide it with its needed memory resources.

This allows us to use PCIe on the ThunderX2 and, with a previous version
of the patch, on the SoftIron 3000 with ACPI.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
Sponsored by:	DARPA, AFRL
Sponsored by:	Cavium (Hardware)
Differential Revision:	https://reviews.freebsd.org/D8767
2018-03-07 10:47:27 +00:00
andrew
32b7f64a11 Restrict the arm64 DMAP region to the 1G blocks where we have at least
one physical page. This is in preparation for limiting it further as this
is needed on some hardware, however testing has shown issues with further
restricting the DMAP and ACPI.

Sponsored by:	DARPA, AFRL
Sponsored by:	Cavium (Hardware)
2018-03-07 09:58:36 +00:00
cem
b3df8f9162 g_part_gpt: Fix memory leak in error path
If g_part_gpt_read() encountered a disk with bad primary and secondary
tables, it could leak memory.

Reported by:	Coverity
Sponsored by:	Dell EMC Isilon
2018-03-07 01:55:50 +00:00
gonzo
141e81ad58 [ig4] Add support for i2c controllers on Skylake and Kaby Lake
This was tested by Ben on  HP Chromebook 13 G1 with a
Skylake CPU and Sunrise Point-LP I2C controller and by me on
Minnowboard Turbot with Atom E3826 (formerly Bay Trail)

Submitted by:	Ben Pye <ben@curlybracket.co.uk>
Reviewed by:	gonzo
Obtained from:	DragonflyBSD (a4549657 by Imre Vadász)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D13654
2018-03-06 23:39:43 +00:00
kevans
f7b8124e8e aw_usbphy: Move later to SUPPORTDEV pass
vbus-supply properties may be specified for each PHY. These properties
reference a regulator that we must turn on/off as we turn the PHY on/off.
However, if the usbphy comes up before the regulator in question (as is the
case with GPIO-controlled regulators), then we will fail to grab a handle to
the regulator and control it as the PHY power state changes.

Fix it by just attaching the usbphy driver later. We don't really need it at
RESOURCE, we just need it to be before DEFAULT when ehci/ohci attach. In
particular, this fixes the USB NIC on a board that we don't yet supported-
without this, it will not power on and if_ure cannot attach.

Tested on:	various boards [manu]
Tested on:	OrangePi R1 [Rap2 (irc)]
Reported by:	Rap2 (irc, "Cannot find USB NIC")
2018-03-06 22:45:45 +00:00
cem
9f71c5847f psm(4): Initialize variables before use
dxp/dyp could have been used uninitialized in the subsequent debugging log
invocation.

Reported by:	Coverity
Sponsored by:	Dell EMC Isilon
2018-03-06 20:31:14 +00:00
brooks
8b84863f4a Remove reference to unimplemented fuiword, etc.
We don't support Harvard architectures.
2018-03-06 18:28:55 +00:00
nwhitehorn
af59ead0bb Fix use of unitialized variables. 2018-03-06 15:52:43 +00:00
markj
9d45d000b2 Unbreak amd64 FBT after r330539.
X-MFC with:	r330539
2018-03-06 15:51:59 +00:00
jtl
8e9b6569cb amd64: Protect the kernel text, data, and BSS by setting the RW/NX bits
correctly for the data contained on each memory page.

There are several components to this change:
 * Add a variable to indicate the start of the R/W portion of the
   initial memory.
 * Stop detecting NX bit support for each AP.  Instead, use the value
   from the BSP and, if supported, activate the feature on the other
   APs just before loading the correct page table.  (Functionally, we
   already assume that the BSP and all APs had the same support or
   lack of support for the NX bit.)
 * Set the RW and NX bits correctly for the kernel text, data, and
   BSS (subject to some caveats below).
 * Ensure DDB can write to memory when necessary (such as to set a
   breakpoint).
 * Ensure GDB can write to memory when necessary (such as to set a
   breakpoint).  For this purpose, add new MD functions gdb_begin_write()
   and gdb_end_write() which the GDB support code can call before and
   after writing to memory.

This change is not comprehensive:
 * It doesn't do anything to protect modules.
 * It doesn't do anything for kernel memory allocated after the kernel
   starts running.
 * In order to avoid excessive memory inefficiency, it may let multiple
   types of data share a 2M page, and assigns the most permissions
   needed for data on that page.

Reviewed by:	jhb, kib
Discussed with:	emaste
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D14282
2018-03-06 14:28:37 +00:00
jtl
403e82edfe Nudge lld to break the kernel read-only and read-write sections into
separate 2M pages.  The binutils default for max-page-size and
common-page-size used to produce this result.  By setting these
values, we can nudge lld to also separate these sections into separate
2M pages.

Reviewed by:	jhb, kib
Discussed with:	emaste
Sponsored by:	Netflix
Differential Revision:	D14282
2018-03-06 14:18:45 +00:00
ae
3da6faa563 Add mapping for several ethernet types used by Linux to FreeBSD
ethernet types.

Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D14594
2018-03-06 12:58:00 +00:00
ae
d3176e34c7 Define ethernet type 0x88A8 as ETHERTYPE_QINQ.
Reviewed by:	kp
Obtained from:	OpenBSD
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D14593
2018-03-06 12:01:31 +00:00
ian
bdd407c104 Build the ds1672 driver as a module. Add a detach() to unregister the rtc. 2018-03-06 02:30:34 +00:00
ian
0a56e231f1 Fix a paste-o that broke the build. There is no softc pointer here, just
use the dev arg.

Reported by:	Jonathan Looney <jonlooney@gmail.com>
Pointy hat:	ian@
2018-03-06 02:21:41 +00:00
brooks
301d9bace0 Use umtx_copyin_umtx_time32() in __umtx_op_lock_umutex_compat32().
Non-NULL timeouts where copied in improperly and could produce failures
due to incompatible data structures.

Reviewed by:	kib
MFC after:	3 days
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14587
2018-03-06 01:52:04 +00:00
cem
04657e23fa MFV: zstd: FIO_addFInfo: Fully initialize output 'total' struct
Silence a Coverity warning about 'windowSize' being uninitialized.
(Yes, nothing that calls this routine actually uses the windowSize
value.  Still, appeasing Coverity is pretty harmless in this case.)

Reported by:	Coverity
Reviewed by:	Yann Collet
Obtained from:	zstd 606374269cf3485972c90b993fbb84dc20da032f
Sponsored by:	Dell EMC Isilon
2018-03-05 20:03:45 +00:00
brooks
16ef654221 Regen after r330517. 2018-03-05 17:02:50 +00:00
brooks
5d5c624759 Remove remenants of 1990s efforts to let us run Net/OpenBSD binaries.
No functional change (comments change in some generated files.)

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14571
2018-03-05 17:02:16 +00:00
jtl
5a8fb2b9d6 We shouldn't need to execute code in the recursive page table mappings;
therefore, it should be safe to set the NX bit on the PML4E for the
recursive page table mappings.  According to the Intel docs, the effect
of the NX bit should propogate to any page reached through a PML4E which
has the NX bit set.

Reviewed by:	kib, markj
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D14333
2018-03-05 15:12:35 +00:00
jtl
886958fbdb Prior to r329071, pmap_bootstrap() used pmap_kmem_choose() to round the
first available virtual address to a 2MB boundary. After r329071,
create_pagetables() rounds firstaddr up to a 2MB boundary. This ensures
the kernel is mapped in super-pages, which is the point of the logic
in pmap_kmem_choose(). Therefore, it is no longer necessary for
pmap_bootstrap() to round up to the 2MB boundary again.

As pmap_bootstrap() was the only user of pmap_kmem_choose(), we can
delete pmap_kmem_choose().

Reviewed by:	kib
MFC after:	2 weeks
X-MFC-with:	r329071
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D14355
2018-03-05 15:10:17 +00:00
hselasky
5f8973a373 Optimize ibcore RoCE address handle creation from user-space.
Creating a UD address handle from user-space or from the kernel-space,
when the link layer is ethernet, requires resolving the remote L3
address into a L2 address. Doing this from the kernel is easy because
the required ARP(IPv4) and ND6(IPv6) address resolving APIs are readily
available. In userspace such an interface does not exist and kernel
help is required.

It should be noted that in an IP-based GID environment, the GID itself
does not contain all the information needed to resolve the destination
IP address. For example information like VLAN ID and SCOPE ID, is not
part of the GID and must be fetched from the GID attributes. Therefore
a source GID should always be referred to as a GID index. Instead of
going through various racy steps to obtain information about the
GID attributes from user-space, this is now all done by the kernel.

This patch optimises the L3 to L2 address resolving using the existing
create address handle uverbs interface, retrieving back the L2 address
as an additional user-space information structure.

This commit combines the following Linux upstream commits:

IB/core: Let create_ah return extended response to user
IB/core: Change ib_resolve_eth_dmac to use it in create AH
IB/mlx5: Make create/destroy_ah available to userspace
IB/mlx5: Use kernel driver to help userspace create ah
IB/mlx5: Report that device has udata response in create_ah

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 14:34:52 +00:00
hselasky
1a0dfe6582 Get correct network device when accepting incoming RDMA connections in ibcore.
This patch ensures the GID index is always used as a basis of resolving
incoming RDMA connections, as compared to the GID value itself.

Background:
On a per infiniband port basis, the GID identifier is not a unique identifier!
This assumption falls apart when VLAN ID, IPv6 scope ID and RoCE type,
as supported by RoCE v2, is taken into account. This additional
information is stored in the so-called GID attributes and is needed to
correctly identify the destination network interface for an incoming
connection.

Different VLANs are allowed to define the same IPv4 addresses and especially
for the default IPv6 link-local addresses or when using so-called containers
or jails, this is true.

The VNET information for the destination network interface is needed in
order to perform the L2 address lookup in the right Virtual Network Stack
context.

Consequently old functions previously used by RoCE v1, like
rdma_addr_find_smac_by_sgid() are impossible to support, because
there can be multiple identical GIDs associated with the same
infiniband port, and the answer to such a request becomes undefined.
This function has been removed.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 14:24:30 +00:00
hselasky
22d19d1493 Pass valid if_index to rdma_addr_find_l2_eth_by_grh() in ibcore when possible.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 14:22:36 +00:00
hselasky
9d3ebabaf8 Add support for loopback in ibcore.
Implement the missing pieces in addr_resolve() to support loopback
addresses. IB core will test for the IFF_LOOPBACK flag in the network
interface and treat these devices in a special way.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 13:57:37 +00:00
hselasky
fe1e77a2db Make sure to register the VLAN GIDs using the VLAN network interface
and not the parent one in ibcore. Else looking up the VLAN GIDs will
fail for VLAN IPs.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 12:39:34 +00:00
hselasky
5d1e7279d7 Need to check for IPv6 linklocal address inside rdma_resolve_addr() in ibcore.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 12:04:34 +00:00
hselasky
790d7753c2 Map type of service, TOS, to IB or VLAN service level 1:1 in ibcore.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 11:59:54 +00:00
hselasky
1f4402b837 Select RoCEv2 by default in ibcore.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 11:58:37 +00:00
hselasky
357f044f72 Make deletion of RoCE GID entries synchronous in ibcore.
When a network device is departing, the RoCE GID entries should be
cleared before the default L2 link layer address is freed. Else a NULL
pointer access may happen.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 11:57:26 +00:00
hselasky
5bd9bfb44a Add support for IPv6 link local GIDs equal to the default GID for
VLANs in ibcore.

IPv6 link local addresses are usually derived from the netdev MAC
address. This is applicable to VLAN devices and its lower netdevice as
well. In such cases the IPv6 link local address is a duplicate of the
default GID.

Now that link local IPv6 addresses based GIDs are supported, allow
adding such GID entries in the GID table.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 11:55:29 +00:00
hselasky
3c9b062cb5 Do not add RoCEv2 default GID in ibcore when IPv6 is disabled to honor the
networking stack's IPv6 disabled setting. Else the offload HCA can start using
IPv6 packets for QPs.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 11:52:39 +00:00
hselasky
a0fcf1645d Add missing FreeBSD tags and SVN properties to ibcore.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 11:49:45 +00:00
andrew
8ea72b3894 Register each GICv3 ITS driver with a useful cross reference. We currently
only use the first driver, however this may change in the future and
hardware exists with multiple ITS devices.

Sponsored by:	DARPA, AFRL
Sponsored by:	Cavium (Hardware)
2018-03-05 10:11:30 +00:00
andrew
82fd7d44b1 In the ACPI GICv3 attach function call device_get_children to get the list
of children. We expect this to be populated when configuring the secondary
cores.

Sponsored by:	DARPA, AFRL
Sponsored by:	Cavium (Hardware)
2018-03-05 10:09:18 +00:00
ian
02bd6c4548 Switch imx_gpio to attach at BUS_PASS_INTERRUPT + BUS_PASS_ORDER_LATE.
Pretty much any other device might need to manipulate a gpio pin during its
probe or attach routines, so these devices must be available as early as
possible.

The gpio device is an interrupt controller, but I didn't choose the
INTERRUPT pass for that reason (it works fine as an interrupt controller as
long as it attaches any time before interrupts are enabled).  That just
looked like the right place in the passes to ensure that it attaches before
any type of device that might need gpio pin manipulations.
2018-03-05 02:32:23 +00:00
anish
919edd6e24 Move the new AMD-Vi IVHD [ACPI_IVRS_HARDWARE_NEW]definitions added in r329360 in contrib ACPI to local files till ACPI code adds new definitions reported by jkim.
Rename ACPI_IVRS_HARDWARE_NEW to ACPI_IVRS_HARDWARE_EFRSUP, since new definitions add Extended Feature Register support.  Use IvrsType to distinguish three types of IVHD - 0x10(legacy), 0x11 and 0x40(with EFR). IVHD 0x40 is also called mixed type since it supports HID device entries.
Fix 2 coverity bugs reported by cem.

Reported by:jkim, cem
Approved by:grehan
Differential Revision://reviews.freebsd.org/D14501
2018-03-05 02:28:25 +00:00
ian
ad847e650c Defer attaching the spibus until timers and interrupts are working. The
driver requires interrupts to do transfers, and the drivers for the SPI
devices on the bus quite reasonably expect to be able to do IO while probing
and attaching.
2018-03-05 02:13:28 +00:00
ian
12e327558c Do not stop the loop that configures gpio chipselect pins on the first
error, just ignore pins that don't configure and keep setting up the ones
that do.  (But when bootverbose is on, whine about the errors.)
2018-03-05 02:08:33 +00:00
ian
9d589144b0 Switch to the new bcd_clocktime conversion routines, and add calls to the
new clock_dbgprint_xxx() functions.
2018-03-05 00:43:53 +00:00
mjg
fb232db34a lockmgr: save on sleepq when cmpset fails 2018-03-05 00:30:07 +00:00
ian
b2d136be8c Switch to the new bcd_clocktime conversion routines, and add calls to the
new clock_dbgprint_xxx() functions.
2018-03-05 00:15:56 +00:00
ian
5bfe5ec015 Switch to the new bcd_clocktime conversion routines, and add calls to the
new clock_dbgprint_xxx() functions.
2018-03-04 23:39:40 +00:00
mjg
b77b8ba290 lockmgr: whack unused lockmgr_note_exclusive_upgrade 2018-03-04 22:14:20 +00:00
mjg
11dd702fc8 mtx: tidy up recursion handling in thread lock
Normally after grabbing the lock it has to be verified we got the right one
to begin with. However, if we are recursing, it must not change thus the
check can be avoided. In particular this avoids a lock read for non-recursing
case which found out the lock was changed.

While here avoid an irq trip of this happens.

Tested by:	pho (previous version)
2018-03-04 22:01:23 +00:00
ian
86158ea35f The year is stored in a single byte in sram, in binary format, as a count
of years since the century, so strip the century out when converting to or
from bcd_clocktime format (the conversion routines will infer century by
pivoting on 70).
2018-03-04 21:58:32 +00:00
mjg
8531f3bfc6 sx: don't do an atomic op in upgrade if it cananot succeed
The code already pays the cost of reading the lock to obtain the waiters
flag. Checking whether there is more than one reader is not a problem and
avoids dirtying the line.

This also fixes a small corner case: if waiters were to show up between
reading the flag and upgrading the lock, the operation would fail even
though it should not. No correctness change here though.
2018-03-04 21:41:05 +00:00
mjg
de7546e12b locks: fix a corner case in r327399
If there were exactly rowner_retries/asx_retries (by default: 10) transitions
between read and write state and the waiters still did not get the lock, the
next owner -> reader transition would result in the code correctly falling
back to turnstile/sleepq where it would incorrectly think it was waiting
for a writer and decide to leave turnstile/sleepq to loop back. From this
point it would take ts/sq trips until the lock gets released.

The bug sometimes manifested itself in stalls during -j 128 package builds.

Refactor the code to fix the bug, while here remove some of the gratituous
differences between rw and sx locks.
2018-03-04 21:38:30 +00:00
kib
3b2ac3ad1a Remove redundant test from r330410.
If the input slist is non-empty, counter cannot be zero after freeing.

Noted by:	mjg
MFC after:	2 weeks
2018-03-04 21:15:31 +00:00
ian
0f20c4831e Build iicbus/rtc8583 as a module. 2018-03-04 21:06:21 +00:00
ian
67c7105205 Convert to the new(ish) bcd_clocktime conversion functions, add calls to
the new debug output functions, and when setting the clock, propagate the
timespec nsecs to the 1/100ths register.
2018-03-04 21:04:30 +00:00
kib
d27cb27779 Unify bulk free operations in several pmaps.
Submitted by:	Yoshihiro Ota
Reviewed by:	markj
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D13485
2018-03-04 20:53:20 +00:00
hselasky
f54d4ecc08 Properly wrap the BUILD_BUG() function macro in the LinuxKPI.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-04 19:42:50 +00:00
ian
f0e8fddc3d Add calls to the new clock_dbgprint_xxx() functions. Also, stop applying
a local .5 second adjustment to the time, since that is now done by the
code in subr_rtc.c.
2018-03-04 19:32:52 +00:00
ian
49f5d22b5d Add calls to the new clock_dbgprint_xxx() functions. 2018-03-04 19:26:47 +00:00
ian
048c206fca Oops, fix a paste-o. 2018-03-04 19:25:54 +00:00
ian
777f75bb2b Add calls to the new clock_dbgprint_xxx() functions. 2018-03-04 19:23:48 +00:00
ian
50cedeec4f Add calls to the new clock_dbgprint_xxx() functions. 2018-03-04 19:20:11 +00:00
mjg
97c5138f1a lockmgr: start decomposing the main routine
The main routine takes 8 args, 3 of which are almost the same for most uses.
This in particular pushes it above the limit of 6 arguments passable through
registers on amd64 making it impossible to tail call.

This is a prerequisite for further cleanups.

Tested by:	pho
2018-03-04 19:12:54 +00:00
hselasky
151894a156 Stub kernel_param_lock() and kernel_param_unlock() in the LinuxKPI.
MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-03-04 19:10:30 +00:00
hselasky
2dab843a2e Implement wait_event_lock_irq() macro function in the LinuxKPI.
MFC after:	1 week
Requested by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
2018-03-04 19:07:10 +00:00
ian
dd2c024b5a Fix a paste-o: use the IICBUS version constants, not IICBB bitbang driver's. 2018-03-04 18:58:24 +00:00
hselasky
7e77289d46 Keep the old SLAB_DESTROY_BY_RCU macro definition around in the LinuxKPI
to avoid compilation breakage in external kernel modules.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-04 18:53:41 +00:00
hselasky
8e70a8ef19 Implement DEFINE_WAIT_FUNC() function macro and default_wake_function()
in the LinuxKPI.

MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-03-04 18:51:43 +00:00
hselasky
9baae572dc Implement pr_err_ratelimited() function macro in the LinuxKPI.
MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-03-04 18:27:50 +00:00
hselasky
c22224101f Implement __MODULE_STRING() function macro in the LinuxKPI.
MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-03-04 18:21:21 +00:00
hselasky
4e79e1a83b Implement BUILD_BUG() function macro in the LinuxKPI.
MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-03-04 18:19:44 +00:00
hselasky
00a71092fb Implement writel_relaxed() in the LinuxKPI.
MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-03-04 18:17:54 +00:00
hselasky
0945850a32 Define noinline and __maybe_unused macros in the LinuxKPI.
MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-03-04 18:13:31 +00:00
hselasky
3c7c22a9aa Implement for_each_clear_bit() function macro in the LinuxKPI.
MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-03-04 18:10:18 +00:00