Commit Graph

118666 Commits

Author SHA1 Message Date
Hans Petter Selasky
d384b1673f Extend sysctl description for hw.usb.disable_enumeration .
PR:		222505
Submitted by:	Julian H. Stacey <jhs@berklix.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-09-22 08:21:35 +00:00
Andriy Gapon
1c6ea90df5 MFV r323914: 8661 remove "zil-cw2" dtrace probe
illumos/illumos-gate@bd9d3f9046
bd9d3f9046

https://www.illumos.org/issues/8661
  The "zil-cw1" dtrace probe was previously removed in 8558, and the "zil-cw2"
  probe should have been removed in that patch as well. Unfortunately, the "zil-
  cw2" was not removed in 8558, so this bug is to track it's removal.

Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>

MFC after:	1 week
2017-09-22 08:21:14 +00:00
Hans Petter Selasky
40f53a7cdc Add support for 32-bit compatibility IOCTLs in the LinuxKPI.
Bump the FreeBSD version to force recompilation of external
kernel modules due to structure change.

PR:		222504
Submitted by:	Greg V <greg@unrelenting.technology>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-09-22 08:12:08 +00:00
Toomas Soome
c57460d994 libefi: define EISA PNP constants
Define EISA PNP constants and use them, also fix ID for 0x701
2017-09-22 07:44:36 +00:00
Toomas Soome
f43a98e6c2 libefi: efipart_hdinfo_add_filepath should check strtol result
Use errno for error checking.
2017-09-22 07:40:05 +00:00
Toomas Soome
abf054b4c6 libefi: efipart.c cstyle fix for efipart_print_common()
The else statement should have { }
2017-09-22 07:37:42 +00:00
Toomas Soome
5af584ffc1 libefi: efipart_strategy() should return ENXIO when there is no media
We should return ENXIO to indicate the situation with device present,
but no media.
2017-09-22 07:34:08 +00:00
Toomas Soome
b07d6aa2ae libefi: pdinfo_t pd_unit and pd_open should be unsigned
The device index, partition index and reference counter are all positive
numbers. However, since our internal partition number may be negative
to indicate GPT table, the compare expression need to take care when comparing
pdinfo_t and partition data.
2017-09-22 07:29:26 +00:00
Michael Tuexen
d28a3a393b Add missing locking. Found by Coverity while scanning the usrsctp
library.

MFC after:	1 week
2017-09-22 06:33:01 +00:00
Michael Tuexen
afb908dada Add missing socket lock.
MFC after:	1 week
2017-09-22 06:07:47 +00:00
Toomas Soome
832d45d219 efilib.h: typo in structure member description
The link should be replaced by list.
2017-09-22 02:58:47 +00:00
Toomas Soome
024cc69661 r323885 did miss efilib.h update
The efilib.h update was left out from r323885 by mistake.
2017-09-22 02:56:26 +00:00
Toomas Soome
cbc1b3de8f libefi: efi_devpath_match local len should be unsigned
DevicePathNodeLength() will always return unsigned value.
2017-09-22 02:53:01 +00:00
Warner Losh
78ed811e6c cam iosched: Bettar account IOPS for smoother performance
Prevent cam_iosched_iops_tick() from discarding 'unspent' ios unless
it's a new accounting interval.

Previously ios that weren't used between ticks were lost, as a result
the iops limiter could enforce a limit below the configured maximum.

Obtained from: ElectroBSD
Submitted by: Fabian Keil
PR: 221974
2017-09-22 02:36:36 +00:00
Warner Losh
f777123b83 cam iosched: Enforce iop limits below the quanta value
Previously the iops limiter would always allow at least
quanta ios per second as cam_iosched_iops_tick() never set
ios->l_value1 below 1.

Submitted by: Fabian Keil <fk@fabiankeil.de>
Obtained from: ElectroBSD
PR: 221974
2017-09-22 02:36:32 +00:00
John Baldwin
cc05c7d256 Support AEAD requests with non-GCM algorithms.
In particular, support chaining an AES cipher with an HMAC for a request
including AAD.  This permits submitting requests from userland to encrypt
objects like IPSec packets using these algorithms.

In the non-GCM case, the authentication crypto descriptor covers both the
AAD and the ciphertext.  The GCM case remains unchanged.  This matches
the requests created internally in IPSec.  For the non-GCM case, the
COP_F_CIPHER_FIRST is also supported since the ordering matters.

Note that while this can be used to simulate IPSec requests from userland,
this ioctl cannot currently be used to perform TLS requests using AES-CBC
and MAC-before-encrypt.

Reviewed by:	cem
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D11759
2017-09-22 00:34:46 +00:00
John Baldwin
2c907637bc Add a new COP_F_CIPHER_FIRST flag for struct crypt_op.
This requests that the cipher be performed before rather than after
the HMAC when both are specified for a single operation.

Reviewed by:	cem
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D11757
2017-09-22 00:21:58 +00:00
John Baldwin
95f076384f Place the AAD before the plaintext/ciphertext for CIOCRYPTAEAD.
Software crypto implementations don't care how the buffer is laid out,
but hardware implementations may assume that the AAD is always before
the plain/cipher text and that the hash/tag is immediately after the end
of the plain/cipher text.

In particular, this arrangement matches the layout of both IPSec packets
and TLS frames.  Linux's crypto framework also assumes this layout for
AEAD requests.

Reviewed by:	cem
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D11758
2017-09-22 00:15:54 +00:00
Stephen Hurd
bf227542f3 Fix undeclared identifier error introduced in r323879
It doesn't appear to be safe to use gtask->gt_name.

Reported by:	Mark Johnston, Jenkins
Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12448
2017-09-21 23:27:35 +00:00
Toomas Soome
cfe103a2ac libefi: efipart.c should use calloc()
The device specific *_add functions are using malloc() + memset,
should use calloc instead.
2017-09-21 23:22:18 +00:00
Toomas Soome
59fcc285f4 libefi: efi_devpath_match() should return bool
The current implementation of efi_devpath_match() is returning values 0 or 1,
so it should be updated to return bool.
2017-09-21 23:14:07 +00:00
John Baldwin
e1d15b892a Only handle _PC_MAX_CANON, _PC_MAX_INPUT, and _PC_VDISABLE for TTY devices.
Move handling of these three pathconf() variables out of vop_stdpathconf()
and into devfs_pathconf() as TTY devices can only be devfs files.  In
addition, only return settings for these three variables for devfs devices
whose device switch has the D_TTY flag set.

Discussed with:	bde, kib
Sponsored by:	Chelsio Communications
2017-09-21 23:05:32 +00:00
Mark Johnston
568aef2f6a Simplify i915_gem_wire_page() and avoid unneeded page-busying.
Reviewed by:	alc, kib
MFC after:	1 week
2017-09-21 22:15:45 +00:00
Stephen Hurd
326aacb0e3 Improved logging of gtaskqueue failues
Check the return code of intr_setaffinity() and log any errors
it returns. When a qid is not located, log an error before returning
failure.  Also, use __func__ rather than hardcoding the function name

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12436
2017-09-21 21:14:48 +00:00
Stephen Hurd
a0fcc37122 Fix M_GTASKQUEUE definition
Previously had the same short and long description as taskqueues.
This could cause problems with memguard(9) and vmstat -m which use
the short description as a unique identifier.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
MFC after:	3 days
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12438
2017-09-21 20:34:33 +00:00
Stephen Hurd
23e90483ec bnxt: Fix driver when attached to a VF
- Use HWRM_FUNC_VF_CFG instead of HWRM_FUNC_CFG on VFs
- Fix NPAR/VF detection
- Clean up flag definitions
- Don't allow WoL on VFs

Although the bnxt driver doesn't support SR-IOV so can create VFs yet,
the PF could be running Linux or ESCi with a VF passed through to a
FreeBSD guest.  This fixes the driver for that use case.

Submitted by:	Siva Kallam <siva.kallam@@broadcom.com>
Reviewed by:	shurd, sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Broadcom Limited
Differential Revision:	https://reviews.freebsd.org/D12410
2017-09-21 20:27:43 +00:00
Eugene Grosbein
10633c7e5a Unprotected modification of ng_iface(4) private data leads to kernel panic.
Fix a race with per-node read-mostly lock and refcounting for a hook.

PR:			220076
Tested by:		peixoto.cassiano
Approved by:		avg (mentor), mav (mentor)
MFC after:		1 week
Relnotes:		yes
Differential Revision:	https://reviews.freebsd.org/D12435
2017-09-21 20:16:10 +00:00
Alan Cox
4aef95b3f0 Modernize calls to vm_page_unwire(). As of r288122, vm_page_unwire()
accepts PQ_NONE as the specified queue and returns a Boolean indicating
whether the page's wire count transitioned to zero.  Use these features
in dev/drm2.

Reviewed by:	kib, markj
MFC after:	1 week
2017-09-21 15:32:41 +00:00
Toomas Soome
5370de88cc libefi: devicename.c cleanups
Remove duplicated free()+return statements, default unit to 0
and improve strtol error processing.
2017-09-21 15:30:20 +00:00
Kristof Provost
ed9de14d2f bridge: Set module version
This ensures that the loader will not load the module if it's also built in to
the kernel.

PR:		220860
Submitted by:	Eugene Grosbein <eugen@freebsd.org>
Reported by:	Marie Helene Kvello-Aune <marieheleneka@gmail.com>
2017-09-21 14:14:01 +00:00
Michael Tuexen
cdd2d7d4a5 Code cleanup, no functional change.
MFC after:	1 week
2017-09-21 11:56:31 +00:00
Mariusz Zaborski
ef0c8428f9 Plug memory leak in case when nvlist allocation succeeds, but nvpair
allocation fails.

Submitted by:	pjd@
MFC after:	1 month
Sponsored by:	Wheel Systems
2017-09-21 10:28:22 +00:00
Mariusz Zaborski
b6960f00fa Simplify the code by _not_ expecting success under 'fail'.
Submitted by:	pjd@ and oshogbo@
MFC after:	1 month
Sponsored by:	Wheel Systems
2017-09-21 10:18:02 +00:00
Mariusz Zaborski
56117a342f IMHO it is possible that failure will be treated as success because we don't
initialize nvp on every loop iteration and the code under 'fail'(!) label
detects success by checking of nvp != NULL.

Submitted by:	pjd@
MFC after:	1 month
Sponsored by:   Wheel Systems
2017-09-21 10:16:44 +00:00
Mariusz Zaborski
0a5f83e3fa Free 'value' only once we are done freeing all individual
Submitted by:   pjd@
MFC after:	1 month
Found by:       scan-build
Sponsored by:   Wheel Systems
2017-09-21 10:14:43 +00:00
Mariusz Zaborski
c696dd0687 Because nvp wasn't initialized on every loop iteration once we jumped
to 'fail' on error it was treated as success, because nvp!=NULL. Fix this
by not handling success under 'fail' label and by using separate variable
for parent nvpair.

If we succeeded to allocate nvlist, but failed to allocated nvpair we
would leak nvls[ii] on return. Destroy it when we cannot allocate nvpair,
before we goto fail.

Submitted by:	pjd@ and oshogbo@ (minor changes)
Found by:       scan-build
MFC after:	1 month
Sponsored by:	Wheel Systems
2017-09-21 10:10:42 +00:00
Mariusz Zaborski
a3c485d38d Make the code consistent by always using 'fail' label.
Submitted by:	pjd@ and oshogbo@
MFC after:	1 month
Sponsored by:	Wheel Systems
2017-09-21 10:06:00 +00:00
Mariusz Zaborski
1dacabe1ab The 'while (array != NULL) { }' suggests scan-build that array may be
initially NULL, which is not possible. Change the loop to
'do {} while (array != NULL)' to satisfy scan-build and assert that
array really cannot be NULL just in case.

Submitted by:	pjd@
Found by:	scan-build
MFC after:	1 month
Sponsored by:	Wheel Systems
2017-09-21 10:03:14 +00:00
Mariusz Zaborski
08016b3185 Remove redundant initialization. Don't use variable - just return the value.
Make scan-build happy by casting to 'void *' instead of 'void **'.

Submitted by:	pjd@
MFC after:	1 month
Found by:	scan-build and cppcheck
Sponsored by:	Wheel Systems
2017-09-21 10:00:16 +00:00
Michael Tuexen
53999485e0 Free the control structure after using is, not before.
Found by Coverity while scanning the usrsctp library.
MFC after:	1 week
2017-09-21 09:47:56 +00:00
Michael Tuexen
d0d8c7de19 No need to wakeup, since sctp_add_to_readq() does it.
MFC after:	1 week
2017-09-21 09:18:05 +00:00
Rick Macklem
6b43e06029 Add a few definitions for Flex File Layout for pNFS.
These definitions will be used by a future commit.
2017-09-21 00:41:12 +00:00
Jung-uk Kim
8c294161aa Remove an ancient comment about the existence of READ(16) and WRITE(16).
MFC after:	3 days
2017-09-21 00:03:59 +00:00
Andrey V. Elsukov
5df8171da3 Use in_localip() function instead of unlocked access to addresses hash
to determine that an address is our local.

PR:		220078
MFC after:	1 week
2017-09-20 22:35:28 +00:00
Andrey V. Elsukov
369bc48dc5 Do not acquire IPFW_WLOCK when a named object is created and destroyed.
Acquiring of IPFW_WLOCK is requried for cases when we are going to
change some data that can be accessed during processing of packets flow.
When we create new named object, there are not yet any rules, that
references it, thus holding IPFW_UH_WLOCK is enough to safely update
needed structures. When we destroy an object, we do this only when its
reference counter becomes zero. And it is safe to not acquire IPFW_WLOCK,
because noone references it. The another case is when we failed to finish
some action and thus we are doing rollback and destroying an object, in
this case it is still not referenced by rules and no need to acquire
IPFW_WLOCK.

This also fixes panic with INVARIANTS due to recursive IPFW_WLOCK acquiring.

MFC after:	1 week
Sponsored by:	Yandex LLC
2017-09-20 22:00:06 +00:00
Warner Losh
5fff95cc1d Fix queue depth for nda.
1/4 of the number of queues times queue entries is too limiting. It
works up to about 4k IOPS / 3.0GB/s for hardware that can do
4.4k/3.2GB/s with nvd. 3/4 works better, though it highlights issues
in the fairness of nda's choice of TRIM vs READ. That will be fixed
separately.
2017-09-20 21:42:25 +00:00
Michael Tuexen
2c62ba7377 Protect the address workqueue timer by a mutex.
MFC after:	1 week
2017-09-20 21:29:54 +00:00
Warner Losh
89d26636f3 cam iosched: Call cam_iosched_limiter_init() after ios->current is set to the default
Previously ios->current was set to 0 until the first
cam_iosched_cl_maybe_steer() call.

PR: 221954
Obtained from: ElectroBSD
Submitted by: Fabian Keil
Differential Revision: https://reviews.freebsd.org/D12349
2017-09-20 21:26:01 +00:00
Warner Losh
3028dd8dd5 cam iosched: Schedule cam_iosched_ticker() quanta times per second
Previously callout_reset() was called with a "ticks" value that was
off by one.  As a result cam_iosched_ticker() was called a bit too
frequently: On systems with hz=1000 a quanta value of 200 resulted in
~250 calls and a value of 100 in ~111 calls.

For the "queue_depth" and "bandwidth" limiters the difference doesn't
matter but the "iops" limiter depends on the scheduling to enforce the
correct maximum.

PR: 221956
Obtained from: ElectroBSD
Submitted by: Fabian Keil
Differential Revision: https://reviews.freebsd.org/D12350
2017-09-20 21:25:56 +00:00
Warner Losh
2d22619adc cam iosched: Add a handler for the quanta sysctl to enforce valid values
Invalid values can result in devision-by-zero panics or other
undefined behaviour so lets not allow them.

PR: 221957
Obtained from: ElectroBSD
Submitted by: Fabian Keil
Differential Revision: https://reviews.freebsd.org/D12351
2017-09-20 21:19:53 +00:00
Warner Losh
84c12dcdd0 cam iosched: Use the write queue for BIO_ZONE commands
Use the write queue for BIO_ZONE commands so they can't get executed
ahead of writes that were sent after them. More generally, since they
introduce strong ordering into the list, they need to go to the write
queue (which is the only queue that BIO_ORDERED is honored for at the
moment). In fact, fix mismatch between queueing and dequeueing code by
changing this to queue all non-reads (and non-trims) to the write
queue.

As a side effect this prevents the kernel message:
kernel: Found bio_cmd = 0x9
which cam_iosched_next_bio() emits when finding commands
other than BIO_READ in the read queue.

PR: 221973
Obtained from: ElectroBSD
Submitted by: Fabian Keil
Differential Revision: https://reviews.freebsd.org/D12353
2017-09-20 21:13:20 +00:00
Stephen Hurd
d0d0ad0ae2 Fix iflib netmap RX
RXQ setup for netmap was broken because netmap_rxq_init was getting called
before IFDI_INIT - thus we ended up with ring tail pointer being reset to zero.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12140
2017-09-20 20:40:49 +00:00
David C Somayajulu
203f9d1828 1. ql_hw.c:
In ql_hw_send() return EINVAL when TSO framelength exceeds max
	supported length by HW.(davidcs)
2. ql_os.c:
	In qla_send() call bus_dmamap_unload before freeing mbuf or
	recreating dmmamap.(davidcs)
	In qla_fp_taskqueue() Add additional checks for IFF_DRV_RUNNING
	Fix qla_clear_tx_buf() call bus_dmamap_sync() before freeing
	mbuf.

Submitted by:David.Bachu@netapp.com
MFC after:5 days
2017-09-20 20:07:45 +00:00
Conrad Meyer
d616681cec aesni(4): Fix another trivial typo (aensi -> aesni)
Sponsored by:	Dell EMC Isilon
2017-09-20 18:31:36 +00:00
Conrad Meyer
194446f9b7 x86: Decode AMD "Extended Feature Extensions ID EBX" bits
In particular, this determines CPU support for the CLZERO instruction.

(No, I am not making this name up.)

Sponsored by:	Dell EMC Isilon
2017-09-20 18:30:37 +00:00
Conrad Meyer
81326306dd aesni(4): Fix trivial typo (AQUIRE -> ACQUIRE)
Sponsored by:	Dell EMC Isilon
2017-09-20 17:53:25 +00:00
Alan Somers
cd037f075c MFV r323789: 8473 scrub does not detect errors on active spares
illumos/illumos-gate@554675eee7
554675eee7

https://www.illumos.org/issues/8473
Scrubbing is supposed to detect and repair all errors in the pool. However,
it wrongly ignores active spare devices. The problem can easily be
reproduced in OpenZFS at git rev 0ef125d with these commands:

truncate -s 64m /tmp/a /tmp/b /tmp/c
sudo zpool create testpool mirror /tmp/a /tmp/b spare /tmp/c
sudo zpool replace testpool /tmp/a /tmp/c
/bin/dd if=/dev/zero bs=1024k count=63 oseek=1 conv=notrunc of=/tmp/c
sync
sudo zpool scrub testpool
zpool status testpool # Will show 0 errors, which is wrong
sudo zpool offline testpool /tmp/a
sudo zpool scrub testpool
zpool status testpool # Will show errors on /tmp/c,
		      # which should've already been fixed

FreeBSD head is partially affected: the first scrub will detect some errors, but the second scrub will detect more.

Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>

MFC after:	1 week
Sponsored by:	Spectra Logic Corp
2017-09-20 16:31:00 +00:00
Andriy Gapon
aacd0b4bb2 add vfs_zfs.abd_chunk_size tunable
It is reported that the default value of 4KB results in a substantial
memory use overhead (at least, on some configurations).  Using 1KB seems
to reduce the overhead significantly.

PR:		222377
Reported by:	Sean Chittenden <sean@chittenden.org>
MFC after:	1 week
2017-09-20 08:36:31 +00:00
Andriy Gapon
3d5487d981 fix memory leak in g_bio zone introduced in r320452, another ABD fallout
I overlooked the fact that that ZIO_IOCTL_PIPELINE does not include
ZIO_STAGE_VDEV_IO_DONE stage.  We do allocate a struct bio for an ioctl
zio (a disk cache flush), but we never freed it.

This change splits bio handling into two groups, one for normal
read/write i/o that passes data around and, thus, needs the abd data
tranform; the other group is for "data-less" i/o such as trim and cache
flush.

PR:		222288
Reported by:	Dan Nelson <dnelson@allantgroup.com>
Tested by:	Borja Marcos <borjam@sarenet.es>
MFC after:	10 days
2017-09-20 08:27:21 +00:00
Andriy Gapon
8c9377cde7 MFV r323792: 8602 remove unused "dp_early_sync_tasks" field from "dsl_pool" structure
illumos/illumos-gate@2bcb545854
2bcb545854

https://www.illumos.org/issues/8602
  When I landed the fix for 8558, I incorrectly added the "dp_early_sync_tasks"
  field to the "dsl_pool" structure. This field is used in DelphixOS, but not in
  illumos. It was incorrectly pulled into illumos, so this bug is to remove it
  from the structure.

Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>

MFC after:	1 week
2017-09-20 07:26:52 +00:00
Alan Cox
e9bfbb02c5 In r288122, we changed vm_page_unwire() so that it returns a Boolean
indicating whether the page's wire count transitioned to zero.  Use that
return value in zbuf_page_free() rather than checking the wire count.

MFC after:	1 week
2017-09-20 04:59:52 +00:00
Alan Cox
2582d7a969 Sync with amd64/arm/arm64/i386/mips pmap change r288256:
Exploit r288122 to address a cosmetic issue.  Since PV chunk pages don't
belong to a vm object, they can't be paged out.  Since they can't be paged
out, they are never enqueued in a paging queue.  Nonetheless, passing
PQ_INACTIVE to vm_page_unwire() creates the appearance that these pages
are being enqueued in the inactive queue.  As of r288122, we can avoid
this false impression by passing PQ_NONE.

MFC after:	1 week
2017-09-20 04:19:49 +00:00
Olivier Houchard
4583315a06 Define CPU_XSCALE_CORE3 when relevant.
It was lost when cpuconf.h was deobirted.
2017-09-19 23:41:55 +00:00
Rick Macklem
0f29b8292d Make the nfsrpc_layoutget() function a static.
Make the NFSv4 pNFS client function nfsrpc_layoutget() a static, since it
is only used in sys/fs/nfsclient/nfs_clrpcops.c.
This prepares the code for future patches that add Flex File layout
support.
2017-09-19 23:28:22 +00:00
David C Somayajulu
1faeac0fd5 Add sysctl "enable_minidump" to turn on/off automatic minidump retrieval
MFC after:5 days
2017-09-19 23:26:27 +00:00
David C Somayajulu
9bc6b5ac37 Update minidump template for version 5.4.66
MFC after:5 days
2017-09-19 22:17:30 +00:00
Rick Macklem
2742a21091 Add a new function called nfsm_uiombuflist(), similar to nfsm_uiombuf().
This patch adds a new function called nfsm_uiombuflist(), which is
similar to nfsm_uiombuf(), but doesn't not use the fields in
struct nfsrv_descript. This new function will be used by the pNFS client
for writing to mirrors using Flex Files layout.
The function is not yet called anywhere.
Also, get rid of #ifndef APPLE, which is ancient cruft left over from
the Mac OSX port of the NFSv4 client.
2017-09-19 21:31:36 +00:00
Rick Macklem
b0932afacc Simplify nfsrpc_layoutreturn() args.
Simplify nfsrpc_layoutreturn() args. in preparation for the addition
of Flex File layout support, since File layout uses a 0 length field.
Flex Files does use a longer field, but that will be added in a
subsequent commit.
2017-09-19 20:45:25 +00:00
Josh Paetzel
c77037f16f Fix indentation for r323068
PR:	220170
Reported by:	lidl
MFC after:	3 days
Pointyhat to:	jpaetzel
2017-09-19 20:40:05 +00:00
Olivier Houchard
7ce16cd956 i81342 is little endian, not big endian. 2017-09-19 20:33:22 +00:00
Michael Tuexen
3ec509bcd3 Fix a warning.
MFC after:	1 week
2017-09-19 20:24:13 +00:00
Rick Macklem
ab118d04be Simplify nfsrpc_layoutcommit() args.
Simplify nfsrpc_layoutcommit() args. in preparation for the addition
of Flex File layout support, since it also uses a 0 length field.
2017-09-19 20:18:41 +00:00
Michael Tuexen
564a95f485 Avoid an overflow when computing the staleness.
This issue was found by running libfuzz on the userland stack.

MFC after:	1 week
2017-09-19 20:09:58 +00:00
Konstantin Belousov
3cabd93e26 Do not do torn writes to active LDTs.
Care must be taken when updating the active LDT, since parallel
threads might try to load a segment descriptor which is currently
updated. Since the results are undefined, this cannot be ignored by
claiming to be an application race.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D12413
2017-09-19 17:57:04 +00:00
Konstantin Belousov
9770475ce7 Do not vrele() covered vnode under the mp mutex.
If vrele() changes the hold count to zero, it needs to acquire the
vnode lock.

Sponsored by:	The FreeBSD Foundation
Discussed with:	avg
X-MFC with:	r323578
2017-09-19 16:49:45 +00:00
Konstantin Belousov
5bf949377e For unlinked files, do not msync(2) or sync on the vnode deactivation.
One consequence of the patch is that msyncing unlinked file mappings
no longer reduces the amount of the dirty memory in the system, but I
do not think that there are users of msync(2) that utilize it for such
side-effect.

Reported and tested by:	tjil
PR:	222356
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D12411
2017-09-19 16:46:37 +00:00
Michael Tuexen
ad608f06ed Remove a no longer used variable.
Reported by:	Felix Weinrank
MFC after:	1 week
2017-09-19 15:00:19 +00:00
Sepherosa Ziehau
d74831eea2 hyperv/hn: Incease max supported MTU
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12365
2017-09-19 06:46:00 +00:00
Sepherosa Ziehau
eb2fe04416 hyperv/hn: Fix MTU setting
- Add size of an ethernet header to the value configured to NVS.  This
  does not seem to have any effects if MTU is 1500, but fix hypervisor
  side's setting if MTU > 1500.
- Override the MTU setting according to the view from the hypervisor
  side.

MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12352
2017-09-19 06:38:57 +00:00
Sepherosa Ziehau
642ec226bb hyperv/hn: Apply VF's RSS setting
Since in Azure SYN and SYN|ACK go through the synthetic parts while the
rest of the same TCP flow goes through the VF, apply VF's RSS settings
to synthetic parts to have a consistent hash value/type for the same TCP
flow.

MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12333
2017-09-19 06:29:38 +00:00
John Baldwin
4fedff3ca1 Enable support for lookaside crypto operations by default.
This permits ccr(4) to be used with the default firmware configuration
file.

Discussed with:	np
Sponsored by:	Chelsio Communications
2017-09-18 23:50:34 +00:00
John Baldwin
4f45713ae2 Add UFS_LINK_MAX for the UFS-specific limit on link counts.
ino64 expanded nlink_t to 64 bits, but the on-disk format for UFS is still
limited to 16 bits.  This is a nop currently but will matter if LINK_MAX is
increased in the future.

Reviewed by:	kib
Sponsored by:	Chelsio Communications
2017-09-18 23:30:39 +00:00
Konstantin Belousov
5efe338f3d Fix handling of the segment registers on i386.
Suppose that userspace is executing with the non-standard segment
descriptors.  Then, until exception or interrupt handler executed
SET_KERNEL_SEGS, kernel is still executing with user %ds, %es and %fs.
If an interrupt occurs in this window, the interrupt handler is
executed unsafely, relying on usability of the usermode registers.  If
the interrupt results in the context switch on return, the
contamination of the kernel state spreads to the thread we switched
to.  As result, kernel data accesses might fault or, if only the base
is changed, completely messed up.

More, if the user segment was allocated in LDT, another thread might
mark the descriptor as invalid before doreti code tried to reload
them.  In this case kernel panics.

The issue exists for all exception entry points which use trap gate,
and thus do not automatically disable interrupts on entry, and for
lcall_handler.

Fix is two-fold: first, we need to disable interrupts for all kernel
entries, changing the IDT descriptor types from trap gate to interrupt
gate.  Interrupts are re-enabled not earlier than the kernel segments
are loaded into the segment registers.  Second, we only load the
segment registers from the trap frame when returning to usermode.  For
the later, all interrupt return paths must happen through the doreti
common code.

There is no way to disable interrupts on call gate, which is the
supposed mode of servicing for lcall $7,$0 syscalls.  Change the LDT
descriptor 0 into a code segment type and point it to the userspace
trampoline which redirects the syscall to int $0x80.

All the measures make the segment register handling similar to that of
amd64.  We do not apply amd64 optimizations of not reloading segment
registers on return from the syscall.

Reported by:	Maxime Villard <max@m00nbsd.net>
Tested by:	pho (the non-lcall part)
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D12402
2017-09-18 20:22:42 +00:00
Ilya Bakulin
ffa317051b Add kern.features flag for MMCCAM
kern.features.mmcam will be present and equal to 1 if the kernel has been
compiled with option MMCCAM.
This will help sdio-related userland tools to fail-fast if running on the kernel
without MMCCAM enabled.

Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D12386
2017-09-18 20:17:08 +00:00
Cy Schubert
70394ab378 Don't use an apostrophe in a possesive pronoun.
MFC after:	3 days
2017-09-18 19:16:41 +00:00
Ryan Libby
2e6418c0c5 linsysfs: quiet gcc -Wformat after r323692
Reviewed by:	cem
Sponsored by:	Dell EMC Isilon
2017-09-18 19:09:40 +00:00
Scott Long
1ab9094c90 Hide a normal probe warning message under bootverbose, similar to atkbdc
Sponsored by:	Netflix
2017-09-18 18:42:28 +00:00
Conrad Meyer
4d367f2501 linsysfs(5): Fix two unrelated issues
1. Swap the order of device_get_ivars with device_get_devclass and devclass
   name validation.  This bug was introduced in r323692.

2. Error check device_get_children and free the returned list.  This bug was
   introduced in the original linsysfs commit.

Reported by:	Oleg V. Nauman <oleg AT theweb.org.ua>, hselasky (1); hselasky (2)
Reviewed by:	hselasky
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12407
2017-09-18 17:14:13 +00:00
Toomas Soome
3f247eab7a loader: biosmem allocate heap just below 4GB
The current biosmem code is walking bios smap entries and looking for smap
entry just below 4GB line, if there is such entry, its base and size is set
for heap base and size. Instead of entry base, we should use last HEAP_MIN
(currently 64MB) bytes just below 4GB, to make maximum space for kernel and
modules.

The problem was revealed on ASUS B350M-A system board, an AMD Ryzen 3 1200 CPU

memory map:

SMAP type=01 base=0000000000000000 len=000000000009d400 attr=01
SMAP type=02 base=000000000009d400 len=0000000000002c00 attr=01
SMAP type=02 base=00000000000e0000 len=0000000000020000 attr=01
SMAP type=01 base=0000000000100000 len=0000000009c00000 attr=01
SMAP type=02 base=0000000009d00000 len=0000000000300000 attr=01
SMAP type=01 base=000000000a000000 len=00000000be69b000 attr=01
SMAP type=03 base=00000000c869b000 len=0000000000016000 attr=01
SMAP type=01 base=00000000c86b1000 len=00000000124e7000 attr=01
SMAP type=02 base=00000000dab98000 len=0000000000138000 attr=01
SMAP type=03 base=00000000dacd0000 len=0000000000008000 attr=01
SMAP type=01 base=00000000dacd8000 len=0000000000100000 attr=01
SMAP type=04 base=00000000dadd8000 len=00000000003b3000 attr=01
SMAP type=02 base=00000000db18b000 len=0000000000d42000 attr=01
SMAP type=01 base=00000000dbecd000 len=0000000002133000 attr=01
SMAP type=01 base=0000000100000000 len=000000011f380000 attr=01
SMAP type=02 base=00000000de000000 len=0000000002000000 attr=01
SMAP type=02 base=00000000f8000000 len=0000000004000000 attr=01
SMAP type=02 base=00000000fdf00000 len=0000000000100000 attr=01
SMAP type=02 base=00000000fea00000 len=0000000000010000 attr=01
SMAP type=02 base=00000000feb80000 len=0000000000082000 attr=01
SMAP type=02 base=00000000fec10000 len=0000000000001000 attr=01
SMAP type=02 base=00000000fec30000 len=0000000000001000 attr=01
SMAP type=02 base=00000000fed00000 len=0000000000001000 attr=01
SMAP type=02 base=00000000fed40000 len=0000000000005000 attr=01
SMAP type=02 base=00000000fed80000 len=0000000000010000 attr=01
SMAP type=02 base=00000000fedc2000 len=000000000000e000 attr=01
SMAP type=02 base=00000000fedd4000 len=0000000000002000 attr=01
SMAP type=02 base=00000000fee00000 len=0000000000100000 attr=01
SMAP type=02 base=00000000ff000000 len=0000000001000000 attr=01

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D12368
2017-09-18 15:17:01 +00:00
Hans Petter Selasky
e35e2f3ebe Bump the __FreeBSD_version after recent LinuxKPI changes.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-09-18 13:39:51 +00:00
Hans Petter Selasky
62bae5d421 The LinuxKPI atomics do not have acquire nor release semantics unless
specified. Fix code to use READ_ONCE() and WRITE_ONCE() where appropriate.

Suggested by:		kib @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-09-18 13:37:14 +00:00
Hans Petter Selasky
1f7c7e1bec Only wire pages in the LinuxKPI instead of holding and wiring them.
This prevents the page daemon from regularly scanning the held pages.

Suggested by:		kib @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-09-18 13:23:59 +00:00
Hans Petter Selasky
c05238a681 Add support for shared memory functions to the LinuxKPI.
Obtained from:		kmacy @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-09-18 13:17:23 +00:00
Conrad Meyer
2d347b2ef8 linsysfs(5): Add support for recent libdrm
Expose more information about PCI devices (and GPUs in particular) via
linsysfs to libdrm.

This allows unmodified modern 64-bit Linux libdrm to work, which allows
modern Linux Mesa to work.  The submitter reports that he tested the change
with an Ubuntu 16.04 chroot + amdgpu from graphics/drm-next-kmod.

PR:		222375
Submitted by:	Greg V <greg AT unrelenting.technology>
2017-09-17 23:40:16 +00:00
Ian Lepore
d18915fadb Give icee(4) a detach() method so it can be used as a module. Add a
module makefile for it.
2017-09-17 22:58:13 +00:00
Conrad Meyer
c50df68a08 MCA: Expand AMD Thresholding support to cover all banks
When it was added in r314636, AMD Thresholding was hardcoded to only
bank 4 (Northbridge) for some reason.  However, even on family 10h the
MCAx_MISC register Valid/Present bits determine whether thresholding is
supported on that bank.

Expand thresholding support to monitor all monitorable banks.  This
simplifies some of the logic and makes it more consistent with our Intel
CMCI support.

Reviewed by:	markj (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12321
2017-09-17 22:58:13 +00:00
Rick Macklem
ccf038250a Fix bogus FREAD with NFSV4OPEN_ACCESSREAD. No functional change.
The code in nfscl_doflayoutio() bogusly used FREAD instead of
NFSV4OPEN_ACCESSREAD. Since both happen to be defined as "1", this
worked and the patch doesn't result in a functional change.
Found by inspection during development of Flex File Layout support.

MFC after:	2 weeks
2017-09-17 22:18:01 +00:00
Justin Hibbits
d11e86549e Don't use a non-zero argument for __builtin_frame_address
__builtin_frame_address with a non-zero argument is unsafe and rejected by
newer gcc.  Since it doesn't seem to impact the stacktrace, don't bother
with gymnastics to unwind to a different frame for starting.

PR:		kern/220118
MFC after:	2 weeks
2017-09-17 20:07:20 +00:00
Justin Hibbits
6b7530563b Print the correct bitmask for the running Book-E CPU
All the Book-E world is no longer e500v{1,2}.  e500mc the 64-bit derivatives do
not use the DOZE/NAP bits with MSR[WE], instead using the `wait' instruction to
wait for interrupts, and SoC plane controls (via CCSR) for power management.

MFC after:	1 week
2017-09-17 19:40:17 +00:00
Mark Johnston
b999e9c813 Implement mmu_page_init for AIM platforms.
As of r323290 we cannot rely on the vm_page array being
zero-initialized.

Reported and tested by:	andreast
MFC after:	1 week
2017-09-17 15:40:12 +00:00
Michael Tuexen
72e23aba22 Fix an accounting bug and use sctp_timer_start to start a timer.
MFC after:	1 week
2017-09-17 09:27:27 +00:00
Michael Tuexen
fe40f49bb3 Remove code not used on any platform currently supported.
MFC after:	1 week
2017-09-16 21:26:06 +00:00
Alan Cox
ec371b57e8 Modify blst_leaf_alloc to take only the cursor argument.
Modify blst_leaf_alloc to find allocations that cross the boundary between
one leaf node and the next when those two leaves descend from the same
meta node.

Update the hint field for leaves so that it represents a bound on how
large an allocation can begin in that leaf, where it currently represents
a bound on how large an allocation can be found within the boundaries of
the leaf.

The first phase of blst_leaf_alloc currently shrinks sequences of
consecutive 1-bits in mask until each has been shrunken by count-1 bits,
so that any bits remaining show where an allocation can begin, or until
all the bits have disappeared, in which case the allocation fails. This
change amends that so that the high-order bit is copied, as if, when the
last block was free, it was followed by an endless stream of free
blocks. It also amends the early stopping condition, so that the shrinking
of 1-sequences stops early when there are none, or there is only one
unbounded one remaining.

The search for the first set bit is unchanged, and the code path
thereafter is mostly unchanged unless the first set bit is in a position
that makes some of those copied sign bits matter. In that case, we look
for a next leaf, and at what blocks it can provide, to see if a
cross-boundary allocation is possible.

The hint is updated on a successful allocation that clears the last bit,
but it not updated on a failed allocation that leaves the last bit
set. So, as long as the last block is free, the hint value for the leaf is
large. As long as the last block is free, and there's a next leaf, a large
allocation can begin here, perhaps. A stricter rule than this would mean
that allocations and frees in one leaf could require hint updates to the
preceding leaf, and this change seeks to leave the freeing code
unmodified.

Define BLIST_BMAP_MASK, and use it for bit masking in blst_leaf_free and
blist_leaf_fill, as well as in blst_leaf_alloc.

Correct a panic message in blst_leaf_free.

Submitted by:	Doug Moore <dougm@rice.edu>
Reviewed by:	markj (an earlier version)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11819
2017-09-16 18:12:15 +00:00
Ian Lepore
8dc710184a Add a missing header file to SRCS to fix out-of-kernel builds.
PR:		222354
Submitted by:	eugen@
Pointy hat:	ian@
2017-09-16 16:09:05 +00:00
Emmanuel Vadot
36dcd6a499 Allwinner usb phy: Rework resource allocation
The usbphy node for allwinner have two kind of resources, one for the
phy_ctrl and one per phy. Instead of blindy allocating resources, alloc
the phy_ctrl and pmu ones separately.
Also add a configuration struct for all different phy that hold the difference
between them (number of phys, unknow needed register write etc ...).

While here remove A83T code as upstream and FreeBSD dts don't have
nodes for USB.

This (plus 323640) re-enable OHCI on Pine64 on the bottom USB port.
The top USB port is routed to the OHCI0/EHCI0 which is by default in OTG mode.
While the phy code can handle the re-route to standard OHCI/EHCI we still need
a driver for musb to probe and configure it in host mode.

EHCI is still buggy on Pine64 (hang the board) so do not enable it for now.

Tested On:	Bananapi (A20), BananapiM2 (A31S), OrangePi One (H3) Pine64 (A64)
2017-09-16 15:58:20 +00:00
Emmanuel Vadot
489cba7d58 A64 CCUNG: Correct gate and reset for OHCI0/1
Reported by:	jmcneill
Pointy Hat:	manu
2017-09-16 15:50:31 +00:00
Emmanuel Vadot
082f09757c Allwinner: a10_gpio Fix panic on multiple lock
r323392 introduce gpio_pin_get/gpio_pin_set for a10_gpio driver.
When called via gpio method they must aquire the device lock while
when they are called via gpio_pin_configure the lock is already aquire.

Introduce a10_gpio_pin_{s,g}et_locked and call them in pin_gpio_configure
instead.

Tested On: BananaPi (A20)

Reported by:	Richard Puga richard@puga.net
2017-09-16 14:08:20 +00:00
Stephen Hurd
ab2e3f7958 Revert r323516 (iflib rollup)
This was really too big of a commit even if everything worked, but there
are multiple new issues introduced in the one huge commit, so it's not
worth keeping this until it's fixed.

I'll work on splitting this up into logical chunks and introduce them one
at a time over the next week or two.

Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
2017-09-16 02:41:38 +00:00
John Baldwin
91a65e2f6b Avoid reusing the wrong buffer for a DDP AIO request.
To optimize the case of ping-ponging between two buffers, the DDP code
caches the last two buffers used keeping the pages wired and page pods
stored in the NIC's RAM.  If a new aio_read() request uses one of the
same buffers, then the work of holding pages, etc. can be avoided.
However, the starting virtual address of an aio buffer was not saved,
only the page count, length, and initial page offset.  Thus, an
aio_read() request could match a different buffer in the address
space.  (Earlier during development vm_fault_hold_quick_pages() was
always called and the vm_page_t values were compared, but that was
eventually removed without being adequately replaced.)  Fix by storing
the starting virtual address and comparing that (along with other
fields) to determine if a buffer can be reused.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-09-15 22:40:57 +00:00
Scott Long
7eed4c1853 Fix line wrap issues.
Sponsored by:	Netflix
2017-09-15 20:58:52 +00:00
Warner Losh
851063e16a Allow multiple TRIMs to be done for nda
Don't call cam_iosched_trim_done or cam_iosched_submit_trim for nda
since its hardware can handle almost an arbitrary number of TRIMs and
we don't have to be careful to only ever do one.

Sponsored by: Netflix
2017-09-15 20:16:06 +00:00
Warner Losh
55c770b40a Update comments on what the CAM_IOSCHED_FLAG_TRIM_ACTIVE means.
It's intended only for those situations where the periph driver
ones to limit the number of trims active to one and only one.
Also update comments on associated functions.

Sponsored by: Netflix
2017-09-15 20:15:55 +00:00
Landon J. Fuller
011e84e0a7 Add MIPS32/64 Rev2 CP0 intctl register definitions.
Approved by:	adrian (mentor)
Differential Revision:	https://reviews.freebsd.org/D12300
2017-09-15 19:56:21 +00:00
Ilya Bakulin
02c474b481 Miscellaneous fixes and improvements to MMCCAM stack
* Demote the level of several debug messages to CAM_DEBUG_TRACE
 * Add detection for SDHC cards that can do 1.8V. No voltage switch sequence
   is issued yet;
 * Don't create a separate LUN for each SDIO function. We need just one to make
   pass(4) attach;
 * Remove obsolete mmc_sdio* files. SDIO functionality will be moved into the
   separate device that will manage a new sdio(4) bus;
 * Terminate probing if got no reply to CMD0;
 * Make bcm2835 SDHCI host controller driver compile with 'option MMCCAM'.

Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D12109
2017-09-15 19:47:44 +00:00
Konstantin Belousov
bba52ecadd Batch freeing of the pages in vm_object_page_remove() under the same
free queue mutex lock owning session, same as it was done for the
object termination in r323561.

Reported and tested by:	mjg
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-15 16:07:09 +00:00
Mark Johnston
e04223bf94 Include _bitset.h to get BITSET_DEFINE, used to define struct slabbits.
MFC after:	1 week
2017-09-15 14:59:35 +00:00
Andriy Gapon
7103ac8ad6 gmirror: treat ENXIO as disk disconnect, not media error
In theory, all data access errors mean that a member is out of sync
at most.  But they were treated as more serious errors to avoid the
situation where a flaky disk gets repeatedly disconnected, re-synchronized,
reconnected and then disconnected again.

ENXIO is a special error that means that the member disk disappeared,
so it should get the same handling as the GEOM orphaning event.
There is a better chance that when the disk is reconnected, it will be
a good member again.

When ENXIO happens on a read we use the exisiting G_MIRROR_BUMP_SYNCID
mechanism which means that the mirror's syncid is increased as soon
as there is a write to the mirror.  That's because no data has got out
of sync yet, but the problematic memeber is disconnected, so the future
write will make it stale.

When ENXIO happens on a write we use a new G_MIRROR_BUMP_SYNCID_NOW
mechanism which means that we update the mirror metadata as soon as
possible because the problematic memeber is already behind.

Reviewed by:	markj, imp
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D9463
2017-09-15 13:57:08 +00:00
Andrew Turner
ca289945b2 Add the ARMv8.3 ID register fields. These were found in the A-Profile
exploration tools documentation:
https://developer.arm.com/products/architecture/a-profile/exploration-tools

Sponsored by:	DARPA, AFRL
2017-09-15 12:57:34 +00:00
John Baldwin
2bd1e600e3 Fix some incorrect sysctl pointers for some error stats.
The bad_session, sglist_error, and process_error sysctl nodes were
returning the value of the pad_error node instead of the appropriate
error counters.

Sponsored by:	Chelsio Communications
2017-09-14 21:06:08 +00:00
Gleb Smirnoff
584ab65a75 Fix locking in soisconnected().
When a newborn socket moves from incomplete queue to complete
one, we need to obtain the listening socket lock after the child,
which is a wrong order.  The old code did that in potentially
endless loop of mtx_trylock().  The new one does only one attempt
of mtx_trylock(), and in case of failure references listening
socket, unlocks child and locks everything in right order.  In
case if listening socket shuts down during that, just bail out.

Reported & tested by:	Jason Eggleston <jeggleston llnw.com>
Reported & tested by:	Jason Wolfe <jason llnw.com>
2017-09-14 18:05:54 +00:00
Andrew Turner
bcf2b954c3 Add support for handling undefined instructions in userspace and the
kernel. We can register callbacks to perform the required operation on the
saved registers before returning.

This is initially used to work around a bug in old versions of QEMU that
trigger such an exception when reading from an ID register when it should
load z zero value.

I expect this could be used with other exception types, e.g. to emulate
special register access from userland.

Sponsored by:	DARPA, AFRL
2017-09-14 17:29:51 +00:00
Toomas Soome
8b448cf1d6 loader: biosmem.c cstyle cleanup
No functional changes, just cleanup.

Reviewed by:	allanjude, imp
Differential Revision:	https://reviews.freebsd.org/D12370
2017-09-14 16:42:29 +00:00
Ed Maste
34cb0eb2ed octeon sdk: initialize variable to quiet Clang warning
Clang complains "variable 'dummy' is uninitialized when used here".

Reported by:	Clang
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2017-09-14 16:41:22 +00:00
Conrad Meyer
a64bf59c49 Add PNP metadata to a few drivers
An eventual devd(8) or other component should be able to scan buses and
automatically load drivers that match device ids described in this metadata.

Reviewed by:	imp
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12364
2017-09-14 15:34:45 +00:00
John Baldwin
8df419f2df Add AT_EHDRFLAGS and AT_HWCAP on amd64.
x86 has two separate (but identical) list of AT_* constants and the
earlier commit to add AT_HWCAP only updated the i386 list.
2017-09-14 15:34:29 +00:00
John Baldwin
27efb0a242 Add a NT_ARM_VFP ELF core note to hold VFP registers for each thread.
The core note matches the format and layout of NT_ARM_VFP on Linux.
Debuggers use the AT_HWCAP flags to determine how many VFP registers
are actually used and their format.

Reviewed by:	mmel (earlier version w/o gcore)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D12293
2017-09-14 15:07:48 +00:00
John Baldwin
ca2b367f5c Export get/set_vfpcontext from machdep.c.
Should have been part of the previous commit to add ptrace operations
for VFP registers.

MFC after:	1 month
2017-09-14 15:06:29 +00:00
John Baldwin
197e3ae5fc Add ptrace operations to fetch and store VFP registers.
Reviewed by:	mmel, kib
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D12294
2017-09-14 15:03:43 +00:00
John Baldwin
21994598e4 Only mess with VFP state on the CPU for curthread for get/set_vfpcontext.
Future changes will use these functions to fetch and store VFP state for
threads other than curthread.

Reviewed by:	andrew, stevek, Michal Meloun <meloun-miracle-cz>
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D12292
2017-09-14 14:36:56 +00:00
John Baldwin
19e1bd0104 Add AT_HWCAP flags for VFP settings for FreeBSD/arm.
These flags match the meaning and value of flags in Linux, though
Linux has many more flags.

Reviewed by:	stevek, Michal Meloun <meloun-miracle-cz> (earlier version)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D12291
2017-09-14 14:30:43 +00:00
John Baldwin
c2f37b9245 Add AT_HWCAP and AT_EHDRFLAGS on all platforms.
A new 'u_long *sv_hwcap' field is added to 'struct sysentvec'.  A
process ABI can set this field to point to a value holding a mask of
architecture-specific CPU feature flags.  If an ABI does not wish to
supply AT_HWCAP to processes the field can be left as NULL.

The support code for AT_EHDRFLAGS was already present on all systems,
just the #define was not present.  This is a step towards unifying the
AT_* constants across platforms.

Reviewed by:	kib
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D12290
2017-09-14 14:26:55 +00:00
Andriy Gapon
cbc785c293 dounmount: do not release the mount point's reference on the covered vnode
As long as mnt_ref is not zero there can be a consumer that might try
to access mnt_vnodecovered.  For this reason the covered vnode must not
be freed until mnt_ref goes to zero.
So, move the release of the covered vnode to vfs_mount_destroy.

Reviewed by:	kib
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D12329
2017-09-14 08:47:06 +00:00
Alexander Motin
83feae78cc Add second entry to LUT on a link side in B2B mode.
Each of two entries on a virtual side should have its counterpart on a
peer's link side.

MFC after:	1 week
2017-09-14 04:51:17 +00:00
Ryan Libby
4e51f184e6 gcc builds: reenable -Wstrict-overflow for kern.mk
Reviewed by:	emaste
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12284
2017-09-14 03:42:41 +00:00
Gleb Smirnoff
d37aa3ccce Use soref() in sendfile(2) instead fhold() to reference a socket.
The problem is that fdrop() requires syscall context, as it may
enter sleep in some cases.  The reason to use it in the original
non-blocking sendfile implementation, was to avoid use of global
ACCEPT_LOCK() on every I/O completion. Now in head sorele() no
longer requires this lock.
2017-09-13 22:11:05 +00:00
Mark Johnston
2d54d4bb9f Widen uk_pgoff, the slab header offset field.
16 bits is only wide enough for kegs with an item size of up to 64KB.
At that size or larger, slab headers are typically offpage because the
item size is a multiple of the page size, but there is no requirement
that this be the case.

We can widen the field without affecting the layout of struct uma_keg
since the removal of uk_slabsize in r315077 left an adjacent hole.

PR:		218911
MFC after:	2 weeks
2017-09-13 21:54:37 +00:00
Konstantin Belousov
e82e50e681 Remove inline specifier from vm_page_free_wakeup(), do not
micro-manage compiler.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-13 19:30:09 +00:00
Konstantin Belousov
2fcd1ff68f Do not relock free queue mutex for each page, free whole terminating
object' page queue under the single mutex lock.

First, all pages on the queue are prepared for free by calls to
vm_page_free_prep(), and pages which should not be returned to the
physical allocator (e.g. wired or fictitious) are simply removed from
the queue.  On the second pass, vm_page_free_phys_pglist() inserts all
pages from the queue without relocking the mutex.

The change improves the object termination, e.g. on the process exit
where large anonymous memory objects otherwise cause relocks the free
queue mutex for each page.  More, if several such processes are
exiting or execing in parallel, the mutex was highly contended on
the address space demolition.

Diagnosed and tested by:	mjg (previous version)
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-13 19:22:07 +00:00
Konstantin Belousov
540ac3b310 Split vm_page_free_toq() into two parts, preparation vm_page_free_prep()
and insertion into the phys allocator free queues vm_page_free_phys().
Also provide a wrapper vm_page_free_phys_pglist() for batched free.

Reviewed by:	alc, markj
Tested by:	mjg (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-13 19:11:52 +00:00
Konstantin Belousov
b9e8fb647e Use existing tag name for the vm_object' memq.
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-13 19:03:59 +00:00
Navdeep Parhar
26cee56642 Retire the T3 iWARP and TOE drivers. This saves catch-up work when OFED or
other kernel infrastructure changes.

Note that this doesn't affect the base cxgb(4) NIC driver for T3 at all.

MFC after:	No MFC.
Sponsored by:	Chelsio Communications
2017-09-13 17:49:23 +00:00
Conrad Meyer
e5dc78af11 intpm(4): Decrease requested i/o port range width
On some AMD FCH devices driven by intpm(4) (read: mine), the SMBus I/O port
range is split in two and the low range is only 0x10 wide.  intpm(4) does
not access any registers above 0x0f, so there is no need for the wider
range.

Discussed with:	avg
Sponsored by:	Dell EMC Isilon
2017-09-13 17:43:18 +00:00
Allan Jude
dbfcf648a3 Increase EFI boot file size frok 128k to 384k
generate_fat.sh does the following:
- create an 800kb zero-filled file
- create an md device backed by this file
- format the device fat12
- mount the filesystem
- create the EFI ESP directory structure
- create the EFI boot file (BOOTx64 for amd64, BOOTaa64 for aarch64, etc)
- Adds a marker to the beginning of the file, and pad it to 384kb
- 384kb was chosen as it is less than half of 800kb, thus allowing
  users to keep a backup of their older boot file in the small partition
- Unmount the filesystem
- Scan the image and find the offset where the marker was inserted
- The process requires root, to make image generation easier, images for
  each architecture are pregenerated, compressed with xz, and checked
  into svn.

The Makefile that generates boot1.efifat does the following:
- Ensure the compiled boot1.efi file is no larger than the generated image
- Decompress the template created by generate-fat.sh
- dd the contents of boot1.efi into boot1.efifat starting at the offset
  where the marker is found. This allows any file less than the maximum
  size to be written into the fat filesystem without having to mount it,
  so no root privileges are required.

Later work by imp and myself makes bsdinstall create a 200mb fat16 instead
of using this process, but it is retained to make image generation easier.

Submitted by:	Eric McCorkle (original version)
Reviewed by:	emaste, tsoome, Eric McCorkle
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D9680
2017-09-13 17:00:02 +00:00
Ian Lepore
1e4042d44e Defer attaching and probing iicbus and its children until interrupts are
available, in i2c controller drivers that require interrupts for transfers.

This is the result of auditing all 22 existing drivers that attach iicbus.
These drivers were the only ones remaining that require interrupts and were
not using config_intrhook to defer attachment.  That has led, over the
years, to various i2c slave device drivers needing to use config_intrhook
themselves rather than performing bus transactions in their probe() and
attach() methods, just in case they were attached too early.
2017-09-13 16:54:27 +00:00
Gleb Smirnoff
100db364eb Fix two issues with not ready data in sockets (read: sendfile)
in UNIX sockets.

o Check that socket is still connected in uipc_ready(). If not
  we are responsible to free mbufs.
o In uipc_send() if socket appears to be disconnected, but we
  are sending data with pending I/Os, don't free mbufs.

Reported by:	Kevin Bowling <kbowling llnw.com>
Tested by:	Kevin Bowling <kbowling llnw.com>
PR:		222259
Reported by:	Mark Martinec <Mark.Martinec ijs.si>
MFC after:	3 days
2017-09-13 16:47:23 +00:00
Conrad Meyer
02e015aa38 intpm(4): While here, remove redundant 'res' check
Reported by:	avg
Sponsored by:	Dell EMC Isilon
2017-09-13 16:43:31 +00:00
Gordon Tetlow
4572fb3faf Deorbit catman. The tradeoff of disk for performance has long since tipped
in favor of just rendering the manpage instead of relying on pre-formatted
catpages. Note, this does not impede the ability to use existing catpages,
it just removes the utility to generate them.

Reviewed by:	imp, allanjude
Approved by:	emaste (mentor)
Differential Revision:	https://reviews.freebsd.org/D12317
2017-09-13 16:35:16 +00:00
Conrad Meyer
54d89ef114 intpm(4): Do not attach if io_res can not be allocated
Attempts to use the driver without an io_res result in immediate panic.

Sponsored by:	Dell EMC Isilon
2017-09-13 16:23:59 +00:00
Mark Johnston
2934eb8a22 Fix a logic error in the item size calculation for internal UMA zones.
Kegs for internal zones always keep the slab header in the slab itself.
Therefore, when determining the allocation size, we need to take the
slab header size into account.

Reported and tested by:	ae, rakuco
Reviewed by:	avg
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D12342
2017-09-13 15:44:54 +00:00
Sean Bruno
19ebd288fb Don't (try to) build lio(4) if the SOURCELESS_UCODE is set.
Submitted by:	Fabien Keil <fk@fabiankeil.de>
2017-09-13 15:17:35 +00:00
Toomas Soome
0a0c72ff93 libefi: efipart_realstrategy rsize pointer may be NULL
Need to check rsize before dereferencing it.
2017-09-13 14:27:13 +00:00
Andriy Gapon
ceb1a4fb2d jedec_ts: add many more devices from various vendors
The new IDs are taken from the hardware to which I have access
and from open datasheets.

Also, the hardware probing is moved to the device probe method.

Reviewed by:	rpokala
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D11730
2017-09-13 13:03:29 +00:00
Ed Maste
eadaf05db0 qlnx: exclude if WITHOUT_SOURCELESS_UCODE set
PR:		222277
Submitted by:	Fabian Keil
Obtained from:	ElectroBSD
MFC after:	1 week
2017-09-13 12:16:27 +00:00
Ilya Bakulin
a9bfc8d2ae Add MMCCAM-enabled kernel config for IMX6, reduce debug noice in MMCCAM kernels
CAM_DEBUG_TRACE results in way too much debug output than needed now.
When debugging, it's always possible to turn on trace level using camcontrol.

Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D12110
2017-09-13 10:56:02 +00:00
Andriy Gapon
86261a95ed slightly simplify zfs_vptocnp
It's not necessary to look up the parent's ID to check if the node is
the root node of the filesystem.

MFC after:	2 weeks
2017-09-13 07:09:58 +00:00
Navdeep Parhar
efeb46889f cxgbe(4): Ignore capabilities that depend on TOE when the firmware
reports TOE is not available.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-09-13 06:07:02 +00:00
Sean Bruno
68467b1206 Jenkins i386 LINT build uses NOTES to generate its LINT kernel config.
ixl(4) isn't in here either, so I'll remove lio(4) too.
2017-09-13 03:56:03 +00:00
Stephen Hurd
ea4c57fe0c Fix GCC build failure caused by r323516
No need to declare cold when we #include <sys/systm.h>

Reported by:	Jenkins
Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12347
2017-09-13 02:44:50 +00:00
Stephen Hurd
d300df0182 Roll up iflib commits from github. This pulls in most of the work done
by Matt Macy as well as other changes which he has accepted via pull
request to his github repo at https://github.com/mattmacy/networking/

This should bring -CURRENT and the github repo into close enough sync to
allow small feature branches rather than a large chain of interdependant
patches being developed out of tree.  The reset of the synchronization
should be able to be completed on github by splitting the remaining
changes that are not yet ready into short feature branches for later
review as smaller commits.

Here is a summary of changes included in this patch:

1)  More checks when INVARIANTS are enabled for eariler problem
    detection
2)  Group Task Queue cleanups
    - Fix use of duplicate shortdesc for gtaskqueue malloc type.
      Some interfaces such as memguard(9) use the short description to
      identify malloc types, so duplicates should be avoided.
3)  Allow gtaskqueues to use ithreads in addition to taskqueues
    - In some cases, this can improve performance
4)  Better logging when taskqgroup_attach*() fails to set interrupt
    affinity.
5)  Do not start gtaskqueues until they're needed
6)  Have mp_ring enqueue function enter the ABDICATED rather than BUSY
    state.  This moves the TX to the gtaskq and allows processing to
    continue faster as well as make TX batching more likely.
7)  Add an ift_txd_errata function to struct if_txrx.  This allows
    drivers to inspect/modify mbufs before transmission.
8)  Add a new IFLIB_NEED_ZERO_CSUM for drivers to indicate they need
    checksums zeroed for checksum offload to work.  This avoids modifying
    packet data in the TX path when possible.
9)  Use ithreads for iflib I/O instead of taskqueues
10) Clean up ioctl and support async ioctl functions
11) Prefetch two cachlines from each mbuf instead of one up to 128B.  We
    often need to parse packet header info beyond 64B.
12) Fix potential memory corruption due to fence post error in
    bit_nclear() usage.
13) Improved hang detection and handling
14) If the packet is smaller than MTU, disable the TSO flags.
    This avoids extra packet parsing when not needed.
15) Move TCP header parsing inside the IS_TSO?() test.
    This avoids extra packet parsing when not needed.
16) Pass chains of mbufs that are not consumed by lro to if_input()
    rather call if_input() for each mbuf.
17) Re-arrange packet header loads to get as much work as possible done
    before a cache stall.
18) Lock the context when calling IFDI_ATTACH_PRE()/IFDI_ATTACH_POST()/
    IFDI_DETACH();
19) Attempt to distribute RX/TX tasks across cores more sensibly,
    especially when RX and TX share an interrupt.  RX will attempt to
    take the first threads on a core, and TX will attempt to take
    successive threads.
20) Allow iflib_softirq_alloc_generic() to request affinity to the same
    cpus an interrupt has affinity with.  This allows TX queues to
    ensure they are serviced by the socket the device is on.
21) Add new iflib sysctls to net.iflib:
    - timer_int - interval at which to run per-queue timers in ticks
    - force_busdma
22) Add new per-device iflib sysctls to dev.X.Y.iflib
    - rx_budget allows tuning the batch size on the RX path
    - watchdog_events Count of watchdog events seen since load
23) Fix error where netmap_rxq_init() could get called before
    IFDI_INIT()
24) e1000: Fixed version of r323008: post-cold sleep instead of DELAY
    when waiting for firmware
    - After interrupts are enabled, convert all waits to sleeps
    - Eliminates e1000 software/firmware synchronization busy waits after
      startup
25) e1000: Remove special case for budget=1 in em_txrx.c
    - Premature optimization which may actually be incorrect with
      multi-segment packets
26) e1000: Split out TX interrupt rather than share an interrupt for
    RX and TX.
    - Allows better performance by keeping RX and TX paths separate
27) e1000: Separate igb from em code where suitable
    Much easier to understand separate functions and "if (is_igb)" than
    previous tests like "if (reg_icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC))"

#blamebruno

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12235
2017-09-13 01:18:42 +00:00
Matt Joras
fdbf11746a Allow vlan interfaces to rx through netmap(4).
Normally after receiving a packet, a vlan(4) interface sends the packet
back through its parent interface's rx routine so that it can be
processed as an untagged frame. It does this by using the parent's
ifp->if_input. This is incompatible with netmap(4), which replaces the
vlan(4) interface's if_input with a netmap(4) hook. Fix this by using
the vlan(4) interface's ifp instead of the parent's directly.

Reported by:	Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by:	rstone
Approved by:	rstone (mentor)
MFC after:	3 days
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12191
2017-09-13 00:25:09 +00:00
Sean Bruno
be17336036 Leave the Cavium Liquid IO driver exist in files, not files.amd64
Submitted by:	imp
2017-09-12 23:58:38 +00:00
Warner Losh
d7fa1ab02d cam iosched: Limit the quanta default to hz if it's below 200
The cam_iosched_ticker() can't be scheduled more than once per tick.
Some limiters depend on quanta matching the number of calls per second
to enforce the proper limits. Limit the quanta to no faster than 1 per
clock tick. This fixes some features when running in VMs where the
default HZ is 100.

PR: 221953
Obtained from: ElectroBSD
Differential Revision: https://reviews.freebsd.org/D12337
Submitted by: Fabian Keil
2017-09-12 23:46:33 +00:00
Sean Bruno
e460f3adbb Do not try to build the Cavium Liquidio driver on all architechtures.
For now, limit to amd64 only.
2017-09-12 23:42:52 +00:00
Sean Bruno
f173c2b77e The diff is the initial submission of Cavium Liquidio 2350/2360 10/25G
Intelligent NIC driver.

The submission conconsists of firmware binary file and driver sources.

Submitted by:	pkanneganti@cavium.com (Prasad V Kanneganti)
Relnotes:	Yes
Sponsored by:	Cavium Networks
Differential Revision:	https://reviews.freebsd.org/D11927
2017-09-12 23:36:58 +00:00
Michael Tuexen
292efb1bc0 Export the UDP encapsualation port and the path state. 2017-09-12 21:08:50 +00:00
Alan Somers
71cd87c66c Remove spaces from CTL devices' default serial numbers
It's awkward to have spaces in CAM device serial numbers. That leads to
such things as device nodes named "/dev/diskid/MYSERIAL%20%20%201". Better
to replace the spaces with "0"s. This change only affects the default
serial numbers for users who don't provide their own.

Reviewed by:	ken, mav
MFC after:	Never
Relnotes:	Yes
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D12263
2017-09-12 19:36:24 +00:00
John Baldwin
b4e9a36bf7 Handle relocations for newer non-PIC MIPS ABI.
Newer binutils supports extensions to the MIPS ABI for non-PIC code
that is used when compiling O32 binaries with clang 5 (but not used
for N64 oddly enough).  These extensions require support for
R_MIPS_COPY relocations as well as a second PLT GOT using
R_MIPS_JUMP_SLOT relocations.

For R_MIPS_COPY, use the same approach as on other architectures where
fixups are deferred to the MD do_copy_relocations.

The additional PLT GOT for jump slots is located in a .got.plt section
which is identified by a DT_MIPS_PLTGOT dynamic entry.  This GOT also
requires fixups for the first two GOT entries just as the normal GOT.
However, the entry point for this second GOT uses a different calling
convention. Rather than passing an offset into the GOT, it passes an
offset into the .rel.plt section.  This requires a second entry point
(_rtld_pltbind_start) which calls the normal _rtld_bind() rather than
_mips_rtld_bind().  This also means providing a real version of
reloc_jmpslot() which is used by _rtld_bind().

In addition, add real implementions of reloc_plt() and
reloc_jmpslots() which walk .rel.plt handling R_MIPS_JUMP_SLOT
relocations.

Reviewed by:	kib
Sponsored by:	DARPA / AFRL
Differential Revision:	https://reviews.freebsd.org/D12326
2017-09-12 17:46:30 +00:00
Toomas Soome
c7847a364a libefi: efipart_open should check the status from disk_open
In case of error from disk_open(), we should clean up properly.

Reviewed by:	allanjude, imp
Differential Revision:	https://reviews.freebsd.org/D12340
2017-09-12 14:18:45 +00:00
Toomas Soome
b40aaca6dd loader should support large_dnode
The zfsonlinux feature large_dnode is not yet supported by the loader.

Reviewed by:	avg, allanjude
Differential Revision:	https://reviews.freebsd.org/D12288
2017-09-12 13:45:04 +00:00
Michael Tuexen
e5cccc35c3 Add support to print the TCP stack being used.
Sponsored by:	Netflix, Inc.
2017-09-12 13:34:43 +00:00
Andriy Gapon
1a2ddb2997 fix a fallout from the ZTOV tightening, r323479
MFC after:	13 days
X-MFC with:	r323479
2017-09-12 13:21:14 +00:00
Olivier Houchard
69d14913fc Some devices come with the same name as TI devices, so we can't rely on the
"probe" method of those drivers to mean we're on e TI SoC. Introduce a new
function, ti_soc_is_supported(), and use it to be sure we're really a TI
system.

PR:	222250
2017-09-12 10:43:02 +00:00
Andriy Gapon
bcab65cab5 zfsctl_snapdir_lookup should be able to handle an uncovered vnode
The uncovered vnode is possible because there is no guarantee that
its hold count would go to zero (and it would be inactivated and reclaimed)
immediately after a covering filesystem is unmounted.
So, such a vnode should be expected and it is possible to re-use it
without any trouble.

MFC after:	3 weeks
Sponsored by:	Panzura
2017-09-12 06:06:58 +00:00
Andriy Gapon
c09d0da8d1 zfs_ctldir: remove obsolete / bogus ARGSUSED lint directives
None of the tagged functions had unused parameters.

MFC after:	1 week
2017-09-12 06:05:30 +00:00
Andriy Gapon
65b38f7311 zfsvfs_hold: assert that the busied filesystem can not be unmounted
This is a FreeBSD specific feature.

MFC after:	3 weeks
Sponsored by:	Panzura
2017-09-12 06:04:50 +00:00
Andriy Gapon
d092f79489 zfs_get_vfs: reference a requested filesystem instead of vfs_busy-ing it
The only consumer of zfs_get_vfs, zfs_unmount_snap, does not need
the filesystem to be busy, it just need a reference that it can pass
to dounmount.

Also, previously the code was racy as it unbusied the filesystem
before taking a reference on it.

Now the code should be simpler and safer.

MFC after:	2 weeks
Sponsored by:	Panzura
2017-09-12 06:04:01 +00:00
Andriy Gapon
f7519dbb76 zfs: tighten debug versions of ZTOV and VTOZ
MFC after:	2 weeks
Sponsored by:	Panzura
2017-09-12 06:02:21 +00:00
Cy Schubert
54e485fd3c Improve the wording of a comment describing why EAGAIN is the error code.
MFC after:	3 days
2017-09-12 04:21:04 +00:00
Ian Lepore
813c1b27fe Add a default implementation that returns ENODEV for start, repeat_start,
stop, read, and write methods.  Some controllers don't implement these
individual operations and have only a transfer method.  In that case, we
should return an indication that the device is present but doesn't support
the method, as opposed to the kobj default error ENXIO which makes it
look like the whole device is missing.  Userland tools such as i2c(8) can
use the differing return values to switch between the two different i2c
IO mechanisms.
2017-09-11 23:47:49 +00:00
Conrad Meyer
d63edb4dc6 MCA: Rename AMD MISC bits/masks
They apply to all AMD MCAi_MISC0 registers, not just MCA4 (NB).

No functional change.

Sponsored by:	Dell EMC Isilon
2017-09-11 20:42:07 +00:00
Conrad Meyer
f739be66e6 x86 MCA: Extract CMCI support predicate into function
On AMD, the MCG_CAP feature bit is reserved -- not explicitly zero.  Do not
use it to determine CMCI support.

Reviewed by:	avg, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12320
2017-09-11 20:41:25 +00:00
Marcin Wojtas
5a2f997cb5 Restore alphabetical order in UART Makefile
Commit r323359 introduced new Marvell UART controller driver
and by mistake it broke correct order in the Makefile. Fix this.

Reported by: emaste
2017-09-11 19:07:53 +00:00
Ilya Bakulin
61df30cfd4 Add MMCCAM-enabled kernel config for arm64
Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D12114
2017-09-11 19:07:42 +00:00
Marcin Wojtas
885a74181c Expand Marvell NIC description in arm64 GENERIC config
Suggested by: emaste
2017-09-11 19:00:53 +00:00
Konstantin Belousov
809f2d8b8b Fix ioapic acpi id matching on PCI attach and rid calculation.
Sponsored by:	The FreeBSD Foundation
MFC after:	11 days
2017-09-11 18:29:09 +00:00
Conrad Meyer
e8be4e41c6 Decode new AMD SVM feature bits on family 17h
Sponsored by:	Dell EMC Isilon
2017-09-11 18:11:53 +00:00
Ed Maste
e9346a94d1 boot1: remove BOOT1_MAXSIZE default value
This Makefile relies on Makefile.fat providing the correct value for
BOOT1_MAXSIZE and BOOT1_OFFSET. Since BOOT1_OFFSET had no default value
here the build would already fail if Makefile.fat did not provide
correct values.

Sponsored by:	The FreeBSD Foundation
2017-09-11 14:33:04 +00:00
Andriy Gapon
970165f190 MFV r323111: 8569 problem with inline functions in abd.h
illumos/illumos-gate@37e84ab74e
37e84ab74e

https://www.illumos.org/issues/8569
  C [C99] has peculiar rules for inline functions that are different from the
  C++ rules.  Unlike C++ where inline is "fire and forget", in C a programmer
  must pay attention to the function's storage class / visibility.  The main
  problem is with the case where a compiler decides to not inline a call to the
  function declared as inline.
  Some relevant links:
  - http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15831.html
  - http://www.drdobbs.com/the-new-c-inline-functions/184401540
  The summary is that either the inline functions should be declared 'static
  inline' or one of the compilation units (.c files) must provide a callable
  externally visible function definition.  In the former case, the compiler would
  automatically create a local non-inlined function instance in every compilation
  unit where it's needed.  In the latter case the single external definition is
  used to satisfy any non-inlined calls in all compilation units.  As things
  stand right now, we can get an undefined reference error under certain
  combinations of compilers and compiler options.  For example, this is what I
  get on FreeBSD when compiling with clang 4.0.0 and -O1:
    In function `abd_free': /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c:385:
    undefined reference to `abd_is_linear'

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>

MFC after:	1 week
2017-09-11 12:15:49 +00:00
Andriy Gapon
25625d8746 Revert r322601, Mark ZFS ABD inline functions static
An alternative fix is to be merged from illumos shortly.
2017-09-11 12:08:20 +00:00
Andriy Gapon
3d9a0e564d MFV r323110: 8558 lwp_create() returns EAGAIN on system with more than 80K ZFS filesystems
illumos/illumos-gate@216d7723a1
216d7723a1

https://www.illumos.org/issues/8558
  On a system with more than 80K ZFS filesystems, we've seen cases where
  lwp_create() will start to fail by returning EAGAIN. The problem being,
  for each of those 80K ZFS filesystems, a taskq will be created for each
  dataset as part of the ZIL for each dataset.
  For each of these taskq's, a kernel thread will be created which results
  in 24KB being allocated for each thread. With enough of these 24KB
  allocations, we eventually exhaust the memory region set aside for these
  allocations. Currently, segkpsize is set to a value of 2GB, which means
  we can only support about 80K filesystems; 2GB / 24KB = ~80K.
  The lwp_create() failure comes into play due to the fact that LWP
  creation also allocates 24KB from this same region of memory. Thus, if
  we've exhausted this region of memory due to the number of ZIL taskq's,
  there won't be any memory avaible to allow the call to lwp_create() to
  succeed.

FreeBSD note: I haven't created sysctl-s for the new ZIL clean
parameters.  Let's add them if anyone requires to tune them.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>
MFC after:	3 weeks
2017-09-11 11:31:43 +00:00
Marcin Wojtas
be2d15eae6 Improve HW type checking in mv_ehci driver
This patch adds hwtype parameter which keeps information about hardware
revision of Marvell EHCI controller. It allows to replace multiple
calls to ofw_bus_is_compatible with comparing hwtype value during driver
initialization.

Submitted by: Patryk Duda <pdk@semihalf.com>
Suggested by: ian
Obtained from: Semihalf
Sponsored by: Semihalf
2017-09-11 10:41:42 +00:00
Toomas Soome
4f152c5b8a r323389 breaks the kernel build when WITHOUT_ZFS is defined in src.conf
Need to add #ifdef EFI_ZFS_BOOT guard into efi/loader/main.c

PR:		222215
Reported by:	Sylvain Garrigues
2017-09-11 07:38:53 +00:00
Scott Long
3c5ac992c7 Add infrastructure for allocating multiple MSI-X interrupts. Also
add more fine-tuned controls for allocating requests and replies.

Sponsored by:	Netflix
2017-09-11 01:51:27 +00:00
Ed Maste
5e66298138 boot1 generate-fat: generate all templates at once
In advance of other changes to the fat template generation process, have
generate-fat.sh create all template files at the same time so that they
cannot get out of sync.

Also correct a longstanding but where BOOT1_OFFSET was overwritten on
each invocation. A previous version of this patch stored a per-arch
offset (e.g. BOOT1_arm64_OFFSET) but that was deemed unnecessary.
Instead just hardcode the known offset that applies to all archs (0x2d)
and fail if the offset happens to be different.

Ongiong work (using newfs_msdos in bsdinstall and adding msdosfs support
to makefs) will eventually allow us to do away with this fat template
hack altogether, but in the near term we have a few improvements that
will build on this.

Reviewed by:	allanjude, imp, Eric McCorkle
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D10931
2017-09-11 00:37:00 +00:00
Ed Maste
3ac77a9151 newvers.sh: speed up failing git-svn revision search
In the case of running newvers.sh on a git tree w/o git-svn-id notes we
previously piped the entire 'git log' to grep. Add --grep to the log
invocation to avoid processing log entries of no interest.

This saves about 2-3 seconds of newvers.sh run time on my SSD laptop.
Later changes will bring further speedups.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2017-09-11 00:14:04 +00:00
Ed Maste
a489102080 newvers.sh: accept "git-svn-id:" at the start of a line only
This prevents incorrect subversion revision detection when "git svn" is
not being used to get the sources but git is available. Previously old
subversion revisions included in commit messages were favoured over the
more recent and correct revisions in git notes.

For example cf1f355747 represents r315395 but was treated as r313908
which is referenced in the commit message. Commits following
r315395/cf1f35574722 but before another commit with a git-svn-id
reference in the commit message would be treated as r313908 as well.

Patch from PR updated to accommodate the initial four space indent in
`git log` ouptut.

PR:		221848
Submitted by:	Fabian Keil
Obtained from:	ElectroBSD
MFC after:	2 weeks
2017-09-10 19:12:01 +00:00
Mateusz Guzik
1c0b34417b Move vmmeter atomic counters into dedicated cache lines
Prior to the change they were subject to extreme false sharing.
In particular this change shaves about 3 seconds real time of -j 80 buildkernel.

Reviewed by:	alc, markj
Differential Revision:	https://reviews.freebsd.org/D12281
2017-09-10 19:00:38 +00:00
Ian Lepore
e1275c6805 Add gpio methods to read/write/configure up to 32 pins simultaneously.
Sometimes it is necessary to combine several gpio pins into an ad-hoc bus
and manipulate the pins as a group. In such cases manipulating the pins
individualy is not an option, because the value on the "bus" assumes
potentially-invalid intermediate values as each pin is changed in turn. Note
that the "bus" may be something as simple as a bi-color LED where changing
colors requires changing both gpio pins at once, or something as complex as
a bitbanged multiplexed address/data bus connected to a microcontroller.

In addition to the absolute requirement of simultaneously changing the
output values of driven pins, a desirable feature of these new methods is to
provide a higher-performance mechanism for reading and writing multiple
pins, especially from userland where pin-at-a-time access incurs a noticible
syscall time penalty.

These new interfaces are NOT intended to abstract away all the ugly details
of how gpio is implemented on any given platform. In fact, to use these
properly you absolutely must know something about how the gpio hardware is
organized. Typically there are "banks" of gpio pins controlled by registers
which group several pins together. A bank may be as small as 2 pins or as
big as "all the pins on the device, hundreds of them." In the latter case, a
driver might support this interface by allowing access to any 32 adjacent
pins within the overall collection. Or, more likely, any 32 adjacent pins
starting at any multiple of 32. Whatever the hardware restrictions may be,
you would need to understand them to use this interface.

In additional to defining the interfaces, two example implementations are
included here, for imx5/6, and allwinner. These represent the two primary
types of gpio hardware drivers. imx6 has multiple gpio devices, each
implementing a single bank of 32 pins. Allwinner implements a single large
gpio number space from 1-n pins, and the driver internally translates that
linear number space to a bank+pin scheme based on how the pins are grouped
into control registers. The allwinner implementation imposes the restriction
that the first_pin argument to the new functions must always be pin 0 of a
bank.

Differential Revision:	https://reviews.freebsd.org/D11810
2017-09-10 18:08:25 +00:00
Alan Cox
d027ed2e7a To analyze the allocation of swap blocks by blist functions, add a method
for analyzing the radix tree structures and reporting on the number, and
sizes, of maximal intervals of free blocks.  The report includes the number
of maximal intervals, and also the number of them in each of several size
ranges, from small (size 1, or 3 to 4) to large (28657 to 46367) with size
boundaries defined by Fibonacci numbers.  The report is written in the test
tool with the 's' command, or in a running kernel by sysctl.

The analysis of the radix tree frequently computes the position of the lone
bit set in a u_daddr_t, a computation that also appears in leaf allocation.
That computation has been moved into a function of its own, and optimized
for cases where an inlined machine instruction can replace the usual binary
search.

Submitted by:	Doug Moore <dougm@rice.edu>
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11906
2017-09-10 17:46:03 +00:00
Dag-Erling Smørgrav
008a09355b If the user tries to set kern.randompid to 1 (which is meaningless), set
it to a random value between 100 and 1123, rather than 0 as before.

Submitted by:	Marie Helene Kvello-Aune <marieheleneka@gmail.com>
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D5336
2017-09-10 15:01:29 +00:00