Commit Graph

118463 Commits

Author SHA1 Message Date
Conrad Meyer
d616681cec aesni(4): Fix another trivial typo (aensi -> aesni)
Sponsored by:	Dell EMC Isilon
2017-09-20 18:31:36 +00:00
Conrad Meyer
194446f9b7 x86: Decode AMD "Extended Feature Extensions ID EBX" bits
In particular, this determines CPU support for the CLZERO instruction.

(No, I am not making this name up.)

Sponsored by:	Dell EMC Isilon
2017-09-20 18:30:37 +00:00
Conrad Meyer
81326306dd aesni(4): Fix trivial typo (AQUIRE -> ACQUIRE)
Sponsored by:	Dell EMC Isilon
2017-09-20 17:53:25 +00:00
Alan Somers
cd037f075c MFV r323789: 8473 scrub does not detect errors on active spares
illumos/illumos-gate@554675eee7
554675eee7

https://www.illumos.org/issues/8473
Scrubbing is supposed to detect and repair all errors in the pool. However,
it wrongly ignores active spare devices. The problem can easily be
reproduced in OpenZFS at git rev 0ef125d with these commands:

truncate -s 64m /tmp/a /tmp/b /tmp/c
sudo zpool create testpool mirror /tmp/a /tmp/b spare /tmp/c
sudo zpool replace testpool /tmp/a /tmp/c
/bin/dd if=/dev/zero bs=1024k count=63 oseek=1 conv=notrunc of=/tmp/c
sync
sudo zpool scrub testpool
zpool status testpool # Will show 0 errors, which is wrong
sudo zpool offline testpool /tmp/a
sudo zpool scrub testpool
zpool status testpool # Will show errors on /tmp/c,
		      # which should've already been fixed

FreeBSD head is partially affected: the first scrub will detect some errors, but the second scrub will detect more.

Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>

MFC after:	1 week
Sponsored by:	Spectra Logic Corp
2017-09-20 16:31:00 +00:00
Andriy Gapon
aacd0b4bb2 add vfs_zfs.abd_chunk_size tunable
It is reported that the default value of 4KB results in a substantial
memory use overhead (at least, on some configurations).  Using 1KB seems
to reduce the overhead significantly.

PR:		222377
Reported by:	Sean Chittenden <sean@chittenden.org>
MFC after:	1 week
2017-09-20 08:36:31 +00:00
Andriy Gapon
3d5487d981 fix memory leak in g_bio zone introduced in r320452, another ABD fallout
I overlooked the fact that that ZIO_IOCTL_PIPELINE does not include
ZIO_STAGE_VDEV_IO_DONE stage.  We do allocate a struct bio for an ioctl
zio (a disk cache flush), but we never freed it.

This change splits bio handling into two groups, one for normal
read/write i/o that passes data around and, thus, needs the abd data
tranform; the other group is for "data-less" i/o such as trim and cache
flush.

PR:		222288
Reported by:	Dan Nelson <dnelson@allantgroup.com>
Tested by:	Borja Marcos <borjam@sarenet.es>
MFC after:	10 days
2017-09-20 08:27:21 +00:00
Andriy Gapon
8c9377cde7 MFV r323792: 8602 remove unused "dp_early_sync_tasks" field from "dsl_pool" structure
illumos/illumos-gate@2bcb545854
2bcb545854

https://www.illumos.org/issues/8602
  When I landed the fix for 8558, I incorrectly added the "dp_early_sync_tasks"
  field to the "dsl_pool" structure. This field is used in DelphixOS, but not in
  illumos. It was incorrectly pulled into illumos, so this bug is to remove it
  from the structure.

Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>

MFC after:	1 week
2017-09-20 07:26:52 +00:00
Alan Cox
e9bfbb02c5 In r288122, we changed vm_page_unwire() so that it returns a Boolean
indicating whether the page's wire count transitioned to zero.  Use that
return value in zbuf_page_free() rather than checking the wire count.

MFC after:	1 week
2017-09-20 04:59:52 +00:00
Alan Cox
2582d7a969 Sync with amd64/arm/arm64/i386/mips pmap change r288256:
Exploit r288122 to address a cosmetic issue.  Since PV chunk pages don't
belong to a vm object, they can't be paged out.  Since they can't be paged
out, they are never enqueued in a paging queue.  Nonetheless, passing
PQ_INACTIVE to vm_page_unwire() creates the appearance that these pages
are being enqueued in the inactive queue.  As of r288122, we can avoid
this false impression by passing PQ_NONE.

MFC after:	1 week
2017-09-20 04:19:49 +00:00
Olivier Houchard
4583315a06 Define CPU_XSCALE_CORE3 when relevant.
It was lost when cpuconf.h was deobirted.
2017-09-19 23:41:55 +00:00
Rick Macklem
0f29b8292d Make the nfsrpc_layoutget() function a static.
Make the NFSv4 pNFS client function nfsrpc_layoutget() a static, since it
is only used in sys/fs/nfsclient/nfs_clrpcops.c.
This prepares the code for future patches that add Flex File layout
support.
2017-09-19 23:28:22 +00:00
David C Somayajulu
1faeac0fd5 Add sysctl "enable_minidump" to turn on/off automatic minidump retrieval
MFC after:5 days
2017-09-19 23:26:27 +00:00
David C Somayajulu
9bc6b5ac37 Update minidump template for version 5.4.66
MFC after:5 days
2017-09-19 22:17:30 +00:00
Rick Macklem
2742a21091 Add a new function called nfsm_uiombuflist(), similar to nfsm_uiombuf().
This patch adds a new function called nfsm_uiombuflist(), which is
similar to nfsm_uiombuf(), but doesn't not use the fields in
struct nfsrv_descript. This new function will be used by the pNFS client
for writing to mirrors using Flex Files layout.
The function is not yet called anywhere.
Also, get rid of #ifndef APPLE, which is ancient cruft left over from
the Mac OSX port of the NFSv4 client.
2017-09-19 21:31:36 +00:00
Rick Macklem
b0932afacc Simplify nfsrpc_layoutreturn() args.
Simplify nfsrpc_layoutreturn() args. in preparation for the addition
of Flex File layout support, since File layout uses a 0 length field.
Flex Files does use a longer field, but that will be added in a
subsequent commit.
2017-09-19 20:45:25 +00:00
Josh Paetzel
c77037f16f Fix indentation for r323068
PR:	220170
Reported by:	lidl
MFC after:	3 days
Pointyhat to:	jpaetzel
2017-09-19 20:40:05 +00:00
Olivier Houchard
7ce16cd956 i81342 is little endian, not big endian. 2017-09-19 20:33:22 +00:00
Michael Tuexen
3ec509bcd3 Fix a warning.
MFC after:	1 week
2017-09-19 20:24:13 +00:00
Rick Macklem
ab118d04be Simplify nfsrpc_layoutcommit() args.
Simplify nfsrpc_layoutcommit() args. in preparation for the addition
of Flex File layout support, since it also uses a 0 length field.
2017-09-19 20:18:41 +00:00
Michael Tuexen
564a95f485 Avoid an overflow when computing the staleness.
This issue was found by running libfuzz on the userland stack.

MFC after:	1 week
2017-09-19 20:09:58 +00:00
Konstantin Belousov
3cabd93e26 Do not do torn writes to active LDTs.
Care must be taken when updating the active LDT, since parallel
threads might try to load a segment descriptor which is currently
updated. Since the results are undefined, this cannot be ignored by
claiming to be an application race.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D12413
2017-09-19 17:57:04 +00:00
Konstantin Belousov
9770475ce7 Do not vrele() covered vnode under the mp mutex.
If vrele() changes the hold count to zero, it needs to acquire the
vnode lock.

Sponsored by:	The FreeBSD Foundation
Discussed with:	avg
X-MFC with:	r323578
2017-09-19 16:49:45 +00:00
Konstantin Belousov
5bf949377e For unlinked files, do not msync(2) or sync on the vnode deactivation.
One consequence of the patch is that msyncing unlinked file mappings
no longer reduces the amount of the dirty memory in the system, but I
do not think that there are users of msync(2) that utilize it for such
side-effect.

Reported and tested by:	tjil
PR:	222356
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D12411
2017-09-19 16:46:37 +00:00
Michael Tuexen
ad608f06ed Remove a no longer used variable.
Reported by:	Felix Weinrank
MFC after:	1 week
2017-09-19 15:00:19 +00:00
Sepherosa Ziehau
d74831eea2 hyperv/hn: Incease max supported MTU
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12365
2017-09-19 06:46:00 +00:00
Sepherosa Ziehau
eb2fe04416 hyperv/hn: Fix MTU setting
- Add size of an ethernet header to the value configured to NVS.  This
  does not seem to have any effects if MTU is 1500, but fix hypervisor
  side's setting if MTU > 1500.
- Override the MTU setting according to the view from the hypervisor
  side.

MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12352
2017-09-19 06:38:57 +00:00
Sepherosa Ziehau
642ec226bb hyperv/hn: Apply VF's RSS setting
Since in Azure SYN and SYN|ACK go through the synthetic parts while the
rest of the same TCP flow goes through the VF, apply VF's RSS settings
to synthetic parts to have a consistent hash value/type for the same TCP
flow.

MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12333
2017-09-19 06:29:38 +00:00
John Baldwin
4fedff3ca1 Enable support for lookaside crypto operations by default.
This permits ccr(4) to be used with the default firmware configuration
file.

Discussed with:	np
Sponsored by:	Chelsio Communications
2017-09-18 23:50:34 +00:00
John Baldwin
4f45713ae2 Add UFS_LINK_MAX for the UFS-specific limit on link counts.
ino64 expanded nlink_t to 64 bits, but the on-disk format for UFS is still
limited to 16 bits.  This is a nop currently but will matter if LINK_MAX is
increased in the future.

Reviewed by:	kib
Sponsored by:	Chelsio Communications
2017-09-18 23:30:39 +00:00
Konstantin Belousov
5efe338f3d Fix handling of the segment registers on i386.
Suppose that userspace is executing with the non-standard segment
descriptors.  Then, until exception or interrupt handler executed
SET_KERNEL_SEGS, kernel is still executing with user %ds, %es and %fs.
If an interrupt occurs in this window, the interrupt handler is
executed unsafely, relying on usability of the usermode registers.  If
the interrupt results in the context switch on return, the
contamination of the kernel state spreads to the thread we switched
to.  As result, kernel data accesses might fault or, if only the base
is changed, completely messed up.

More, if the user segment was allocated in LDT, another thread might
mark the descriptor as invalid before doreti code tried to reload
them.  In this case kernel panics.

The issue exists for all exception entry points which use trap gate,
and thus do not automatically disable interrupts on entry, and for
lcall_handler.

Fix is two-fold: first, we need to disable interrupts for all kernel
entries, changing the IDT descriptor types from trap gate to interrupt
gate.  Interrupts are re-enabled not earlier than the kernel segments
are loaded into the segment registers.  Second, we only load the
segment registers from the trap frame when returning to usermode.  For
the later, all interrupt return paths must happen through the doreti
common code.

There is no way to disable interrupts on call gate, which is the
supposed mode of servicing for lcall $7,$0 syscalls.  Change the LDT
descriptor 0 into a code segment type and point it to the userspace
trampoline which redirects the syscall to int $0x80.

All the measures make the segment register handling similar to that of
amd64.  We do not apply amd64 optimizations of not reloading segment
registers on return from the syscall.

Reported by:	Maxime Villard <max@m00nbsd.net>
Tested by:	pho (the non-lcall part)
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D12402
2017-09-18 20:22:42 +00:00
Ilya Bakulin
ffa317051b Add kern.features flag for MMCCAM
kern.features.mmcam will be present and equal to 1 if the kernel has been
compiled with option MMCCAM.
This will help sdio-related userland tools to fail-fast if running on the kernel
without MMCCAM enabled.

Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D12386
2017-09-18 20:17:08 +00:00
Cy Schubert
70394ab378 Don't use an apostrophe in a possesive pronoun.
MFC after:	3 days
2017-09-18 19:16:41 +00:00
Ryan Libby
2e6418c0c5 linsysfs: quiet gcc -Wformat after r323692
Reviewed by:	cem
Sponsored by:	Dell EMC Isilon
2017-09-18 19:09:40 +00:00
Scott Long
1ab9094c90 Hide a normal probe warning message under bootverbose, similar to atkbdc
Sponsored by:	Netflix
2017-09-18 18:42:28 +00:00
Conrad Meyer
4d367f2501 linsysfs(5): Fix two unrelated issues
1. Swap the order of device_get_ivars with device_get_devclass and devclass
   name validation.  This bug was introduced in r323692.

2. Error check device_get_children and free the returned list.  This bug was
   introduced in the original linsysfs commit.

Reported by:	Oleg V. Nauman <oleg AT theweb.org.ua>, hselasky (1); hselasky (2)
Reviewed by:	hselasky
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12407
2017-09-18 17:14:13 +00:00
Toomas Soome
3f247eab7a loader: biosmem allocate heap just below 4GB
The current biosmem code is walking bios smap entries and looking for smap
entry just below 4GB line, if there is such entry, its base and size is set
for heap base and size. Instead of entry base, we should use last HEAP_MIN
(currently 64MB) bytes just below 4GB, to make maximum space for kernel and
modules.

The problem was revealed on ASUS B350M-A system board, an AMD Ryzen 3 1200 CPU

memory map:

SMAP type=01 base=0000000000000000 len=000000000009d400 attr=01
SMAP type=02 base=000000000009d400 len=0000000000002c00 attr=01
SMAP type=02 base=00000000000e0000 len=0000000000020000 attr=01
SMAP type=01 base=0000000000100000 len=0000000009c00000 attr=01
SMAP type=02 base=0000000009d00000 len=0000000000300000 attr=01
SMAP type=01 base=000000000a000000 len=00000000be69b000 attr=01
SMAP type=03 base=00000000c869b000 len=0000000000016000 attr=01
SMAP type=01 base=00000000c86b1000 len=00000000124e7000 attr=01
SMAP type=02 base=00000000dab98000 len=0000000000138000 attr=01
SMAP type=03 base=00000000dacd0000 len=0000000000008000 attr=01
SMAP type=01 base=00000000dacd8000 len=0000000000100000 attr=01
SMAP type=04 base=00000000dadd8000 len=00000000003b3000 attr=01
SMAP type=02 base=00000000db18b000 len=0000000000d42000 attr=01
SMAP type=01 base=00000000dbecd000 len=0000000002133000 attr=01
SMAP type=01 base=0000000100000000 len=000000011f380000 attr=01
SMAP type=02 base=00000000de000000 len=0000000002000000 attr=01
SMAP type=02 base=00000000f8000000 len=0000000004000000 attr=01
SMAP type=02 base=00000000fdf00000 len=0000000000100000 attr=01
SMAP type=02 base=00000000fea00000 len=0000000000010000 attr=01
SMAP type=02 base=00000000feb80000 len=0000000000082000 attr=01
SMAP type=02 base=00000000fec10000 len=0000000000001000 attr=01
SMAP type=02 base=00000000fec30000 len=0000000000001000 attr=01
SMAP type=02 base=00000000fed00000 len=0000000000001000 attr=01
SMAP type=02 base=00000000fed40000 len=0000000000005000 attr=01
SMAP type=02 base=00000000fed80000 len=0000000000010000 attr=01
SMAP type=02 base=00000000fedc2000 len=000000000000e000 attr=01
SMAP type=02 base=00000000fedd4000 len=0000000000002000 attr=01
SMAP type=02 base=00000000fee00000 len=0000000000100000 attr=01
SMAP type=02 base=00000000ff000000 len=0000000001000000 attr=01

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D12368
2017-09-18 15:17:01 +00:00
Hans Petter Selasky
e35e2f3ebe Bump the __FreeBSD_version after recent LinuxKPI changes.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-09-18 13:39:51 +00:00
Hans Petter Selasky
62bae5d421 The LinuxKPI atomics do not have acquire nor release semantics unless
specified. Fix code to use READ_ONCE() and WRITE_ONCE() where appropriate.

Suggested by:		kib @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-09-18 13:37:14 +00:00
Hans Petter Selasky
1f7c7e1bec Only wire pages in the LinuxKPI instead of holding and wiring them.
This prevents the page daemon from regularly scanning the held pages.

Suggested by:		kib @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-09-18 13:23:59 +00:00
Hans Petter Selasky
c05238a681 Add support for shared memory functions to the LinuxKPI.
Obtained from:		kmacy @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-09-18 13:17:23 +00:00
Conrad Meyer
2d347b2ef8 linsysfs(5): Add support for recent libdrm
Expose more information about PCI devices (and GPUs in particular) via
linsysfs to libdrm.

This allows unmodified modern 64-bit Linux libdrm to work, which allows
modern Linux Mesa to work.  The submitter reports that he tested the change
with an Ubuntu 16.04 chroot + amdgpu from graphics/drm-next-kmod.

PR:		222375
Submitted by:	Greg V <greg AT unrelenting.technology>
2017-09-17 23:40:16 +00:00
Ian Lepore
d18915fadb Give icee(4) a detach() method so it can be used as a module. Add a
module makefile for it.
2017-09-17 22:58:13 +00:00
Conrad Meyer
c50df68a08 MCA: Expand AMD Thresholding support to cover all banks
When it was added in r314636, AMD Thresholding was hardcoded to only
bank 4 (Northbridge) for some reason.  However, even on family 10h the
MCAx_MISC register Valid/Present bits determine whether thresholding is
supported on that bank.

Expand thresholding support to monitor all monitorable banks.  This
simplifies some of the logic and makes it more consistent with our Intel
CMCI support.

Reviewed by:	markj (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12321
2017-09-17 22:58:13 +00:00
Rick Macklem
ccf038250a Fix bogus FREAD with NFSV4OPEN_ACCESSREAD. No functional change.
The code in nfscl_doflayoutio() bogusly used FREAD instead of
NFSV4OPEN_ACCESSREAD. Since both happen to be defined as "1", this
worked and the patch doesn't result in a functional change.
Found by inspection during development of Flex File Layout support.

MFC after:	2 weeks
2017-09-17 22:18:01 +00:00
Justin Hibbits
d11e86549e Don't use a non-zero argument for __builtin_frame_address
__builtin_frame_address with a non-zero argument is unsafe and rejected by
newer gcc.  Since it doesn't seem to impact the stacktrace, don't bother
with gymnastics to unwind to a different frame for starting.

PR:		kern/220118
MFC after:	2 weeks
2017-09-17 20:07:20 +00:00
Justin Hibbits
6b7530563b Print the correct bitmask for the running Book-E CPU
All the Book-E world is no longer e500v{1,2}.  e500mc the 64-bit derivatives do
not use the DOZE/NAP bits with MSR[WE], instead using the `wait' instruction to
wait for interrupts, and SoC plane controls (via CCSR) for power management.

MFC after:	1 week
2017-09-17 19:40:17 +00:00
Mark Johnston
b999e9c813 Implement mmu_page_init for AIM platforms.
As of r323290 we cannot rely on the vm_page array being
zero-initialized.

Reported and tested by:	andreast
MFC after:	1 week
2017-09-17 15:40:12 +00:00
Michael Tuexen
72e23aba22 Fix an accounting bug and use sctp_timer_start to start a timer.
MFC after:	1 week
2017-09-17 09:27:27 +00:00
Michael Tuexen
fe40f49bb3 Remove code not used on any platform currently supported.
MFC after:	1 week
2017-09-16 21:26:06 +00:00
Alan Cox
ec371b57e8 Modify blst_leaf_alloc to take only the cursor argument.
Modify blst_leaf_alloc to find allocations that cross the boundary between
one leaf node and the next when those two leaves descend from the same
meta node.

Update the hint field for leaves so that it represents a bound on how
large an allocation can begin in that leaf, where it currently represents
a bound on how large an allocation can be found within the boundaries of
the leaf.

The first phase of blst_leaf_alloc currently shrinks sequences of
consecutive 1-bits in mask until each has been shrunken by count-1 bits,
so that any bits remaining show where an allocation can begin, or until
all the bits have disappeared, in which case the allocation fails. This
change amends that so that the high-order bit is copied, as if, when the
last block was free, it was followed by an endless stream of free
blocks. It also amends the early stopping condition, so that the shrinking
of 1-sequences stops early when there are none, or there is only one
unbounded one remaining.

The search for the first set bit is unchanged, and the code path
thereafter is mostly unchanged unless the first set bit is in a position
that makes some of those copied sign bits matter. In that case, we look
for a next leaf, and at what blocks it can provide, to see if a
cross-boundary allocation is possible.

The hint is updated on a successful allocation that clears the last bit,
but it not updated on a failed allocation that leaves the last bit
set. So, as long as the last block is free, the hint value for the leaf is
large. As long as the last block is free, and there's a next leaf, a large
allocation can begin here, perhaps. A stricter rule than this would mean
that allocations and frees in one leaf could require hint updates to the
preceding leaf, and this change seeks to leave the freeing code
unmodified.

Define BLIST_BMAP_MASK, and use it for bit masking in blst_leaf_free and
blist_leaf_fill, as well as in blst_leaf_alloc.

Correct a panic message in blst_leaf_free.

Submitted by:	Doug Moore <dougm@rice.edu>
Reviewed by:	markj (an earlier version)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11819
2017-09-16 18:12:15 +00:00