Commit Graph

118446 Commits

Author SHA1 Message Date
Michael Tuexen
3ec509bcd3 Fix a warning.
MFC after:	1 week
2017-09-19 20:24:13 +00:00
Rick Macklem
ab118d04be Simplify nfsrpc_layoutcommit() args.
Simplify nfsrpc_layoutcommit() args. in preparation for the addition
of Flex File layout support, since it also uses a 0 length field.
2017-09-19 20:18:41 +00:00
Michael Tuexen
564a95f485 Avoid an overflow when computing the staleness.
This issue was found by running libfuzz on the userland stack.

MFC after:	1 week
2017-09-19 20:09:58 +00:00
Konstantin Belousov
3cabd93e26 Do not do torn writes to active LDTs.
Care must be taken when updating the active LDT, since parallel
threads might try to load a segment descriptor which is currently
updated. Since the results are undefined, this cannot be ignored by
claiming to be an application race.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D12413
2017-09-19 17:57:04 +00:00
Konstantin Belousov
9770475ce7 Do not vrele() covered vnode under the mp mutex.
If vrele() changes the hold count to zero, it needs to acquire the
vnode lock.

Sponsored by:	The FreeBSD Foundation
Discussed with:	avg
X-MFC with:	r323578
2017-09-19 16:49:45 +00:00
Konstantin Belousov
5bf949377e For unlinked files, do not msync(2) or sync on the vnode deactivation.
One consequence of the patch is that msyncing unlinked file mappings
no longer reduces the amount of the dirty memory in the system, but I
do not think that there are users of msync(2) that utilize it for such
side-effect.

Reported and tested by:	tjil
PR:	222356
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D12411
2017-09-19 16:46:37 +00:00
Michael Tuexen
ad608f06ed Remove a no longer used variable.
Reported by:	Felix Weinrank
MFC after:	1 week
2017-09-19 15:00:19 +00:00
Sepherosa Ziehau
d74831eea2 hyperv/hn: Incease max supported MTU
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12365
2017-09-19 06:46:00 +00:00
Sepherosa Ziehau
eb2fe04416 hyperv/hn: Fix MTU setting
- Add size of an ethernet header to the value configured to NVS.  This
  does not seem to have any effects if MTU is 1500, but fix hypervisor
  side's setting if MTU > 1500.
- Override the MTU setting according to the view from the hypervisor
  side.

MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12352
2017-09-19 06:38:57 +00:00
Sepherosa Ziehau
642ec226bb hyperv/hn: Apply VF's RSS setting
Since in Azure SYN and SYN|ACK go through the synthetic parts while the
rest of the same TCP flow goes through the VF, apply VF's RSS settings
to synthetic parts to have a consistent hash value/type for the same TCP
flow.

MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12333
2017-09-19 06:29:38 +00:00
John Baldwin
4fedff3ca1 Enable support for lookaside crypto operations by default.
This permits ccr(4) to be used with the default firmware configuration
file.

Discussed with:	np
Sponsored by:	Chelsio Communications
2017-09-18 23:50:34 +00:00
John Baldwin
4f45713ae2 Add UFS_LINK_MAX for the UFS-specific limit on link counts.
ino64 expanded nlink_t to 64 bits, but the on-disk format for UFS is still
limited to 16 bits.  This is a nop currently but will matter if LINK_MAX is
increased in the future.

Reviewed by:	kib
Sponsored by:	Chelsio Communications
2017-09-18 23:30:39 +00:00
Konstantin Belousov
5efe338f3d Fix handling of the segment registers on i386.
Suppose that userspace is executing with the non-standard segment
descriptors.  Then, until exception or interrupt handler executed
SET_KERNEL_SEGS, kernel is still executing with user %ds, %es and %fs.
If an interrupt occurs in this window, the interrupt handler is
executed unsafely, relying on usability of the usermode registers.  If
the interrupt results in the context switch on return, the
contamination of the kernel state spreads to the thread we switched
to.  As result, kernel data accesses might fault or, if only the base
is changed, completely messed up.

More, if the user segment was allocated in LDT, another thread might
mark the descriptor as invalid before doreti code tried to reload
them.  In this case kernel panics.

The issue exists for all exception entry points which use trap gate,
and thus do not automatically disable interrupts on entry, and for
lcall_handler.

Fix is two-fold: first, we need to disable interrupts for all kernel
entries, changing the IDT descriptor types from trap gate to interrupt
gate.  Interrupts are re-enabled not earlier than the kernel segments
are loaded into the segment registers.  Second, we only load the
segment registers from the trap frame when returning to usermode.  For
the later, all interrupt return paths must happen through the doreti
common code.

There is no way to disable interrupts on call gate, which is the
supposed mode of servicing for lcall $7,$0 syscalls.  Change the LDT
descriptor 0 into a code segment type and point it to the userspace
trampoline which redirects the syscall to int $0x80.

All the measures make the segment register handling similar to that of
amd64.  We do not apply amd64 optimizations of not reloading segment
registers on return from the syscall.

Reported by:	Maxime Villard <max@m00nbsd.net>
Tested by:	pho (the non-lcall part)
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D12402
2017-09-18 20:22:42 +00:00
Ilya Bakulin
ffa317051b Add kern.features flag for MMCCAM
kern.features.mmcam will be present and equal to 1 if the kernel has been
compiled with option MMCCAM.
This will help sdio-related userland tools to fail-fast if running on the kernel
without MMCCAM enabled.

Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D12386
2017-09-18 20:17:08 +00:00
Cy Schubert
70394ab378 Don't use an apostrophe in a possesive pronoun.
MFC after:	3 days
2017-09-18 19:16:41 +00:00
Ryan Libby
2e6418c0c5 linsysfs: quiet gcc -Wformat after r323692
Reviewed by:	cem
Sponsored by:	Dell EMC Isilon
2017-09-18 19:09:40 +00:00
Scott Long
1ab9094c90 Hide a normal probe warning message under bootverbose, similar to atkbdc
Sponsored by:	Netflix
2017-09-18 18:42:28 +00:00
Conrad Meyer
4d367f2501 linsysfs(5): Fix two unrelated issues
1. Swap the order of device_get_ivars with device_get_devclass and devclass
   name validation.  This bug was introduced in r323692.

2. Error check device_get_children and free the returned list.  This bug was
   introduced in the original linsysfs commit.

Reported by:	Oleg V. Nauman <oleg AT theweb.org.ua>, hselasky (1); hselasky (2)
Reviewed by:	hselasky
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12407
2017-09-18 17:14:13 +00:00
Toomas Soome
3f247eab7a loader: biosmem allocate heap just below 4GB
The current biosmem code is walking bios smap entries and looking for smap
entry just below 4GB line, if there is such entry, its base and size is set
for heap base and size. Instead of entry base, we should use last HEAP_MIN
(currently 64MB) bytes just below 4GB, to make maximum space for kernel and
modules.

The problem was revealed on ASUS B350M-A system board, an AMD Ryzen 3 1200 CPU

memory map:

SMAP type=01 base=0000000000000000 len=000000000009d400 attr=01
SMAP type=02 base=000000000009d400 len=0000000000002c00 attr=01
SMAP type=02 base=00000000000e0000 len=0000000000020000 attr=01
SMAP type=01 base=0000000000100000 len=0000000009c00000 attr=01
SMAP type=02 base=0000000009d00000 len=0000000000300000 attr=01
SMAP type=01 base=000000000a000000 len=00000000be69b000 attr=01
SMAP type=03 base=00000000c869b000 len=0000000000016000 attr=01
SMAP type=01 base=00000000c86b1000 len=00000000124e7000 attr=01
SMAP type=02 base=00000000dab98000 len=0000000000138000 attr=01
SMAP type=03 base=00000000dacd0000 len=0000000000008000 attr=01
SMAP type=01 base=00000000dacd8000 len=0000000000100000 attr=01
SMAP type=04 base=00000000dadd8000 len=00000000003b3000 attr=01
SMAP type=02 base=00000000db18b000 len=0000000000d42000 attr=01
SMAP type=01 base=00000000dbecd000 len=0000000002133000 attr=01
SMAP type=01 base=0000000100000000 len=000000011f380000 attr=01
SMAP type=02 base=00000000de000000 len=0000000002000000 attr=01
SMAP type=02 base=00000000f8000000 len=0000000004000000 attr=01
SMAP type=02 base=00000000fdf00000 len=0000000000100000 attr=01
SMAP type=02 base=00000000fea00000 len=0000000000010000 attr=01
SMAP type=02 base=00000000feb80000 len=0000000000082000 attr=01
SMAP type=02 base=00000000fec10000 len=0000000000001000 attr=01
SMAP type=02 base=00000000fec30000 len=0000000000001000 attr=01
SMAP type=02 base=00000000fed00000 len=0000000000001000 attr=01
SMAP type=02 base=00000000fed40000 len=0000000000005000 attr=01
SMAP type=02 base=00000000fed80000 len=0000000000010000 attr=01
SMAP type=02 base=00000000fedc2000 len=000000000000e000 attr=01
SMAP type=02 base=00000000fedd4000 len=0000000000002000 attr=01
SMAP type=02 base=00000000fee00000 len=0000000000100000 attr=01
SMAP type=02 base=00000000ff000000 len=0000000001000000 attr=01

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D12368
2017-09-18 15:17:01 +00:00
Hans Petter Selasky
e35e2f3ebe Bump the __FreeBSD_version after recent LinuxKPI changes.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-09-18 13:39:51 +00:00
Hans Petter Selasky
62bae5d421 The LinuxKPI atomics do not have acquire nor release semantics unless
specified. Fix code to use READ_ONCE() and WRITE_ONCE() where appropriate.

Suggested by:		kib @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-09-18 13:37:14 +00:00
Hans Petter Selasky
1f7c7e1bec Only wire pages in the LinuxKPI instead of holding and wiring them.
This prevents the page daemon from regularly scanning the held pages.

Suggested by:		kib @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-09-18 13:23:59 +00:00
Hans Petter Selasky
c05238a681 Add support for shared memory functions to the LinuxKPI.
Obtained from:		kmacy @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-09-18 13:17:23 +00:00
Conrad Meyer
2d347b2ef8 linsysfs(5): Add support for recent libdrm
Expose more information about PCI devices (and GPUs in particular) via
linsysfs to libdrm.

This allows unmodified modern 64-bit Linux libdrm to work, which allows
modern Linux Mesa to work.  The submitter reports that he tested the change
with an Ubuntu 16.04 chroot + amdgpu from graphics/drm-next-kmod.

PR:		222375
Submitted by:	Greg V <greg AT unrelenting.technology>
2017-09-17 23:40:16 +00:00
Ian Lepore
d18915fadb Give icee(4) a detach() method so it can be used as a module. Add a
module makefile for it.
2017-09-17 22:58:13 +00:00
Conrad Meyer
c50df68a08 MCA: Expand AMD Thresholding support to cover all banks
When it was added in r314636, AMD Thresholding was hardcoded to only
bank 4 (Northbridge) for some reason.  However, even on family 10h the
MCAx_MISC register Valid/Present bits determine whether thresholding is
supported on that bank.

Expand thresholding support to monitor all monitorable banks.  This
simplifies some of the logic and makes it more consistent with our Intel
CMCI support.

Reviewed by:	markj (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12321
2017-09-17 22:58:13 +00:00
Rick Macklem
ccf038250a Fix bogus FREAD with NFSV4OPEN_ACCESSREAD. No functional change.
The code in nfscl_doflayoutio() bogusly used FREAD instead of
NFSV4OPEN_ACCESSREAD. Since both happen to be defined as "1", this
worked and the patch doesn't result in a functional change.
Found by inspection during development of Flex File Layout support.

MFC after:	2 weeks
2017-09-17 22:18:01 +00:00
Justin Hibbits
d11e86549e Don't use a non-zero argument for __builtin_frame_address
__builtin_frame_address with a non-zero argument is unsafe and rejected by
newer gcc.  Since it doesn't seem to impact the stacktrace, don't bother
with gymnastics to unwind to a different frame for starting.

PR:		kern/220118
MFC after:	2 weeks
2017-09-17 20:07:20 +00:00
Justin Hibbits
6b7530563b Print the correct bitmask for the running Book-E CPU
All the Book-E world is no longer e500v{1,2}.  e500mc the 64-bit derivatives do
not use the DOZE/NAP bits with MSR[WE], instead using the `wait' instruction to
wait for interrupts, and SoC plane controls (via CCSR) for power management.

MFC after:	1 week
2017-09-17 19:40:17 +00:00
Mark Johnston
b999e9c813 Implement mmu_page_init for AIM platforms.
As of r323290 we cannot rely on the vm_page array being
zero-initialized.

Reported and tested by:	andreast
MFC after:	1 week
2017-09-17 15:40:12 +00:00
Michael Tuexen
72e23aba22 Fix an accounting bug and use sctp_timer_start to start a timer.
MFC after:	1 week
2017-09-17 09:27:27 +00:00
Michael Tuexen
fe40f49bb3 Remove code not used on any platform currently supported.
MFC after:	1 week
2017-09-16 21:26:06 +00:00
Alan Cox
ec371b57e8 Modify blst_leaf_alloc to take only the cursor argument.
Modify blst_leaf_alloc to find allocations that cross the boundary between
one leaf node and the next when those two leaves descend from the same
meta node.

Update the hint field for leaves so that it represents a bound on how
large an allocation can begin in that leaf, where it currently represents
a bound on how large an allocation can be found within the boundaries of
the leaf.

The first phase of blst_leaf_alloc currently shrinks sequences of
consecutive 1-bits in mask until each has been shrunken by count-1 bits,
so that any bits remaining show where an allocation can begin, or until
all the bits have disappeared, in which case the allocation fails. This
change amends that so that the high-order bit is copied, as if, when the
last block was free, it was followed by an endless stream of free
blocks. It also amends the early stopping condition, so that the shrinking
of 1-sequences stops early when there are none, or there is only one
unbounded one remaining.

The search for the first set bit is unchanged, and the code path
thereafter is mostly unchanged unless the first set bit is in a position
that makes some of those copied sign bits matter. In that case, we look
for a next leaf, and at what blocks it can provide, to see if a
cross-boundary allocation is possible.

The hint is updated on a successful allocation that clears the last bit,
but it not updated on a failed allocation that leaves the last bit
set. So, as long as the last block is free, the hint value for the leaf is
large. As long as the last block is free, and there's a next leaf, a large
allocation can begin here, perhaps. A stricter rule than this would mean
that allocations and frees in one leaf could require hint updates to the
preceding leaf, and this change seeks to leave the freeing code
unmodified.

Define BLIST_BMAP_MASK, and use it for bit masking in blst_leaf_free and
blist_leaf_fill, as well as in blst_leaf_alloc.

Correct a panic message in blst_leaf_free.

Submitted by:	Doug Moore <dougm@rice.edu>
Reviewed by:	markj (an earlier version)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11819
2017-09-16 18:12:15 +00:00
Ian Lepore
8dc710184a Add a missing header file to SRCS to fix out-of-kernel builds.
PR:		222354
Submitted by:	eugen@
Pointy hat:	ian@
2017-09-16 16:09:05 +00:00
Emmanuel Vadot
36dcd6a499 Allwinner usb phy: Rework resource allocation
The usbphy node for allwinner have two kind of resources, one for the
phy_ctrl and one per phy. Instead of blindy allocating resources, alloc
the phy_ctrl and pmu ones separately.
Also add a configuration struct for all different phy that hold the difference
between them (number of phys, unknow needed register write etc ...).

While here remove A83T code as upstream and FreeBSD dts don't have
nodes for USB.

This (plus 323640) re-enable OHCI on Pine64 on the bottom USB port.
The top USB port is routed to the OHCI0/EHCI0 which is by default in OTG mode.
While the phy code can handle the re-route to standard OHCI/EHCI we still need
a driver for musb to probe and configure it in host mode.

EHCI is still buggy on Pine64 (hang the board) so do not enable it for now.

Tested On:	Bananapi (A20), BananapiM2 (A31S), OrangePi One (H3) Pine64 (A64)
2017-09-16 15:58:20 +00:00
Emmanuel Vadot
489cba7d58 A64 CCUNG: Correct gate and reset for OHCI0/1
Reported by:	jmcneill
Pointy Hat:	manu
2017-09-16 15:50:31 +00:00
Emmanuel Vadot
082f09757c Allwinner: a10_gpio Fix panic on multiple lock
r323392 introduce gpio_pin_get/gpio_pin_set for a10_gpio driver.
When called via gpio method they must aquire the device lock while
when they are called via gpio_pin_configure the lock is already aquire.

Introduce a10_gpio_pin_{s,g}et_locked and call them in pin_gpio_configure
instead.

Tested On: BananaPi (A20)

Reported by:	Richard Puga richard@puga.net
2017-09-16 14:08:20 +00:00
Stephen Hurd
ab2e3f7958 Revert r323516 (iflib rollup)
This was really too big of a commit even if everything worked, but there
are multiple new issues introduced in the one huge commit, so it's not
worth keeping this until it's fixed.

I'll work on splitting this up into logical chunks and introduce them one
at a time over the next week or two.

Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
2017-09-16 02:41:38 +00:00
John Baldwin
91a65e2f6b Avoid reusing the wrong buffer for a DDP AIO request.
To optimize the case of ping-ponging between two buffers, the DDP code
caches the last two buffers used keeping the pages wired and page pods
stored in the NIC's RAM.  If a new aio_read() request uses one of the
same buffers, then the work of holding pages, etc. can be avoided.
However, the starting virtual address of an aio buffer was not saved,
only the page count, length, and initial page offset.  Thus, an
aio_read() request could match a different buffer in the address
space.  (Earlier during development vm_fault_hold_quick_pages() was
always called and the vm_page_t values were compared, but that was
eventually removed without being adequately replaced.)  Fix by storing
the starting virtual address and comparing that (along with other
fields) to determine if a buffer can be reused.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-09-15 22:40:57 +00:00
Scott Long
7eed4c1853 Fix line wrap issues.
Sponsored by:	Netflix
2017-09-15 20:58:52 +00:00
Warner Losh
851063e16a Allow multiple TRIMs to be done for nda
Don't call cam_iosched_trim_done or cam_iosched_submit_trim for nda
since its hardware can handle almost an arbitrary number of TRIMs and
we don't have to be careful to only ever do one.

Sponsored by: Netflix
2017-09-15 20:16:06 +00:00
Warner Losh
55c770b40a Update comments on what the CAM_IOSCHED_FLAG_TRIM_ACTIVE means.
It's intended only for those situations where the periph driver
ones to limit the number of trims active to one and only one.
Also update comments on associated functions.

Sponsored by: Netflix
2017-09-15 20:15:55 +00:00
Landon J. Fuller
011e84e0a7 Add MIPS32/64 Rev2 CP0 intctl register definitions.
Approved by:	adrian (mentor)
Differential Revision:	https://reviews.freebsd.org/D12300
2017-09-15 19:56:21 +00:00
Ilya Bakulin
02c474b481 Miscellaneous fixes and improvements to MMCCAM stack
* Demote the level of several debug messages to CAM_DEBUG_TRACE
 * Add detection for SDHC cards that can do 1.8V. No voltage switch sequence
   is issued yet;
 * Don't create a separate LUN for each SDIO function. We need just one to make
   pass(4) attach;
 * Remove obsolete mmc_sdio* files. SDIO functionality will be moved into the
   separate device that will manage a new sdio(4) bus;
 * Terminate probing if got no reply to CMD0;
 * Make bcm2835 SDHCI host controller driver compile with 'option MMCCAM'.

Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D12109
2017-09-15 19:47:44 +00:00
Konstantin Belousov
bba52ecadd Batch freeing of the pages in vm_object_page_remove() under the same
free queue mutex lock owning session, same as it was done for the
object termination in r323561.

Reported and tested by:	mjg
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-15 16:07:09 +00:00
Mark Johnston
e04223bf94 Include _bitset.h to get BITSET_DEFINE, used to define struct slabbits.
MFC after:	1 week
2017-09-15 14:59:35 +00:00
Andriy Gapon
7103ac8ad6 gmirror: treat ENXIO as disk disconnect, not media error
In theory, all data access errors mean that a member is out of sync
at most.  But they were treated as more serious errors to avoid the
situation where a flaky disk gets repeatedly disconnected, re-synchronized,
reconnected and then disconnected again.

ENXIO is a special error that means that the member disk disappeared,
so it should get the same handling as the GEOM orphaning event.
There is a better chance that when the disk is reconnected, it will be
a good member again.

When ENXIO happens on a read we use the exisiting G_MIRROR_BUMP_SYNCID
mechanism which means that the mirror's syncid is increased as soon
as there is a write to the mirror.  That's because no data has got out
of sync yet, but the problematic memeber is disconnected, so the future
write will make it stale.

When ENXIO happens on a write we use a new G_MIRROR_BUMP_SYNCID_NOW
mechanism which means that we update the mirror metadata as soon as
possible because the problematic memeber is already behind.

Reviewed by:	markj, imp
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D9463
2017-09-15 13:57:08 +00:00
Andrew Turner
ca289945b2 Add the ARMv8.3 ID register fields. These were found in the A-Profile
exploration tools documentation:
https://developer.arm.com/products/architecture/a-profile/exploration-tools

Sponsored by:	DARPA, AFRL
2017-09-15 12:57:34 +00:00
John Baldwin
2bd1e600e3 Fix some incorrect sysctl pointers for some error stats.
The bad_session, sglist_error, and process_error sysctl nodes were
returning the value of the pad_error node instead of the appropriate
error counters.

Sponsored by:	Chelsio Communications
2017-09-14 21:06:08 +00:00
Gleb Smirnoff
584ab65a75 Fix locking in soisconnected().
When a newborn socket moves from incomplete queue to complete
one, we need to obtain the listening socket lock after the child,
which is a wrong order.  The old code did that in potentially
endless loop of mtx_trylock().  The new one does only one attempt
of mtx_trylock(), and in case of failure references listening
socket, unlocks child and locks everything in right order.  In
case if listening socket shuts down during that, just bail out.

Reported & tested by:	Jason Eggleston <jeggleston llnw.com>
Reported & tested by:	Jason Wolfe <jason llnw.com>
2017-09-14 18:05:54 +00:00