When a command is finished running, we must transition it from INQUEUE
to busy state. We were failing to do that, so we hit a panic when the
commands were freed. This only affects mpr, mps already did simmilar
things. Now both the polling and interrupt paths properly set BUSY as
appropriate.
Summary:
Running a 32-bit process on a 64-bit POWER CPU may still use all 64-bits
in calculations, while ignoring the upper 32 bits for addressing
storage. It so happens that some processes end up with r1 (SP) having
bit 31 set in some cases (33-bit address). Writing out to this 33-bit
address obviosly fails. Since the CPU ignores the upper bits, we should
as well.
sendsig() and cpu_fetch_syscall_args() appear to be the only functions
that actually rely on userspace register values for copy in/out, and
cpu_fetch_syscall_args() doesn't seem to be bitten in practice yet.
Reviewed By: luporl
Differential Revision: https://reviews.freebsd.org/D20896
register values" of the architecture manual, an isb instruction should be
executed after updating ttbr0_el1 and before invalidating the TLB. The
lack of this instruction in pmap_activate() appears to be the reason why
andrew@ and I have observed an unexpected TLB entry for an invalid PTE on
entry to pmap_enter_quick_locked(). Thus, we should now be able to revert
the workaround committed in r349442.
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20904
via an ioctl interface. Rules can be added or removed and stats and
counters can be zeroed out. As the ipfilter interprets these
instructions or operations they are stored in an integer called
addrem (add/remove). 1 is add, 2 is remove, and 3 is clear stats and
counters. Much of this is not documented. This commit documents these
operations by replacing simple integers with a self documenting
enum along with a few basic comments.
MFC after: 1 week
This device cannot cross a 4GB boundary with DMA. Removing the
boundary in r346386 resulted in low frequency memory corruption on
machines with isci(4) controllers.
Submitted by: gallatin@
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20910
well as sets in some of the groundwork for committing BBR. The
hpts system is updated as well as some other needed utilities
for the entrance of BBR. This is actually part 1 of 3 more
needed commits which will finally complete with BBRv1 being
added as a new tcp stack.
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D20834
Thus, when using proccontrol(1) to disable implicit application of
PROT_MAX within a process, child processes will inherit this setting.
Discussed with: kib
MFC with: r349609
Sponsored by: The FreeBSD Foundation
Instead of including stdint.h for uintptr_t, include sys/_types.h and use
__types for everything that isn't a native C keyword type.
Remove the #include of cdefs.h. It appears after the include of armreg.h
which has a precondition of cdefs.h being included before it, so everyone
including sysarch.h is already including cdefs.h. (When armv5 support
goes away, there will be no need include armreg.h here either.)
Unfortunately, the unprefixed struct member names "addr" and "len" cannot
be changed, because 3rd-party software is relying on them (libcompiler_rt
is one known consumer).
On POWER9/pseries, QEMU passes several regions of memory,
instead of a single region containing all memory, as the
code was expecting.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D20857
After r349240 kern_mprotect returns EINVAL for unsupported bits in the prot
argument. Linux rtld uses PROT_GROWSDOWN and PROT_GROWS_UP when marking the
stack executable. Mask these bits like kern_mprotect used to do. For other
unsupported bits EINVAL is returned like Linux does.
Reviewed by: trasz, brooks
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20864
sv_maxuser specifies the maximum addressable space for user space. Presently
this is all 64-bits worth, which is impossible for a 32-bit process.
This bug has existed since the initial import of powerpc64 in 2010.
MFC after: 2 weeks
This is more consistent with the rest of the function and lets us
unindent most of the function.
Reviewed by: kib
MFC after: 1 month
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D20897
of pmap_load_clear(), in places where we don't care about the page table
entry's prior contents.
Eliminate an unnecessary pmap_load() from pmap_remove_all(). Instead, use
the value returned by the pmap_load_clear() on the very next line. (In the
future, when we support "hardware dirty bit management", using the value
from the pmap_load() rather than the pmap_load_clear() would have actually
been an error because the dirty bit could potentially change between the
pmap_load() and the pmap_load_clear().)
A KASSERT() in pmap_enter(), which originated in the amd64 pmap, was meant
to check the value returned by the pmap_load_clear() on the previous line.
However, we were ignoring the value returned by the pmap_load_clear(), and
so the KASSERT() was not serving its intended purpose. Use the value
returned by the pmap_load_clear() in the KASSERT().
MFC after: 2 weeks
Add VMBus protocol version 4.0. and 5.0 to support Windows 10 and newer HyperV hosts.
For VMBus 4.0 and newer HyperV, the netvsc gpadl teardown must be done after vmbus close.
Submitted by: whu
MFC after: 2 weeks
Sponsored by: Microsoft
r348164 added code to iicbus_request_bus/iicbus_release_bus to automatically
call device_busy()/device_unbusy() as part of aquiring exclusive use of the
bus (so modules can't be unloaded while the bus is exclusively owned and/or
IO is in progress). That broke the ability to do i2c IO from a slave device
probe method, because the slave isn't attached yet, so calling device_busy()
triggers a sanity-check panic for trying to busy a non-attached device.
Now we check whether the device status is < DS_ATTACHING, and if so we busy
the iicbus rather than the slave device. I think this leaves a small window
where a module could be unloaded while probing is in progress. But I think
that's true of all devices, and probably should be fixed by introducing a
DS_PROBING state for devices, and handling that at various points in the
newbus code.
Eliminate the TIMEDOUT state. This state really conveyed two different
concepts: I timed out during recovery (and my command got put on the
recovery queue), and I timed out diring discovery (which doesn't).
Separate those two concepts into two flags. Use the TIMEDOUT flag to
fail requests as timed out. Use the on queue flag to remove them from
the queue.
In mps_intr_locked for MPI2_RPY_DESCRIPT_FLAGS_ADDRESS_REPLY message
type, when completing commands, ignore the ones that are not in state
INQUEUE. They were already completed as part of the recovery
process. When we complete them twice, we wind up with entries on the
free queue that are marked as busy, trigging asserts.
Reviewed by: scottl (earlier version, just for mpr)
Differential Revision: https://reviews.freebsd.org/D20785
The hold_count and wire_count fields of struct vm_page are separate
reference counters with similar semantics. The remaining essential
differences are that holds are not counted as a reference with respect
to LRU, and holds have an implicit free-on-last unhold semantic whereas
vm_page_unwire() callers must explicitly determine whether to free the
page once the last reference to the page is released.
This change removes the KPIs which directly manipulate hold_count.
Functions such as vm_fault_quick_hold_pages() now return wired pages
instead. Since r328977 the overhead of maintaining LRU for wired pages
is lower, and in many cases vm_fault_quick_hold_pages() callers would
swap holds for wirings on the returned pages anyway, so with this change
we remove a number of page lock acquisitions.
No functional change is intended. __FreeBSD_version is bumped.
Reviewed by: alc, kib
Discussed with: jeff
Discussed with: jhb, np (cxgbe)
Tested by: pho (previous version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19247
table VCTRL registers.
Unconditionally program the MSI-X vector control Mask field for MSI-X
table entries without regarud for Mask's previous value. Some devices
return all zeros on reads of the VCTRL registers, which would cause us
to skip disabling interrupts. This fixes the Samsung SM961/PM961 SSDs
which are return zero starting from offset 0x3084 within the memory
region specified by BAR0, even when they are active MSI-X vectors.
The Illumos kernel writes these unconditionally to 0 or 1. However,
section 6.8.2.9 of the PCI Local Bus 3.0 spec (dated Feb 3, 2004)
states for bits 31::01:
After reset, the state of these bits must be 0. However, for
potential future use, software must preserve the value of
these reserved bits when modifying the value of other Vector
Control bits. If software modifies the value of these reserved
bits, the result is undefined."
so we always set or clear the Mask bit, but otherwise preserves the
old value.
PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211713
Reviewed By: imp, jhb
Submitted by: Ka Ho Ng
MFC After: 1 week
Differential Revision: https://reviews.freebsd.org/D20873
While at it fix an invalid memory access issue when attaching external
USB HUBs, which are not mapped by ACPI, due to missing status check
when calling AcpiGetObjectInfo() from acpi_usb_hub_port_probe_cb().
Sponsored by: Mellanox Technologies
Pages with PG_PCPU_CACHE set cannot have been allocated from a
reservation, so as an optimization, skip the call to
vm_reserv_free_page() in this case. Otherwise, the access of
the corresponding reservation structure often results in a cache
miss.
Reviewed by: alc, kib
Discussed with: jeff
MFC after: 2 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20859
Some workloads benefit from having a per-CPU cache for
VM_FREEPOOL_DIRECT pages.
Reviewed by: dougm, kib
Discussed with: alc, jeff
MFC after: 2 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20858
When the system has no graphical console, such as bhyve in common
configurations, ignore kern.vt.splash_cpu, instead of panicking
on INVARIANTS kernels.
Reviewed by: cem dumbbell
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20877
Although PPC SLB code doesn't handle allocation failures,
which are rare, in most places it asserts that the pointer
returned by uma_zalloc() is not NULL, making it easier to
identify the failure and avoiding an invalid pointer dereference.
This change simply adds a missing KASSERT in SLB code.
These fields will not be equal only in case if bigalloc filesystem feature is turned on.
This feature is not supported for now.
Reported by: Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE
Reported as: FS-27-EXT2-12: Denial of Service in openat-0 (vm_fault_hold/ext2_clusteracct)
MFC after: 2 weeks
The ext2fs fragments are different from ufs fragments.
In case of ext2fs the fragment should be equal or more then block size.
The values more than block size are used only in case of bigalloc feature, which is does not supported for now.
Reported by: Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE
Reported as: FS-22-EXT2-9: Denial of service in ftruncate-0 (ext2_balloc)
MFC after: 2 weeks
Reported by: Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE
Reported as: FS-11-EXT2-6: Denial Of Service in write-1 (ext2_balloc)
MFC after: 2 weeks
comment. Rewrite that comment to improve its clarity.
Reported by: cem
Reviewed by: alc, cem
Approved by: kib, markj (mentors, implicit)
Differential Revision: https://reviews.freebsd.org/D20871
1. Use _pmap_alloc_l3() instead of pmap_alloc_l3() in order to handle the
possibility that a superpage mapping for "va" was created while we slept.
(This is derived from the amd64 version.)
2. Eliminate code for allocating kernel page table pages. Kernel page
table pages are preallocated by pmap_growkernel().
3. Eliminate duplicated unlock operations when KERN_RESOURCE_SHORTAGE is
returned.
MFC after: 2 weeks
after the one where the possible block allocation begins, and allocate
a larger number of blocks than the current limit. This does not affect
the limit on minimum allocation size, which still cannot exceed
BLIST_MAX_ALLOC.
Use this change to modify swp_pager_getswapspace and its callers, so
that they can allocate more than BLIST_MAX_ALLOC blocks if they are
available.
Tested by: pho
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D20579
restructure cache_handle_range so that all of the data cache operations are
performed before any instruction cache operations. Then, we only need one
barrier between the data and instruction cache operations and one barrier
after the instruction cache operations.
On an Amazon EC2 a1.2xlarge instance, this simple change reduces the time
for a "make -j8 buildworld" by 9%.
Reviewed by: andrew
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20848
swap_pager_swapoff_object and swp_pager_force_pagein so that they can
page in multiple pages at a time to a swap device, rather than doing
one I/O operation for each page.
Tested by: pho
Submitted by: ota_j.email.ne.jp (Yoshihiro Ota)
Reviewed by: alc, markj, kib
Approved by: kib, markj (mentors)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20635
vm_page_dirty() when, in fact, we are write protecting the page and the L3
entry has PTE_D set. However, pmap_protect() was always calling
vm_page_dirty() when an L2 entry has PTE_D set. Handle L2 entries the
same as L3 entries so that we won't perform unnecessary calls to
vm_page_dirty().
Simplify the loop calling vm_page_dirty() on L2 entries.