Care must be taken when updating the active LDT, since parallel
threads might try to load a segment descriptor which is currently
updated. Since the results are undefined, this cannot be ignored by
claiming to be an application race.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D12413
If vrele() changes the hold count to zero, it needs to acquire the
vnode lock.
Sponsored by: The FreeBSD Foundation
Discussed with: avg
X-MFC with: r323578
One consequence of the patch is that msyncing unlinked file mappings
no longer reduces the amount of the dirty memory in the system, but I
do not think that there are users of msync(2) that utilize it for such
side-effect.
Reported and tested by: tjil
PR: 222356
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D12411
- Add size of an ethernet header to the value configured to NVS. This
does not seem to have any effects if MTU is 1500, but fix hypervisor
side's setting if MTU > 1500.
- Override the MTU setting according to the view from the hypervisor
side.
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D12352
Since in Azure SYN and SYN|ACK go through the synthetic parts while the
rest of the same TCP flow goes through the VF, apply VF's RSS settings
to synthetic parts to have a consistent hash value/type for the same TCP
flow.
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D12333
ino64 expanded nlink_t to 64 bits, but the on-disk format for UFS is still
limited to 16 bits. This is a nop currently but will matter if LINK_MAX is
increased in the future.
Reviewed by: kib
Sponsored by: Chelsio Communications
Suppose that userspace is executing with the non-standard segment
descriptors. Then, until exception or interrupt handler executed
SET_KERNEL_SEGS, kernel is still executing with user %ds, %es and %fs.
If an interrupt occurs in this window, the interrupt handler is
executed unsafely, relying on usability of the usermode registers. If
the interrupt results in the context switch on return, the
contamination of the kernel state spreads to the thread we switched
to. As result, kernel data accesses might fault or, if only the base
is changed, completely messed up.
More, if the user segment was allocated in LDT, another thread might
mark the descriptor as invalid before doreti code tried to reload
them. In this case kernel panics.
The issue exists for all exception entry points which use trap gate,
and thus do not automatically disable interrupts on entry, and for
lcall_handler.
Fix is two-fold: first, we need to disable interrupts for all kernel
entries, changing the IDT descriptor types from trap gate to interrupt
gate. Interrupts are re-enabled not earlier than the kernel segments
are loaded into the segment registers. Second, we only load the
segment registers from the trap frame when returning to usermode. For
the later, all interrupt return paths must happen through the doreti
common code.
There is no way to disable interrupts on call gate, which is the
supposed mode of servicing for lcall $7,$0 syscalls. Change the LDT
descriptor 0 into a code segment type and point it to the userspace
trampoline which redirects the syscall to int $0x80.
All the measures make the segment register handling similar to that of
amd64. We do not apply amd64 optimizations of not reloading segment
registers on return from the syscall.
Reported by: Maxime Villard <max@m00nbsd.net>
Tested by: pho (the non-lcall part)
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D12402
kern.features.mmcam will be present and equal to 1 if the kernel has been
compiled with option MMCCAM.
This will help sdio-related userland tools to fail-fast if running on the kernel
without MMCCAM enabled.
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D12386
1. Swap the order of device_get_ivars with device_get_devclass and devclass
name validation. This bug was introduced in r323692.
2. Error check device_get_children and free the returned list. This bug was
introduced in the original linsysfs commit.
Reported by: Oleg V. Nauman <oleg AT theweb.org.ua>, hselasky (1); hselasky (2)
Reviewed by: hselasky
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12407
Expose more information about PCI devices (and GPUs in particular) via
linsysfs to libdrm.
This allows unmodified modern 64-bit Linux libdrm to work, which allows
modern Linux Mesa to work. The submitter reports that he tested the change
with an Ubuntu 16.04 chroot + amdgpu from graphics/drm-next-kmod.
PR: 222375
Submitted by: Greg V <greg AT unrelenting.technology>
When it was added in r314636, AMD Thresholding was hardcoded to only
bank 4 (Northbridge) for some reason. However, even on family 10h the
MCAx_MISC register Valid/Present bits determine whether thresholding is
supported on that bank.
Expand thresholding support to monitor all monitorable banks. This
simplifies some of the logic and makes it more consistent with our Intel
CMCI support.
Reviewed by: markj (earlier version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12321
The code in nfscl_doflayoutio() bogusly used FREAD instead of
NFSV4OPEN_ACCESSREAD. Since both happen to be defined as "1", this
worked and the patch doesn't result in a functional change.
Found by inspection during development of Flex File Layout support.
MFC after: 2 weeks
__builtin_frame_address with a non-zero argument is unsafe and rejected by
newer gcc. Since it doesn't seem to impact the stacktrace, don't bother
with gymnastics to unwind to a different frame for starting.
PR: kern/220118
MFC after: 2 weeks
All the Book-E world is no longer e500v{1,2}. e500mc the 64-bit derivatives do
not use the DOZE/NAP bits with MSR[WE], instead using the `wait' instruction to
wait for interrupts, and SoC plane controls (via CCSR) for power management.
MFC after: 1 week
Modify blst_leaf_alloc to find allocations that cross the boundary between
one leaf node and the next when those two leaves descend from the same
meta node.
Update the hint field for leaves so that it represents a bound on how
large an allocation can begin in that leaf, where it currently represents
a bound on how large an allocation can be found within the boundaries of
the leaf.
The first phase of blst_leaf_alloc currently shrinks sequences of
consecutive 1-bits in mask until each has been shrunken by count-1 bits,
so that any bits remaining show where an allocation can begin, or until
all the bits have disappeared, in which case the allocation fails. This
change amends that so that the high-order bit is copied, as if, when the
last block was free, it was followed by an endless stream of free
blocks. It also amends the early stopping condition, so that the shrinking
of 1-sequences stops early when there are none, or there is only one
unbounded one remaining.
The search for the first set bit is unchanged, and the code path
thereafter is mostly unchanged unless the first set bit is in a position
that makes some of those copied sign bits matter. In that case, we look
for a next leaf, and at what blocks it can provide, to see if a
cross-boundary allocation is possible.
The hint is updated on a successful allocation that clears the last bit,
but it not updated on a failed allocation that leaves the last bit
set. So, as long as the last block is free, the hint value for the leaf is
large. As long as the last block is free, and there's a next leaf, a large
allocation can begin here, perhaps. A stricter rule than this would mean
that allocations and frees in one leaf could require hint updates to the
preceding leaf, and this change seeks to leave the freeing code
unmodified.
Define BLIST_BMAP_MASK, and use it for bit masking in blst_leaf_free and
blist_leaf_fill, as well as in blst_leaf_alloc.
Correct a panic message in blst_leaf_free.
Submitted by: Doug Moore <dougm@rice.edu>
Reviewed by: markj (an earlier version)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D11819
The usbphy node for allwinner have two kind of resources, one for the
phy_ctrl and one per phy. Instead of blindy allocating resources, alloc
the phy_ctrl and pmu ones separately.
Also add a configuration struct for all different phy that hold the difference
between them (number of phys, unknow needed register write etc ...).
While here remove A83T code as upstream and FreeBSD dts don't have
nodes for USB.
This (plus 323640) re-enable OHCI on Pine64 on the bottom USB port.
The top USB port is routed to the OHCI0/EHCI0 which is by default in OTG mode.
While the phy code can handle the re-route to standard OHCI/EHCI we still need
a driver for musb to probe and configure it in host mode.
EHCI is still buggy on Pine64 (hang the board) so do not enable it for now.
Tested On: Bananapi (A20), BananapiM2 (A31S), OrangePi One (H3) Pine64 (A64)
r323392 introduce gpio_pin_get/gpio_pin_set for a10_gpio driver.
When called via gpio method they must aquire the device lock while
when they are called via gpio_pin_configure the lock is already aquire.
Introduce a10_gpio_pin_{s,g}et_locked and call them in pin_gpio_configure
instead.
Tested On: BananaPi (A20)
Reported by: Richard Puga richard@puga.net
This was really too big of a commit even if everything worked, but there
are multiple new issues introduced in the one huge commit, so it's not
worth keeping this until it's fixed.
I'll work on splitting this up into logical chunks and introduce them one
at a time over the next week or two.
Approved by: sbruno (mentor)
Sponsored by: Limelight Networks
To optimize the case of ping-ponging between two buffers, the DDP code
caches the last two buffers used keeping the pages wired and page pods
stored in the NIC's RAM. If a new aio_read() request uses one of the
same buffers, then the work of holding pages, etc. can be avoided.
However, the starting virtual address of an aio buffer was not saved,
only the page count, length, and initial page offset. Thus, an
aio_read() request could match a different buffer in the address
space. (Earlier during development vm_fault_hold_quick_pages() was
always called and the vm_page_t values were compared, but that was
eventually removed without being adequately replaced.) Fix by storing
the starting virtual address and comparing that (along with other
fields) to determine if a buffer can be reused.
MFC after: 3 days
Sponsored by: Chelsio Communications
Don't call cam_iosched_trim_done or cam_iosched_submit_trim for nda
since its hardware can handle almost an arbitrary number of TRIMs and
we don't have to be careful to only ever do one.
Sponsored by: Netflix
It's intended only for those situations where the periph driver
ones to limit the number of trims active to one and only one.
Also update comments on associated functions.
Sponsored by: Netflix
* Demote the level of several debug messages to CAM_DEBUG_TRACE
* Add detection for SDHC cards that can do 1.8V. No voltage switch sequence
is issued yet;
* Don't create a separate LUN for each SDIO function. We need just one to make
pass(4) attach;
* Remove obsolete mmc_sdio* files. SDIO functionality will be moved into the
separate device that will manage a new sdio(4) bus;
* Terminate probing if got no reply to CMD0;
* Make bcm2835 SDHCI host controller driver compile with 'option MMCCAM'.
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D12109
free queue mutex lock owning session, same as it was done for the
object termination in r323561.
Reported and tested by: mjg
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
In theory, all data access errors mean that a member is out of sync
at most. But they were treated as more serious errors to avoid the
situation where a flaky disk gets repeatedly disconnected, re-synchronized,
reconnected and then disconnected again.
ENXIO is a special error that means that the member disk disappeared,
so it should get the same handling as the GEOM orphaning event.
There is a better chance that when the disk is reconnected, it will be
a good member again.
When ENXIO happens on a read we use the exisiting G_MIRROR_BUMP_SYNCID
mechanism which means that the mirror's syncid is increased as soon
as there is a write to the mirror. That's because no data has got out
of sync yet, but the problematic memeber is disconnected, so the future
write will make it stale.
When ENXIO happens on a write we use a new G_MIRROR_BUMP_SYNCID_NOW
mechanism which means that we update the mirror metadata as soon as
possible because the problematic memeber is already behind.
Reviewed by: markj, imp
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D9463
The bad_session, sglist_error, and process_error sysctl nodes were
returning the value of the pad_error node instead of the appropriate
error counters.
Sponsored by: Chelsio Communications
When a newborn socket moves from incomplete queue to complete
one, we need to obtain the listening socket lock after the child,
which is a wrong order. The old code did that in potentially
endless loop of mtx_trylock(). The new one does only one attempt
of mtx_trylock(), and in case of failure references listening
socket, unlocks child and locks everything in right order. In
case if listening socket shuts down during that, just bail out.
Reported & tested by: Jason Eggleston <jeggleston llnw.com>
Reported & tested by: Jason Wolfe <jason llnw.com>