With newer AMD GPUs (>=Navi,Renoir) there is FPU context usage in the
amdgpu driver.
The `kernel_fpu_begin/end` implementations in drm did not even allow nested
begin-end blocks.
Submitted by: Greg V
Reviewed By: manu, hselasky
Differential Revision: https://reviews.freebsd.org/D28061
A driver can register a shrinker that will be called when the kernel
wants to free some memory.
Add support for that in linuxkpi and call the registered shrinkers
when the lowmem event is triggered.
Reviewed by: bz
Differential Revision: https://reviews.freebsd.org/D27728
-pci_get_class : This function search for a matching pci device based on
the class/subclass and returns a newly created pci_dev.
- pci_{save,restore}_state : This is analogous to ours with the same name
- pci_is_root_bus : Return true if this is the root bus
- pci_get_domain_bus_and_slot : This function search for a matching pci
device based on domain, bus and slot/function concat into a single
unsigned int (devfn) and returns a newly created pci_dev
- pci_bus_{read,write}_config* : Read/Write to the config space.
While here add some helper function to alloc and fill the pci_dev struct.
Reviewed by: hselasky, bz (older version)
Differential Revision: https://reviews.freebsd.org/D27550
pci_find_class_from help finding one or multiple device matching
a class and subclass.
If the from argument is not null we will first loop in the device list
until we find the matching device and only then start to check if the
class/subclass matches.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D27549
Pass gfx_state to efi_find_framebuffer(), so we can pick between
GOP and UGA in efi_find_framebuffer(), also we can then
set up struct gen_fb in gfx_state from efifb and isolate efi fb data
processing into framebuffer.c.
This change does allow us to clean up efi_cons_init() and reduce
BS->LocateProtocol() calls.
A little downside is that we now need to translate gen_fb back to
efifb in bootinfo.c (for passing to kernel), and we need to add few
-I options to CFLAGS.
One possible way the recursion can happen is during fork: suppose
that fork is called from early code that did not triggered
jemalloc(3) initialization yet. Then we lock thr_malloc lock, and
call malloc_prefork() that might require initialization of jemalloc
pthread_mutexes, calling into libthr malloc. It is safe to allow
recursion for this occurence.
PR: 252579
Reported by: Vasily Postnicov <shamaz.mazum@gmail.com>
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Even if sigfastblock block is non-zero, non-blockable signals must be
checked on ast and delivered now. This also affects debugger ability
to attach, because issignal() also calls ptracestop() if there is
a pending stop for debugee.
Instead of checking for sigfastblock, and either setting PENDING flag
for usermode or doing signal delivery loop, always do the loop after
checking, and then handle PENDING bit. issignal() already does the right
thing for fast-blocked case, allowing only STOPs and SIGKILL delivery to
happen.
Reported by: Vasily Postnicov <shamaz.mazum@gmail.com>, markj
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28089
User pending bit should not be set if kernel did not noted a pending signal.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28089
Right now the routine leaves the current CPU in the map, later tripping
on an assert when filling in the scoreboard: panic: IPI scoreboard is
zero, initiator 1 target 1
Instead pre-check if all CPUs are present in the map and remember that
outcome for later.
Fixes: 7eaea04a5bb1dc86 ("amd64: compare TLB shootdown target to all_cpus")
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D28111
Previously, we would accept any kind of LIO_* opcode, including ones
that were intended for in-kernel use only like LIO_SYNC (which is not
defined in userland). The situation became more serious with
022ca2fc7fe08d51f33a1d23a9be49e6d132914e. After that revision, setting
aio_lio_opcode to LIO_WRITEV or LIO_READV would trigger an assertion.
Note that POSIX does not specify what should happen if aio_lio_opcode is
invalid.
MFC-with: 022ca2fc7fe08d51f33a1d23a9be49e6d132914e
Reviewed by: jhb, tmunro, 0mp
Differential Revision: <https://reviews.freebsd.org/D28078
The in_cksum tests originally tried to simulate a BE environment by
swapping the byte order of the input. But that's overcomplicated, and
didn't actually work on real BE hardware. The correct testing strategy
is just to test on the native endianness, and run the tests in both BE
and LE environments.
Submitted by: Renato Riolino <renato.riolino@eldorado.org.br>
Reviewed By: asomers
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D23193
On amd64, the pmap code passes all_cpus to
smp_targeted_tlb_shootdown() when unmapping from the
kernel pmap. This function has an optimized path to send IPIs
to all but itself, which it intends to do when the target
is all cpus. However, we need to compare the target cpu mask
with all_cpus, rather than using CPU_ISFULLSET(). Comparing with
CPU_ISFULLSET() will only work when we have MAXCPU cpus active in
the system, otherwise, we'll be sending repeated IPIs, rather than
a single IPI to all CPUs but ourself.
Fixing this should reduce the time spent in native_lapic_ipi_wait()
as we will be sending ipis in parallel, rather than one-by-one.
This is confirmed by dtrace.
Reviewed by: alc, jhb, kib, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D28102
UFS uses a new "mntfs" pseudo file system which provides private
device vnodes for a file system to safely access its disk device.
The original device vnode is saved in um_odevvp to hold the exclusive
lock on the device so that any attempts to open it for writing will
fail. But it is otherwise unused and has its BO_NOBUFS flag set to
enforce that file systems using mntfs vnodes do not accidentally
use the original devfs vnode. When the file system is unmounted,
um_odevvp is no longer needed and is released.
The lock order reversal happens because device vnodes must be locked
before UFS vnodes. During unmount, the root directory vnode lock
is held. When when calling vrele() on um_odevvp, vrele() attempts to
exclusive lock um_odevvp causing the lock order reversal. The problem
is eliminated by doing a non-blocking exclusive lock on um_odevvp
which will always succeed since there are no users of um_odevvp.
With um_odevvp locked, it can be released using vput which does not
attempt to do a blocking exclusive lock request and thus avoids the
lock order reversal.
Sponsored by: Netflix
For rate-based resources that support throttling (e.g.
readiops/writeips), this fixes a divide-by-zero panic when rctl(8)
passes 0 as the throttle value. For these resources, treat
zero-throttle requests as requests to suspend forward progress as long
as possible using the duration specified in
kern.racct.rctl.throttle_max.
PR: 251803
Reported by: chris@cretaforce.gr
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27858
Relevant inet/inet6 code has the control over deciding what
the RIB lookup function currently is. With that in mind,
explicitly set it to the current value (rn_match) in the
datapath lookups. This avoids cost on indirect call.
Differential Revision: https://reviews.freebsd.org/D28066
After old vmspace is destroyed during execve(2), but before the new space
is fully constructed, an error during image activation cannot be returned
because there is no executing program to receive it.
In the relatively common case of failure to map stack, print some hints
on the control terminal. Note that user has enough knobs to cause stack
mapping error, and this is the most common reason for execve(2) aborting
the process.
Requested by: jhb
Reviewed by: emaste, jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28050
It is checked in vm_map_insert() and vm_map_protect() that PROT_WRITE |
PROT_EXEC are never specified together, if vm_map has MAP_WX flag set.
FreeBSD control flag allows specific binary to request WX exempt, and
there are per ABI boolean sysctls kern.elf{32,64}.allow_wx to enable/
disable globally.
Reviewed by: emaste, jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28050
When retrieving the list of group members we cannot simply use
ifa_lookup(), because it expects the interface to have an IP (v4 or v6)
address. This means that interfaces with no address are not found.
This presents as interfacing being alternately marked as skip and not
whenever the rules are re-loaded.
Happily we only need to fix ifa_grouplookup(). Teach it to also accept
AF_LINK (i.e. interface) node_hosts.
PR: 250994
MFC after: 3 days
Without wrapping, rtld services and malloc(3) are not guaranteed
to operate correctly in the forked child.
Reviewed by: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28088
Otherwise parallel pmap_allocpte_alloc() for nearby va might also fail
allocating page table page and free the page under us. The end result is
that we could dereference unmapped pte when doing cleanup after sleep.
Instead, on allocation failure, first free everything, only then we can
drop pmap mutex and sleep safely, right before returning to caller.
Split inner non-sleepable part of the pmap_allocpte_alloc() into a new
helper pmap_allocpte_nosleep().
Reviewed by: markj
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27956
The function performs actual allocation of pte, as opposed to
pmap_allocpte() that uses existing free pte if pt page is already
there. This also moves function out of namespace similar to a language
reserved.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27956
pmap_pdpe() might return NULL, check for it.
Reviewed by: markj
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27956
the following warning printed at boot time:
rcorder: requirement `ipfs' in file `/etc/rc.d/netif' has no providers.
Close that using BEFORE rather than REQUIRE for writing down
dependencies of optional components.
Currently default behaviour is to keep only 1 packet per unresolved entry.
Ability to queue more than one packet was added 10 years ago, in r215207,
though the default value was kep intact.
Things have changed since that time. Systems tend to initiate multiple
connections at once for a variety of reasons.
For example, recent kern/252278 bug report describe happy-eyeball DNS
behaviour sending multiple requests to the DNS server.
The primary driver for upper value for the queue length determination is
memory consumption. Remote actors should not be able to easily exhaust
local memory by sending packets to unresolved arp/ND entries.
For now, bump value to 16 packets, to match Darwin implementation.
The proper approach would be to switch the limit to calculate memory
consumption instead of packet count and limit based on memory.
We should MFC this with a variation of D22447.
Reviewers: #manpages, #network, bz, emaste
Reviewed By: emaste, gbe(doc), jilles(doc)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D28068
The cgem(4) driver was updated to support 64-bit bus addressing in
facdd1cd2045. However, the committed version determines this in an
un-idiomatic way. Change the compile-time conditional to check
BUS_SPACE_MAXADDR, rather than comparing int and pointer sizes.
Reported by: jrtc27
- Implement a dtrace_getnanouptime(), matching the existing
dtrace_getnanotime(), to avoid DTrace calling out to a potentially
instrumentable function.
(These should probably both be under KDTRACE_HOOKS. Also, it's not clear
to me that they are correct implementations for the DTrace thread time
functions they are used in .. fixes for another commit.)
- Don't allow FBT to instrument functions involved in EL1 exception handling
that are involved in FBT trap processing: handle_el1h_sync() and
do_el1h_sync().
- Don't allow FBT to instrument DDB and KDB functions, as that makes it
rather harder to debug FBT problems.
Prior to these changes, use of FBT on FreeBSD/arm64 rapidly led to kernel
panics due to recursion in DTrace.
Reliable FBT on FreeBSD/arm64 is reliant on another change from @andrew to
have the aarch64 instrumentor more carefully check that instructions it
replaces are against the stack pointer, which can otherwise lead to memory
corruption. That change remains under review.
MFC after: 2 weeks
Reviewed by: andrew, kp, markj (earlier version), jrtc27 (earlier version)
Differential revision: https://reviews.freebsd.org/D27766
Use an interface compatible with the Linux one so that the user-space
libraries already using the Linux interface can be used without much
modifications.
This allows an open privcmd instance to limit against which domains it
can act upon.
Sponsored by: Citrix Systems R&D
Use an interface compatible with the Linux one so that the user-space
libraries already using the Linux interface can be used without much
modifications.
This allows user-space to make use of the dm_op family of hypercalls,
which are used by device models.
Sponsored by: Citrix Systems R&D
The interface is mostly the same as the Linux ioctl, so that we don't
need to modify the user-space libraries that make use of it.
The ioctl is just a proxy for the XENMEM_acquire_resource hypercall.
Sponsored by: Citrix Systems R&D
In e86bddea9fe62d5093a1942cf21950b3c5ca62e5 sys/netpfil/pf/pf.h grew a
declaration of pf_get_ruleset_number. Now delete the old declaration
from sys/net/pfvar.h.
Reviewed by: kp
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D28081