This ratelimits the "unsupported getsockopt level 6 optname 11"
warnings that happen all the time when watching Netflix.
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D32454
I got a compilation failure in virtio-gpu without this change.
Reviewed By: #linuxkpi, manu, bz, hselasky
Differential Revision: https://reviews.freebsd.org/D32366
This is compatible with Linux, and some driver error paths depend on it.
Reviewed by: bz, emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32337
One of the three uses is already guarded; this guards the remaining ones
to support architectures like riscv that do not provide write-combining,
and is needed to build drm-kmod on riscv.
Reviewed by: hselasky, manu
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31999
The code would incorrectly use curthread instead of the target proc to
resolve vnodes.
Fixes: 8d03b99b9d ("fd: move vnodes out of filedesc into a dedicated structure")
PR: 258729
Noted by: Damjan Jovanovic <damjan.jov@gmail.com>
It is removed from Linux since 4.11.
In FreeBSD it results in several #ifdefs in drm-kmod.
Reviewed by: emaste, hselasky, manu
Differential revision: https://reviews.freebsd.org/D32169
For now, disable backlight if brightness level is set to 0.
In the future we may implement separate knob in backlight(8).
Required by drm-kmod v5.6
Reviewed by: hselasky, manu
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D32165
from other vm_objects. This workarounds "Page already inserted" panic
in vm_page_insert routine triggered on attempt to mmap file created
with shmem_file_setup call. After introduction of "GTT mmap
interface v4" a.k.a. MMAP_OFFSET, vm_objects allocated by these calls
may try to own intersected sets of pages that leads to the assertion.
Reviewed by: kib
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D32090
Although drm-kmod contains better implementation which is able to
allocate real entries on pseudofs, this feature has never been used.
Starting from drm-kmod v5.6 old implementation began to leak entries
on each drm device close(). Now just drop pseudofs support instead of
fixing it in drm-kmod and provide stub in base.
Reviewed by: hselasky, manu
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D32069
from GEM and TTM page fault handlers and move it in to base system. This
code is tightly integrated with LKPI mmap support to belong to drm-kmod.
As this routine requires associated vm_object to be locked, it got
additional _locked suffix.
Reviewed by: hselasky, markj
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D32068
This fixes following warnings when shrinkers are invoked first time:
uma_zalloc_debug: zone "lkpicurr" with the following non-sleepable
locks held: exclusive sleep mutex lkpi-shrinker (lkpi-shrinker)
uma_zalloc_debug: zone "lkpimm" with the following non-sleepable locks
held: exclusive sleep mutex lkpi-shrinker (lkpi-shrinker)
Reviewed by: hselasky, manu
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D32066
except linux/pci.h to avoid conflicts with Linux version.
This allows to #define resource in drm-kmod globally and strip some #ifdef-s
Reviewed by: hselasky, manu
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D31673
get_file_rcu() grabs a file if the file->f_count is not zero.
Required by drm-kmod 5.6
Reviewed by: hselasky, manu (previous version)
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D31672
Add a missing "static" for non-{i386,amd64,arm64} which was missed in
c39eefe715. This should ifx the builds.
Sponsored by: The FreeBSD Foundation
MFC after: 7 days
X-MFC with: c39eefe715
Coherent is lower 32bit only by default in Linux and our only default
dma mask is 64bit currently which violates expectations unless
dma_set_coherent_mask() was called explicitly with a different mask.
Implement coherent by creating a second tag, and storing the tags in the
objects and use the tag from the object wherever possible.
This currently does not update the scatterlist or pool (both could be
converted but S/G cannot be MFCed as easily).
There is a 2nd change embedded in the updated logic of
linux_dma_alloc_coherent() to always zero the allocation as
otherwise some drivers get cranky on uninialised garbage.
Sponsored by: The FreeBSD Foundation
MFC after: 7 days
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D32164
In some places we are using "mask" and others "dma_mask" for the
same thing. Harmonize the various places to "dma_mask" as used in
linux_pci.c. For the declaration remove the argument names to
avoid the entire problem.
This is in preparation for an upcoming change.
No functional changes intended.
Sponsored by: The FreeBSD Foundation
MFC after: 5 days
As reported by multiple people testing iwlwifi, device_release_driver()
can lead to a panic on secondary errors (usually during attach).
Disable device_release_driver() for the short-term to prevent the panic
but leave it in place so it can be re-worked and fixed properly for
the long-term more easily.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
According to https://github.com/NuxiNL/cloudlibc:
CloudABI is no longer being maintained. It was an awesome experiment,
but it never got enough traction to be sustainable.
There is no reason to keep it in FreeBSD.
Approved by: ed (private mail)
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D31923
freebsd32_sendmsg() and freebsd32_recvmsg() both copyin the message
header twice, once directly and once in freebsd32_copyinmsghdr(). The
iovec length from the former is used when copying in msg_iov, but the
rest of the kernel uses the iovec length from the latter. When
kern_sendit() and kern_recvit() iterate over the iovec to compute the
residual for I/O, they can therefore end up walking past the end of the
copied in iovec, either resulting in a system call error, userspace
memory corruption from uiomove() with invalid iovecs, or a kernel page
fault if the copied-in iovec is followed by an unmapped KVA region.
Reported by: syzbot+7cc64cd0c49605acd421@syzkaller.appspotmail.com
Reviewed by: kib, emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32010
It allows to override kern.elf{32,64}.allow_wx on per-process basis.
In particular, it makes it possible to run binaries without PT_GNU_STACK
and without elfctl note while allow_wx = 0.
Reviewed by: brooks, emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31779
Reimplement bdf0f24bb1 by checking for the caller' ABI in
the implementation of PT_GET_SC_ARGS, and copying out everything if
it is Linuxolator.
Also fix a minor information leak: if PT_GET_SC_ARGS_ALL is done on the
thread reused after other process, it allows to read some number of that
thread last syscall arguments. Clear td_sa.args in thread_alloc().
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D31968
GCC only added support for __has_builtin in GCC 10. However, all
supported versions of GCC and clang include these builtins so just use
them unconditionally.
This fixes the build with GCC 9.
Reviewed by: manu, hselasky, imp
Differential Revision: https://reviews.freebsd.org/D31942
This is one of the pieces required to make modern (ie Focal)
strace(1) work.
Reviewed By: jhb (earlier version)
Sponsored by: EPSRC
Differential Revision: https://reviews.freebsd.org/D28212
Switch the main syscall table to use CAPENABLED flags rather than
capabilities.conf. This avoid synchronization issues between
syscalls.master and capabilities.conf (e.g. when renaming a syscall
during development).
For now, move capabilities.conf to sys/compat/freebsd32 and use it
there. Use of sys/compat/freebsd32/syscalls.master should be replaced
by makesyscalls.lua enhancements to allow the main one to be used.
This change results in no changes to generated files after running
`make sysent`.
Reviewed by: kevans, emaste
MFC after: 1 week
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D31350
Move the common kernel function signatures from machine/reg.h to a new
sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2).
Reviewed by: imp, markj
Sponsored by: DARPA, AFRL (original work)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19830
CLONE_CLEAR_SIGHAND is designed to reset all signal handlers of the child
not set to SIG_IGN to SIG_DFL.
Reviewed by: kib
Differential revision: https://reviews.freebsd.org/D31481
MFC after: 2 weeks
In preparation for clone3 system call add struct clone_args and use it in
clone implementation.
Move all of clone related bits to the newly created linux_fork.h header.
Differential revision: https://reviews.freebsd.org/D31474
MFC after: 2 weeks
At least Linux x86 ABI's does not use carry bit and expects that the dx register
is preserved. For this add a new sv_set_fork_retval hook and call it from cpu_fork().
Add a short comment about touching dx in x86_set_fork_retval(), for more details
see phab comments from kib@ and imp@.
Reviewed by: kib
Differential revision: https://reviews.freebsd.org/D31472
MFC after: 2 weeks
As no more NetBSD code in futexes exists replace NetBSD copyrights by
standard FreeBSD 2 clause license.
Add Roman Divacky's copyrights as an author of the robust futexes.
Differential revision: https://reviews.freebsd.org/D31347
MFC after: 2 weeks
These ones were unambiguous cases where the Foundation was the only
listed copyright holder (in the associated license block).
Sponsored by: The FreeBSD Foundation
fspacectl(2) is a system call to provide space management support to
userspace applications. VOP_DEALLOCATE(9) is a VOP call to perform the
deallocation. vn_deallocate(9) is a public KPI for kmods' use.
The purpose of proposing a new system call, a KPI and a VOP call is to
allow bhyve or other hypervisor monitors to emulate the behavior of SCSI
UNMAP/NVMe DEALLOCATE on a plain file.
fspacectl(2) comprises of cmd and flags parameters to specify the
space management operation to be performed. Currently cmd has to be
SPACECTL_DEALLOC, and flags has to be 0.
fo_fspacectl is added to fileops.
VOP_DEALLOCATE(9) is added as a new VOP call. A trivial implementation
of VOP_DEALLOCATE(9) is provided.
Sponsored by: The FreeBSD Foundation
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D28347
Fix a bug that slipped in in 90707c4e44
using the correct field in le32p_replace_bits().
MFC after: 3 days
Reviewed by: hselasky
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31352
FUTEX_LOCK_PI2 was added to support clock selection as FUTEX_LOCK_PI uses a
CLOCK_REALTIME based absolute value since it was implemented, but it does not
require that the FUTEX_CLOCK_REALTIME bit is set, because that was introduced
later.
MFC after: 2 weeks
In the Linux emulation layer linux_tdfind() has a special purpose to
handle glibc specific TID mangling and we should use it instead of tdfind().
MFC after: 2 weeks
Handle some races in handle_futex_death() which can prevents a wakeup of
potential waiters which can cause these waiters to block forever.
Differential Revision: https://reviews.freebsd.org/D31280
MFC after: 2 weeks
Linux futex documentation explicitly states that EINVAL is returned if
the futex is not 4-byte aligned. Check futex alignment as a Linux do
and return EINVAL.
Differential Revision: https://reviews.freebsd.org/D31279
MFC after: 2 weeks
Follow the r349951 (30b3018d), add check to react to stops and requests
to terminate between retries.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31254
MFC after: 2 weeks
According to fetch(9) fueword facility designed to fetch atomically
small amount of data from user space.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31239
MFC after: 2 weeks
To prevent umtx.h polluting by future changes split it on two headers:
umtx.h - ABI header for userspace;
umtxvar.h - the kernel staff.
While here fix umtx_key_match style.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31248
MFC after: 2 weeks
gcc failed as it didn't inlined the builtins and generates calls to
the libgcc, ld can't find libgcc as cross-toolchain libgcc is not installed.
To avoid this add internal vDSO ffs functions without optimized builtins.
Reported by: jhb
MFC after: 2 weeks
Add an implementation of read_poll_timeout() and the atomic variant
which I did at some point last year for rtw88 and now updated based
on feedback.
MFC after: 10 days
Reviewed by: hsealsky
Differential Revision: https://reviews.freebsd.org/D30980
Add fsleep() function now required by rtw88. This seems to be
making a decision depending on time to sleep on how to sleep.
Given our compat framework already is lenient on how long to sleep,
this is a cut down version.
MFC after: 10 days
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D31322
Add sys/types.h to dmi.h and do not rely on other files to include
all needed headers in Linux land. I ran into compile problems with
rtw88 otherwise.
MFC after: 3 days
to restore ABI compatibility for pre-10.x binaries.
It restores _umtx_lock() and _umtx_unlock() syscalls, and UMTX_OP_LOCK/
UMTX_OP_UNLOCK umtx_op(2) operations. UMUTEX_ERROR_CHECK flag is left
out for now, I do not think it makes a difference.
PR: 218571
Reviewed by: brooks (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31220
Use thread_reap_barrier() to ensure that no threads are kept in the
zombies list which could have the linuxkpi task allocated.
Also fix order of initialization and teardown for current task
allocation hooks and resources. Register current task allocator after
zones are initialized. Deregister allocator before cycling over threads
and zeroing task pointer.
Reviewed by: hselasky, markj
Sponsored by: Mellanox Technologies/NVidia Networking
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30468
Add sign_extend32() replicating the 64 version. This is needed by
the rtw88 driver.
MFC after: 10 days
Reviewed by: imp, emaste, hselasky
Differential Revision: https://reviews.freebsd.org/D30979
Add the nexthdr definitions for IPv6 which are used by wireless
drivers and were previously placed in an 80211 header file by
accident.
Obtained from: bz_iwlwifi
Sponsored by: The FreeBSD Foundation
Reviewed by: hselasky
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D31321
Add the two new functions needed by rtw88 to register the driver and
handle the module bits as well as a version of pci_alloc_irq_vectors()
for what is needed.
Reviewed by: hselasky
MFC after: 10 days
Differential Revision: https://reviews.freebsd.org/D30981
In the futex_atomic_op() the encoded_op is a user-supplied parameter.
If the user specifies an incorrect value for this parameter paired with a valid
*uaddr parameter the caller will go into the endless loop. To prevent this check
futex_atomic_op() result and break the loop in case of ENOSYS.
MFC after: 2 weeks
For the caller is no need for access checking here, as the caller must take care
of EFAULT handling. Moreover, this check would be superfluous, since EFAULT is
extremily rare, and we prefer the fast path.
MFC after: 2 weeks
Initial patch from submitter was adapted by me to prevent unconditional
FUTEX_REQUEUE use.
PR: 255947
Submitted by: Philippe Michaud-Boudreault
Differential Revision: https://reviews.freebsd.org/D30332
Move flags and rtclock to the struct linux_futex_args. This will be used when
I split linux_futex() into separate futex op functions.
MFC after: 2 weeks
As the sv_shared_page_base now pointed out to the native sharedpage and
the process VA layout has changed as follows:
VDSOPAGE (2 * PAGE_SIZE)
SHAREDPAGE (PAGE_SIZE)
USRSTACK
fixup the vDSO name by calculating the start of page relative to the
native sharedpage.
Differential revision: https://reviews.freebsd.org/D30903
MFC after: 2 weeks
The vDSO (virtual dynamic shared object) is a small shared library that the
kernel maps R/O into the address space of all Linux processes on image
activation. The vDSO is a fully formed ELF image, shared by all processes
with the same ABI, has no process private data.
The primary purpose of the vDSO:
- non-executable stack, signal trampolines not copied to the stack;
- signal trampolines unwind, mandatory for the NPTL;
- to avoid contex-switch overhead frequently used system calls can be
implemented in the vDSO: for now gettimeofday, clock_gettime.
The first two have been implemented, so add the implementation of system
calls.
System calls implemenation based on a native timekeeping code with some
limitations:
- ifunc can't be used, as vDSO r/o mapped to the process VA and rtld
can't relocate symbols;
- reading HPET memory is not implemented for now (TODO).
In case on any error vDSO system calls fallback to the kernel system
calls. For unimplemented vDSO system calls added prototypes which call
corresponding kernel system call.
Tested by: trasz (arm64)
Differential revision: https://reviews.freebsd.org/D30900
MFC after: 2 weeks
This allows other threads to execute, typically during hardware waiting loops.
This also maches how the function works in Linux.
Reviewed by: kib
MFC after: 1 week
Sponsored by: NVIDIA Networking
These ABIs do not use umtx at all, so there is nothing to clean.
Cloudabi references to umtx keys do not require any cleanups anyway.
Requested by: dchagin
Reviewed by: dchagin, markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D30987
Use sysentvec hooks to only call umtx_thread_exit/umtx_exec, which handle
robust mutexes, for native FreeBSD ABI. Similarly, there is no sense
in calling sigfastblock_clear() for non-native ABIs.
Requested by: dchagin
Reviewed by: dchagin, markj (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D30987
as a thin wrapper around native version found in sys/seqc.h.
This replaces out-of-base GPLv2-licensed code used by drm-kmod.
Reviewed by: hselasky
Differential revision: https://reviews.freebsd.org/D31006
strscpy copies the src string, or as much of it as fits, into the dst
buffer. The dst buffer is always NUL terminated, unless it's zero-sized.
strscpy returns the number of characters copied (not including the
trailing NUL) or -E2BIG if len is 0 or src was truncated.
Currently drm-kmod replaces strscpy with strncpy that is not quite
correct as strncpy does not NUL-terminate truncated strings and returns
different values on exit.
Reviewed by: hselasky, imp, manu
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D31005
This allows to remove unimplemented attrs parameter which type differs
between Linux kernel versions and to compile both drm-kmod and ofed
callers unmodified.
Also convert it to 'unsigned long' type to match modern Linuxes.
Reviewed by: hselasky
Differential revision: https://reviews.freebsd.org/D30932
Linux docs explicitly state that this is not required [1]:
"Important note: The rcu_barrier() function is not, repeat, not,
obligated to wait for a grace period. It is instead only required to
wait for RCU callbacks that have already been posted. Therefore, if
there are no RCU callbacks posted anywhere in the system, rcu_barrier()
is within its rights to return immediately. Even if there are
callbacks posted, rcu_barrier() does not necessarily need to wait for
a grace period."
[1] https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html
Reviewed by: emaste, hselasky, manu
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D30809
so this list-traversal primitive may safely run concurrently with the
_rcu list-mutation primitives such as list_add_rcu() as long as the
traversal is guarded by rcu_read_lock().
Do it by reusing the "list_for_each_entry_rcu" macro which does the same.
On Linux it implements some additional lockdep stuff which we skip.
Also move the macro to linux/rculist.h where it resides on Linux.
Reviewed by: hselasky
Differential revision: https://reviews.freebsd.org/D30795
as it is required by i915kms driver from Linux kernel v 5.5.
This is done with asynchronous freeing of requested memory areas from
taskqueue thread. As memory to be freed is reused to store linked list
entry, backing UMA zone item size is rounded up to pointer size.
While here, make struct linux_kmem_cache private to LKPI to reduce amount
of BSD headers included by linux/slab.h and switch RCU code to usage of
LKPI's linux_irq_work_tq taskqueue to avoid injection of current into
system-wide taskqueue_fast thread context.
Submitted by: nc (initial version for drm-kmod)
Reviewed by: manu, nc
Differential revision: https://reviews.freebsd.org/D30760
This makes prctl(2) support PR_SET_NO_NEW_PRIVS, by mapping it
to the native PROC_NO_NEW_PRIVS_CTL procctl(2).
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30973
This introduces a new, per-process flag, "NO_NEW_PRIVS", which
is inherited, preserved on exec, and cannot be cleared. The flag,
when set, makes subsequent execs ignore any SUID and SGID bits,
instead executing those binaries as if they not set.
The main purpose of the flag is implementation of Linux
PROC_SET_NO_NEW_PRIVS prctl(2), and possibly also unpriviledged
chroot.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30939
Implement dumping core for Linux binaries on amd64, for both
32- and 64-bit executables. Some bits are still missing.
This is based on a prototype by chuck@.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30019
To avoid duplication in the vmstat -m output rename the kmalloc type short
description to 'lkpikmalloc' as the Linux emulation layer historically names
its linux malloc type as 'linux'.
Reviewed by: hselasky, kib, emaste
Differential Revision: https://reviews.freebsd.org/D30928
MFC after: 2 weeks
This adds `sv_elf_core_osabi`, `sv_elf_core_abi_vendor`,
and `sv_elf_core_prepare_notes` fields to `struct sysentvec`,
and modifies imgact_elf.c to make use of them instead
of hardcoding FreeBSD-specific values. It also updates all
of the ABI definitions to preserve current behaviour.
This makes it possible to implement non-native ELF coredump
support without unnecessary code duplication. It will be used
for Linux coredumps.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30921
Change linuxkpi_request_firmware_nowait() to deferred firmware loading
scheduling a task. This changes behaviour in some cases that we
return from loading the driver before the driver is finished
initialising if the driver does not deal with it (wait).
This brings the behaviour one would expect from when this function is
called and I implemented it to see if it would help a specific case.
Sponsored by: The FreeBSD Foundation
MFC after: 12 days
Reviewed by: hselasky, imp (earlier version)
Differential Revision: https://reviews.freebsd.org/D30830
Re-add pci_free_irq_vectors() accidentally removed in
d4a4960c65 and now needed by drm-kmod v5.5.
Reported by: wulf
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
X-MFC with: d4a4960c65
Assuming we can't run on i486, i586 class cpu, retire linux_kplatform var
and use hardcoded 'machine' value in linux_newuname().
I have added linux_kplatform for consistency with linux_platform which is
placed in to vdso to avoid excess copyout it on stack for AT_PLATFORM at
exec time.
This is the first stage of Linuxulator's vdso revision.
Reviewed by: trasz, imp
Differential Revision: https://reviews.freebsd.org/D30774
MFC after: 2 weeks
For now the Linux emulation layer uses in kernel ppoll(2) without
conversion of user supplied fd 'events', and does not convert the
kernel supplied fd 'revents'.
At least POLLRDHUP is handled by FreeBSD differently than by
Linux. Seems that Linux silencly ignores POLLRDHUP on non socket fd's
unlike FreeBSD, which does more strictly check and fails.
Rework the Linux ppoll, using kern_poll and converting 'events'
and 'revents' values.
While here, move poll events defines to the MI part of code as they
mostly identical on all arches except arm.
Differential Revision: https://reviews.freebsd.org/D30716
MFC after: 2 weeks
Fix a last minute change from d4a4960c65
based on review feedback in where a function now gets called before
it is declared which did not fully get merged back to my commit branch.
Noticed by: CI, jkim
MFC after: 10 days
X-MFC with: d4a4960c65
Sponsored-by: The FreeBSD Foundation
Some code manually calls local_bh_disable() and spin_lock() but
then calls spin_unlock_bh() (or vice versa).
Our code then calls local_bh_disable() again from spin_lock()
which means we have the thread pin count increased twice and that
means we get out of synch and are still pinned when returning to
user space.
Avoid this by adding the explicit local_bh_{enable,disable}() to
the spin_[un]lock_bh() versions.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30711
In sg_pcopy_from_buffer() is an error in that skip can underflow
and lead to bogus page arithmetics which may lead to memory corruption
or more likely panics. Once we found a s/g page to copy into there
is nothing to skip anymore so simply set skip to 0.
Sponsored by: The FreeBSD Foundation
MFC after: 5 days
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30676
Restructure some code and add support for various "managed" versions
for PCI resource management.
This is beyond of what iwlwifi needs but some was found with other
wireless drivers and it mostly all goes together.
Add one FreeBSD sepcific feature returning the resource rather than
the handle to allow us to use bus_*() functions in drivers directly.
Sponsored by: The FreeBSD Foundation
MFC after: 10 days
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30558
Given we are manually setting up the "device" in PCI in some cases,
we need to initialise the list and lock for device devres here as well
as otherwise we will panic on the uninitialised lock.
Sponsored by: The FreeBSD Foundation
MFC after: 5 days
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30681
Move request_irq() to an internal function which serves request_irq()
and the newly added request_threaded_irq() and devm_request_threaded_irq().
Likewise factor out parts of free_irq() to also be used with
devm_free_irq(). Add the storage and call to a thread_handler in case
of IRQ_WAKE_THREAD.
This is needed for the iwlwifi driver.
Sponsored by: The FreeBSD Foundation
MFC after: 10 days
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30549
Add dummy functions for dealing with "HotPlug" events which we currently
do not support.
Add pci_dev_get(), pci_find_ext_capability() and pci_pme_capable().
The added pcie_find_root_port() is a bit special as we need to create
another linux pci device; for that make lkpinew_pci_dev() public
which is also helpful for other cases when we want to use the Linux
routines to check for device identifiers only and need a container
for the "bsddev" to use natively. This has proven to avoid basic
checking code for the sake of rewriting it to native field names
elsewhere. Given we cache the newly created "root" we also need to
make sure we clean it up.
Sponsored by: The FreeBSD Foundation
MFC after: 10 days
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30521
dmam_pool_create() is a "managed" version of dma_pool_create() which
will cleanup everything left when the device goes away using the
devres framework. For that add an internal cleanup function to be
called from devres release.
This is used by at least one wireless driver.
Sponsored by: The FreeBSD Foundation
MFC after: 10 days
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30520
Add two new (though untested) functions to linux/device.h which are
dealing with manually managing the device/driver and are used by
at least one wireless driver. We may have to re-fine them in the
future.
Move the devres declarations further up so they can be used earlier
in the file.
Sponsored by: The FreeBSD Foundation
MFC after: 10 days
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D30519
While currently the ifp gets cast to a net_device and then returned
and consumers are expecting an ifp again, allow parallel usage now and
in the future by extending and also passing the ifp directly back in
the netdev_notifier_info. Add a function to return the ifp instead of
the net_device.
Sponsored by: The FreeBSD Foundation
MFC after: 10 days
Suggested by: hselasky
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30522
Filter out the flags we do support; previously we would print
out the flag value verbatim.
Reviewed By: dchagin
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30693
In Linux, these are macros to locks in the kernel for scheduling purposes.
But as with other macros in this header, we aren't doing anything with them
so we are doing `do {} while (0)` for now.
This is needed by the drm-kmod 5.7 update.
Approved by: hselasky (src)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D30710
This fixes a number of AppImages; tested with
scribus-1.5.6.1-linux-x86_64.AppImage.
Reported By: @probonopd
Reviewed By: asomers, emaste
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30606
Retire ksiginfo_to_lsiginfo function, use siginfo_to_lsiginfo instead.
Convert rt_sigtimedwait siginfo variables to well known names.
MFC after: 2 weeks
Add more raditap definitions based on "names" found in actual drivers
and based on documentation from radiotap.org (where avail).
Leave one specific "duplicate" in the LinuxKPI implementation but
otherwise manage it all in net80211.
Sponsored by: The FreeBSD Foundation
MFC after: 10 days
Reviewed by: hselasky, adrian, sam
Differential Revision: https://reviews.freebsd.org/D30641
Add HWEIGHT32() macro needed by iwlwifi and while here add the 8/16/64
variants likewise.
Sponsored by: The FreeBSD Foundation
MFC after: 12 days
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30501
Now that mlx4 and ofed either are operating on ifnet functions
directly or have a private copy of these macros, we can remove them
from linux/netdevice.h.
With this only the #define for net_device to ifnet is left.
Sponsored by: The FreeBSD Foundation
MFC after: 12 days
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D30478
This removes all unused bits from linux/netdevice.h and migrates two
inline functions into the mlx4 and ofed code respectively.
This gets the mlx4/ofed (struct ifnet) specific bits down to 7 lines
in netdevice.h.
Sponsored by: The FreeBSD Foundation
MFC after: 13 days
Reviewed by: hselasky, kib
Differential Revision: https://reviews.freebsd.org/D30461
The second 'linuxcommon' line was added by c66f5b079d
but Linuxulator's modules dependend on 'linux_common'.
To avoid such mistakes in the future rename moduledata name and module
name to 'linux_common' and retire 'linuxcommon' line.
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D30409
MFC after: 2 weeks
Introduce net/addrconf.h with an implementation to
addrconf_addr_solict_mult() used by WiFi drivers.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30416
Add DECLARE_EWMA() which expands to a per-name EWMA implementation
as used by multiple wireless drivers.
Sposnored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky, cperciva, dwmalone
Differential Revision: https://reviews.freebsd.org/D30415
Add linux/bsearch.h which only includes libkern.h as the sort(9)
functions seem to be compatible.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30417
Add a few more le<n>_{tp,add}_cpu*() #defines/functions found in
wireless drivers. While here fill most of the combinatorics gaps
and also add the remaining combinations [1].
Suggested by: emaste [1] (for one part)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30418
Add a definition for SMP_CACHE_BYTES and while here include sys/param.h
for CACHE_LINE_SIZE as otherwise code might not compile standalone.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30419
Add linux/cpu.h for cpumask_*() functions found in wireless drivers
and make sure cpu_online_mask is always initialised.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30421
Add linux/devcoredump.h with stub implementation of dev_coredumpv()
and dev_coredumpsg() which only free the passed in SG table as needed
for iwlwifi.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30423
Replace the implementation for ether_addr_equal() with
ether_addr_equal_unaligned() and add a define for ether_addr_equal()
pointing to the now ether_addr_equal_unaligned() implementation.
This way ether_addr_equal_unaligned() cannot be broken by accident [1].
Suggested by: emaste [1]
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30425
Add a dummy struct inet6_dev {}; to net/if_inet6.h. This is currently
not used for anything but in a declaration. Just needs to be there.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30426
Add an implementation for irq_set_affinity_hint() to linux/interrupt.h
and include linux/hardirq.h for synchronize_irq() as needed by
wireless drivers.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30427
Add header files for struct and accessors for IPv4, UDP, and TCP.
Only parts of the fields of the structs have been seen while working
on wireless drivers. The remaining field names are filled up with
the FreeBSD field names for now. If you have insights into their
correct naming in Linux, feel free to adjust.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30428
Include linux/bitops.h for a definition of BITS_PER_LONG so that this
file can be used independently.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30429
This is used by wireless drivers. Use the time_after() macro as
done for the "after_eq" version.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30430
BUILD_BUG_ON() can be used inside functions where the definition to
CTASSERT() (_Static_assert()) seems to not work.
Go back to an old-style CTASSERT() implementation but also add a
variable dclaration to avoid "unsued typedef" errors and dummy-use
the variable to avoid "unusued variable" errors. Given it is all
self-contained in a block and not used outside this should be
optimised away.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30431
Add yet another version of the various module_param_named() use cases.
This one deals with "charp".
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30433
Add more definitions for various PCI uses to linux/pci.h. Almost all
are defined to their FreeBSD counterparts which are described there.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30434
Add a define for rcu_dereference_check() to rcu_dereference_protected()
which ignores the check argument. Our lockdep compat implementation
for use cases found in iwlwifi would return 1 anyway.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30436
Add linux/stringify.h as directly included by drivers. Remove the
definitions from compiler.h and include the new header in places
where the stringify macros are already used without linuxkpi.
I have adjusted the Copyright of the new file according to the commit
originaly adding the macros (99e690772a).
Sposnored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30440
Add a placeholder struct for guid_t which is needed by ACPI consumers
in at least one wireless driver.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30439
the thread destructor is invoked. Catch that window by waiting for all
task_struct allocations to be returned before freeing the UMA zone in the
LinuxKPI. Else UMA may fail to release the zone due to concurrent access
and panic:
panic() - Bad link element prev->next != elm
zone_release()
bucket_drain()
bucket_free()
zone_dtor()
zone_free_item()
uma_zdestroy()
linux_current_uninit()
This failure can be triggered by loading and unloading the LinuxKPI module
in a loop:
while true
do
kldload linuxkpi
kldunload linuxkpi
done
Discussed with: kib@
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Remove all the 'entry' and 'return' probes; they clutter up the source
and are redundant to FBT.
Reviewed By: dchagin
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30040
It writes the core of live stopped process to the file descriptor
provided as an argument.
Based on the initial version from https://reviews.freebsd.org/D29691,
submitted by Michał Górny <mgorny@gentoo.org>.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29955
This is needed by the drm-kmod 5.5 update and is similar in logic to the
existing wait_event_killable macro.
Reviewed by: hselasky, manu
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29987
Only allocate struct_mm after we checked that other threads do not carry
useful mm_struct. If they don't, drop process lock, allocate, and recheck.
Note that for M_NOWAIT allocations we could avoid dropping process lock,
but I do not think that this increased complexity is useful.
Reviewed by: hselasky
Sponsored by: Mellanox Technologies/NVidia Networking
MFC after: 1 week
Create and use zones for task and mm. Reserve items in zones based on the
estimation of the max number of interrupts in the system. Use M_USE_RESERVE
to allow to take reserved items when allocation occurs from the interrupt
thread context.
Of course, this would only work first time we allocate the task for
interrupt thread. If interrupt is deallocated and allocated anew,
creating a new thread, it might be that zone is depleted. It still
should be good enough for practical uses.
Reviewed by: hselasky
Sponsored by: Mellanox Technologies/NVidia Networking
MFC after: 1 week
This is required for the current Arch Linux binaries to work.
PR: 254112
Reviewed By: emaste
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D29218
Without it, Qt5 apps from Focal fail to start, being unable to load
their plugins. It's also necessary for glibc 2.33, as found in recent
Arch snapshots.
PR: 254112
Reviewed By: kib
Sponsored by: The FreeBSD Foundation, EPSRC
Differential Revision: https://reviews.freebsd.org/D28192
This should be a no-op; the purpose of this is to reduce
a spurious difference between Linuxulator and Linux, to make
debugging core dumps slightly easier.
Note that AT_HWCAP2 we pass to Linux binaries is always 0,
instead of being equal to 'cpu_feature2'. This matches what
I've observed under Ubuntu Focal VM.
Reviewed By: chuck, dchagin
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D29609
We were passing a LinuxKPI struct device * to a pci(4) function that
expects a device_t.
Reviewed by: manu, hselasky, bz
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29675
They have their own lifetime managed by the containing objects.
Premature and unexpected free causes corruption.
Reviewed by: hselasky
Sponsored by: Mellanox Technologies/NVidia Networking
MFC after: 1 week
the variables hold pointers to a linux_cdev, not to a FreeBSD cdev.
Reviewed by: hselasky
Sponsored by: Mellanox Technologies/NVidia Networking
MFC after: 1 week
A lot of firmware files have a "-" in the name. That "-" is a problem
when dealing with shell variables or loader (e.g., auto-loading .ko).
It may thus often be convenient to generate firmware kernel object files
with s/-/_/g in the name. In order to automatically find them from
drivers using LinuxKPI also substitue the '-' for a '_' like we do
for '/' and '.' already.
Reviewed-by: hselasky, manu (ok)
MFC-after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29514
The two functions in linux/inetdevice.h are highly FreeBSD/ifnet
specific. This is a result of struct net_device being mapped to
struct ifnet.
The only known consumer of these functions are two files in the
ofed/infiniband code.
As a first step of cleaning up copy linux/inetdevice.h to
rdma/ib_addr_freebsd.h. (It stayed a separate file to preserve
copyright and license of the original file; otherwise it could be
merged into ib_addr.h where more EPOCH/vnet/.. are already used).
Slightly rename the function to not conflict with LinuxKPI
in the future.
Remove the three last, now unneeded includes of inetdevice.h and
zap linux/inetdevice.h to an empty header file with only the forward
include to netdevice.h remaining.
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: hselasky, kib
X-D-R: D29366 (extracted as further cleanup)
Differential Revision: https://reviews.freebsd.org/D29434
Introduce struct netdev_notifier_info as a container to pass
net_device to the callback functions.
Adjust netdev_notifier_info_to_dev() to return the net_device field.
Add explicit casts from ifp to ni->dev even though currently
struct net_device is defined to struct ifnet. This is needed in
preparation for untangling this and improving the net_device compat
code.
Obtained-from: bz_iwlwifi
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: hselasky
Differential Revision: https://reviews.freebsd.org/D29365
Add a net_ratelimit() compat implementation based on ppsratecheck().
Add a sysctl to allow tuning of the number of messages.
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: hselasky
Differential Revision: https://reviews.freebsd.org/D29399
We are not aware of any out-of-tree consumers anymore
which would need KPI support for before Linux version 5.
Update the two in-tree consumers to use the new KPI.
This allows us to remove the extra version check and
will also give access to {lower,upper}_32_bits() unconditionally.
Sponsored-by: The FreeBSD Foundation
Reviewed-by: hselasky, rlibby, rstone
MFC-after: 2 weeks
X-MFC: to 13 only
Differential Revision: https://reviews.freebsd.org/D29391
Add stubs for struct lockdep_map and three accessor functions
used by iwlwifi.
Obtained-from: bz_iwlwifi
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: hselasky, emaste
Differential Revision: https://reviews.freebsd.org/D29398
brcm80211 include pci_ids.h directly while historically we were tracking
IDs in pci.h. Move the current set of IDs from pci.h to pci_ids.h and
while here add IDs for Realtek and Broadcom as well as a network class
as needed by their wireless drivers.
We still include pci_ids.h from pci.h so this should not change anything.
MFC-after: 2 weeks
Reviewed-by: hselasky
Differential Revision: https://reviews.freebsd.org/D29400
Add various protocol IDs found in various wireless drivers.
Also add ETH_FRAME_LEN and struct ethhdr.
Obtained-from: bz_iwlwifi
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: hselasky
Differential Revision: https://reviews.freebsd.org/D29397
Add ERFKILL and EBADE found in iwlwifi and brcmfmac wireless drivers.
While here add a comment above the block of error numbers above 500 to
document expectations.
Obtained-from: bz_iwlwifi
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: hselasky, emaste
Differential Revision: https://reviews.freebsd.org/D29396
ieee80211_node.h uses LIST_HEAD() which LinuxKPI redefines and this
can lead to problems (see comment there). Make sure the net80211
header file is handled correctly by adding it to the list of files
to include before re-defining the macro.
Also add header files needed as dependencies.
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: philip, hselasky
Differential Revision: https://reviews.freebsd.org/D29336
Add support for crc32_le() as a wrapper around crc32_raw().
Sponsored-by: The FreeBSD Foundation
Obtained-from: bz_iwlwifi
MFC-after: 2 weeks
Reviewed-by: hselasky
Differential Revision: https://reviews.freebsd.org/D29187
For native FreeBSD binaries, the return value from __getcwd(2)
doesn't really matter, as the libc wrapper takes over and returns
the proper errno.
PR: kern/254120
Reported By: Alex S <iwtcex@gmail.com>
Reviewed By: kib
Sponsored By: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29217
This looks like a no-op, but it prevents udevadm(8) with failing
loudly, which in turn unbreaks installation of libfprint-2-2, which
in Focal is a dependency for make-4.2.1-1.2.
One might wonder why installing a build utility involves messing
with device handling...
Sponsored By: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29133
KCSAN complains about racy accesses in the locking code. Those races are
fine since they are inside a TD_SET_RUNNING() loop that expects the value
to be changed by another CPU.
Use relaxed atomic stores/loads to indicate that this variable can be
written/read by multiple CPUs at the same time. This will also prevent
the compiler from doing unexpected re-ordering.
Reported by: GENERIC-KCSAN
Test Plan: KCSAN no longer complains, kernel still runs fine.
Reviewed By: markj, mjg (earlier version)
Differential Revision: https://reviews.freebsd.org/D28569
linux_shared_page_init() creates an object and grabs and maps a single
page to back the VDSO. When destroying the VDSO object, we failed to
destroy the mapping and free KVA. Fix this.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28696
Current epoll implementation stores udata fields of epoll_event
structure in special dynamically-sized table rather than in udata field
of backing kevent structure because of 2 reasons:
1. Kevent's udata size is smaller than epoll's on 32-bit archs.
2. Kevent's udata can be clobbered on execution EPOLL_CTL_ADD as kqueue
modifies existing event while epoll returns error in this case.
After r320043 has introduced four new 64bit user data members (ext[]),
we can store epoll udata in one of them and drop aforementioned table.
According to kqueue_register() source code ext members are not updated
when existing kevent is modified that fixes p.2.
As a side effect the patch fixes PR/252582.
Reviewed by: trasz
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D28169
It returns "unconfined", like Linux without SELinux would.
Sponsored By: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28164
Previously the flags were passed as-is, which could resulted
in spurious EAGAIN returned for non-blocking sockets, which
broke some Steam games.
PR: 248065
Reported By: Alex S <iwtcex@gmail.com>
Tested By: Alex S <iwtcex@gmail.com>
Reviewed By: emaste
MFC After: 3 days
Sponsored By: The FreeBSD Foundation
Consider the following scenario:
1. A delayed_work struct in the WORK_ST_TIMER state.
2. Thread A calls mod_delayed_work()
3. Thread B (a callout thread) simultaneously calls
linux_delayed_work_timer_fn()
The following sequence of events is possible:
A: Call linux_cancel_delayed_work()
A: Change state from TIMER TO CANCEL
B: Change state from CANCEL to TASK
B: taskqueue_enqueue() the task
A: taskqueue_cancel() the task
A: Call linux_queue_delayed_work_on(). This is a no-op because the
state is WORK_ST_TASK.
As a result, the delayed_work struct will never be invoked. This is
causing address resolution in ib_addr.c to stop permanently, as it
never tries to reschedule a task that it thinks is already scheduled.
Fix this by introducing locking into the cancel path (which
corresponds with the lock held while the callout runs). This will
prevent the callout from changing the state of the task until the
cancel is complete, preventing the race.
Differential Revision: https://reviews.freebsd.org/D28420
Reviewed by: hselasky
MFC after: 2 months
The lock around callout_drain() is unnecessary and may cause
deadlock when one closes a timer descriptor during timer execution.
Reviewed By: delphij
Submitted By: ankohuu_outlook.com (Shunchao Hu)
Differential Revision: https://reviews.freebsd.org/D28148
On Linux, read(2) from a timerfd file descriptor returns an unsigned
8-byte integer (uint64_t) containing the number of expirations
that have occurred, if the timer has already expired one or more
times since its settings were last modified using timerfd_settime(),
or since the last successful read(2). That's to say, once we do
a read or call timerfd_settime(), timer fd's expiration count should
be zero. Some Linux applications create timerfd and add it to epoll
with LT mode, when event comes, they do timerfd_settime instead
of read to stop event source from trigger. On FreeBSD,
timerfd_settime(2) didn't set the count to zero, which caused high
CPU utilization.
Submitted by: ankohuu_outlook.com (Shunchao Hu)
Differential Revision: https://reviews.freebsd.org/D28231
In a6c2507d1b support for LinuxKPI
firmware loading was added. Record the dependency on firmware(9)
as otherwise (if built as module) linuxkpi will no longer load.
Reported-by: tijl
MFC after: 1 day
X-MFC-with: a6c2507d1b
Sponsored-by: The FreeBSD Foundation
This code implements a version of the devres framework found
working for various iwlwifi use cases and also providing functions
for ttm_page_alloc_dma.c from DRM.
Part of the framework replicates the consumed KPI, while others
are internal helper functions.
In addition the simple devm_k*malloc() consumers were implemented
and kvasprintf() was enhanced to also work for the devm_kasprintf()
case.
Addmittingly lkpi_devm_kmalloc_release() could be avoided but for
the overall understanding of the code and possible memory tracing
it may still be helpful.
Further devsres consumer are implemented for iwlwifi but will follow
later as the main reason for this change is to sort out overlap with
DRM.
Sponsored-by: The FreeBSD Foundation
Obtained-from: bz_iwlwifi
MFC After: 3 days
Reviewed-by: hselasky, manu
Differential Revision: https://reviews.freebsd.org/D28189
In pci_domain_nr() directly return the domain which got set in
lkpifill_pci_dev() in all cases. This was missed between D27550
and 105a37cac7 .
In order to implement pci_dev_put() harmonize further code
(which was started in the aforementioned commit) and add kobj
related bits (through the now common lkpifill_pci_dev() code)
to the DRM specific calls without adding the DRM allocated
pci devices to the pci_devices list.
Add a release for the lkpinew_pci_dev() (DRM) case so freeing
will work.
This allows the DRM created devices to use the normal kobj/refcount
logic and work with, e.g., pci_dev_put().
(For a slightly more detailed code walk see the review).
Sponsored-by: The FreeBSD Foundation
Obtained-from: bz_iwlwifi (partially)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D28188
The upcoming in-kernel implementations for LinuxKPI based on work on
iwlwifi (and other wireless drivers) conflicts in a few places with
the drm-kmod graphics work outside the base system.
In order to transition smoothly extract the conflicting bits.
This included "unaligned" accessor functions, sg_pcopy_from_buffer(),
IS_*() macros (to be further restricted in the future), power management
bits (possibly no longer conflicting with DRM), and other minor changes.
Obtained-from: bz_iwlwifi
Sponsored-by: The FreeBSD Foundation
MFC after: 3 days
Reviewed by: kib, hselasky, manu, bdragon (looked at earlier versions)
Differential Revision: https://reviews.freebsd.org/D26598
Implement linux firmware KPI compat code.
This includes: request_firmware() request_firmware_nowait(),
request_firmware_direct(), firmware_request_nowarn(),
and release_firmware().
Given we will try to map requested names from natively ported
or full-linuxkpi-using drivers to a firmware(9) auto-loading
name format (.ko file name and image name matching),
we quieten firmware(9) and print success or failure (unless
the _nowarn() version was called) in the linuxkpi implementation.
At the moment we try up-to 4 different naming combinations,
with path stripped, original name, and requested name with '/'
or '.' replaced.
We do not currently defer loading in the "nowait" case.
Sponsored-by: The FreeBSD Foundation
Sponsored-by: Rubicon Communications, LLC ("Netgate")
(firmware(9) nowarn update from D27413)
MFC after: 3 days
Reviewed by: kib, manu (looked at older versions)
Differential Revision: https://reviews.freebsd.org/D27414
Replace all uses of kern_mmap with kern_mmap_req move the old kern_mmap.
Reand rename kern_mmap_req to kern_mmap .
The helper saved some code churn initially, but having multiple
interfaces is sub-optimal.
Obtained from: CheriBSD
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D28292
nids(4) was a clever idea in the early 2000's when the market was
flooded with 10/100 NICs with Windows-only drivers, but that hasn't been
the case for ages and the driver has had no meaningful maintenance in
ages. It only supports Windows-XP era drivers.
Also remove:
- ndis support from wpa_supplicant
- ndiscvt(8)
Reviewed By: emaste, bcr (manpages)
Differential Revision: https://reviews.freebsd.org/D27609
Use the number of items scanned to control the duration of the shrink
loop. Otherwise, if a consumer like TTM is not able to free the number
of items requested for some reason, the shrinker keeps looping forever.
Reviewed by: manu
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28224
Summary:
Linux's irq_work queue was created for asynchronous execution of code from contexts where spin_lock's are not available like "hardware interrupt context". FreeBSD's fast taskqueues was created for the same purposes.
Drm-kmod 5.4 uses irq_work_queue() at least in one place to schedule execution of task/work from the critical section that triggers following INVARIANTS-induced panic:
```
panic: acquiring blockable sleep lock with spinlock or critical section held (sleep mutex) linuxkpi_short_wq @ /usr/src/sys/kern/subr_taskqueue.c:281
cpuid = 6
time = 1605048416
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe006b538c90
vpanic() at vpanic+0x182/frame 0xfffffe006b538ce0
panic() at panic+0x43/frame 0xfffffe006b538d40
witness_checkorder() at witness_checkorder+0xf3e/frame 0xfffffe006b538f00
__mtx_lock_flags() at __mtx_lock_flags+0x94/frame 0xfffffe006b538f50
taskqueue_enqueue() at taskqueue_enqueue+0x42/frame 0xfffffe006b538f70
linux_queue_work_on() at linux_queue_work_on+0xe9/frame 0xfffffe006b538fb0
irq_work_queue() at irq_work_queue+0x21/frame 0xfffffe006b538fd0
semaphore_notify() at semaphore_notify+0xb2/frame 0xfffffe006b539020
__i915_sw_fence_notify() at __i915_sw_fence_notify+0x2e/frame 0xfffffe006b539050
__i915_sw_fence_complete() at __i915_sw_fence_complete+0x63/frame 0xfffffe006b539080
i915_sw_fence_complete() at i915_sw_fence_complete+0x8e/frame 0xfffffe006b5390c0
dma_i915_sw_fence_wake() at dma_i915_sw_fence_wake+0x4f/frame 0xfffffe006b539100
dma_fence_signal_locked() at dma_fence_signal_locked+0x105/frame 0xfffffe006b539180
dma_fence_signal() at dma_fence_signal+0x72/frame 0xfffffe006b5391c0
dma_fence_is_signaled() at dma_fence_is_signaled+0x80/frame 0xfffffe006b539200
dma_resv_add_shared_fence() at dma_resv_add_shared_fence+0xb3/frame 0xfffffe006b539270
i915_vma_move_to_active() at i915_vma_move_to_active+0x18a/frame 0xfffffe006b5392b0
eb_move_to_gpu() at eb_move_to_gpu+0x3ad/frame 0xfffffe006b539320
eb_submit() at eb_submit+0x15/frame 0xfffffe006b539350
i915_gem_do_execbuffer() at i915_gem_do_execbuffer+0x7d4/frame 0xfffffe006b539570
i915_gem_execbuffer2_ioctl() at i915_gem_execbuffer2_ioctl+0x1c1/frame 0xfffffe006b539600
drm_ioctl_kernel() at drm_ioctl_kernel+0xd9/frame 0xfffffe006b539670
drm_ioctl() at drm_ioctl+0x5cd/frame 0xfffffe006b539820
linux_file_ioctl() at linux_file_ioctl+0x323/frame 0xfffffe006b539880
kern_ioctl() at kern_ioctl+0x1f4/frame 0xfffffe006b5398f0
sys_ioctl() at sys_ioctl+0x12a/frame 0xfffffe006b5399c0
amd64_syscall() at amd64_syscall+0x121/frame 0xfffffe006b539af0
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe006b539af0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800a6f09a, rsp = 0x7fffffffe588, rbp = 0x7fffffffe640 ---
KDB: enter: panic
```
Here, the dma_resv_add_shared_fence() performs a critical_enter() and following call of schedule_work() from semaphore_notify() triggers 'acquiring blockable sleep lock with spinlock or critical section held' panic.
Switching irq_work implementation to fast taskqueue fixes the panic for me.
Other report with the similar bug: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=247166
Reviewed By: hselasky
Differential Revision: https://reviews.freebsd.org/D27171
This is required for Qt5, as found in Ubuntu Focal. The library contains
the minimum kernel version encoded in an ELF note; this makes rtld ignore
it altogether, with a confusing error message. Without it, things fail
like this:
$ konsole: error while loading shared libraries: libQt5Core.so.5: cannot
open shared object file: No such file or directory
For reference, the Qt kernel version requirements can be found at:
https://github.com/qt/qtbase/blob/dev/src/corelib/global/minimum-linux_p.h
Sponsored by: The FreeBSD Foundation
Reviewed By: emaste
Differential Revision: https://reviews.freebsd.org/D28105
With newer AMD GPUs (>=Navi,Renoir) there is FPU context usage in the
amdgpu driver.
The `kernel_fpu_begin/end` implementations in drm did not even allow nested
begin-end blocks.
Submitted by: Greg V
Reviewed By: manu, hselasky
Differential Revision: https://reviews.freebsd.org/D28061
A driver can register a shrinker that will be called when the kernel
wants to free some memory.
Add support for that in linuxkpi and call the registered shrinkers
when the lowmem event is triggered.
Reviewed by: bz
Differential Revision: https://reviews.freebsd.org/D27728
-pci_get_class : This function search for a matching pci device based on
the class/subclass and returns a newly created pci_dev.
- pci_{save,restore}_state : This is analogous to ours with the same name
- pci_is_root_bus : Return true if this is the root bus
- pci_get_domain_bus_and_slot : This function search for a matching pci
device based on domain, bus and slot/function concat into a single
unsigned int (devfn) and returns a newly created pci_dev
- pci_bus_{read,write}_config* : Read/Write to the config space.
While here add some helper function to alloc and fill the pci_dev struct.
Reviewed by: hselasky, bz (older version)
Differential Revision: https://reviews.freebsd.org/D27550
Stop trying to manually calculate RID, which cannot be done correctly
by PCI_DEVFN(). Use PCI_GET_RID() method instead.
Do not use pci_find_dbsf() to go from the linux pci_dev to freebsd
device_t. First, device is readily available as dev.bsddev. Second,
using pci_find_dbsf() fails for ARI-enabled functions with large
function numbers, because PCI_SLOT()/PCI_FUNC() are for non-ARI.
Reviewed by: bz, hselasky, manu
Tested by: manu (drm)
Sponsored by: Mellanox Technologies/NVidia Networking
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27960
The originally chosen numbers interfere with downstream projects'
syscalls. Move them to the end of the syscall table instead.
Reported by: jrtc27
Reviewed by: brooks
MFC-With: 022ca2fc7f
Differential Revision: 022ca2fc7f
POSIX AIO is great, but it lacks vectored I/O functions. This commit
fixes that shortcoming by adding aio_writev and aio_readv. They aren't
part of the standard, but they're an obvious extension. They work just
like their synchronous equivalents pwritev and preadv.
It isn't yet possible to use vectored aiocbs with lio_listio, but that
could be added in the future.
Reviewed by: jhb, kib, bcr
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D27743
eventfd is a Linux system call that produces special file descriptors
for event notification. When porting Linux software, it is currently
usually emulated by epoll-shim on top of kqueues. Unfortunately, kqueues
are not passable between processes. And, as noted by the author of
epoll-shim, even if they were, the library state would also have to be
passed somehow. This came up when debugging strange HW video decode
failures in Firefox. A native implementation would avoid these problems
and help with porting Linux software.
Since we now already have an eventfd implementation in the kernel (for
the Linuxulator), it's pretty easy to expose it natively, which is what
this patch does.
Submitted by: greg@unrelenting.technology
Reviewed by: markj (previous version)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D26668
Also centralize and unify checks to enable ASLR stack gap in a new
helper exec_stackgap().
PR: 239873
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
- Use [u]intptr_t casts to convert pointers to integers.
- Change IS_ERR* to return bool instead of long.
Reviewed by: manu
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27577
Allow setting the alternate interface number to fail when there is only
one alternate setting present, to comply with the USB specification.
Refactor how iface->num_altsetting is computed.
Bump the __FreeBSD_version due to change of core USB structure.
PR: 251856
MFC after: 1 week
Submitted by: Ma, Horse <Shichun.Ma@dell.com>
Sponsored by: Mellanox Technologies // NVIDIA Networking
Possibly fixes the wrong flags being passed to the kernel
allocators in linux_dma_alloc_coherent() and linux_dma_pool_alloc().
Reviewed by: hps
MFC after: 2 weeks
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D27508
in the LinuxKPI. Linux defines min() to be a macro, while in FreeBSD
min() is a static inline function clamping its arguments to
"unsigned int".
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
linux_common.c to linux_util.c so they become available on i386.
linux_common.c defines the linux_common kernel module but this module does
not exist on i386 and linux_common.c is not included in the linux module.
linux_util.c is included in the linux_common module on amd64 and the linux
module on i386.
Remove linux_common.c from files.i386 again. It was added recently in
r367433 when the DTrace provider definitions were moved.
The V4L feature declarations were moved to linux_common in r283423.
struct timex is not 32-bit safe, it uses longs for members.
Provide translation.
Reviewed by: brooks, cy
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27471
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.
Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*). Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.
Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys. Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight. Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.
Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.
Suggested by: mav (*)
Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27225
Providing these in freebsd32.h facilitates local testing/measuring of the
structs rather than forcing one to locally recreate them. Sanity checking
offsets/sizes remains in kern_umtx.c where these are typically used.
The two flags are distinct and it is impossible to correctly handle clone(2)
without the assistance of fork1(). This change depends on the pwddesc split
introduced in r367777.
I've added a fork_req flag, FR2_SHARE_PATHS, which indicates that p_pd
should be treated the opposite way p_fd is (based on RFFDG flag). This is a
little ugly, but the benefit is that existing RFFDG API is preserved.
Holding FR2_SHARE_PATHS disabled, RFFDG indicates both p_fd and p_pd are
copied, while !RFFDG indicates both should be cloned.
In Chrome, clone(2) is used with CLONE_FS, without CLONE_FILES, and expects
independent fd tables.
The previous conflation of CLONE_FS and CLONE_FILES was introduced in
r163371 (2006).
Discussed with: markj, trasz (earlier version)
Differential Revision: https://reviews.freebsd.org/D27016
No functional change intended.
Tracking these structures separately for each proc enables future work to
correctly emulate clone(2) in linux(4).
__FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof.
Reviewed by: kib
Discussed with: markj, mjg
Differential Revision: https://reviews.freebsd.org/D27037
As this ABI is still fresh (r367287), let's correct some mistakes now:
- Version the structure to allow for future changes
- Include sender's pid in control message structure
- Use a distinct control message type from the cmsgcred / sockcred mess
Discussed with: kib, markj, trasz
Differential Revision: https://reviews.freebsd.org/D27084
This is used by some Linux programs using filehandles (r367773) to locate
the mountpoint for a given fsid.
Differential Revision: https://reviews.freebsd.org/D27136
All of the compat32 variants are substantially the same, save for
copyin/copyout (mostly). Apply the same kind of technique used with kevent
here by having the syscall routines supply a umtx_copyops describing the
operations needed.
umtx_copyops carries the bare minimum needed- size of timespec and
_umtx_time are used for determining if copyout is needed in the sem2_wait
case.
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27222
LinuxKPI ACPI support is based on FreeBSD import of ACPICA which can be
compiled only on aarch64, amd64 and i386. Ifdef-out broken parts on our
side to avoid patching of vendor code.
This fixes drm-devel-kmod build on powerpc64(le).
Reported by: pkubaj
It includes:
ACPI_HANDLE() implementation.
AC and VIDEO ACPI events notification support.
Replacement of hand-rolled GPLed _DSM method evaluation helpers
with in-base ones.
Submitted by: wulf
Differential Revision: https://reviews.freebsd.org/D26603
from a Linux binary. Should come handy for AppImages.
Reviewed by: asomers
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26959
- map those IPv4 / IPv6 socket options which exist in FreeBSD
+ most of them visually verified to have the same type/layout of arguments
+ not tested with linux programs to behave as intended
- be more human readable for known options which are not handled
- be more verbose for unhandled socket message flags we know about
- print the jail ID in linux_msg if run in a jail
- add possibility to print debug message about known missing parts only once
- add multiple levels of sysctl linux.debug:
1: print debug messages, tell about unimplemented stuff (only once)
2: like 1, but also print messages about implemented but not tested
stuff (only once)
3+: like 2, but no rate limiting of messages
- increase default linux debug level from 1 to 3
We are a lot more verbose in as we need to be (e.g. some of the IP socket
options which are the same, and share the same memory layout, and are
believed to work). The reason is that we have no good testsuite to test those
linux-bits. The LTP or other test suites like the python one, are not fully
up to the task we need. As such the excessive messages about emulated but not
tested socket options.
IMO any MFC (possible, but most probably not by me) should set the default
debug level to 1.
Discussed with: trasz
Move dtrace SDT definitions into linux_common module code. Also, build
linux_dummy.c into the linux_common kld -- we don't need separate
versions of these stubs for 32- and 64-bit emulation.
Reported by: several
PR: 250897
Discussed with: emaste, trasz
Tested by: John Kennedy, Yasuhiro KIMURA, Oleg Sidorkin
X-MFC-With: r367395
Differential Revision: https://reviews.freebsd.org/D27124
Add a pseudofs node flag 'PFS_AUTODRAIN', which automatically emits sbuf
contents to the caller when the sbuf buffer fills. This is only
permissible if the corresponding PFS node fill function can sleep
whenever it appends to the sbuf.
linprocfs' /proc/self/maps node happens to meet this requirement.
Streaming out the file as it is composed avoids truncating the output
and also avoids preallocating a very large buffer.
Reviewed by: markj; earlier version: emaste, kib, trasz
Differential Revision: https://reviews.freebsd.org/D27047
This will be used by fuse(4).
Reviewed by: asomers
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26974
Add some missing netlink_family definitions and produce vaguely
human-readable error messages for those definitions, like we used to do for
just ROUTE and KOBJECT_UEVENTS.
Additionally, if we know it's a netfilter socket but didn't find it in the
table, fall back to printing that instead of the generic handler ("socket
domain 16, ...").
No change to the emulator correctness, just mildly improved diagnostics for
gaps.
Proxy the flag to the roughly analogous FreeBSD procctl 'TRACE'.
TRACE-disabled processes are not coredumped, and Linux !DUMPABLE processes
can not be ptraced. There are some additional semantics around ownership of
files in the /proc/[pid] pseudo-filesystem, which we do not attempt to
emulate correctly at this time.
Reviewed by: markj (earlier version)
Differential Revision: https://reviews.freebsd.org/D27015
This is required by some major linux applications, such as Chrome and
Firefox. (As well as Electron-using applications, which are essentially
a bundled version of Chrome.)
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27012
On Linux, sqlite probes for underlying F2FS filesystems that support
certain kinds of atomic update with this ioctl. The expected result on
non-F2FS filesystem (i.e., all FreeBSD filesystems) is any error value.
Minimally implement the ioctl and avoid the warning message.
(This shows up in Linux Chrome, which embeds sqlite.)
Reviewed by: emaste, trasz
Differential Revision: https://reviews.freebsd.org/D27050
more readable. While here, add linux_check_errtbl() function to make
sure we don't leave holes.
No objections: emaste (earlier version)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26972
r326145 corrected do_execve() to return EJUSTRETURN upon success so that
important registers are not clobbered. This had the side effect of tapping
out 'failures' for all *execve(2) audit records, which is less than useful
for auditing purposes.
Audit exec returns earlier, where we can know for sure that EJUSTRETURN
translates to success. Note that this unsets TDP_AUDITREC as we commit the
audit record, so the usual audit in the syscall return path will do nothing.
PR: 249179
Reported by: Eirik Oeverby <ltning-freebsd anduin net>
Reviewed by: csjp, kib
MFC after: 1 week
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26922