Similar to the change for sendmsg(), create a pointer size independent
implementation of recvmsg() and let cloudabi32 and cloudabi64 call into
it. In case userspace requests one or more file descriptors, call
kern_recvit() in such a way that we get the control message headers in
an mbuf. Iterate over all of the headers and copy the file descriptors
to userspace.
Reduce the potential amount of code duplication between cloudabi32 and
cloudabi64 by creating a cloudabi_sock_recv() utility function. The
cloudabi32 and cloudabi64 modules will then only contain code to convert
the iovecs to the native pointer size.
In cloudabi_sock_recv(), we can now construct an SCM_RIGHTS cmsghdr in
an mbuf and pass that on to kern_sendit().
Add a clock_nanosleep() syscall, as specified by POSIX.
Make nanosleep() a wrapper around it.
Attach the clock_nanosleep test from NetBSD. Adjust it for the
FreeBSD behavior of updating rmtp only when interrupted by a signal.
I believe this to be POSIX-compliant, since POSIX mentions the rmtp
parameter only in the paragraph about EINTR. This is also what
Linux does. (NetBSD updates rmtp unconditionally.)
Copy the whole nanosleep.2 man page from NetBSD because it is complete
and closely resembles the POSIX description. Edit, polish, and reword it
a bit, being sure to keep any relevant text from the FreeBSD page.
Reviewed by: kib, ngie, jilles
MFC after: 3 weeks
Relnotes: yes
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D10020
nanosleep() updates rmtp on EINVAL. In that case, kern_nanosleep()
has not updated rmt, so sys_nanosleep() updates the user-space rmtp
by copying garbage from its stack frame. This is not only a kernel
memory disclosure, it's also not POSIX-compliant. Fix it to update
rmtp only on EINTR.
Reviewed by: jilles (via D10020), dchagin
MFC after: 3 days
Security: possibly
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D10044
functions in the LinuxKPI. Add a usage atomic to the task_struct
structure to facilitate refcounting the task structure when returned
from get_pid_task(). The get_task_struct() and put_task_struct()
function is used to manage atomic refcounting. After this change the
task_struct should only be freed through put_task_struct().
Obtained from: kmacy @
MFC after: 1 week
Sponsored by: Mellanox Technologies
some associated helper functions in the LinuxKPI. Let the existing
linux_alloc_current() function allocate and initialize the new
structure and let linux_free_current() drop the refcount on the memory
mapping structure. When the mm_struct's refcount reaches zero, the
structure is freed.
Obtained from: kmacy @
MFC after: 1 week
Sponsored by: Mellanox Technologies
pairwise to support the FreeBSD way of pushing and popping the page
fault flags. Ensure this by requiring every occurrence of pagefault
disable function call to have a corresponding pagefault enable call.
Obtained from: kmacy @
MFC after: 1 week
Sponsored by: Mellanox Technologies
Support is implemented by mapping Linux's "struct net" into FreeBSD's
"struct vnet". Currently only vnet0 is supported by ibcore.
MFC after: 1 week
Sponsored by: Mellanox Technologies
When locking a mutex and deadlock is detected the first mutex lock
call that sees the deadlock will return -EDEADLK .
MFC after: 1 week
Sponsored by: Mellanox Technologies
Add support for using mutexes during KDB and shutdown. This is also
required for doing mode-switching during panic for drm-next.
Add new mutex functions mutex_init_witness() and mutex_destroy()
allowing LinuxKPI mutexes to be tracked by witness.
Declare mutex_is_locked() and mutex_is_owned() like inline functions
to get cleaner warnings. These functions are used inside WARN_ON()
statements which might look a bit odd if these functions get fully
expanded.
Give mutexes better debug names through the mutex_name() macro when
WITNESS_ALL is defined. The mutex_name() macro can prefix parts of the
filename and line number before the mutex name.
MFC after: 1 week
Sponsored by: Mellanox Technologies
kthread_add() will assert it is called too soon. This fixes a startup
issue when COMPAT_LINUXKPI is in enabled the kernel configuration
file.
Reported by: Michael Butler <imb@protected-networks.net>
MFC after: 1 week
Sponsored by: Mellanox Technologies
Put large functions into linux_slab.c instead of declaring them static
inline.
Add support for more memory allocation wrappers like kmalloc_array()
and __vmalloc().
Make sure either the M_WAITOK or the M_NOWAIT flag is set and mask
away unused memory allocation flags before calling FreeBSD's malloc()
routine.
Move kmalloc_node() definition to slab.h where it belongs.
Implement support for the SLAB_DESTROY_BY_RCU feature when creating a
kmem_cache which basically means kmem_cache memory is freed using
call_rcu().
MFC after: 1 week
Sponsored by: Mellanox Technologies
LinuxKPI. When the type of the argument is constant the temporary
variable cannot be assigned after the barrier. Instead assign the
temporary variable by initialization.
MFC after: 1 week
Sponsored by: Mellanox Technologies
related struct definitions out into the MI path.
Invert the native ipc structs to the Linux ipc structs convesion logic.
Since 64-bit variant of ipc structs has more precision convert native ipc
structs to the 64-bit Linux ipc structs and then truncate 64-bit values
into the non 64-bit if needed. Unlike Linux, return EOVERFLOW if the
values do not fit.
Fix SYSV IPC for 64-bit Linuxulator which never sets IPC_64 bit.
MFC after: 1 month
This avoids creating own per-CPU threads and also ensures the tasklet
execution happens on the same CPU core invoking the tasklet.
MFC after: 1 week
Sponsored by: Mellanox Technologies
This change makes the workqueue implementation behave more like in
Linux, both functionality wise and structure wise.
All workqueue code has been moved to linux_work.c
Add an atomic based statemachine to the work_struct to ensure proper
operation. Prior to this change struct_work was directly mapped to a
FreeBSD task. When a taskqueue has multiple threads the same task may
end up being executed on more than one worker thread simultaneously.
This might cause problems with code coming from Linux, which expects
serial behaviour, similar to Linux tasklets.
Move all global workqueue function names into the linux_xxx domain to
avoid symbol name clashes in the future.
Implement a few more workqueue related functions and macros.
Create two multithreaded taskqueues for the LinuxKPI during module
load, one for time-consuming callbacks and one for non-time consuming
callbacks.
MFC after: 1 week
Sponsored by: Mellanox Technologies
the syscalls that are not implemented in Linux kernel itself.
Cleanup DUMMY() macros.
Reviewed by: dchagin, trasz
Approved by: dchagin
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D9804
WITNESS_ALL is defined. The lock name is based on the filename and
line number where the initialisation happens.
MFC after: 1 week
Sponsored by: Mellanox Technologies
Linuxulator is x86 only.
The only notable differences in algnment for an LP64 64-bit system
when compared to a 32-bit system is an eight or large byte types
alignment.
MFC after: 1 month
- Optimise the RCU implementation to not allocate and free
ck_epoch_records during runtime. Instead allocate two sets of
ck_epoch_records per CPU for general purpose use. The first set is
only used for reader locks and the second set is only used for
synchronization and barriers and is protected with a regular mutex to
prevent simultaneous issues.
- Move the task structure away from the rcu_head structure and into
the per-CPU structures. This allows the size of the rcu_head structure
to be reduced down to the size of two pointers.
- Fix a bug where the linux_rcu_barrier() function only waited for one
per-CPU epoch record to be completed instead of all.
- Use a critical section or a mutex to protect ck_epoch_begin() and
ck_epoch_end() depending on RCU or SRCU type. All the ck_epoch_xxx()
functions, except ck_epoch_register(), ck_epoch_unregister() and
ck_epoch_recycle() are not re-entrant and needs a critical section or
a mutex to operate in the LinuxKPI, after inspecting the CK
implementation of the above mentioned functions. The simultaneous
issues arise from per-CPU epoch records being shared between multiple
threads depending on the amount of taskswitching and how many threads
are involved with the RCU and SRCU operations.
- Properly free all epoch records by using safe list traversal at
LinuxKPI module unload. It turns out the ck_epoch_recycle() always
have the records on an internal list and use a flag in the epoch
record to track allocated and free entries. This would lead to use
after free during module unload.
- Remove redundant synchronize_rcu() call from the
linux_compat_uninit() function. Let the linux_rcu_runtime_uninit()
function do the final rcu_barrier() instead.
MFC after: 1 week
Sponsored by: Mellanox Technologies
The clang compiler will optimise these functions down to three AMD64
instructions if the bit argument is a constant during compilation.
MFC after: 1 week
Sponsored by: Mellanox Technologies
When allocating unmapped pages, take advantage of the direct map on
AMD64 to get the virtual address corresponding to a page. Else all
pages allocated must be mapped because sometimes the virtual address
of a page is requested.
Move all page allocation and deallocation code into an own C-file.
Add support for GFP_DMA32, GFP_KERNEL, GFP_ATOMIC and __GFP_ZERO
allocation flags.
Make a clear separation between mapped and unmapped allocations.
Obtained from: kmacy @
MFC after: 1 week
Sponsored by: Mellanox Technologies
The i915kms driver in Linux 4.9 reimplement parts of the scatter list
functions with regards to performance. In other words there is not so
much room for changing structure layouts and functionality if the
i915kms should be built AS-IS. This patch aligns the scatter list
support to what is expected by the i915kms driver. Remove some
comments not needed while at it.
Obtained from: kmacy @
MFC after: 1 week
Sponsored by: Mellanox Technologies
would be useless anyway - there is no point in pretending to have block
devices; our "block" devices are in fact character ones, and can only
be accessed as such.
Discussed with: dchagin
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
1) Add better spinlock debug names when WITNESS_ALL is defined.
2) Make sure that the calling thread gets bound to the current CPU
while a spinlock is locked. Some Linux kernel code depends on that the
CPU ID doesn't change while a spinlock is locked.
3) Add support for using LinuxKPI spinlocks during a panic().
MFC after: 1 week
Sponsored by: Mellanox Technologies
Tasklets are implemented using a taskqueue and a small statemachine on
top. The additional statemachine is required to ensure all LinuxKPI
tasklets get serialized. FreeBSD taskqueues do not guarantee
serialisation of its tasks, except when there is only one worker
thread configured.
MFC after: 1 week
Sponsored by: Mellanox Technologies
A set of helper functions have been added to manage the life of the
LinuxKPI task struct. When an external system call or task is invoked,
a check is made to create the task struct by demand. A thread
destructor callback is registered to free the task struct when a
thread exits to avoid memory leaks.
This change lays the ground for emulating the Linux kernel more
closely which is a dependency by the code using the LinuxKPI APIs.
Add new dedicated td_lkpi_task field has been added to struct thread
instead of abusing td_retval[1].
Fix some header file inclusions to make LINT kernel build properly
after this change.
Bump the __FreeBSD_version to force a rebuild of all kernel modules.
MFC after: 1 week
Sponsored by: Mellanox Technologies
parameter to mmap(2), even if MAP_FIXED is not explicitly specified.
Android ART is one example. Implement bug compatibility for this case
in linuxulator.
Reviewed by: dchagin@
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D9373
Convert linux_recv(), linux_send() and linux_accept() system call arguments
to the register_t type too.
PR: 217161
MFC after: 3 days
xMFC with: r313284,r313285,r313684
getdents64() with wrapper over kern_getdirentries().
The patch was originally written by emaste@ and then adapted by trasz@
and me.
Note:
1. I divided linux_getdents() and linux_readdir() as in case when the
getdents() called with count = 1 (readdir() case) it can overwrite
user stack (by writing to user buffer pointer more than 1 byte).
2. Linux returns EINVAL in case when user supplied buffer is not enough
to contain fetched dirent.
3. Linux returns ENOTDIR in case when fd points to not a directory.
Reviewed by: trasz@
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D2210
Rename kern_vm_* functions to kern_*. Move the prototypes to
syscallsubr.h. Also change Mach VM types to uintptr_t/size_t as
needed, to avoid headers pollution.
Requested by: alc, jhb
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D9535