- Add a new KTR_STRUCT_ARRAY ktrace record type which dumps an array of
structures.
The structure name in the record payload is preceded by a size_t
containing the size of the individual structures. Use this to
replace the previous code that dumped the kevent arrays dumped for
kevent(). kdump is now able to decode the kevent structures rather
than dumping their contents via a hexdump.
One change from before is that the 'changes' and 'events' arrays are
not marked with separate 'read' and 'write' annotations in kdump
output. Instead, the first array is the 'changes' array, and the
second array (only present if kevent doesn't fail with an error) is
the 'events' array. For kevent(), empty arrays are denoted by an
entry with an array containing zero entries rather than no record.
- Move kevent decoding tables from truss to libsysdecode.
This adds three new functions to decode members of struct kevent:
sysdecode_kevent_filter, sysdecode_kevent_flags, and
sysdecode_kevent_fflags.
kdump uses these helper functions to pretty-print kevent fields.
- Move structure definitions for freebsd11 and freebsd32 kevent
structures to <sys/event.h> so that they can be shared with userland.
The 32-bit structures are only exposed if _WANT_KEVENT32 is defined.
The freebsd11 structures are only exposed if _WANT_FREEBSD11_KEVENT is
defined. The 32-bit freebsd11 structure requires both.
- Decode freebsd11 kevent structures in truss for the compat11.kevent()
system call.
- Log 32-bit kevent structures via ktrace for 32-bit compat kevent()
system calls.
- While here, constify the 'void *data' argument to ktrstruct().
Reviewed by: kib (earlier version)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D12470
Upon successful completion, the execve() system call invokes
exec_setregs() to initialize the registers of the initial thread of the
newly executed process. What is weird is that when execve() returns, it
still goes through the normal system call return path, clobbering the
registers with the system call's return value (td->td_retval).
Though this doesn't seem to be problematic for x86 most of the times (as
the value of eax/rax doesn't matter upon startup), this can be pretty
frustrating for architectures where function argument and return
registers overlap (e.g., ARM). On these systems, exec_setregs() also
needs to initialize td_retval.
Even worse are architectures where cpu_set_syscall_retval() sets
registers to values not derived from td_retval. On these architectures,
there is no way cpu_set_syscall_retval() can set registers to the way it
wants them to be upon the start of execution.
To get rid of this madness, let sys_execve() return EJUSTRETURN. This
will cause cpu_set_syscall_retval() to leave registers intact. This
makes process execution easier to understand. It also eliminates the
difference between execution of the initial process and successive ones.
The initial call to sys_execve() is not performed through a system call
context.
Reviewed by: kib, jhibbits
Differential Revision: https://reviews.freebsd.org/D13180
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
Initially, only tag files that use BSD 4-Clause "Original" license.
RelNotes: yes
Differential Revision: https://reviews.freebsd.org/D13133
in the LinuxKPI returns NULL. This happens when the VM area's private
data handle already exists and could cause a so-called NULL pointer
dereferencing issue prior to this fix.
Found by: greg@unrelenting.technology
MFC after: 1 week
Sponsored by: Mellanox Technologies
LinuxKPI task struct. Change type of "state" variable from "int" to
"atomic_t" to simplify code and avoid unneccessary casting.
MFC after: 1 week
Sponsored by: Mellanox Technologies
Compared to the previous version, v0.16, there are a couple of minor
changes:
- CLOUDABI_AT_PID: Process identifiers for CloudABI processes.
Initially, BSD process identifiers weren't exposed inside the runtime,
due to them being pretty much useless inside of a cluster computing
environment. When jobs are scheduled across systems, the BSD process
number doesn't act as an identifier. Even on individual systems they
may recycle relatively quickly.
With this change, the kernel will now generate a UUIDv4 when executing
a process. These UUIDs can be obtained within the process using
program_getpid(). Right now, FreeBSD will not attempt to store this
value. This should of course happen at some point in time, so that it
may be printed by administration tools.
- Removal of some unused structure members for polling.
With the polling framework being simplified/redesigned, it turns out
some of the structure fields were not used by the C library. We can
remove these to keep things nice and tidy.
Obtained from: https://github.com/NuxiNL/cloudabi
This pointer is checked during the linux_dev_open() callback and does
not need to be NULL checked again. It should always be set for
character devices belonging to the "linuxcdevsw" and technically
there is no need to NULL check this pointer at all.
Suggested by: kib @
MFC after: 1 week
Sponsored by: Mellanox Technologies
The most important change in this release is the removal of the
poll_fd() system call; CloudABI's equivalent of kevent(). Though I think
that kqueue is a lot saner than many of its alternatives, our
experience is that emulating this system call on other systems
accurately isn't easy. It has become a complex API, even though I'm not
convinced this complexity is needed. This is why we've decided to take a
different approach, by looking one layer up.
We're currently adding an event loop to CloudABI's C library that is API
compatible with libuv (except when incompatible with Capsicum).
Initially, this event loop will be built on top of plain inefficient
poll() calls. Only after this is finished, we'll work our way backwards
and design a new set of system calls to optimize it.
Interesting challenges will include integrating asynchronous I/O into
such a system call API. libuv currently doesn't aio(4) on Linux/BSD, due
to it being unreliable and having undesired semantics.
Obtained from: https://github.com/NuxiNL/cloudabi
Create a config file for PCI devices that exposes their configuration
space. Only fields needed by libdrm are filled in (vendor, device,
revision, subvendor and subdevice).
Link /sys/class/drm/card%d/device to the PCI device directory.
After some in-progress work is committed, this would otherwise be the only
instance of #if(n)def NO_SWAPPING in the tree. Moreover, the requisite
opt_vm.h include was missing, so the PHOLD/PRELE calls were always being
compiled in anyway.
MFC after: 1 week
then td->td_sel is NULL and this will result in a segfault inside selrecord().
This happens when only using kqueue() to poll for read and write events.
If select() and kqueue() is mixed there won't be a segfault.
Reported by: Johannes Lundberg
MFC after: 1 week
Sponsored by: Mellanox Technologies
All of these arguments are stored in m_ext, so there is no reason
to pass them in the argument list. Not all functions need the second
argument, some don't even need the first one. The second argument
lives in next cache line, so not dereferencing it is a performance
gain. This was discovered in sendfile(2), which will be covered by
next commits.
The second goal of this commit is to bring even more flexibility
to m_ext mbufs, allowing to create more fields in m_ext, opaque to
the generic mbuf code, and potentially set and dereferenced by
subsystems.
Reviewed by: gallatin, kbowling
Differential Revision: https://reviews.freebsd.org/D12615
gets drained before invoking the work function. Else the timer
mutex may still be in use which can lead to use-after-free situations,
because the work function might free the work structure before returning.
MFC after: 1 week
Sponsored by: Mellanox Technologies
Bump the FreeBSD version to force recompilation of external
kernel modules due to structure change.
PR: 222504
Submitted by: Greg V <greg@unrelenting.technology>
MFC after: 1 week
Sponsored by: Mellanox Technologies
1. Swap the order of device_get_ivars with device_get_devclass and devclass
name validation. This bug was introduced in r323692.
2. Error check device_get_children and free the returned list. This bug was
introduced in the original linsysfs commit.
Reported by: Oleg V. Nauman <oleg AT theweb.org.ua>, hselasky (1); hselasky (2)
Reviewed by: hselasky
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12407
Expose more information about PCI devices (and GPUs in particular) via
linsysfs to libdrm.
This allows unmodified modern 64-bit Linux libdrm to work, which allows
modern Linux Mesa to work. The submitter reports that he tested the change
with an Ubuntu 16.04 chroot + amdgpu from graphics/drm-next-kmod.
PR: 222375
Submitted by: Greg V <greg AT unrelenting.technology>
have a scope ID. Change size of the searched scope ID to the full
16-bits. There can typically be more than 255 interfaces.
Suggested by: ae @
MFC after: 1 week
Sponsored by: Mellanox Technologies
Workaround problem that ifa_ifwithaddr() also matches the scope ID of
the IPv6 address when searching for a maching IPv6 address. For now
simply try all valid scope IDs until a match is found.
MFC after: 1 week
Sponsored by: Mellanox Technologies
use of the linux_poll_wakeup() function from unsafe contexts, which
can lead to use-after-free issues.
Instead of calling linux_poll_wakeup() directly use the wake_up()
family of functions in the LinuxKPI to do this.
Bump the FreeBSD version to force recompilation of external kernel modules.
MFC after: 1 week
Sponsored by: Mellanox Technologies
and convert any messages of types SCM_BINTIME, SCM_TIMESTAMP,
SCM_REALTIME and SCM_MONOTONIC from 64-bit to its 32-bit
representation. Otherwise we either run out of user-supplied
buffer to copy those out resulting in the MSG_CTRUNC or simply
return values that the userland 32-bit code is not going
to parse correctly. This fixes at least two regression tests
failing to function properly in 32-bit compat mode:
tools/regression/sockets/udp_pingpong
tools/regression/sockets/unix_cmsg
PR: kern/222039
MFC after: 30 days
Now that CloudABI's sockets API has been changed to be addressless and
only connected socket instances are used (e.g., socket pairs), they have
become fairly similar to pipes. The only differences on CloudABI is that
socket pairs additionally support shutdown(), send() and recv().
To simplify the ABI, we've therefore decided to remove pipes as a
separate file descriptor type and just let pipe() return a socket pair
of type SOCK_STREAM. S_ISFIFO() and S_ISSOCK() are now defined
identically.
While I am here, declare struct md_ioctl32 as packed which allows
us to stop playing tricks with sizeof(md_ioctl32)+y as well as
simplifies md_pad handling. Both were necessary because of different
alignment preferences on amd64 vs i386.
MFC after: 4 weeks
Now that all of the packaged software has been adjusted to either use
Flower (https://github.com/NuxiNL/flower) for making incoming/outgoing
network connections or can have connections injected, there is no longer
need to keep accept() around. It is now a lot easier to write networked
services that are address family independent, dual-stack, testable, etc.
Remove all of the bits related to accept(), but also to
getsockopt(SO_ACCEPTCONN).
With Flower (CloudABI's network connection daemon) becoming more
complete, there is no longer any need for creating any unconnected
sockets. Socket pairs in combination with file descriptor passing is all
that is necessary, as that is what is used by Flower to pass network
connections from the public internet to listening processes.
Remove all of the kernel bits that were used to implement socket(),
listen(), bindat() and connectat(). In principle, accept() and
SO_ACCEPTCONN may also be removed, but there are still some consumers
left.
Obtained from: https://github.com/NuxiNL/cloudabi
MFC after: 1 month
Deadlock condition:
The return value of TDQ_LOCKPTR(td) is the same for two threads.
1) The first thread signals a wakeup while keeping the rcu_read_lock().
This invokes sched_add() which in turn will try to lock TDQ_LOCK().
2) The second thread is calling synchronize_rcu() calling mi_switch() over
and over again trying to yield(). This prevents the first thread from running
and releasing the RCU reader lock.
Solution:
Release the thread lock while yielding to allow other threads to acquire the
lock pointed to by TDQ_LOCKPTR(td).
Found by: KrishnamRaju ErapaRaju <Krishna2@chelsio.com>
MFC after: 1 week
Sponsored by: Mellanox Technologies