Commit Graph

2847 Commits

Author SHA1 Message Date
John Baldwin
ffb6607984 Decode kevent structures logged via ktrace(2) in kdump.
- Add a new KTR_STRUCT_ARRAY ktrace record type which dumps an array of
  structures.

  The structure name in the record payload is preceded by a size_t
  containing the size of the individual structures.  Use this to
  replace the previous code that dumped the kevent arrays dumped for
  kevent().  kdump is now able to decode the kevent structures rather
  than dumping their contents via a hexdump.

  One change from before is that the 'changes' and 'events' arrays are
  not marked with separate 'read' and 'write' annotations in kdump
  output.  Instead, the first array is the 'changes' array, and the
  second array (only present if kevent doesn't fail with an error) is
  the 'events' array.  For kevent(), empty arrays are denoted by an
  entry with an array containing zero entries rather than no record.

- Move kevent decoding tables from truss to libsysdecode.

  This adds three new functions to decode members of struct kevent:
  sysdecode_kevent_filter, sysdecode_kevent_flags, and
  sysdecode_kevent_fflags.

  kdump uses these helper functions to pretty-print kevent fields.

- Move structure definitions for freebsd11 and freebsd32 kevent
  structures to <sys/event.h> so that they can be shared with userland.
  The 32-bit structures are only exposed if _WANT_KEVENT32 is defined.
  The freebsd11 structures are only exposed if _WANT_FREEBSD11_KEVENT is
  defined.  The 32-bit freebsd11 structure requires both.

- Decode freebsd11 kevent structures in truss for the compat11.kevent()
  system call.

- Log 32-bit kevent structures via ktrace for 32-bit compat kevent()
  system calls.

- While here, constify the 'void *data' argument to ktrstruct().

Reviewed by:	kib (earlier version)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D12470
2017-11-25 04:49:12 +00:00
Ed Schouten
814629dd64 Don't let cpu_set_syscall_retval() clobber exec_setregs().
Upon successful completion, the execve() system call invokes
exec_setregs() to initialize the registers of the initial thread of the
newly executed process. What is weird is that when execve() returns, it
still goes through the normal system call return path, clobbering the
registers with the system call's return value (td->td_retval).

Though this doesn't seem to be problematic for x86 most of the times (as
the value of eax/rax doesn't matter upon startup), this can be pretty
frustrating for architectures where function argument and return
registers overlap (e.g., ARM). On these systems, exec_setregs() also
needs to initialize td_retval.

Even worse are architectures where cpu_set_syscall_retval() sets
registers to values not derived from td_retval. On these architectures,
there is no way cpu_set_syscall_retval() can set registers to the way it
wants them to be upon the start of execution.

To get rid of this madness, let sys_execve() return EJUSTRETURN. This
will cause cpu_set_syscall_retval() to leave registers intact. This
makes process execution easier to understand. It also eliminates the
difference between execution of the initial process and successive ones.
The initial call to sys_execve() is not performed through a system call
context.

Reviewed by:	kib, jhibbits
Differential Revision:	https://reviews.freebsd.org/D13180
2017-11-24 07:35:08 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Pedro F. Giffuni
df57947f08 spdx: initial adoption of licensing ID tags.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes:	yes
Differential Revision:	https://reviews.freebsd.org/D13133
2017-11-18 14:26:50 +00:00
Gordon Tetlow
edb01d11f8 Properly bzero kldstat structure to prevent kernel information leak.
Submitted by:	kib
Reported by:	TJ Corley
Security:	CVE-2017-1088
2017-11-15 22:30:21 +00:00
Hans Petter Selasky
0c19d06405 Properly handle the case where the linux_cdev_handle_insert() function
in the LinuxKPI returns NULL. This happens when the VM area's private
data handle already exists and could cause a so-called NULL pointer
dereferencing issue prior to this fix.

Found by:	greg@unrelenting.technology
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-11-13 18:16:26 +00:00
Mateusz Guzik
537d0fb138 Use pfind_any in linux_rt_sigqueueinfo and kern_sigqueue 2017-11-11 18:10:09 +00:00
Hans Petter Selasky
ef9257491d Remove release and acquire semantics when accessing the "state" field of the
LinuxKPI task struct. Change type of "state" variable from "int" to
"atomic_t" to simplify code and avoid unneccessary casting.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-11-11 11:01:50 +00:00
Hans Petter Selasky
e0390735d3 Mask away return codes from del_timer() and del_timer_sync() because
they are not the same like in Linux.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-11-11 10:46:12 +00:00
Hans Petter Selasky
076f7ce6f6 Remove some not needed comments in the LinuxKPI. Use the Linux source tree
to lookup documentation for the functions implemented in the LinuxKPI
instead.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-11-10 08:31:40 +00:00
Ed Schouten
7e6155744d Upgrade to CloudABI v0.17.
Compared to the previous version, v0.16, there are a couple of minor
changes:

- CLOUDABI_AT_PID: Process identifiers for CloudABI processes.

  Initially, BSD process identifiers weren't exposed inside the runtime,
  due to them being pretty much useless inside of a cluster computing
  environment. When jobs are scheduled across systems, the BSD process
  number doesn't act as an identifier. Even on individual systems they
  may recycle relatively quickly.

  With this change, the kernel will now generate a UUIDv4 when executing
  a process. These UUIDs can be obtained within the process using
  program_getpid(). Right now, FreeBSD will not attempt to store this
  value. This should of course happen at some point in time, so that it
  may be printed by administration tools.

- Removal of some unused structure members for polling.

  With the polling framework being simplified/redesigned, it turns out
  some of the structure fields were not used by the C library. We can
  remove these to keep things nice and tidy.

Obtained from:	https://github.com/NuxiNL/cloudabi
2017-11-08 14:21:52 +00:00
Hans Petter Selasky
a7a3d0d170 Make the dma_alloc_coherent() function in the LinuxKPI NULL safe with regard
to the "dev" argument.

Submitted by:	Krishnamraju Eraparaju @ Chelsio
Sponsored by:	Chelsio Communications
MFC after:	1 week
2017-11-08 08:37:05 +00:00
Hans Petter Selasky
8ead3a9933 Remove redundant dev->si_drv1 NULL checks in the LinuxKPI.
This pointer is checked during the linux_dev_open() callback and does
not need to be NULL checked again. It should always be set for
character devices belonging to the "linuxcdevsw" and technically
there is no need to NULL check this pointer at all.

Suggested by:	kib @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-11-03 13:43:05 +00:00
Hans Petter Selasky
62d08fae13 Implement ioread16be() in the LinuxKPI.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-11-01 12:34:18 +00:00
Hans Petter Selasky
b37c654140 Unconditionally include "opt_inet6.h" in the LinuxKPI.
This makes sure the INET6 macro gets properly defined,
also for kernel module builds.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-11-01 12:21:28 +00:00
David E. O'Brien
61407763bc Update comment to match r177997 & r178036 changes. 2017-10-27 16:36:05 +00:00
Ed Schouten
4e1847781b Import the latest CloudABI definitions, version 0.16.
The most important change in this release is the removal of the
poll_fd() system call; CloudABI's equivalent of kevent(). Though I think
that kqueue is a lot saner than many of its alternatives, our
experience is that emulating this system call on other systems
accurately isn't easy. It has become a complex API, even though I'm not
convinced this complexity is needed. This is why we've decided to take a
different approach, by looking one layer up.

We're currently adding an event loop to CloudABI's C library that is API
compatible with libuv (except when incompatible with Capsicum).
Initially, this event loop will be built on top of plain inefficient
poll() calls. Only after this is finished, we'll work our way backwards
and design a new set of system calls to optimize it.

Interesting challenges will include integrating asynchronous I/O into
such a system call API. libuv currently doesn't aio(4) on Linux/BSD, due
to it being unreliable and having undesired semantics.

Obtained from:	https://github.com/NuxiNL/cloudabi
2017-10-18 19:22:53 +00:00
Tijl Coosemans
03cea61b3a Add information needed by Linux libdrm 2.4.74 (shipped with CentOS 7.4).
Create a config file for PCI devices that exposes their configuration
space.  Only fields needed by libdrm are filled in (vendor, device,
revision, subvendor and subdevice).

Link /sys/class/drm/card%d/device to the PCI device directory.
2017-10-15 19:28:14 +00:00
Tijl Coosemans
df4c975275 Set DEVNAME to dri/card%d. This works with both in-tree drm and drm-next
and is also the value used on Linux.

Tested by:	Greg V <greg@unrelenting.technology>
2017-10-15 19:21:15 +00:00
Tijl Coosemans
834804f3fe Add special handling for current in-tree drm devices, like r323692 added
for drm-next.
2017-10-15 16:08:22 +00:00
Tijl Coosemans
f3792e07f6 Use sizeof instead of strlen on string constants. The compiler doesn't
optimise the strlen calls away with -ffreestanding.
2017-10-15 16:03:45 +00:00
Mark Johnston
9db0f8e76f Make the PHOLD in linux_wait_event_common() unconditional.
After some in-progress work is committed, this would otherwise be the only
instance of #if(n)def NO_SWAPPING in the tree. Moreover, the requisite
opt_vm.h include was missing, so the PHOLD/PRELE calls were always being
compiled in anyway.

MFC after:	1 week
2017-10-13 19:27:33 +00:00
Hans Petter Selasky
627ac5b4e3 Don't call selrecord() outside the select system call in the LinuxKPI, because
then td->td_sel is NULL and this will result in a segfault inside selrecord().
This happens when only using kqueue() to poll for read and write events.
If select() and kqueue() is mixed there won't be a segfault.

Reported by:	Johannes Lundberg
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-10-13 14:14:46 +00:00
Ed Maste
cf62459c35 regen freebsd32_sysent.c after r324564 (freebsd32_posix_fallocate) 2017-10-12 18:31:28 +00:00
Ed Maste
bfb763bd24 allow posix_fallocate in 32-bit compat capability mode
Reported by:	kib
MFC after:	2 weeks
MFC with:	r324560
Sponsored by:	The FreeBSD Foundation
2017-10-12 18:30:54 +00:00
Gleb Smirnoff
e8fd18f306 Shorten list of arguments to mbuf external storage freeing function.
All of these arguments are stored in m_ext, so there is no reason
to pass them in the argument list.  Not all functions need the second
argument, some don't even need the first one.  The second argument
lives in next cache line, so not dereferencing it is a performance
gain.  This was discovered in sendfile(2), which will be covered by
next commits.

The second goal of this commit is to bring even more flexibility
to m_ext mbufs, allowing to create more fields in m_ext, opaque to
the generic mbuf code, and potentially set and dereferenced by
subsystems.

Reviewed by:	gallatin, kbowling
Differential Revision:	https://reviews.freebsd.org/D12615
2017-10-09 20:35:31 +00:00
Mark Johnston
bf4e2e5be1 Add get_random_{int,long} to the LinuxKPI.
Fix some whitespace bugs while here.

Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D12588
2017-10-04 17:29:08 +00:00
Hans Petter Selasky
87a567f181 Make sure the timer belonging to the delayed work in the LinuxKPI
gets drained before invoking the work function. Else the timer
mutex may still be in use which can lead to use-after-free situations,
because the work function might free the work structure before returning.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-10-04 13:13:38 +00:00
Pedro F. Giffuni
2c75d7b08d Small style(9) issue: spaces vs TAB. 2017-09-24 20:57:03 +00:00
Hans Petter Selasky
40f53a7cdc Add support for 32-bit compatibility IOCTLs in the LinuxKPI.
Bump the FreeBSD version to force recompilation of external
kernel modules due to structure change.

PR:		222504
Submitted by:	Greg V <greg@unrelenting.technology>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-09-22 08:12:08 +00:00
Ryan Libby
2e6418c0c5 linsysfs: quiet gcc -Wformat after r323692
Reviewed by:	cem
Sponsored by:	Dell EMC Isilon
2017-09-18 19:09:40 +00:00
Conrad Meyer
4d367f2501 linsysfs(5): Fix two unrelated issues
1. Swap the order of device_get_ivars with device_get_devclass and devclass
   name validation.  This bug was introduced in r323692.

2. Error check device_get_children and free the returned list.  This bug was
   introduced in the original linsysfs commit.

Reported by:	Oleg V. Nauman <oleg AT theweb.org.ua>, hselasky (1); hselasky (2)
Reviewed by:	hselasky
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12407
2017-09-18 17:14:13 +00:00
Hans Petter Selasky
62bae5d421 The LinuxKPI atomics do not have acquire nor release semantics unless
specified. Fix code to use READ_ONCE() and WRITE_ONCE() where appropriate.

Suggested by:		kib @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-09-18 13:37:14 +00:00
Hans Petter Selasky
1f7c7e1bec Only wire pages in the LinuxKPI instead of holding and wiring them.
This prevents the page daemon from regularly scanning the held pages.

Suggested by:		kib @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-09-18 13:23:59 +00:00
Hans Petter Selasky
c05238a681 Add support for shared memory functions to the LinuxKPI.
Obtained from:		kmacy @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-09-18 13:17:23 +00:00
Conrad Meyer
2d347b2ef8 linsysfs(5): Add support for recent libdrm
Expose more information about PCI devices (and GPUs in particular) via
linsysfs to libdrm.

This allows unmodified modern 64-bit Linux libdrm to work, which allows
modern Linux Mesa to work.  The submitter reports that he tested the change
with an Ubuntu 16.04 chroot + amdgpu from graphics/drm-next-kmod.

PR:		222375
Submitted by:	Greg V <greg AT unrelenting.technology>
2017-09-17 23:40:16 +00:00
Hans Petter Selasky
6263f8b78d Only search the scope ID in ip6_find_dev() for IPv6 addresses which
have a scope ID. Change size of the searched scope ID to the full
16-bits. There can typically be more than 255 interfaces.

Suggested by:		ae @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-09-09 12:50:12 +00:00
Hans Petter Selasky
f4cf3177a2 Resolve IPv6 scope ID issues when using ip6_find_dev() in the LinuxKPI.
Workaround problem that ifa_ifwithaddr() also matches the scope ID of
the IPv6 address when searching for a maching IPv6 address. For now
simply try all valid scope IDs until a match is found.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-09-09 07:21:27 +00:00
Hans Petter Selasky
6dec7efa83 Properly implement poll_wait() in the LinuxKPI. This prevents direct
use of the linux_poll_wakeup() function from unsafe contexts, which
can lead to use-after-free issues.

Instead of calling linux_poll_wakeup() directly use the wake_up()
family of functions in the LinuxKPI to do this.

Bump the FreeBSD version to force recompilation of external kernel modules.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-09-09 06:29:29 +00:00
Hans Petter Selasky
5b1cfc99cf Add more sanity checks to linux_fget() in the LinuxKPI. This prevents
returning pointers to file descriptors which were not created by the
LinuxKPI.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-09-09 06:04:05 +00:00
Maxim Sobolev
c24e7f3fd9 Correct bintime32 declaration: uint32_t sec -> time32_t sec.
Submitted by:	jhb
MFC after:	1 month
2017-09-08 18:32:13 +00:00
Maxim Sobolev
afbd12c110 In the recvmsg32() system call iterate over returned structure(s)
and convert any messages of types SCM_BINTIME, SCM_TIMESTAMP,
SCM_REALTIME and SCM_MONOTONIC from 64-bit to its 32-bit
representation. Otherwise we either run out of user-supplied
buffer to copy those out resulting in the MSG_CTRUNC or simply
return values that the userland 32-bit code is not going
to parse correctly. This fixes at least two regression tests
failing to function properly in 32-bit compat mode:

    tools/regression/sockets/udp_pingpong
    tools/regression/sockets/unix_cmsg

PR:             kern/222039
MFC after:	30 days
2017-09-07 04:29:57 +00:00
Ed Schouten
733ba7f881 Merge pipes and socket pairs.
Now that CloudABI's sockets API has been changed to be addressless and
only connected socket instances are used (e.g., socket pairs), they have
become fairly similar to pipes. The only differences on CloudABI is that
socket pairs additionally support shutdown(), send() and recv().

To simplify the ABI, we've therefore decided to remove pipes as a
separate file descriptor type and just let pipe() return a socket pair
of type SOCK_STREAM. S_ISFIFO() and S_ISSOCK() are now defined
identically.
2017-09-05 07:46:45 +00:00
Maxim Sobolev
f76de5dd51 Add proper support for the md_label into md(4) ioctl compat layer.
While I am here, declare struct md_ioctl32 as packed which allows
us to stop playing tricks with sizeof(md_ioctl32)+y as well as
simplifies md_pad handling. Both were necessary because of different
alignment preferences on amd64 vs i386.

MFC after:	4 weeks
2017-08-30 15:07:10 +00:00
Ed Schouten
b53b978a6c Complete the CloudABI networking refactoring.
Now that all of the packaged software has been adjusted to either use
Flower (https://github.com/NuxiNL/flower) for making incoming/outgoing
network connections or can have connections injected, there is no longer
need to keep accept() around. It is now a lot easier to write networked
services that are address family independent, dual-stack, testable, etc.

Remove all of the bits related to accept(), but also to
getsockopt(SO_ACCEPTCONN).
2017-08-30 07:30:06 +00:00
Ed Schouten
8212ad9a99 Sync CloudABI compatibility against the latest upstream version (v0.13).
With Flower (CloudABI's network connection daemon) becoming more
complete, there is no longer any need for creating any unconnected
sockets. Socket pairs in combination with file descriptor passing is all
that is necessary, as that is what is used by Flower to pass network
connections from the public internet to listening processes.

Remove all of the kernel bits that were used to implement socket(),
listen(), bindat() and connectat(). In principle, accept() and
SO_ACCEPTCONN may also be removed, but there are still some consumers
left.

Obtained from:	https://github.com/NuxiNL/cloudabi
MFC after:	1 month
2017-08-25 11:01:39 +00:00
Mark Johnston
c72fadf8f0 Set the bus number field when attaching a PCI device.
MFC after:	1 week
2017-08-23 16:50:10 +00:00
Mark Johnston
7e1a02baa5 Add some miscellaneous definitions to support the DRM drivers.
MFC after:	1 week
2017-08-22 17:13:28 +00:00
Hans Petter Selasky
714ed5b27b Fix for deadlock situation in the LinuxKPI's RCU synchronize API.
Deadlock condition:
The return value of TDQ_LOCKPTR(td) is the same for two threads.

1) The first thread signals a wakeup while keeping the rcu_read_lock().
This invokes sched_add() which in turn will try to lock TDQ_LOCK().

2) The second thread is calling synchronize_rcu() calling mi_switch() over
and over again trying to yield(). This prevents the first thread from running
and releasing the RCU reader lock.

Solution:
Release the thread lock while yielding to allow other threads to acquire the
lock pointed to by TDQ_LOCKPTR(td).

Found by:	KrishnamRaju ErapaRaju <Krishna2@chelsio.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-08-21 11:51:40 +00:00
Mark Johnston
00d47e02ac Define prefetch() only if it hasn't already been defined.
MFC after:	1 week
2017-08-20 01:42:01 +00:00