Commit Graph

2873 Commits

Author SHA1 Message Date
Konstantin Belousov
67dcd64ab8 linuxkpi: Do not leak pages on put.
When the owner of the wire reference releases the last reference, it
might be that the page was already attempted to be freed (but free
cannot be performed at that time due to wire).  Check that the page
was removed from the object as the indicator of the free attempt and
finish the free operation if so.

Reported and tested by:	Slava Shwartsman
Reviewed by:	hselasky
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2018-02-13 15:44:35 +00:00
Jeff Roberson
e958ad4cf3 Make v_wire_count a per-cpu counter(9) counter. This eliminates a
significant source of cache line contention from vm_page_alloc().  Use
accessors and vm_page_unwire_noq() so that the mechanism can be easily
changed in the future.

Reviewed by:	markj
Discussed with:	kib, glebius
Tested by:	pho (earlier version)
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14273
2018-02-12 22:53:00 +00:00
Hans Petter Selasky
5b7cc89266 Fix implementation of ktime_add_ns() and ktime_sub_ns() in the LinuxKPI to
actually return the computed result instead of the input value.

This is a regression issue after r289572.

Found by:	gcc6
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2018-02-07 12:12:06 +00:00
Jeff Roberson
e2068d0bcd Use per-domain locks for vm page queue free. Move paging control from
global to per-domain state.  Protect reservations with the free lock
from the domain that they belong to.  Refactor to make vm domains more
of a first class object.

Reviewed by:    markj, kib, gallatin
Tested by:      pho
Sponsored by:   Netflix, Dell/EMC Isilon
Differential Revision:  https://reviews.freebsd.org/D14000
2018-02-06 22:10:07 +00:00
Ed Maste
132f90c660 Linuxolator whitespace cleanup
A version of each of the MD files by necessity exists for each CPU
architecture supported by the Linuxolator.  Clean these up so that new
architectures do not inherit whitespace issues.

Clean up shared Linuxolator files while here.

Sponsored by:	Turing Robotic Industries Inc.
2018-02-05 17:29:12 +00:00
Brooks Davis
0fd25723bc Add kern.ipc.{msqids,semsegs,sema} sysctls for FreeBSD32.
Stop leaking kernel pointers though theses sysctls and make sure that the
padding in the structures is zeroed on allocation to avoid other leaks.

Reviewed by:	gordon, kib
Obtained from:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D13459
2018-02-02 18:03:12 +00:00
Hans Petter Selasky
f71d0b0da7 Fix some recent regressions after r328436 in the LinuxKPI:
1) The OPW() function macro should have the same return type like the
function it executes.
2) The DEVFS I/O-limit should be enforced for all character device reads
and writes.
3) The character device file handle should be passable, same as for
DEVFS based file handles.

Reported by:	jbeich @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-02-01 19:57:21 +00:00
Hans Petter Selasky
3f3735db30 Make sure the LinuxKPI's internal ERESTARTSYS error code gets translated
into ERESTART for mmap and page fault calls aswell.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-02-01 17:32:45 +00:00
Hans Petter Selasky
cb57d1dd30 Properly implement the cond_resched() function macro in the LinuxKPI.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-01-31 13:40:36 +00:00
Bryan Drewery
595109196a Don't use an .OBJDIR for 'make sysent'.
Reported by:	emaste, jhb
Sponsored by:	Dell EMC
2018-01-29 19:14:15 +00:00
Hans Petter Selasky
e23ae408c0 Decouple Linux files from the belonging character device right after open
in the LinuxKPI. This is done by calling finit() just before returning a magic
value of ENXIO in the "linux_dev_fdopen" function.

The Linux file structure should mimic the BSD file structure as much as
possible. This patch decouples the Linux file structure from the belonging
character device right after the "linux_dev_fdopen" function has returned.
This fixes an issue which allows a Linux file handle to exist after a
character device has been destroyed and removed from the directory index
of /dev. Only when the reference count of the BSD file handle reaches zero,
the Linux file handle is destroyed. This fixes use-after-free issues related
to accessing the Linux file structure after the character device has been
destroyed.

While at it add a missing NULL check for non-present file operation.
Calling a NULL pointer will result in a segmentation fault.

Reviewed by:	kib @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-01-26 10:49:02 +00:00
Justin Hibbits
51bd6f9618 Minimal change to build linuxkpi on architectures with physical addresses larger
than virtual

Summary:
Some architectures have physical/bus addresses that are much larger
than virtual addresses.  This change just quiets a warning, as DMAP is not used
on those architectures, and on 64-bit platforms uintptr_t is the same size as
vm_paddr_t and void *.

Reviewed By:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D14043
2018-01-26 00:56:09 +00:00
Hans Petter Selasky
f885420d3c Properly implement the "id" callback argument in the "idr_for_each" function
in the LinuxKPI. The old implementation assumed only one IDR layer was present.
Take additional IDR layers into account when computing the "id" value.

MFC after:	1 week
Found by:	Karthik Palanichamy <karthikp@chelsio.com>
Tested by:	Karthik Palanichamy <karthikp@chelsio.com>
Sponsored by:	Mellanox Technologies
2018-01-24 13:37:07 +00:00
Pedro F. Giffuni
ac2fffa4b7 Revert r327828, r327949, r327953, r328016-r328026, r328041:
Uses of mallocarray(9).

The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.

Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.

Reported by:	wosch
PR:		225197
2018-01-21 15:42:36 +00:00
Nathan Whitehorn
ad6b97e7ca Define PHYS_TO_DMAP() and DMAP_TO_PHYS() as panics on the architectures
(i386 and arm) that never implement them. This allows the removal of
#ifdef PHYS_TO_DMAP on code otherwise protected by a runtime check on
PMAP_HAS_DMAP. It also fixes the build on ARM and i386 after I forgot an
#ifdef in r328168.

Reported by:	Milan Obuch
Pointy hat to:	me
2018-01-19 22:17:13 +00:00
Nathan Whitehorn
9a8196ce19 Remove SFBUF_OPTIONAL_DIRECT_MAP and such hacks, replacing them across the
kernel by PHYS_TO_DMAP() as previously present on amd64, arm64, riscv, and
powerpc64. This introduces a new MI macro (PMAP_HAS_DMAP) that can be
evaluated at runtime to determine if the architecture has a direct map;
if it does not (or does) unconditionally and PMAP_HAS_DMAP is either 0 or
1, the compiler can remove the conditional logic.

As part of this, implement PHYS_TO_DMAP() on sparc64 and mips64, which had
similar things but spelled differently. 32-bit MIPS has a partial direct-map
that maps poorly to this concept and is unchanged.

Reviewed by:		kib
Suggestions from:	marius, alc, kib
Runtime tested on:	amd64, powerpc64, powerpc, mips64
2018-01-19 17:46:31 +00:00
Pedro F. Giffuni
0993a7daaf ndis: make some use of mallocarray(9).
Focus on code where we are doing multiplications within malloc(9). None of
these ire likely to overflow, however the change is still useful as some
static checkers can benefit from the allocation attributes we use for
mallocarray.

This initial sweep only covers malloc(9) calls with M_NOWAIT. No good
reason but I started doing the changes before r327796 and at that time it
was convenient to make sure the sorrounding code could handle NULL values.

X-Differential revision: https://reviews.freebsd.org/D13837
2018-01-15 21:11:38 +00:00
Jeff Roberson
af80820a57 Regenerate auto-generated files 2018-01-12 23:06:35 +00:00
Jeff Roberson
3f289c3fcf Implement 'domainset', a cpuset based NUMA policy mechanism. This allows
userspace to control NUMA policy administratively and programmatically.

Implement domainset based iterators in the page layer.

Remove the now legacy numa_* syscalls.

Cleanup some header polution created by having seq.h in proc.h.

Reviewed by:	markj, kib
Discussed with:	alc
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D13403
2018-01-12 22:48:23 +00:00
Pedro F. Giffuni
ed595433c6 linuxkpi: Simplify kmalloc_array.
kmalloc_array seems what we call mallocarray(9).
2018-01-10 20:50:06 +00:00
Ed Schouten
4b6b56b32b Use mallocarray(9) in CloudABI kernel code where possible.
Submitted by:	pfg@
2018-01-07 22:38:45 +00:00
Kristof Provost
e70c77ca17 linuxkpi: Implement kcalloc() based on mallocarray()
This means we now get integer overflow protection, which Linux code
might expect as it is also provided by kcalloc() in Linux.
2018-01-07 13:39:12 +00:00
Ed Schouten
8e04fe5af3 Allow timed waits with relative timeouts on locks and condvars.
Even though pthreads doesn't support this, there are various alternative
APIs that use this. For example, uv_cond_timedwait() accepts a relative
timeout. So does Rust's std::sync::Condvar::wait_timeout().

Though I personally think that relative timeouts are bad (due to
imprecision for repeated operations), it does seem that people want
this. Extend the existing futex functions to keep track of whether an
absolute timeout is used in a boolean flag.

MFC after:	1 month
2018-01-04 21:57:37 +00:00
Stephen Hurd
96fc97c81f Update Matthew Macy contact info
Email address has changed, uses consistent name (Matthew, not Matt)

Reported by:	Matthew Macy <mmacy@mattmacy.io>
Differential Revision:	https://reviews.freebsd.org/D13537
2017-12-19 17:59:00 +00:00
Brooks Davis
5cd667e65f Disable vim syntax highlighting.
Vim's default pick doesn't understand that ';' is a comment character
and the result looks horrible.

Reviewed by:	emaste
2017-11-28 18:23:17 +00:00
Pedro F. Giffuni
7f2d13d607 sys/compat: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:13:23 +00:00
John Baldwin
ffb6607984 Decode kevent structures logged via ktrace(2) in kdump.
- Add a new KTR_STRUCT_ARRAY ktrace record type which dumps an array of
  structures.

  The structure name in the record payload is preceded by a size_t
  containing the size of the individual structures.  Use this to
  replace the previous code that dumped the kevent arrays dumped for
  kevent().  kdump is now able to decode the kevent structures rather
  than dumping their contents via a hexdump.

  One change from before is that the 'changes' and 'events' arrays are
  not marked with separate 'read' and 'write' annotations in kdump
  output.  Instead, the first array is the 'changes' array, and the
  second array (only present if kevent doesn't fail with an error) is
  the 'events' array.  For kevent(), empty arrays are denoted by an
  entry with an array containing zero entries rather than no record.

- Move kevent decoding tables from truss to libsysdecode.

  This adds three new functions to decode members of struct kevent:
  sysdecode_kevent_filter, sysdecode_kevent_flags, and
  sysdecode_kevent_fflags.

  kdump uses these helper functions to pretty-print kevent fields.

- Move structure definitions for freebsd11 and freebsd32 kevent
  structures to <sys/event.h> so that they can be shared with userland.
  The 32-bit structures are only exposed if _WANT_KEVENT32 is defined.
  The freebsd11 structures are only exposed if _WANT_FREEBSD11_KEVENT is
  defined.  The 32-bit freebsd11 structure requires both.

- Decode freebsd11 kevent structures in truss for the compat11.kevent()
  system call.

- Log 32-bit kevent structures via ktrace for 32-bit compat kevent()
  system calls.

- While here, constify the 'void *data' argument to ktrstruct().

Reviewed by:	kib (earlier version)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D12470
2017-11-25 04:49:12 +00:00
Ed Schouten
814629dd64 Don't let cpu_set_syscall_retval() clobber exec_setregs().
Upon successful completion, the execve() system call invokes
exec_setregs() to initialize the registers of the initial thread of the
newly executed process. What is weird is that when execve() returns, it
still goes through the normal system call return path, clobbering the
registers with the system call's return value (td->td_retval).

Though this doesn't seem to be problematic for x86 most of the times (as
the value of eax/rax doesn't matter upon startup), this can be pretty
frustrating for architectures where function argument and return
registers overlap (e.g., ARM). On these systems, exec_setregs() also
needs to initialize td_retval.

Even worse are architectures where cpu_set_syscall_retval() sets
registers to values not derived from td_retval. On these architectures,
there is no way cpu_set_syscall_retval() can set registers to the way it
wants them to be upon the start of execution.

To get rid of this madness, let sys_execve() return EJUSTRETURN. This
will cause cpu_set_syscall_retval() to leave registers intact. This
makes process execution easier to understand. It also eliminates the
difference between execution of the initial process and successive ones.
The initial call to sys_execve() is not performed through a system call
context.

Reviewed by:	kib, jhibbits
Differential Revision:	https://reviews.freebsd.org/D13180
2017-11-24 07:35:08 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Pedro F. Giffuni
df57947f08 spdx: initial adoption of licensing ID tags.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes:	yes
Differential Revision:	https://reviews.freebsd.org/D13133
2017-11-18 14:26:50 +00:00
Gordon Tetlow
edb01d11f8 Properly bzero kldstat structure to prevent kernel information leak.
Submitted by:	kib
Reported by:	TJ Corley
Security:	CVE-2017-1088
2017-11-15 22:30:21 +00:00
Hans Petter Selasky
0c19d06405 Properly handle the case where the linux_cdev_handle_insert() function
in the LinuxKPI returns NULL. This happens when the VM area's private
data handle already exists and could cause a so-called NULL pointer
dereferencing issue prior to this fix.

Found by:	greg@unrelenting.technology
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-11-13 18:16:26 +00:00
Mateusz Guzik
537d0fb138 Use pfind_any in linux_rt_sigqueueinfo and kern_sigqueue 2017-11-11 18:10:09 +00:00
Hans Petter Selasky
ef9257491d Remove release and acquire semantics when accessing the "state" field of the
LinuxKPI task struct. Change type of "state" variable from "int" to
"atomic_t" to simplify code and avoid unneccessary casting.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-11-11 11:01:50 +00:00
Hans Petter Selasky
e0390735d3 Mask away return codes from del_timer() and del_timer_sync() because
they are not the same like in Linux.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-11-11 10:46:12 +00:00
Hans Petter Selasky
076f7ce6f6 Remove some not needed comments in the LinuxKPI. Use the Linux source tree
to lookup documentation for the functions implemented in the LinuxKPI
instead.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-11-10 08:31:40 +00:00
Ed Schouten
7e6155744d Upgrade to CloudABI v0.17.
Compared to the previous version, v0.16, there are a couple of minor
changes:

- CLOUDABI_AT_PID: Process identifiers for CloudABI processes.

  Initially, BSD process identifiers weren't exposed inside the runtime,
  due to them being pretty much useless inside of a cluster computing
  environment. When jobs are scheduled across systems, the BSD process
  number doesn't act as an identifier. Even on individual systems they
  may recycle relatively quickly.

  With this change, the kernel will now generate a UUIDv4 when executing
  a process. These UUIDs can be obtained within the process using
  program_getpid(). Right now, FreeBSD will not attempt to store this
  value. This should of course happen at some point in time, so that it
  may be printed by administration tools.

- Removal of some unused structure members for polling.

  With the polling framework being simplified/redesigned, it turns out
  some of the structure fields were not used by the C library. We can
  remove these to keep things nice and tidy.

Obtained from:	https://github.com/NuxiNL/cloudabi
2017-11-08 14:21:52 +00:00
Hans Petter Selasky
a7a3d0d170 Make the dma_alloc_coherent() function in the LinuxKPI NULL safe with regard
to the "dev" argument.

Submitted by:	Krishnamraju Eraparaju @ Chelsio
Sponsored by:	Chelsio Communications
MFC after:	1 week
2017-11-08 08:37:05 +00:00
Hans Petter Selasky
8ead3a9933 Remove redundant dev->si_drv1 NULL checks in the LinuxKPI.
This pointer is checked during the linux_dev_open() callback and does
not need to be NULL checked again. It should always be set for
character devices belonging to the "linuxcdevsw" and technically
there is no need to NULL check this pointer at all.

Suggested by:	kib @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-11-03 13:43:05 +00:00
Hans Petter Selasky
62d08fae13 Implement ioread16be() in the LinuxKPI.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-11-01 12:34:18 +00:00
Hans Petter Selasky
b37c654140 Unconditionally include "opt_inet6.h" in the LinuxKPI.
This makes sure the INET6 macro gets properly defined,
also for kernel module builds.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-11-01 12:21:28 +00:00
David E. O'Brien
61407763bc Update comment to match r177997 & r178036 changes. 2017-10-27 16:36:05 +00:00
Ed Schouten
4e1847781b Import the latest CloudABI definitions, version 0.16.
The most important change in this release is the removal of the
poll_fd() system call; CloudABI's equivalent of kevent(). Though I think
that kqueue is a lot saner than many of its alternatives, our
experience is that emulating this system call on other systems
accurately isn't easy. It has become a complex API, even though I'm not
convinced this complexity is needed. This is why we've decided to take a
different approach, by looking one layer up.

We're currently adding an event loop to CloudABI's C library that is API
compatible with libuv (except when incompatible with Capsicum).
Initially, this event loop will be built on top of plain inefficient
poll() calls. Only after this is finished, we'll work our way backwards
and design a new set of system calls to optimize it.

Interesting challenges will include integrating asynchronous I/O into
such a system call API. libuv currently doesn't aio(4) on Linux/BSD, due
to it being unreliable and having undesired semantics.

Obtained from:	https://github.com/NuxiNL/cloudabi
2017-10-18 19:22:53 +00:00
Tijl Coosemans
03cea61b3a Add information needed by Linux libdrm 2.4.74 (shipped with CentOS 7.4).
Create a config file for PCI devices that exposes their configuration
space.  Only fields needed by libdrm are filled in (vendor, device,
revision, subvendor and subdevice).

Link /sys/class/drm/card%d/device to the PCI device directory.
2017-10-15 19:28:14 +00:00
Tijl Coosemans
df4c975275 Set DEVNAME to dri/card%d. This works with both in-tree drm and drm-next
and is also the value used on Linux.

Tested by:	Greg V <greg@unrelenting.technology>
2017-10-15 19:21:15 +00:00
Tijl Coosemans
834804f3fe Add special handling for current in-tree drm devices, like r323692 added
for drm-next.
2017-10-15 16:08:22 +00:00
Tijl Coosemans
f3792e07f6 Use sizeof instead of strlen on string constants. The compiler doesn't
optimise the strlen calls away with -ffreestanding.
2017-10-15 16:03:45 +00:00
Mark Johnston
9db0f8e76f Make the PHOLD in linux_wait_event_common() unconditional.
After some in-progress work is committed, this would otherwise be the only
instance of #if(n)def NO_SWAPPING in the tree. Moreover, the requisite
opt_vm.h include was missing, so the PHOLD/PRELE calls were always being
compiled in anyway.

MFC after:	1 week
2017-10-13 19:27:33 +00:00
Hans Petter Selasky
627ac5b4e3 Don't call selrecord() outside the select system call in the LinuxKPI, because
then td->td_sel is NULL and this will result in a segfault inside selrecord().
This happens when only using kqueue() to poll for read and write events.
If select() and kqueue() is mixed there won't be a segfault.

Reported by:	Johannes Lundberg
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-10-13 14:14:46 +00:00
Ed Maste
cf62459c35 regen freebsd32_sysent.c after r324564 (freebsd32_posix_fallocate) 2017-10-12 18:31:28 +00:00