Commit Graph

3527 Commits

Author SHA1 Message Date
Vladimir Kondratyev
896a4c279d Linux epoll: Don't deregister file descriptor after EPOLLONESHOT is fired
Linux epoll does not remove descriptor after one-shot event has been triggered.
Set EV_DISPATCH kqueue flag rather then EV_ONESHOT to get the same behavior.

Required by Linux Steam client.

PR:		240590
Reported by:	Alex S <iwtcex@gmail.com>
Reviewed by:	emaste, imp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D22513
2019-11-24 20:41:47 +00:00
Mateusz Guzik
4de1818baf linux: avoid overhead of P_CONTROLT checks if possible
Sponsored by:	The FreeBSD Foundation
2019-11-20 12:06:29 +00:00
Kyle Evans
4cc12fb848 sysent: regenerate after r354835
The lua-based makesyscalls produces slightly different output than its
makesyscalls.sh predecessor, all whitespace differences more closely
matching the source syscalls.master.
2019-11-18 23:31:12 +00:00
Kyle Evans
f22a592111 Convert in-tree sysent targets to use new makesyscalls.lua
flua is bootstrapped as part of the build for those on older
versions/revisions that don't yet have flua installed. Once upgraded past
r354833, "make sysent" will again naturally work as expected.

Reviewed by:	brooks
Differential Revision:	https://reviews.freebsd.org/D21894
2019-11-18 23:28:23 +00:00
John Baldwin
03b0d68c72 Check for errors from copyout() and suword*() in sv_copyout_args/strings.
Reviewed by:	brooks, kib
Tested on:	amd64 (amd64, i386, linux64), i386 (i386, linux)
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22401
2019-11-18 20:07:43 +00:00
David Bright
2d5603fe65 Jail and capability mode for shm_rename; add audit support for shm_rename
Co-mingling two things here:

  * Addressing some feedback from Konstantin and Kyle re: jail,
    capability mode, and a few other things
  * Adding audit support as promised.

The audit support change includes a partial refresh of OpenBSM from
upstream, where the change to add shm_rename has already been
accepted. Matthew doesn't plan to work on refreshing anything else to
support audit for those new event types.

Submitted by:	Matthew Bryan <matthew.bryan@isilon.com>
Reviewed by:	kib
Relnotes:	Yes
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22083
2019-11-18 13:31:16 +00:00
Edward Tomasz Napierala
dfe91e5e34 Make linux(4) open(2)/openat(2) return ELOOP instead of EMLINK,
when being passed O_NOFOLLOW.  This fixes LTP testcase openat02:5.

Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22384
2019-11-18 10:19:16 +00:00
John Baldwin
e353233118 Add a sv_copyout_auxargs() hook in sysentvec.
Change the FreeBSD ELF ABIs to use this new hook to copyout ELF auxv
instead of doing it in the sv_fixup hook.  In particular, this new
hook allows the stack space to be allocated at the same time the auxv
values are copied out to userland.  This allows us to avoid wasting
space for unused auxv entries as well as not having to recalculate
where the auxv vector is by walking back up over the argv and
environment vectors.

Reviewed by:	brooks, emaste
Tested on:	amd64 (amd64 and i386 binaries), i386, mips, mips64
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22355
2019-11-15 18:42:13 +00:00
Edward Tomasz Napierala
299cb52a80 Support O_CLOEXEC in linux(4) open(2) and openat(2).
Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21966
2019-11-15 16:21:46 +00:00
Brooks Davis
96c914ee97 Tidy syscall declerations.
Pointer arguments should be of the form "<type> *..." and not "<type>* ...".

No functional change.

Reviewed by:	kevans
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D22373
2019-11-14 17:11:52 +00:00
Olivier Houchard
2b2cde807c linprocfs: Make sure to report -1 as tty when we have no controlling tty.
When reporting a process' stats, we can't just provide the tty as an
unsigned long, as if we have no controlling tty, the tty would be NODEV, or
-1. Instaed, just special-case NODEV.

Submitted by:	Juraj Lutter <otis@sk.FreeBSD.org>
MFC after:	1 week
2019-11-11 00:21:05 +00:00
Ed Maste
01b9ee4c50 linux_renameat2: improve flag checks
In the cases where Linux returns an error (e.g. passing in an undefined
flag) there's no need for us to emit a message.  (The target of this
message is a developer working on the linuxulatorm, not the author of
presumably broken Linux software).

Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21606
2019-11-07 15:51:44 +00:00
Edward Tomasz Napierala
044ab55e41 Make linux(4) create /dev/shm. Linux applications often expect
a tmpfs to be mounted there, and because they like to verify it's
actually a mountpoint, a symlink won't do.

Reviewed by:	dchagin (earlier version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20333
2019-11-06 20:53:33 +00:00
Hans Petter Selasky
76354fa40f Enable device class group attributes in the LinuxKPI.
Bump the __FreeBSD_version to force recompilation of
external kernel modules due to structure change.

Differential Revision:	https://reviews.freebsd.org/D21564
Submitted by:	Greg V <greg@unrelenting.technology>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-11-04 14:19:09 +00:00
Ryan Stone
92a15f946b Add missing M_NOWAIT flag
The LinuxKPI linux_dma code calls PCTRIE_INSERT with a
mutex held, but does not set M_NOWAIT when allocating
nodes, leading to a potential panic.  All of this code
can handle an allocation failure here, so prefer an
allocation failure to sleeping on memory.

Also fix a related case where NOWAIT/WAITOK was not
specified.  In this case it's not clear whether sleeping
is allowed so be conservative and assume not.  There are
a lot of other paths in this code that can fail due to
a lack of memory anyway.

Differential Revision: https://reviews.freebsd.org/D22127
Reviewed by: imp
Sponsored by: Dell EMC Isilon
MFC After: 1 week
2019-10-23 17:20:20 +00:00
Yuri Pankov
a161fba992 linux: futex_mtx should follow futex_list
Move futex_mtx to linux_common.ko for amd64 and aarch64 along
with respective list/mutex init/destroy.

PR:		240989
Reported by:	Alex S <iwtcex@gmail.com>
2019-10-18 12:25:33 +00:00
Yuri Pankov
b9d3556a34 linux: provide just one instance of futex_list
Move futex_list definition to linux.c which is included once
in linux.ko (i386) and in linux_common.ko (amd64 and aarch64)
allowing 32/64 bit linux programs to access the same futexes
in the latter case.

PR:		240989
Reviewed by:	dchagin
Differential Revision:	https://reviews.freebsd.org/D22073
2019-10-18 10:28:08 +00:00
Hans Petter Selasky
427c402de2 Fix missing epochification of the LinuxKPI after r353292.
Sponsored by:	Mellanox Technologies
2019-10-15 11:14:14 +00:00
Jeff Roberson
0012f373e4 (4/6) Protect page valid with the busy lock.
Atomics are used for page busy and valid state when the shared busy is
held.  The details of the locking protocol and valid and dirty
synchronization are in the updated vm_page.h comments.

Reviewed by:    kib, markj
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:        https://reviews.freebsd.org/D21594
2019-10-15 03:45:41 +00:00
Doug Moore
2288078c5e Define macro VM_MAP_ENTRY_FOREACH for enumerating the entries in a vm_map.
In case the implementation ever changes from using a chain of next pointers,
then changing the macro definition will be necessary, but changing all the
files that iterate over vm_map entries will not.

Drop a counter in vm_object.c that would have an effect only if the
vm_map entry count was wrong.

Discussed with: alc
Reviewed by: markj
Tested by: pho (earlier version)
Differential Revision:	https://reviews.freebsd.org/D21882
2019-10-08 07:14:21 +00:00
Brooks Davis
544e6c96b7 Regen after r347228 and r352693.
No functional change.
2019-09-30 21:00:19 +00:00
Pawel Biernacki
ea2609a490 linux_renameat2: don't add extra \n on error.
linux_msg() already adds \n at the end of all messages.

Reported by:	emaste, kib (mentor), mjg (mentor)
Reviewed by:	kib (mentor), mjg (mentor)
Differential Revision:	https://reviews.freebsd.org/D21852
2019-09-30 19:05:14 +00:00
David Bright
c4571256af sysent: regenerate after r352747.
Sponsored by:	Dell EMC Isilon
2019-09-26 15:41:10 +00:00
David Bright
9afb12bab4 Add an shm_rename syscall
Add an atomic shm rename operation, similar in spirit to a file
rename. Atomically unlink an shm from a source path and link it to a
destination path. If an existing shm is linked at the destination
path, unlink it as part of the same atomic operation. The caller needs
the same permissions as shm_unlink to the shm being renamed, and the
same permissions for the shm at the destination which is being
unlinked, if it exists. If those fail, EACCES is returned, as with the
other shm_* syscalls.

truss support is included; audit support will come later.

This commit includes only the implementation; the sysent-generated
bits will come in a follow-on commit.

Submitted by:	Matthew Bryan <matthew.bryan@isilon.com>
Reviewed by:	jilles (earlier revision)
Reviewed by:	brueffer (manpages, earlier revision)
Relnotes:	yes
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D21423
2019-09-26 15:32:28 +00:00
Kyle Evans
618d66a56f compat/freebsd32: restore style after r352705 (no functional change)
The escaped newlines haven't been necessary since r339624, but this file has
not been reformatted. Restore the style.
2019-09-25 18:48:05 +00:00
Kyle Evans
a9ac5e1424 sysent: regenerate after r352705
This also implements it, fixes kdump, and removes no longer needed bits from
lib/libc/sys/shm_open.c for the interim.
2019-09-25 18:09:19 +00:00
Kyle Evans
234879a7e3 Mark shm_open(2) as COMPAT12, succeeded by shm_open2
Implementation and regenerated files will follow.
2019-09-25 18:06:48 +00:00
Kyle Evans
460211e730 sysent: regenerate after r352700 2019-09-25 17:59:58 +00:00
Kyle Evans
20f7057685 Add a shm_open2 syscall to support upcoming memfd_create
shm_open2 allows a little more flexibility than the original shm_open.
shm_open2 doesn't enforce CLOEXEC on its callers, and it has a separate
shmflag argument that can be expanded later. Currently the only shmflag is
to allow file sealing on the returned fd.

shm_open and memfd_create will both be implemented in libc to use this new
syscall.

__FreeBSD_version is bumped to indicate the presence.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D21393
2019-09-25 17:59:15 +00:00
Kyle Evans
0cd95859c8 [2/3] Add an initial seal argument to kern_shm_open()
Now that flags may be set on posixshm, add an argument to kern_shm_open()
for the initial seals. To maintain past behavior where callers of
shm_open(2) are guaranteed to not have any seals applied to the fd they're
given, apply F_SEAL_SEAL for existing callers of kern_shm_open. A special
flag could be opened later for shm_open(2) to indicate that sealing should
be allowed.

We currently restrict initial seals to F_SEAL_SEAL. We cannot error out if
F_SEAL_SEAL is re-applied, as this would easily break shm_open() twice to a
shmfd that already existed. A note's been added about the assumptions we've
made here as a hint towards anyone wanting to allow other seals to be
applied at creation.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D21392
2019-09-25 17:35:03 +00:00
Kyle Evans
d19f028e33 sysent: regenerate after r352693 2019-09-25 17:30:28 +00:00
Kyle Evans
85c5f3cb57 Add COMPAT12 support to makesyscalls.sh
Reviewed by:	kib, imp, brooks (all without syscalls.master edits)
Differential Revision:	https://reviews.freebsd.org/D21366
2019-09-25 17:29:45 +00:00
Tijl Coosemans
55258ab0ff Create a "drm" subdirectory for drm devices in linsysfs. Recent versions of
linux libdrm check for the existence of this directory:

https://cgit.freedesktop.org/mesa/drm/commit/?id=f8392583418aef5e27bfed9989aeb601e20cc96d

MFC after:	2 weeks
2019-09-23 12:27:55 +00:00
Ed Maste
2eb6ef203a linux: add trivial renameat2 implementation
Just return EINVAL if flags != 0.  The Linux man page documents one
case of EINVAL as "The filesystem does not support one of the flags in
flags."

After r351723 userland binaries will try using new system calls.

Reported by:	mjg
Reviewed by:	mjg, trasz
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21590
2019-09-11 13:01:59 +00:00
Hans Petter Selasky
4c8ba7d94f Use true and false when dealing with bool type in the LinuxKPI.
No functional change.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-09-11 08:24:47 +00:00
Hans Petter Selasky
16732c193c Fix synchronous work drain issue in the LinuxKPI.
A work callback may restart itself. Loop in the drain function to see if the
work has been rescheduled and stop the subsequent reschedules, if any.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-09-11 08:20:13 +00:00
Hans Petter Selasky
6575da5eef Fix broken DECLARE_TASKLET() macro after r347852.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-09-11 07:53:49 +00:00
Jeff Roberson
c75757481f Replace redundant code with a few new vm_page_grab facilities:
- VM_ALLOC_NOCREAT will grab without creating a page.
 - vm_page_grab_valid() will grab and page in if necessary.
 - vm_page_busy_acquire() automates some busy acquire loops.

Discussed with:	alc, kib, markj
Tested by:	pho (part of larger branch)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21546
2019-09-10 19:08:01 +00:00
Mark Johnston
fee2a2fa39 Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator.  In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well.  These references are protected by the page lock, which must
therefore be acquired for many per-page operations.  This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.

Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter.  A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held.  As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.

The vm_page_wire() and vm_page_unwire() KPIs are changed.  The former
requires that either the object lock or the busy lock is held.  The
latter no longer has a return value and may free the page if it releases
the last reference to that page.  vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate.  vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold().  It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler.  vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state).  In particular, synchronization details are no longer
leaked into the caller.

The change excises the page lock from several frequently executed code
paths.  In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock.  In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.

__FreeBSD_version is bumped.  The DRM ports have been updated to
accomodate the KPI changes.

Reviewed by:	jeff (earlier version)
Tested by:	gallatin (earlier version), pho
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
Johannes Lundberg
f6668e9f56 LinuxKPI: Improve sysfs support.
- Add functions for creating and merging sysfs groups.
- Add sysfs_streq function to compare strings ignoring newline from the
  sysctl userland call.
- Add a call to sysfs_create_groups in device_add.
- Remove duplicate header include.
- Bump __FreeBSD_version.

Reviewed by:	hselasky
Approved by:	imp (mentor), hselasky
MFC after:	4 days
Differential Revision:	D21542
2019-09-06 15:43:53 +00:00
Edward Tomasz Napierala
e55366be83 Fix /proc/mounts for autofs(5) mounts.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-09-04 18:00:54 +00:00
Konstantin Belousov
fe69291ff4 Add procctl(PROC_STACKGAP_CTL)
It allows a process to request that stack gap was not applied to its
stacks, retroactively.  Also it is possible to control the gaps in the
process after exec.

PR:	239894
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D21352
2019-09-03 18:56:25 +00:00
Edward Tomasz Napierala
bb3c7a5440 Make linprocfs(4) report Tgid, Linux ltrace(1) needs it.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-09-03 16:33:02 +00:00
Mateusz Guzik
d05b53e0ba Add sysctlbyname system call
Previously userspace would issue one syscall to resolve the sysctl and then
another one to actually use it. Do it all in one trip.

Fallback is provided in case newer libc happens to be running on an older
kernel.

Submitted by:	Pawel Biernacki
Reported by:	kib, brooks
Differential Revision:	https://reviews.freebsd.org/D17282
2019-09-03 04:16:30 +00:00
Edward Tomasz Napierala
1d3a302b4a Bump Linux version to 3.2.0. Without it, binaries linked against
glibc 2.24 and up (eg Ubuntu 19.04) fail with "FATAL: kernel too old".

This alone is not enough to make newer binaries actually work;
fix/hack/workaround is pending review at https://reviews.freebsd.org/D20687.

Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20757
2019-09-02 18:10:35 +00:00
Edward Tomasz Napierala
7a8cbc5288 Relax compat.linux.osrelease checks. This way one can do eg
'compat.linux.osrelease=3.10.0-957.12.1.el7.x86_64', which
corresponds to CentOS 7.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20685
2019-09-02 16:57:42 +00:00
Johannes Lundberg
458ba18d43 LinuxKPI: Add sysfs create/remove functions that handles multiple files in one call.
Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
Differential Revision:	D21475
2019-09-02 14:51:59 +00:00
Hans Petter Selasky
4d83500fda Use DEVICE memory instead of UNCACHEABLE on aarch64 in ioremap() in the LinuxKPI.
This fixes system hangs on reading device registers on aarch64.

Tested with:	Marvell MACCHIATObin (Armada8k) + mlx4en, amdgpu
Submitted by:	Greg V <greg@unrelenting.technology>
Differential Revision:	https://reviews.freebsd.org/D20789
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-09-02 08:34:45 +00:00
Konstantin Belousov
bb9e2184f0 Change locking requirements for VOP_UNSET_TEXT().
Require the vnode to be locked for the VOP_UNSET_TEXT() call.  This
will be used by the following bug fix for a tmpfs issue.

Tested by:	sbruno, pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-08-18 20:24:52 +00:00
Conrad Meyer
419fe172c2 Linuxkpi: Prevent easy generated ctor name conflicts with prefix
Sponsored by:	Dell EMC Isilon
2019-08-17 03:00:58 +00:00
Hans Petter Selasky
4f109faadc Implement pci_enable_msi() and pci_disable_msi() in the LinuxKPI.
This patch makes the DRM graphics driver in ports usable on aarch64.

Submitted by:	Greg V <greg@unrelenting.technology>
Differential Revision:	https://reviews.freebsd.org/D21008
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-08-14 09:36:25 +00:00
John Baldwin
e28024fbad Fix build with DRM and INVARIANTS enabled.
The DRM drivers use the lockdep assertion macros with spinlock_t locks
which are backed by mutexes, not sx locks.  This causes compile
failures since you can't use sx_assert with a mutex.  Instead, change
the lockdep macros to use lock_class methods.  This works by assuming
that each LinuxKPI locking primitive embeds a FreeBSD lock as its
first structure and uses a cast to get to the underlying 'struct
lock_object'.

Reviewed by:	hselasky
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20992
2019-08-13 21:15:59 +00:00
Konstantin Belousov
62375ca79c compat/linux: Remove obsoleted and somewhat confusing comments related to COMPAT_43.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21200
2019-08-11 19:17:29 +00:00
Justin Hibbits
4eaa2fde6f Fix 32-bit build again, post r350570.
Missed this part with my testing as well.  Pass the right type to
BUS_TRANSLATE_RESOURCE().
2019-08-04 20:00:39 +00:00
Justin Hibbits
4b238da67b Fix 32-bit build post-r350570
The error message prints a rman_res_t, which is an uintmax_t.  Explicitly
cast, just for future-proofing, and use the correct format.
2019-08-04 19:55:43 +00:00
Justin Hibbits
937a05ba81 Add necessary bits for Linux KPI to work correctly on powerpc
PowerPC, and possibly other architectures, use different address ranges for
PCI space vs physical address space, which is only mapped at resource
activation time, when the BAR gets written.  The DRM kernel modules do not
activate the rman resources, soas not to waste KVA, instead only mapping
parts of the PCI memory at a time.  This introduces a
BUS_TRANSLATE_RESOURCE() method, implemented in the Open Firmware/FDT PCI
driver, to perform this necessary translation without activating the
resource.

In addition to system KPI changes, LinuxKPI is updated to handle a
big-endian host, by adding proper endian swaps to the I/O functions.

Submitted by:	mmacy
Reported by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D21096
2019-08-04 19:28:10 +00:00
Konstantin Belousov
fc83c5a7d0 Make randomized stack gap between strings and pointers to argv/envs.
This effectively makes the stack base on the csu _start entry
randomized.

The gap is enabled if ASLR is for the ABI is enabled, and then
kern.elf{64,32}.aslr.stack_gap specify the max percentage of the
initial stack size that can be wasted for gap.  Setting it to zero
disables the gap, and max is capped at 50%.

Only amd64 for now.

Reviewed by:	cem, markj
Discussed with:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D21081
2019-07-31 20:23:10 +00:00
Konstantin Belousov
48d35b8f45 Regen. 2019-07-31 19:20:39 +00:00
Konstantin Belousov
4dd892181d freebsd32 shims for copy_file_range(2).
Reviewed by:	brooks, rmacklem (previous version)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D21092
2019-07-31 19:20:05 +00:00
Kyle Evans
b5a7ac997f kern_shm_open: push O_CLOEXEC into caller control
The motivation for this change is to allow wrappers around shm to be written
that don't set CLOEXEC. kern_shm_open currently accepts O_CLOEXEC but sets
it unconditionally. kern_shm_open is used by the shm_open(2) syscall, which
is mandated by POSIX to set CLOEXEC, and CloudABI's sys_fd_create1().
Presumably O_CLOEXEC is intended in the latter caller, but it's unclear from
the context.

sys_shm_open() now unconditionally sets O_CLOEXEC to meet POSIX
requirements, and a comment has been dropped in to kern_fd_open() to explain
the situation and add a pointer to where O_CLOEXEC setting is maintained for
shm_open(2) correctness. CloudABI's sys_fd_create1() also unconditionally
sets O_CLOEXEC to match previous behavior.

This also has the side-effect of making flags correctly reflect the
O_CLOEXEC status on this fd for the rest of kern_shm_open(), but a
glance-over leads me to believe that it didn't really matter.

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D21119
2019-07-31 15:16:51 +00:00
Mark Johnston
918988576c Avoid relying on header pollution from sys/refcount.h.
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-07-29 20:26:01 +00:00
Andriy Gapon
c66f5b079d linuxcommon: add module version
MFC after:	2 weeks
2019-07-10 13:47:10 +00:00
Tijl Coosemans
e2fba140a8 Let linuxulator mprotect mask unsupported bits before calling kern_mprotect.
After r349240 kern_mprotect returns EINVAL for unsupported bits in the prot
argument.  Linux rtld uses PROT_GROWSDOWN and PROT_GROWS_UP when marking the
stack executable.  Mask these bits like kern_mprotect used to do.  For other
unsupported bits EINVAL is returned like Linux does.

Reviewed by:	trasz, brooks
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20864
2019-07-10 08:19:33 +00:00
Mark Johnston
eeacb3b02f Merge the vm_page hold and wire mechanisms.
The hold_count and wire_count fields of struct vm_page are separate
reference counters with similar semantics.  The remaining essential
differences are that holds are not counted as a reference with respect
to LRU, and holds have an implicit free-on-last unhold semantic whereas
vm_page_unwire() callers must explicitly determine whether to free the
page once the last reference to the page is released.

This change removes the KPIs which directly manipulate hold_count.
Functions such as vm_fault_quick_hold_pages() now return wired pages
instead.  Since r328977 the overhead of maintaining LRU for wired pages
is lower, and in many cases vm_fault_quick_hold_pages() callers would
swap holds for wirings on the returned pages anyway, so with this change
we remove a number of page lock acquisitions.

No functional change is intended.  __FreeBSD_version is bumped.

Reviewed by:	alc, kib
Discussed with:	jeff
Discussed with:	jhb, np (cxgbe)
Tested by:	pho (previous version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19247
2019-07-08 19:46:20 +00:00
Ed Maste
b97ebbbf72 Update Linux compat version to 2.6.36
New system calls between 2.6.32 and 2.6.26 are already implemented.

This should be mostly NFC as far as contemporary Linux applications are
concerned though, as Linux kernel 3.2 is the oldest supported by a
number of popular distros today; work is in progress by others to enable
support for those applications.

Discussed with:	trasz
MFC after:	1 month
2019-07-04 20:42:08 +00:00
Edward Tomasz Napierala
0fabd7b5cc Return ENOTSUP for Linux FS_IOC_FIEMAP ioctl.
Linux man(1) calls it for no good reason; this avoids the console spam
(eg '(man): ioctl fd=4, cmd=0x660b ('f',11) is not implemented').

Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20690
2019-07-04 20:16:04 +00:00
Edward Tomasz Napierala
2478d444d1 Fix linuxulator prlimit64(2) with pid == 0. This makes 'ulimit -a'
return something reasonable, and helps linux binaries which attempt
to close all the files, eg apt(8).

Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20692
2019-07-04 19:40:01 +00:00
Hans Petter Selasky
8996977a89 Remove dead code added after r348743 in the LinuxKPI. The
LINUXKPI_VERSION macro is not defined for any compiled LinuxKPI code
which basically means __GFP_NOTWIRED is never checked when allocating
pages. This should work fine with the existing external DRM code as
long as the page wiring and unwiring is balanced.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-07-03 09:48:20 +00:00
Mark Johnston
fc795c25d4 Remove the CDIOCREADSUBCHANNEL_SYSSPACE ioctl.
This was added for emulation of Linux's CDROMSUBCHNL, but allows
users with read access to a cd(4) device to overwrite kernel memory
provided that the driver detects some media present.

Reimplement CDROMSUBCHNL by bouncing the data from CDIOCREADSUBCHANNEL
through the linux_cdrom_subchnl structure passed from userspace.

admbugs:	768
Reported by:	Alex Fortune
Security:	CVE-2019-5602
Security:	FreeBSD-SA-19:11.cd_ioctl
2019-07-03 00:10:01 +00:00
Konstantin Belousov
5dc7e31a09 Control implicit PROT_MAX() using procctl(2) and the FreeBSD note
feature bit.

In particular, allocate the bit to opt-out the image from implicit
PROTMAX enablement.  Provide procctl(2) verbs to set and query
implicit PROTMAX handling.  The knobs mimic the same per-image flag
and per-process controls for ASLR.

Reviewed by:	emaste, markj (previous version)
Discussed with:	brooks
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D20795
2019-07-02 19:07:17 +00:00
Johannes Lundberg
6425fed7e6 LinuxKPI: Additions to rcu list.
- Add rcu list functions.
- Make rcu hlist's foreach macro use rcu calls instead of the non-rcu macro.
- Bump FreeBSD version so we have a checkpoint for the vboxvideo drm driver.

Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
Differential Revision:	D20719
2019-06-21 18:48:07 +00:00
Johannes Lundberg
62260f68b4 LinuxKPI: Add atomic_long_sub macro.
Reviewed by:	imp (mentor), hps
Approved by:	imp (mentor), hps
MFC after:	1 week
Differential Revision:	D20718
2019-06-21 16:43:16 +00:00
Mark Johnston
88ea538a98 Replace uses of vm_page_unwire(m, PQ_NONE) with vm_page_unwire_noq(m).
These calls are not the same in general: the former will dequeue the
page if it is enqueued, while the latter will just leave it alone.  But,
all existing uses of the former apply to unmanaged pages, which are
never enqueued in the first place.  No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20470
2019-06-07 18:23:29 +00:00
Mark Johnston
1ef5e651fd Make the linuxkpi's alloc_pages() consistently return wired pages.
Previously it did this only on platforms without a direct map.  This
also more closely matches Linux's semantics.

Since some DRM v5.0 code assumes the old behaviour, use a
LINUXKPI_VERSION guard to preserve that until the out-of-tree module
is updated.

Reviewed by:	hselasky, kib (earlier versions), johalun
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20502
2019-06-06 16:09:19 +00:00
Brooks Davis
4af6033324 makesyscalls.sh: always use absolute path for syscalls.conf
syscalls.conf is included using "." which per the Open Group:

 If file does not contain a <slash>, the shell shall use the search
 path specified by PATH to find the directory containing file.

POSIX shells don't fall back to the current working directory.

Submitted by:	Nathaniel Wesley Filardo <nwf20@cl.cam.ac.uk>
Reviewed by:	bdrewery
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D20476
2019-05-30 20:56:23 +00:00
Dmitry Chagin
c5afec6e89 Complete LOCAL_PEERCRED support. Cache pid of the remote process in the
struct xucred. Do not bump XUCRED_VERSION as struct layout is not changed.

PR:		215202
Reviewed by:	tijl
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20415
2019-05-30 14:24:26 +00:00
Dmitry Chagin
1410bfe142 Linux does not support MSG_OOB for unix(4) or non-stream oriented socket,
return EOPNOTSUPP as a Linux does.

Reviewed by:	tijl
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20409
2019-05-30 14:21:51 +00:00
Dmitry Chagin
8128cfc59e Do not leak sa in linux_recvmsg() call if kern_recvit() fails.
MFC after:	1 week
2019-05-21 18:08:19 +00:00
Dmitry Chagin
57cb29a73e Do not use uninitialised sa.
Reported by:	tijl@
MFC after:	1 week
2019-05-21 18:05:57 +00:00
Dmitry Chagin
dcd6241868 Do not leak sa in linux_recvfrom() call if kern_recvit() fails.
MFC after:	1 week
2019-05-21 18:03:58 +00:00
Conrad Meyer
e12be3218a Include eventhandler.h in more compilation units
This was enumerated with exhaustive search for sys/eventhandler.h includes,
cross-referenced against EVENTHANDLER_* usage with the comm(1) utility.  Manual
checking was performed to avoid redundant includes in some drivers where a
common os_bsd.h (for example) included sys/eventhandler.h indirectly, but it is
possible some of these are redundant with driver-specific headers in ways I
didn't notice.

(These CUs did not show up as missing eventhandler.h in tinderbox.)

X-MFC-With:	r347984
2019-05-21 01:18:43 +00:00
Johannes Lundberg
03f1cf9f32 LinuxKPI: Finalize move of lindebugfs from ports to base.
The source file was moved to base earlier and also improved upon,
but never compiled in. This patch will:
- Make a module in sys/modules
- Make lindebugfs depend on linuxkpi (for seq_file)
- Check if read/write functions are set before calling, DRM drivers
  don't always set both of them.

Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
2019-05-19 15:44:21 +00:00
Edward Tomasz Napierala
d49fb289c8 Implement PTRACE_O_TRACESYSGOOD. This makes Linux strace(1) work.
Reviewed by:	dchagin
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20200
2019-05-19 12:58:44 +00:00
Dmitry Chagin
aa28871254 Linux send() call returns EAGAIN instead of ENOTCONN in case when the
socket is non-blocking and connect() is not finished yet.

Initial patch developed by Steven Hartland in 2008 and adopted by me.

PR:		129169
Reported by:	smh@
MFC after:	2 weeks
2019-05-19 09:23:20 +00:00
Johannes Lundberg
6a65ca35dd LinuxKPI: Finalize import of seq_file.
seq_file.h and linux_seq_file.c was imported form ports earlier but
linux_seq_file.c was never compiled in with the module. With this
commit base seq_file will replace ports seq_file and it required a
few modifications to not break functionality and build.

Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
2019-05-16 21:17:18 +00:00
Johannes Lundberg
6da2681fbc LinuxKPI: Add in_task macro.
This patch is part of D19565

Reviewed by:	hps, bwidawsk
Approved by:	imp (mentor), hps
Obtained from:	bwidawsk
MFC after:	1 week
2019-05-16 21:07:37 +00:00
Johannes Lundberg
39881afcba LinuxKPI: Fix build on powerpc/sparc.
Use cmpset instead of testandset in tasklet lock code.

Reviewed by:	hps
Approved by:	imp (mentor), hps
Obtained from:	hps
MFC after:	1 week
2019-05-16 19:32:11 +00:00
Johannes Lundberg
480995dcf0 LinuxKPI: Updates to tasklets for Linux 5.0.
DRM drivers expect tasklets to have a counter for enable/disable calls.
Also, add a few more tasklet locking functions.

This patch is part of D19565

Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
2019-05-16 18:03:08 +00:00
Johannes Lundberg
07e0a3ca50 LinuxKPI: Add group_leader member to struct task_struct.
Assign self as group leader at creation to act as the only member of a
new process group.
This patch is part of D19565

Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
2019-05-16 17:53:36 +00:00
Johannes Lundberg
47e2723ad7 LinuxKPI: Update access_ok macro for v5.0.
Check LINUXKPI_VERSION macro for backwards compatibility.
It's recommended to update any drivers that depend on the older KPI
so we can deprecate < 5.0 code as we update to newer Linux version.
This patch is part of D19565

Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
2019-05-16 17:44:17 +00:00
Tycho Nightingale
b961c0f244 Allow loading the same DMA address multiple times without any prior
unload for the LinuxKPI.

Reviewed by:	kib, zeising
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20181
2019-05-16 17:41:16 +00:00
Johannes Lundberg
c4e0746e7d LinuxKPI: Add helper macros IS_ALIGNED and DIV_ROUND_DOWN_ULL.
This patch is part of D19565

Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
2019-05-15 17:57:06 +00:00
Johannes Lundberg
0bb30b3a19 LinuxKPI: Move {lower|upper}_32_bits macros from port to base.
This patch is part of D19565

Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
2019-05-15 17:48:11 +00:00
Johannes Lundberg
8264104401 LinuxKPI: Include asm/atomic-long.h from atomic.h.
This patch is part of D19565

Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
2019-05-15 17:44:25 +00:00
Johannes Lundberg
d109700cf0 LinuxKPI: Add get_random_u32 function.
This patch is part of D19565

Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
2019-05-15 17:32:00 +00:00
Johannes Lundberg
3137d2d4ec LinuxKPI: Update user_access_begin for Linux v5.0.
Check the new LINUXKPI_VERSION macro for backwards compatibility.
This patch is part of D19565

Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
2019-05-15 17:04:12 +00:00
Johannes Lundberg
a4a9f2267e LinuxKPI: Expand ktime functionality.
Also, make ktime_get_raw call getnanouptime instead of getnanotime
to match (the correct) ktime_get_raw_ns.
This patch is part of D19565

Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
2019-05-15 16:59:04 +00:00
Johannes Lundberg
65ff7a3192 LinuxKPI: Add prepare to pm_ops and bump FreeBSD version.
This patch is part of D19565

Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
2019-05-14 23:50:46 +00:00
Johannes Lundberg
1462308d8b LinuxKPI: Add vm_fault_t type.
This patch is part of D19565

Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
2019-05-14 23:32:02 +00:00
Johannes Lundberg
395be823fd LinuxKPI: Add context member to ww_mutex and bump FreeBSD version.
This patch is part of https://reviews.freebsd.org/D19565.

Reviewed by:	hps
Approved by:	imp (mentor), hps
2019-05-14 23:21:20 +00:00
Johannes Lundberg
02927c768a LinuxKPI: Let del_timer return a value to match Linux.
This patch is part of https://reviews.freebsd.org/D19565.

Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
2019-05-14 23:12:14 +00:00
Dmitry Chagin
c5156c7785 Linuxulator depends on a fundamental kernel settings such as SMP. Many
of them listed in opt_global.h which is not generated while building
modules outside of a kernel and such modules never match real cofigured
kernel.

So, we should prevent our users from building obviously defective modules.

Therefore, remove the root cause of the building of modules outside of a
kernel - the possibility of building modules with DEBUG or KTR flags.
And remove all of DEBUG printfs as it is incomplete and in threaded
programms not informative, also a half of system call does not have DEBUG
printf. For debuging Linux programms we have dtrace, ktr and ktrace ability.

PR:		222861
Reviewed by:	trasz
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20178
2019-05-13 18:24:29 +00:00
Dmitry Chagin
caaad8736e Linuxulator getpeername() returns EINVAL in case then namelen less then 0.
MFC after:	2 weeks
2019-05-13 18:14:20 +00:00
Dmitry Chagin
d5368bf3df Our bsd_to_linux_sockaddr() and linux_to_bsd_sockaddr() functions
alter the userspace sockaddr to convert the format between linux and BSD versions.
That's the minimum 3 of copyin/copyout operations for one syscall.

Also some syscall uses linux_sa_put() and linux_getsockaddr() when load
sockaddr to userspace or from userspace accordingly.

To avoid this chaos, especially converting sockaddr in the userspace,
rewrite these 4 functions to convert sockaddr only in kernel and leave
only 2 of this functions.

Also in order to reduce duplication between MD parts of the Linuxulator put
struct sockaddr conversion functions that are MI out into linux_common module.

PR:		232920
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20157
2019-05-13 17:48:16 +00:00
Johannes Lundberg
5098ed5f3b Implement linux_pci_unregister_drm_driver in linuxkpi so that drm drivers
can be unloaded.

This patch is a part of D19565.

Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
2019-05-10 23:10:22 +00:00
Hans Petter Selasky
e2eb11e577 Fix memory leak of PCI BUS structure in the LinuxKPI.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-05-09 10:23:42 +00:00
Hans Petter Selasky
eb6f534241 Fix regression issue after r346645 in the LinuxKPI.
Make sure LinuxKPI PCI devices get a default BUSDMA tag.

Found by:	Thomas Laus <lausts@acm.org>
Sponsored by:	Mellanox Technologies
2019-05-09 09:45:19 +00:00
Hans Petter Selasky
423530be04 Add support for Dynamic Interrupt Moderation, DIM, in mlx5en(4).
Add support for DIM based on Linux,
with some minor adaptions specific to FreeBSD.

Linux commit
f97c3dc3c0e8d23a5c4357d182afeef4c67f5c33

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:23:33 +00:00
Ed Maste
0e26cd440f make sysent after r347228
Regenerate to add @generated tag in generated files.
2019-05-07 18:10:21 +00:00
Dmitry Chagin
52a9e429c8 Remove wrong copyright line. Discussed with Carlos Neira.
Reported by:	Rodney W. Grimes
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D13656
2019-05-07 05:08:13 +00:00
Dmitry Chagin
4e2f69f1cf Adds sys/class/net devices to linsysfs.
Only two interfaces are created eth0 and lo and they expose
the following properties:
address, addr_len, flags, ifindex, mty, tx_queue_len and type.

Initial patch developed by Carlos Neira in 2017 and finished by me.

PR:		223722
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D13656
2019-05-06 20:01:13 +00:00
Dmitry Chagin
bbac65c772 Rewrite linux_ifflags() in more readable Linuxulator style.
Reviewed by:	emaste
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20146
2019-05-06 19:57:51 +00:00
Dmitry Chagin
9c1437ae57 Complete r347052 (https://reviews.freebsd.org/D20137) as it it was not
a final revision.

Fix style issues and change bool-like variables from int to bool.

Reviewed by:	emaste
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20141
2019-05-06 19:56:13 +00:00
Hans Petter Selasky
46068e86c3 Use PCIV_INVALID in pci_channel_offline() in the LinuxKPI.
Build tested drm-current-kmod prior to commit.

MFC after:		1 week
Submitted by:		slavash@
Sponsored by:		Mellanox Technologies
2019-05-06 16:22:45 +00:00
Hans Petter Selasky
fa23397925 Disabling a PCI device should only disable busmaster in the LinuxKPI.
As Linux comment for this function point:
Signal to the system that the PCI device is not in use by the system
anymore. This only involves disabling PCI bus-mastering, if active.

Build tested drm-current-kmod prior to commit.

MFC after:		1 week
Submitted by:		slavash@
Sponsored by:		Mellanox Technologies
2019-05-06 16:17:38 +00:00
Hans Petter Selasky
34cb771e01 Implement print_hex_dump_debug() function macro in the LinuxKPI.
Build tested drm-current-kmod prior to commit.

MFC after:		1 week
Submitted by:		slavash@
Sponsored by:		Mellanox Technologies
2019-05-06 16:10:26 +00:00
Hans Petter Selasky
4580f5eadd Allow controlling pr_debug at runtime in the LinuxKPI.
Turning on pr_debug at compile time make it non-optional at runtime.
This often means that the amount of the debugging is unbearable.
Allow developer to turn on pr_debug output only when needed.

Build tested drm-current-kmod prior to commit.

MFC after:		1 week
Submitted by:		kib@
Sponsored by:		Mellanox Technologies
2019-05-06 16:00:20 +00:00
Konstantin Belousov
78022527bb Switch to use shared vnode locks for text files during image activation.
kern_execve() locks text vnode exclusive to be able to set and clear
VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0
condition.

The change removes VV_TEXT, replacing it with the condition
v_writecount <= -1, and puts v_writecount under the vnode interlock.
Each text reference decrements v_writecount.  To clear the text
reference when the segment is unmapped, it is recorded in the
vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and
v_writecount is incremented on the map entry removal

The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that
v_writecount does not contradict the desired change.  vn_writecheck()
is now racy and its use was eliminated everywhere except access.
Atomic check for writeability and increment of v_writecount is
performed by the VOP.  vn_truncate() now increments v_writecount
around VOP_SETATTR() call, lack of which is arguably a bug on its own.

nullfs bypasses v_writecount to the lower vnode always, so nullfs
vnode has its own v_writecount correct, and lower vnode gets all
references, since object->handle is always lower vnode.

On the text vnode' vm object dealloc, the v_writecount value is reset
to zero, and deadfs vop_unset_text short-circuit the operation.
Reclamation of lowervp always reclaims all nullfs vnodes referencing
lowervp first, so no stray references are left.

Reviewed by:	markj, trasz
Tested by:	mjg, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D19923
2019-05-05 11:20:43 +00:00
Hans Petter Selasky
442d12d89c Fix regression issue after r346645 in the LinuxKPI.
The S/G list must be mapped AS-IS without any optimisations.
This also implies that sg_dma_len() must be equal to sg->length.
Many Linux drivers assume this and this fixes some DRM issues.

Put the BUS DMA map pointer into the scatter-gather list to
allow multiple mappings on the same physical memory address.

The FreeBSD version has been bumped to force recompilation of
external kernel modules.

Sponsored by:		Mellanox Technologies
2019-05-04 09:47:01 +00:00
Hans Petter Selasky
8ec9f0282a Fix regression issue after r346645 in the LinuxKPI.
Properly handle error case when mapping DMA address fails.

Sponsored by:		Mellanox Technologies
2019-05-04 09:30:03 +00:00
Dmitry Chagin
d151344dbf In order to reduce duplication between MD parts of the Linuxulator
move bits that are MI out into the headers in compat/linux.
For that remove bogus _packed attribute from struct l_sockaddr
and use MI types for struct members.

And continue to move into the linux_common module a code that is
intended for both Linuxulator modules (both instruction set - 32 & 64 bit)
or for external modules like linsysfs or linprocfs.

To avoid header pollution introduce new sys/compat/linux_common.h header.

Reviewed by:	emaste
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20137
2019-05-03 08:42:49 +00:00
Edward Tomasz Napierala
967cbe64b1 Decode more CPU flags in cpuinfo.
Reviewed by:	dchagin
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20145
2019-05-03 08:27:03 +00:00
Edward Tomasz Napierala
6c8cb13dd8 Fix flags in cpuinfo.
Reviewed by:	dchagin
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20139
2019-05-02 19:02:16 +00:00
Dmitry Chagin
03ddf624e6 Remove unneeded includes.
MFC after:	2 week
2019-05-02 09:00:36 +00:00
Edward Tomasz Napierala
12f3888a98 Add sys/devices/system/cpu/{possible,present} to linsysfs(5).
That makes Linux lscpu(1) work.

Reviewed by:	dchagin
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20131
2019-05-02 08:17:29 +00:00
Dmitry Chagin
5d520d7fab Follow the FreeBSD and implement PDEATH_SIG prctl ops in the Linuxulator.
It was first introduced in r163734 and missied by me in r283383.

MFC after:	1 week
2019-04-30 17:18:05 +00:00
Hans Petter Selasky
a6619e8d9c Reduce the number of mutexes after r346645 in the LinuxKPI.
Make function macro wrappers for locking and unlocking to ease readability.

No functional change.

Discussed with:		kib@, tychon@ and zeising@
Sponsored by:		Mellanox Technologies
2019-04-30 10:41:20 +00:00
Hans Petter Selasky
93a203ea65 Make the dma_pool structure private to the LinuxKPI similar to Linux.
No functional change.

Discussed with:		kib @
Sponsored by:		Mellanox Technologies
2019-04-30 09:38:22 +00:00
Hans Petter Selasky
5a637529ed Store a pointer to the device instead of the PCI device in the DMA pool
implementation in the LinuxKPI. This avoids use of container_of().

No functional change.

Discussed with:		kib @
Sponsored by:		Mellanox Technologies
2019-04-30 09:26:11 +00:00
Ed Maste
5803d72f7e make sysent after r346273 (readlinkat arg correction)
PR:		197915
Reminded by:	dchagin
2019-04-26 12:55:52 +00:00
Johannes Lundberg
af248a7cee Don't call cdev_init where cdev_alloc is called. cdev_alloc already
handles initialization.

Reported by:	johalun
Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D19565
2019-04-25 21:54:32 +00:00
Tycho Nightingale
b09626b330 LinuxKPI buildfix for ppc64 after r346645.
Proposed by:	hselasky
Sponsored by:	Dell EMC Isilon
2019-04-25 18:13:55 +00:00
Hans Petter Selasky
d4fedb75ec LinuxKPI buildfix for 32-bit DMA architectures after r346645.
The <sys/pctrie.h> APIs expect a 64-bit DMA key.
This is fine as long as the DMA is less than or equal to 64 bits, which
is currently the case.

Sponsored by:		Mellanox Technologies
2019-04-25 09:13:15 +00:00
Tycho Nightingale
f211d536b6 LinuxKPI should use bus_dma(9) to be compatible with an IOMMU
Reviewed by:	hselasky, kib
Tested by:	greg@unrelenting.technology
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19845
2019-04-24 20:30:45 +00:00
Ed Maste
ff9be73ee3 Enable ioremap for aarch64 in the LinuxKPI
Required for Mellanox drivers (e.g. on Ampere eMAG at Packet.com).

PR:		237055
Submitted by:	Greg V <greg@unrelenting.technology>
Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D19987
2019-04-20 15:57:05 +00:00
Ed Maste
e8ee7d9035 correct readlinkat(2) return type
r176215 corrected readlink(2)'s return type and the type of the last
argument.  readlink(2) was introduced in r177788 after being developed
as part of Google Summer of Code 2007; it appears to have inherited the
wrong return type.

Man pages and header files were already ssize_t; update syscalls.master
to match.

PR:		197915
Submitted by:	Henning Petersen <henning.petersen@t-online.de>
MFC after:	2 weeks
2019-04-16 13:26:31 +00:00
Mariusz Zaborski
a489026566 Regen after r345982. 2019-04-06 09:37:10 +00:00
Mariusz Zaborski
a1304030b8 Introduce funlinkat syscall that always us to check if we are removing
the file associated with the given file descriptor.

Reviewed by:	kib, asomers
Reviewed by:	cem, jilles, brooks (they reviewed previous version)
Discussed with:	pjd, and many others
Differential Revision:	https://reviews.freebsd.org/D14567
2019-04-06 09:34:26 +00:00
Conrad Meyer
a8a16c7128 Replace read_random(9) with more appropriate arc4rand(9) KPIs
Reviewed by:	ae, delphij
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19760
2019-04-04 01:02:50 +00:00
Jason A. Harmening
f0645b3a06 freebsd32: fix padding of computed control message length for recvmsg()
Each control message region must be aligned on a 4-byte boundary on 32-bit
architectures. The 32-bit compat shim for recvmsg() gets the actual layout
right, but doesn't pad the payload length when computing msg_controllen for
the output message header. If a control message contains an unaligned
payload, such as the 1-byte TTL field in the example attached to PR 236737,
this can produce control message payload boundaries that extend beyond
the boundary reported by msg_controllen.

PR:	236737
Reported by:	Yuval Pavel Zholkover <paulzhol@gmail.com>
Reviewed by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D19768
2019-03-30 23:43:58 +00:00
Dmitry Chagin
803fff9065 Whitespace cleanup (annoying).
MFC after:	1 month
2019-03-24 15:08:30 +00:00
Dmitry Chagin
f730d606d5 Update syscall.master to 5.0.
For 32-bit Linuxulator, ipc() syscall was historically
the entry point for the IPC API. Starting in Linux 4.18, direct
syscalls are provided for the IPC. Enable it.

MFC after:	1 month
2019-03-24 14:50:02 +00:00
Dmitry Chagin
7dabf89bcf Linux between 4.18 and 5.0 split IPC system calls.
In preparation for doing this in the Linuxulator modify our linux_shmat()
to match actual Linux shmat() system call.

MFC after:	1 month
2019-03-24 14:44:35 +00:00
Konstantin Belousov
fd8d844f76 amd64 KPTI: add control from procctl(2).
Add the infrastructure to allow MD procctl(2) commands, and use it to
introduce amd64 PTI control and reporting.  PTI mode cannot be
modified for existing pmap, the knob controls PTI of the new vmspace
created on exec.

Requested by:	jhb
Reviewed by:	jhb, markj (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D19514
2019-03-16 11:44:33 +00:00
Hans Petter Selasky
582141f69d Revert r345102 until the DRM next port issues are resolved.
Requested by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-03-14 09:18:54 +00:00
Hans Petter Selasky
7d595f6b79 Resolve duplicate symbol name conflict after r345095, when building LINT.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-03-13 19:53:20 +00:00
Hans Petter Selasky
998f22eb4b Implement sg_virt() function in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 19:31:33 +00:00
Hans Petter Selasky
56b16627d4 Define SG_CHAIN and SG_END in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 19:30:40 +00:00
Hans Petter Selasky
c469882529 Implement pr_info_ratelimited() function macro in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 19:26:24 +00:00
Hans Petter Selasky
856b815d27 Define some RCU debug macros in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 19:24:30 +00:00
Hans Petter Selasky
0ac26c0e30 Honor SYSCTL function return values when creating sysfs nodes in the LinuxKPI.
Return proper error code upon failure.

Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 19:21:19 +00:00
Hans Petter Selasky
925245aaa9 Implement more malloc function macros in the LinuxKPI.
Fix arguments for currently unused kvmalloc().

Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 19:17:52 +00:00
Hans Petter Selasky
ab62989ae9 Implement more PCI speed related functions and macros in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 19:15:36 +00:00
Hans Petter Selasky
75f7460b34 Implement IS_ALIGNED() and DIV_ROUND_DOWN_ULL() function macros in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 19:04:06 +00:00
Hans Petter Selasky
8734a56285 Implement si_meminfo() in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 19:01:55 +00:00
Hans Petter Selasky
5746b1cd46 Implement task_euid() and get_task_state() function macros in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 18:55:41 +00:00
Hans Petter Selasky
c7486758eb Implement get_task_comm() in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 18:53:29 +00:00
Hans Petter Selasky
638fa5a36f Implement current_exiting() in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 18:51:33 +00:00
Hans Petter Selasky
845a91ce0b Implement list_for_each_entry_from_reverse() and
list_bulk_move_tail() in the LinuxKPI.

Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 18:47:17 +00:00
Hans Petter Selasky
4f081c4fc6 Implement dma_map_page_attrs() in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 18:44:06 +00:00
Hans Petter Selasky
839b4bf24d Implement ida_free() and ida_alloc_max() in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 18:02:47 +00:00
Hans Petter Selasky
8b2a8a4954 Implement DEFINE_STATIC_SRCU() function macro in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 18:00:25 +00:00
Hans Petter Selasky
884aaac660 Implement BITS_PER_TYPE() function macro in the LinuxKPI.
Fix some style while at it.

Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 17:55:58 +00:00
Hans Petter Selasky
4c09177ab7 Properly define the DMA attribute values in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 17:51:08 +00:00
Hans Petter Selasky
7f36930024 Implement dev_err_once() function macro in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 17:46:05 +00:00
Hans Petter Selasky
8fdb5febfc Implement dma_set_mask_and_coherent() in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 17:42:31 +00:00
Warner Losh
329f0aa952 Kill tz_minuteswest and tz_dsttime.
Research Unix, 7th Edition introduced TIMEZONE and DSTFLAG
compile-time constants in sys/param.h to communicate these values for
the machine. 4.2BSD moved from the compile-time to run-time and
introduced these variables and used for localtime() to return the
right offset from UTC (sometimes referred to as GMT, for this purpose
is the same). 4.4BSD migrated to using the tzdata code/database and
these variables were basically unused.

FreeBSD removed the real need for these with adjkerntz in
1995. However, some RTC clocks continued to use these variables,
though they were largely unused otherwise.  Later, phk centeralized
most of the uses in utc_offset, but left it using both tz_minuteswest
and adjkerntz.

POSIX (IEEE Std 1003.1-2017) states in the gettimeofday specification
"If tzp is not a null pointer, the behavior is unspecified" so there's
no standards reason to retain it anymore. In fact, gettimeofday has
been marked as obsolecent, meaning it could be removed from a future
release of the standard. It is the only interface defined in POSIX
that references these two values. All other references come from the
tzdata database via tzset().

These were used to more faithfully implement early unix ABIs which
have been removed from FreeBSD.  NetBSD has completely eliminated
these variables years ago. Linux has migrated to tzdata as well,
though these variables technically still exist for compatibility
with unspecified older programs.

So, there's no real reason to have them these days. They are a
historical vestige that's no longer used in any meaningful way.

Reviewed By: jhb@, brooks@
Differential Revision: https://reviews.freebsd.org/D19550
2019-03-12 04:49:47 +00:00
Edward Tomasz Napierala
1699546def Remove sv_pagesize, originally introduced with r100384.
In all of the architectures we have today, we always use PAGE_SIZE.
While in theory one could define different things, none of the
current architectures do, even the ones that have transitioned from
32-bit to 64-bit like i386 and arm. Some ancient mips binaries on
other systems used 8k instead of 4k, but we don't support running
those and likely never will due to their age and obscurity.

Reviewed by:	imp (who also contributed the commit message)
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19280
2019-03-01 16:16:38 +00:00
Bjoern A. Zeeb
4974f2b172 Add ushort and ulong to linux/types.h.
When porting code once written for Linux we find not only uints but also ushort and ulong.
Provide central typedefs as part of the linuxkpi for those as well.

Reviewed by:	hselasky, emaste
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19405
2019-03-01 14:33:20 +00:00
Matt Macy
3f6cab079c import linux debugfs support
Reviewed by:	hps@
MFC after:	1 week
Sponsored by:	iX Systems
Differential Revision:	https://reviews.freebsd.org/D19258
2019-02-23 20:56:41 +00:00
Matt Macy
2ce1771c12 linux/fs: simplify interop and correct definition of loff_t
- offsets can be negative, loff_t needs to be signed, it also simplifies
  interop with the rest of the code base to use off_t than the actual linux
  definition "long long"
- don't rely on the defining "file" to "linux_file" in interface definitions
  as that causes heartache with includes

Reviewed by:	hps@
MFC after:	1 week
Sponsored by:	iX Systems
Differential Revision:	https://reviews.freebsd.org/D19274
2019-02-23 20:45:45 +00:00
Matt Macy
983ed4f9f1 lkpi: allow late binding of linux_alloc_current
Some consumers may be loosely coupled with the lkpi.
This allows them to call linux_alloc_current without
having a static dependency.

Reviewed by:	hps@
MFC after:	1 week
Sponsored by:	iX Systems
Differential Revision:	https://reviews.freebsd.org/D19257
2019-02-22 23:15:32 +00:00
Marius Strobl
f855ec814d Make taskqgroup_attach{,_cpu}(9) work across architectures
So far, intr_{g,s}etaffinity(9) take a single int for identifying
a device interrupt. This approach doesn't work on all architectures
supported, as a single int isn't sufficient to globally specify a
device interrupt. In particular, with multiple interrupt controllers
in one system as found on e. g. arm and arm64 machines, an interrupt
number as returned by rman_get_start(9) may be only unique relative
to the bus and, thus, interrupt controller, a certain device hangs
off from.
In turn, this makes taskqgroup_attach{,_cpu}(9) and - internal to
the gtaskqueue implementation - taskqgroup_attach_deferred{,_cpu}()
not work across architectures. Yet in turn, iflib(4) as gtaskqueue
consumer so far doesn't fit architectures where interrupt numbers
aren't globally unique.
However, at least for intr_setaffinity(..., CPU_WHICH_IRQ, ...) as
employed by the gtaskqueue implementation to bind an interrupt to a
particular CPU, using bus_bind_intr(9) instead is equivalent from
a functional point of view, with bus_bind_intr(9) taking the device
and interrupt resource arguments required for uniquely specifying a
device interrupt.
Thus, change the gtaskqueue implementation to employ bus_bind_intr(9)
instead and intr_{g,s}etaffinity(9) to take the device and interrupt
resource arguments required respectively. This change also moves
struct grouptask from <sys/_task.h> to <sys/gtaskqueue.h> and wraps
struct gtask along with the gtask_fn_t typedef into #ifdef _KERNEL
as userland likes to include <sys/_task.h> or indirectly drags it
in - for better or worse also with _KERNEL defined -, which with
device_t and struct resource dependencies otherwise is no longer
as easily possible now.
The userland inclusion problem probably can be improved a bit by
introducing a _WANT_TASK (as well as a _WANT_MOUNT) akin to the
existing _WANT_PRISON etc., which is orthogonal to this change,
though, and likely needs an exp-run.

While at it:
- Change the gt_cpu member in the grouptask structure to be of type
  int as used elswhere for specifying CPUs (an int16_t may be too
  narrow sooner or later),
- move the gtaskqueue_enqueue_fn typedef from <sys/gtaskqueue.h> to
  the gtaskqueue implementation as it's only used and needed there,
- change the GTASK_INIT macro to use "gtask" rather than "task" as
  argument given that it actually operates on a struct gtask rather
  than a struct task, and
- let subr_gtaskqueue.c consistently use __func__ to print functions
  names.

Reported by:	mmel
Reviewed by:	mmel
Differential Revision:	https://reviews.freebsd.org/D19139
2019-02-12 21:23:59 +00:00
Konstantin Belousov
fa50a3552d Implement Address Space Layout Randomization (ASLR)
With this change, randomization can be enabled for all non-fixed
mappings.  It means that the base address for the mapping is selected
with a guaranteed amount of entropy (bits). If the mapping was
requested to be superpage aligned, the randomization honours the
superpage attributes.

Although the value of ASLR is diminshing over time as exploit authors
work out simple ASLR bypass techniques, it elimintates the trivial
exploitation of certain vulnerabilities, at least in theory.  This
implementation is relatively small and happens at the correct
architectural level.  Also, it is not expected to introduce
regressions in existing cases when turned off (default for now), or
cause any significant maintaince burden.

The randomization is done on a best-effort basis - that is, the
allocator falls back to a first fit strategy if fragmentation prevents
entropy injection.  It is trivial to implement a strong mode where
failure to guarantee the requested amount of entropy results in
mapping request failure, but I do not consider that to be usable.

I have not fine-tuned the amount of entropy injected right now. It is
only a quantitive change that will not change the implementation.  The
current amount is controlled by aslr_pages_rnd.

To not spoil coalescing optimizations, to reduce the page table
fragmentation inherent to ASLR, and to keep the transient superpage
promotion for the malloced memory, locality clustering is implemented
for anonymous private mappings, which are automatically grouped until
fragmentation kicks in.  The initial location for the anon group range
is, of course, randomized.  This is controlled by vm.cluster_anon,
enabled by default.

The default mode keeps the sbrk area unpopulated by other mappings,
but this can be turned off, which gives much more breathing bits on
architectures with small address space, such as i386.  This is tied
with the question of following an application's hint about the mmap(2)
base address. Testing shows that ignoring the hint does not affect the
function of common applications, but I would expect more demanding
code could break. By default sbrk is preserved and mmap hints are
satisfied, which can be changed by using the
kern.elf{32,64}.aslr.honor_sbrk sysctl.

ASLR is enabled on per-ABI basis, and currently it is only allowed on
FreeBSD native i386 and amd64 (including compat 32bit) ABIs.  Support
for additional architectures will be added after further testing.

Both per-process and per-image controls are implemented:
- procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS;
- NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible
  to force ASLR off for the given binary.  (A tool to edit the feature
  control note is in development.)
Global controls are:
- kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2);
- kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings;
- kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2);
- vm.cluster_anon - enables anon mapping clustering.

PR:	208580 (exp runs)
Exp-runs done by:	antoine
Reviewed by:	markj (previous version)
Discussed with:	emaste
Tested by:	pho
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D5603
2019-02-10 17:19:45 +00:00
Konstantin Belousov
a7f67facdf Normalize the declaration of i386_read_exec variable.
It is currently re-declared in sys/sysent.h which is a wrong place for
MD variable.  Which causes redeclaration error with gcc when
sys/sysent.h and machine/md_var.h are included both.

Remove it from sys/sysent.h and instead include machine/md_var.h when
needed, under #ifdef for both i386 and amd64.

Reported and tested by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-09 03:51:51 +00:00
Andriy Voskoboinyk
a99bdc110b Fix compilation with 'option NDISAPI + device ndis' and
without 'device pccard' in the kernel config file.

PR:		171532
Reported by:	Robert Bonomi <bonomi@host128.r-bonomi.com>
MFC after:	1 week
2019-01-30 11:40:12 +00:00
Hans Petter Selasky
232028b34e Add full support for PCI_ANY_ID when matching PCI IDs in the LinuxKPI.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-01-25 20:13:28 +00:00
Oleksandr Tymoshenko
1715256316 [ndis] Fix unregistered use of FPU by NDIS in kernel on amd64
amd64 miniport drivers are allowed to use FPU which triggers "Unregistered use
of FPU in kernel" panic.

Wrap all variants of MSCALL with fpu_kern_enter/fpu_kern_leave.  To reduce
amount of allocations/deallocations done via
fpu_kern_alloc_ctx/fpu_kern_free_ctx maintain cache of fpu_kern_ctx elements.

Based on the patch by Paul B Mahol

PR:		165622
Submitted by:	Vlad Movchan <vladislav.movchan@gmail.com>
MFC after:	1 month
2019-01-22 03:53:42 +00:00
Ed Maste
347a8ed1bf linuxulator: fix stack memory disclosure in linux_sigaltstack
Most siginfo_to_lsiginfo callers already zeroed the l_siginfo_t before
callit it, but linux_waitid did not.  Instead of zeroing in the called
function to address linux_waitid (as in commit 2e6ebe70), just do it in
linux_waitid.

admbugs:	765
Reported by:	Vlad Tsyrklevich <vlad@tsyrklevich.net>
Reviewed by:	Andrew
MFC after:	1 day
Security:	Kernel stack memory disclosure
Sponsored by:	The FreeBSD Foundation
2019-01-21 17:12:16 +00:00
Ed Maste
9866e7bbae linuxulator: fix stack memory disclosure in linux_ioctl_termio
admbugs:	765
Reported by:	Vlad Tsyrklevich <vlad@tsyrklevich.net>
Reviewed by:	andrew
MFC after:	1 day
Security:	Kernel stack memory disclosure
Sponsored by:	The FreeBSD Foundation
2019-01-21 16:21:03 +00:00
Ed Maste
4308a37410 linuxulator: fix stack memory disclosure in linux_ioctl_v4l
admbugs:	765
Reported by:	Vlad Tsyrklevich <vlad@tsyrklevich.net>
Reviewed by:	andrew
MFC after:	1 day
Security:	Kernel stack memory disclosure
Sponsored by:	The FreeBSD Foundation
2019-01-21 16:19:02 +00:00
Kirk McKusick
88640c0e8b Create new EINTEGRITY error with message "Integrity check failed".
An integrity check such as a check-hash or a cross-correlation failed.
The integrity error falls between EINVAL that identifies errors in
parameters to a system call and EIO that identifies errors with the
underlying storage media. EINTEGRITY is typically raised by intermediate
kernel layers such as a filesystem or an in-kernel GEOM subsystem when
they detect inconsistencies. Uses include allowing the mount(8) command
to return a different exit value to automate the running of fsck(8)
during a system boot.

These changes make no use of the new error, they just add it. Later
commits will be made for the use of the new error number and it will
be added to additional manual pages as appropriate.

Reviewed by:    gnn, dim, brueffer, imp
Discussed with: kib, cem, emaste, ed, jilles
Differential Revision: https://reviews.freebsd.org/D18765
2019-01-17 06:35:45 +00:00
Gleb Smirnoff
396694153f Fix compilation failures on different arches that have vm_machdep.c not
aware of counter_u64_t by including counter.h into uma_int.h. I'm not
happy about this inclusion, but it fixes compilation ASAP.
2019-01-15 19:33:47 +00:00
Gleb Smirnoff
2efcc8cbca Make uz_allocs, uz_frees and uz_fails counter(9). This removes some
atomic updates and reduces amount of data protected by zone lock.

During startup point these fields to EARLY_COUNTER. After startup
allocate them for all early zones.

Tested by:	pho
2019-01-15 18:24:34 +00:00
Olivier Houchard
21fb66241a Regenerate sysent files after having modified syscalls.master. 2019-01-13 00:38:55 +00:00
Olivier Houchard
2ca357528f amd64 is the only arch that doesn't require padding for 32bits syscalls, so
instead of listing every arch thar requires it, just exclude amd64.
2019-01-13 00:37:31 +00:00
Gleb Smirnoff
a68cc38879 Mechanical cleanup of epoch(9) usage in network stack.
- Remove macros that covertly create epoch_tracker on thread stack. Such
  macros a quite unsafe, e.g. will produce a buggy code if same macro is
  used in embedded scopes. Explicitly declare epoch_tracker always.

- Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list
  IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read
  locking macros to what they actually are - the net_epoch.
  Keeping them as is is very misleading. They all are named FOO_RLOCK(),
  while they no longer have lock semantics. Now they allow recursion and
  what's more important they now no longer guarantee protection against
  their companion WLOCK macros.
  Note: INP_HASH_RLOCK() has same problems, but not touched by this commit.

This is non functional mechanical change. The only functionally changed
functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter
epoch recursively.

Discussed with:	jtl, gallatin
2019-01-09 01:11:19 +00:00
Mark Johnston
bb376a990c Specify the correct option level when emulating SO_PEERCRED.
Our equivalent to SO_PEERCRED, LOCAL_PEERCRED, is implemented at
socket option level 0, not SOL_SOCKET.

PR:		234722
Submitted by:	Dániel Bakai <bakaidl@gmail.com>
MFC after:	2 weeks
2019-01-08 17:21:59 +00:00
Conrad Meyer
85f2a00b34 linuxkpi: Remove extraneous NULL check on M_WAITOK allocation
The check was not introduced in r342628, but the subsequent unchecked access to
refs was added then, prompting a Coverity warning about "Null pointer
dereferences (FORWARD_NULL)."  The warning is bogus due to M_WAITOK, but so is
the NULL check that hints it, so just remove it.

CID:		1398588
Reported by:	Coverity
2019-01-01 19:56:49 +00:00
Konstantin Belousov
9362b6a394 Fix 32bit gcc builds after r342625.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-12-30 16:39:26 +00:00
Konstantin Belousov
f823a36e83 Fix linux_destroy_dev() behaviour when there are still files open from
the destroying cdev.

Currently linux_destroy_dev() waits for the reference count on the
linux cdev to drain, and each open file hold the reference.
Practically it means that linux_destroy_dev() is blocked until all
userspace processes that have the cdev open, exit.  FreeBSD devfs does
not have such problem, because device refcount only prevents freeing
of the cdev memory, and separate 'active methods' counter blocks
destroy_dev() until all threads leave the cdevsw methods.  After that,
attempts to enter cdevsw methods are refused with an error.

Implement somewhat similar mechanism for LinuxKPI cdevs.  Demote cdev
refcount to only mean a hold on the linux cdev memory.  Add sirefs
count to track both number of threads inside the cdev methods, and for
single-bit indicator that cdev is being destroyed.  In the later case,
the call is redirected to the dummy cdev.

Reviewed by:	markj
Discussed with:	hselasky
Tested by:	zeising
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Differential revision:	https://reviews.freebsd.org/D18606
2018-12-30 15:46:45 +00:00
Konstantin Belousov
e5a3393a15 Implement zap_vma_ptes() for managed device objects.
Reviewed by:	markj
Discussed with:	hselasky
Tested by:	zeising
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Differential revision:	https://reviews.freebsd.org/D18606
2018-12-30 15:38:07 +00:00
Konstantin Belousov
069598b941 Use IDX_TO_OFF().
Reviewed by:	markj
Discussed with:	hselasky
Tested by:	zeising
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Differential revision:	https://reviews.freebsd.org/D18606
2018-12-30 15:28:31 +00:00
Mateusz Guzik
628888f0e0 Remove iBCS2, part2: general kernel
Reviewed by:	kib (previous version)
Sponsored by:	The FreeBSD Foundation
2018-12-19 21:57:58 +00:00
Brooks Davis
10f7b12c13 const poison the new pointer of __sysctl.
Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D18444
2018-12-18 12:44:38 +00:00
Mateusz Guzik
cc426dd319 Remove unused argument to priv_check_cred.
Patch mostly generated with cocinnelle:

@@
expression E1,E2;
@@

- priv_check_cred(E1,E2,0)
+ priv_check_cred(E1,E2)

Sponsored by:	The FreeBSD Foundation
2018-12-11 19:32:16 +00:00
Hans Petter Selasky
ca487c1888 Remove no longer needed ifdefs in the LinuxKPI, after r341787.
Differential Revision:	https://reviews.freebsd.org/D18450
Reviewed by:		kib@
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2018-12-10 13:41:33 +00:00
Konstantin Belousov
fd52edaf70 Regen. 2018-12-07 15:19:00 +00:00
Konstantin Belousov
d1fd400a80 Add new file handle system calls.
Namely, getfhat(2), fhlink(2), fhlinkat(2), fhreadlink(2).  The
syscalls are provided for a NFS userspace server (nfs-ganesha).

Submitted by:	Jack Halford <jack@gandi.net>
Sponsored by:	Gandi.net
Tested by:	pho
Feedback from:	brooks, markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D18359
2018-12-07 15:17:29 +00:00
Hans Petter Selasky
52da588961 Remove redundant declaration after r341517.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-12-05 15:56:44 +00:00
Hans Petter Selasky
6da0d28e6a Fix some build of LinuxKPI on some platforms after r341518.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-12-05 15:53:34 +00:00
Slava Shwartsman
31c3f64819 mlx5: Fix driver version location
Driver description should be set by core and not by the Ethernet driver.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:47:10 +00:00
Slava Shwartsman
a9c20af23d ibcore: ip6_dev_find() needs to know the scope ID.
Else the wrong network device can be returned for link-local addresses.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:24:43 +00:00
Slava Shwartsman
452d59e130 linuxkpi: Really check if PCI is offline
Currently we always return false if for PCI offline query.
Try to read PCI config, if the return value if 0xffff probably the
PCI is offline.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:17:45 +00:00
Slava Shwartsman
92cbd83001 linuxkpi: properly implement netif_carrier_ok().
Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:17:15 +00:00
Slava Shwartsman
9c7b53cc65 linuxkpi: Fix for use-after-free when tearing down character devices.
Make sure we hold a reference on the character device for every opened file
to prevent the character device to be freed prematurely.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:16:39 +00:00
Slava Shwartsman
be34cfc587 linuxkpi: implement idr_is_empty() and ida_is_empty().
Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:15:57 +00:00
Konstantin Belousov
f186340011 Improve procstat reporting for the linux cdev file descriptors.
If there is a vnode attached to the linux file, use it to fill
kinfo_file.  Otherwise, report a new KF_TYPE_DEV file type, without
supplying any type-specific information.

KF_TYPE_DEV is supposed to be used by most devfs-specific file types.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2018-12-03 23:39:45 +00:00
Brooks Davis
f373437a01 Add helper functions to copy strings into struct image_args.
Given a zeroed struct image_args with an allocated buf member,
exec_args_add_fname() must be called to install a file name (or NULL).
Then zero or more calls to exec_args_add_env() followed by zero or
more calls to exec_args_add_env(). exec_args_adjust_args() may be
called after args and/or env to allow an interpreter to be prepended to
the argument list.

To allow code reuse when adding arg and env variables, begin_envv
should be accessed with the accessor exec_args_get_begin_envv()
which handles the case when no environment entries have been added.

Use these functions to simplify exec_copyin_args() and
freebsd32_exec_copyin_args().

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15468
2018-11-29 21:00:56 +00:00
Mark Johnston
792843c38f Pass malloc flags directly through kevent(2) subroutines.
Some kevent functions have a boolean "waitok" parameter for use when
calling malloc(9).  Replace them with the corresponding malloc() flags:
the desired behaviour is known at compile-time, so this eliminates a
couple of conditional branches, and makes the code easier to read.

No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18318
2018-11-24 17:06:01 +00:00
Ben Widawsky
f82dd310bb linuxkpi: Use pageproc instead of vmproc
According to markj@:
pageproc contains the page daemon and laundry threads, which are
responsible for managing the LRU page queues and writing back dirty
pages.  vmproc's main task is to swap out kernel stacks when the system
is under memory pressure, and swap them back in when necessary.  It's a
somewhat legacy component of the system and isn't required.  You can
build a kernel without it by specifying "options NO_SWAPPING" (which is
a somewhat misleading name), in which vm_swapout_dummy.c is compiled
instead of vm_swapout.c.

Based on this, we want pageproc to emulate kswapd, not vmproc.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D18061
2018-11-21 04:34:18 +00:00
Ben Widawsky
5a46107832 linuxkpi: Remove duplicated text
Somehow this got botched while moving from git -> svn
2018-11-20 23:05:09 +00:00
Ben Widawsky
c3f4f28c63 linuxkpi: Add some basic swap functions
These are used by kms-drm to determine various heuristics relate
memory conditions.

The number of free swap pages is just a variable, and it can be
much cheaper by either adding a new getter, or simply extern'ing
swap_total. However, this patch opts to use the more expensive,
existing interface - since this isn't an operation in a high per
path.

This allows us to remove some more gpl linuxkpi and do the follo
kms-drm:
git rm linuxkpi/gplv2/include/linux/swap.h

Reviewed by:    mmacy, Johannes Lundberg <johalun0@gmail.com>
Approved by:    emaste (mentor)
Differential Revision:  https://reviews.freebsd.org/D18052
2018-11-20 22:49:19 +00:00
Tijl Coosemans
7df0e7beb7 Fix another user address dereference in linux_sendmsg syscall.
This was hidden behind the LINUX_CMSG_NXTHDR macro which dereferences its
second argument.  Stop using the macro as well as LINUX_CMSG_FIRSTHDR.  Use
the size field of the kernel copy of the control message header to obtain
the next control message.

PR:		217901
MFC after:	2 days
X-MFC-With:	r340631
2018-11-20 14:18:57 +00:00
Tijl Coosemans
e3b385fc95 Do proper copyin of control message data in the Linux sendmsg syscall.
Instead of calling m_append with a user address, allocate an mbuf cluster
and copy data into it using copyin.  For the SCM_CREDS case, instead of
zeroing a stack variable and appending that to the mbuf, zero part of the
mbuf cluster directly.  One mbuf cluster is also the size limit used by
the FreeBSD sendmsg syscall (uipc_syscalls.c:sockargs()).

PR:		217901
Reviewed by:	kib
MFC after:	3 days
2018-11-19 15:31:54 +00:00
Mateusz Guzik
2c054ce924 proc: always store parent pid in p_oppid
Doing so removes the dependency on proctree lock from sysctl process list
export which further reduces contention during poudriere -j 128 runs.

Reviewed by:	kib (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17825
2018-11-16 17:07:54 +00:00
Hans Petter Selasky
0df8bab666 Define asm macro in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-11-16 16:23:45 +00:00
Hans Petter Selasky
1799873e3a Implement ktime_get_ts64() function macro in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-11-16 16:19:16 +00:00
Brooks Davis
5b1df30051 Use the main capabilities.conf for freebsd32.
Allow the location of capabilities.conf to be configured.

Also allow a per-abi syscall prefix to be configured with the
abi_func_prefix syscalls.conf variable and check syscalls against
entries in capabilities.conf with and without the prefix amended.

Take advantage of these two features to allow use shared capabilities.conf
between the default syscall vector and the freebsd32 compatability
layer.  We've been inconsistent about keeping the two in sync as
evidenced by the bugs fixed in r340294.  This eliminates that problem
going forward.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17932
2018-11-14 00:46:02 +00:00
Brooks Davis
4b499c75f9 Regen after r340302: Fix freebsd32 mknod(at).
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17928
2018-11-09 21:02:07 +00:00
Brooks Davis
9a38df59e9 Fix freebsd32 mknod(at).
As dev_t is now a 64-bit integer, it requires special handling as a
system call argument.  64-bit arguments are split between two 64-bit
integers due to the way arguments are promoted to allow reuse of most
system call implementations.  They must be reassembled before use.
Further, 64-bit arguments at an odd offset (counting from zero) are
padded and slid to the next slot on powerpc and mips.  Fix the
non-COMPAT11 system call by adding a freebsd32_mknodat() and
appropriately padded declerations.

The COMPAT11 system calls are fully compatible with the 64-bit
implementations so remove the freebsd32_ versions.

Use uint32_t consistently as the type of the old dev_t.  This matches
the old definition.

Reviewed by:	kib
MFC after:	3 days
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17928
2018-11-09 21:01:16 +00:00
Brooks Davis
1632f36305 Regen after r340294: Fix a number of bugs in freebsd32's capabilities.conf.
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17925
2018-11-09 18:06:25 +00:00
Brooks Davis
d457c0b61b Fix a number of bugs in freebsd32's capabilities.conf.
Bugs range from failure to update after changing syscall implementaion
names to using the wrong name.  Somewhat confusingly, the name in
capabilities.conf is exactly the string that appears in syscalls.master,
not the name with a COMPAT* prefix which is the actual function name.

Found while making a change to use the default capabilities.conf.

Fixes:	r335177, r336980, r340272, r340274, others
Reviewed by:	kib, emaste
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17925
2018-11-09 18:03:01 +00:00
Brooks Davis
e1e300b671 Regen after r340274: Make freebsd32_utmx_op follow the freebsd32_foo
convention.
2018-11-09 00:46:50 +00:00
Brooks Davis
b34f4419fb Make freebsd32_umtx_op follow the freebsd32_foo convention.
Sponsored by:	DARPA, AFRL
2018-11-09 00:46:10 +00:00
Brooks Davis
86e06fa55b Regen after 340272: Make __sysctl follow the freebsd32_foo convention
Sponsored by:	DARPA, AFRL
2018-11-09 00:22:45 +00:00
Brooks Davis
4074751718 Make __sysctl follow the freebsd32_foo convention.
Sponsored by:	DARPA, AFRL
2018-11-09 00:21:58 +00:00
Brooks Davis
5577e44bf4 Regen after r340221: allow pointer return types.
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17873
2018-11-07 16:56:07 +00:00
Brooks Davis
e56ec0e519 makesyscalls.sh: allow pointer return types.
The previous code required that the return type be a single word.  This
allows it to be a pointer without using a typedef.

Update the return types of break, mmap, and shmat to be void * as
declared.  This only effects systrace output in-tree, but can aid in
generating system call wrappers from syscalls.master.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17873
2018-11-07 16:55:04 +00:00
Brooks Davis
938e8dcf60 Regen after r340199: Use declared types for caddr_t arguments.
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17852
2018-11-06 18:47:29 +00:00
Brooks Davis
318f0d7720 Use declared types for caddr_t arguments.
Leave ptrace(2) alone for the moment as it's defined to take a caddr_t.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17852
2018-11-06 18:46:38 +00:00
Mariusz Zaborski
0b39d7e377 Remove ppoll. freebsd32 doesn't define a ppoll syscall.
Reported by:	jhb
2018-11-06 18:26:40 +00:00
Mariusz Zaborski
279e464dd5 Regenerate after r340195. 2018-11-06 18:06:52 +00:00
Mariusz Zaborski
4a1f3ed354 capsicum: Add ppoll and freebsd32_ppoll to compat32.
PR:		232495
Pointed out by: brooks
MFC after:	2 weeks
2018-11-06 18:05:46 +00:00
Tijl Coosemans
8fc08087a1 On amd64 both Linux compat modules, linux.ko and linux64.ko, provide
linux_ioctl_(un)register_handler that allows other driver modules to
register ioctl handlers.  The ioctl syscall implementation in each Linux
compat module iterates over the list of handlers and forwards the call to
the appropriate driver.  Because the registration functions have the same
name in each module it is not possible for a driver to support both 32 and
64 bit linux compatibility.

Move the list of ioctl handlers to linux_common.ko so it is shared by
both Linux modules and all drivers receive both 32 and 64 bit ioctl calls
with one registration.  These ioctl handlers normally forward the call
to the FreeBSD ioctl handler which can handle both 32 and 64 bit.

Keep the special COMPAT_LINUX32 ioctl handlers in linux.ko in a separate
list for now and let the ioctl syscall iterate over that list first.
Later, COMPAT_LINUX32 support can be added to the 64 bit ioctl handlers
via a runtime check for ILP32 like is done for COMPAT_FREEBSD32 and then
this separate list would disappear again.  That is a much bigger effort
however and this commit is meant to be MFCable.

This enables linux64 support in x11/nvidia-driver*.

PR:		206711
Reviewed by:	kib
MFC after:	3 days
2018-11-06 13:51:08 +00:00
Brooks Davis
4e8c73eb20 Regen after r340080: Add const to input-only char * arguments.
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17812
2018-11-02 20:56:19 +00:00
Brooks Davis
12e69f96a2 Add const to input-only char * arguments.
These arguments are mostly paths handled by NAMEI*() macros which already
take const char * arguments.

This change improves the match between syscalls.master and the public
declerations of system calls.

Reviewed by:	kib (prior version)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17812
2018-11-02 20:50:22 +00:00
Brooks Davis
f7e5ce325f Regent after r340034: Use mode_t when the documented signature does.
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17784
2018-11-01 23:10:53 +00:00
Brooks Davis
2105ac07d7 Use mode_t when the documented signature does.
This is more clear and produces better results when generating function
stubs from syscalls.master.

Reviewed by:	kib, emaste
Obtained from:	CheribSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17784
2018-11-01 23:06:50 +00:00
Ben Widawsky
3d40cdf014 linuxkpi: Add GFP flags needed for ttm drivers
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Requested by:	bwidawsk
MFC after:	3 days
Approved by:	emaste (mentor)
2018-11-01 15:30:01 +00:00
Hans Petter Selasky
e35079db73 Implement the dump_stack() function in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2018-10-30 16:42:56 +00:00
Hans Petter Selasky
bf05cd05ac Implement __KERNEL_DIV_ROUND_UP() function macro in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2018-10-30 16:32:52 +00:00
Hans Petter Selasky
8ad9551d36 Implement dma_pool_zalloc() in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2018-10-29 19:02:36 +00:00
Brooks Davis
ed34a7fcf2 Move 32-bit compat support for FIODGNAME to the right place.
ioctl(2) commands only have meaning in the context of a file descriptor
so translating them in the syscall layer is incorrect.

The new handler users an accessor to retrieve/construct a pointer from
the last member of the passed structure and relies on type punning to
access the other member which requires no translation.

Unlike r339174 this change supports both places FIODGNAME is handled.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17475
2018-10-26 17:59:25 +00:00
Konstantin Belousov
4f77f48884 Implement O_BENEATH and AT_BENEATH.
Flags prevent open(2) and *at(2) vfs syscalls name lookup from
escaping the starting directory.  Supposedly the interface is similar
to the same proposed Linux flags.

Reviewed by:	jilles (code, previous version of manpages), 0mp (manpages)
Discussed with:	allanjude, emaste, jonathan
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D17547
2018-10-25 22:16:34 +00:00
Brooks Davis
22c0c9a481 Remove __restrict qualifiers from syscalls.master.
The restruct qualifier is intended to aid code generation in the
compiler, but the only access to storage through these pointers is via
structs using copyin/copyout and the like which can not be written in C
or C++ and thus the compiler gains nothing from the qualifiers.

As such, the qualifiers add no value in current usage.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17574
2018-10-22 21:50:43 +00:00
Tijl Coosemans
642909fdfb Define linuxkpi readq for 64-bit architectures. It is used by drm-kmod.
Currently the compiler picks up the definition in machine/cpufunc.h.

Add compiler memory barriers to read* and write*.  The Linux x86
implementation of these functions uses inline asm with "memory" clobber.
The Linux x86 implementation of read_relaxed* and write_relaxed* uses the
same inline asm without "memory" clobber.

Implement ioread* and iowrite* in terms of read* and write* so they also
have memory barriers.

Qualify the addr parameter in write* as volatile.

Like Linux, define macros with the same name as the inline functions.

Only define 64-bit versions on 64-bit architectures because generally
32-bit architectures can't do atomic 64-bit loads and stores.

Regroup the functions a bit and add brief comments explaining what they do:
- __raw_read*, __raw_write*: atomic, no barriers, no byte swapping
- read_relaxed*, write_relaxed*: atomic, no barriers, little-endian
- read*, write*: atomic, with barriers, little-endian

Add a comment that says our implementation of ioread* and iowrite*
only handles MMIO and does not support port IO.

Reviewed by:	hselasky
MFC after:	3 days
2018-10-22 20:55:35 +00:00
Kyle Evans
29bf3a7ba8 Correct COMPAT* macro names in syscalls.master
Both ^/sys/compat/freebsd32/syscalls.master and ^/sys/kern/syscalls.master
cited "COMPAT[n] #ifdef" instead of "COMPAT_FREEBSD[n] #ifdef" in places.

Approved by:	re (glebius)
2018-10-15 21:35:57 +00:00
Brooks Davis
c7d0908e1c Regenerated assorted syscall related files after:
- r327895: Implement 'domainset'...
 - r329876: Use linux types for linux-specific syscalls

Diff generated with:
	find . -name syscalls.conf | xargs dirname | \
	    xargs -n1 -I DIR make -C DIR sysent

Approved by:	re (kib)
Sponsored by:	DARPA, AFRL
2018-10-09 20:42:17 +00:00
Brooks Davis
9bc603bd20 Revert r339174: Move 32-bit compat support for FIODGNAME to the right place.
A case was missed in this commit which breaks sshing into a 32-bit sshd
on a 64-bit system.

Approved by:	re (gjb)
2018-10-04 23:55:03 +00:00