Commit Graph

3704 Commits

Author SHA1 Message Date
Mateusz Guzik
7ad2a82da2 vfs: drop the error parameter from vn_isdisk, introduce vn_isdisk_error
Most consumers pass NULL.
2020-08-19 02:51:17 +00:00
Mateusz Guzik
a125ed50a6 linux: add sysctl compat.linux.use_emul_path
This is a step towards facilitating jails with only Linux binaries.
Supporting emul_path adds path lookups which are completely spurious
if the binary at hand runs in a Linux-based root directory.

It defaults to on (== current behavior).

make -C /root/linux-5.3-rc8 -s -j 1 bzImage:

use_emul_path=1: 101.65s user 68.68s system 100% cpu 2:49.62 total
use_emul_path=0: 101.41s user 64.32s system 100% cpu 2:45.02 total
2020-08-18 22:04:22 +00:00
Mark Johnston
a7044c60a5 Fix handling of ancillary data on non-AF_UNIX Linux sockets.
After r340674, the "continue" would restart the loop without having
updated clen, resulting in an infinite loop.  Restore the old behaviour
of simply ignoring all control messages on such sockets, since we
currently only implement handling for AF_UNIX-specific messages.

Reported by:	syzkaller
Reviewed by:	tijl
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26093
2020-08-18 14:17:14 +00:00
Mark Johnston
d9565182fd Remove "emulation" of clone(CLONE_PARENT | CLONE_THREAD).
On Linux this is supposed to result in EINVAL.

Reported by:	syzkaller
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-08-17 21:30:49 +00:00
Mark Johnston
74a796e0fc Fix a lock leak when emulating futex(FUTEX_WAIT_BITSET).
Reported by:	syzkaller
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-08-17 21:30:15 +00:00
Mark Johnston
30dcce2709 Skip Linux madvise(MADV_DONTNEED) on unmanaged objects.
vm_object_madvise() is a no-op for unmanaged objects, but we should also
limit the scope of mappings on which pmap_remove() is called.  In
particular, with the WIP largepage shm objects patch the kernel must
remove mappings of such objects along superpage boundaries, and without
this check Linux madvise(MADV_DONTNEED) could violate that requirement.

Reviewed by:	alc, kib
MFC with:	r362631
Sponsored by:	Juniper Networks, Klara Inc.
Differential Revision:	https://reviews.freebsd.org/D26084
2020-08-17 17:14:56 +00:00
Mateusz Guzik
a92a971bbb vfs: remove the thread argument from vget
It was already asserted to be curthread.

Semantic patch:

@@

expression arg1, arg2, arg3;

@@

- vget(arg1, arg2, arg3)
+ vget(arg1, arg2)
2020-08-16 17:18:54 +00:00
Emmanuel Vadot
0e123c13fe linuxkpi: Add a few wait_bit functions
The linux function does a lot more than that as multiple waitqueue could be fetch
from a static table based on the hash of the argument but since in DRM it's only used
in one place just add a single variable.
We will probably need to change that in the futur but it's ok with DRM even with current
linux.

Reviewed by:	hselasky
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26054
2020-08-14 08:48:17 +00:00
Mark Johnston
9eb0cd08ae linprocfs: Fix some inaccuracies in meminfo.
- Fill out MemFree correctly.  Delete an ancient comment suggesting that
  we don't want to advertise the true quantity of free memory.
- Populate the Buffers field by reading vfs.bufspace.
- The page cache consists of all pages in page queues, not just the
  inactive queue.

PR:		248463
Reported and tested by:	danfe
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-08-12 16:08:44 +00:00
Mark Johnston
aa5dbc8953 Remove sys/compat/netbsd.
It contained only a header used by ncv(4), which was mainly used on pc98
systems.  Both ncv(4) and pc98 support have long been removed.
2020-08-11 16:40:09 +00:00
Hans Petter Selasky
74d3a63559 Use atomic_clear_rel_long() to implement clear_bit_unlock() in the LinuxKPI
after r363842.

Suggested by:	alc@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-08-11 12:41:40 +00:00
Hans Petter Selasky
6ae240797f Need to clone the task struct fields related to RCU aswell in the
LinuxKPI after r359727. This fixes a minor regression issue. Else the
priority tracking won't work properly when both sleepable and
non-sleepable RCU is in use on the same thread.

Bump the __FreeBSD_version to force recompilation of external kernel
modules.

PR:		242272
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-08-11 12:17:46 +00:00
Mateusz Guzik
51ea7bea91 vfs: add VOP_STAT
The current scheme of calling VOP_GETATTR adds avoidable overhead.

An example with tmpfs doing fstat (ops/s):
before: 7488958
after:  7913833

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D25910
2020-08-07 23:06:40 +00:00
Hans Petter Selasky
6b839ff47b Implement radix_tree_store() in the LinuxKPI for use with the coming
extensible arrays implementation.

While at it add some more comments explaining the current
radix_tree_insert() function and make sure to clean the root node when
the radix tree reaches the maximum height. This can happen if the
index passed is too big when the tree is empty.

The radix_tree_store() function is basically a copy of the
radix_tree_insert() function with some added functionality.

The radix_tree_store() function is local to FreeBSD and does not yet
exist in Linux.

Reviewed by:		kib
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2020-08-07 16:15:44 +00:00
Mark Johnston
1b1428dcc8 Fix a TOCTOU vulnerability in freebsd32_copyin_control().
PR:		248257
Reported by:	m00nbsd working with Trend Micro Zero Day Initiative
Reviewed by:	kib
Security:	SA-20:23.sendmsg
Security:	CVE-2020-7460
Security:	ZDI-CAN-11543
2020-08-05 17:06:14 +00:00
Emmanuel Vadot
dfb4ecb38b linuxkpi: Add time_after32 and time_before32
This compare two 32 bits times

Sponsored by: The FreeBSD Foundation
Reviewed by:	kib, hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25700
2020-08-04 15:27:32 +00:00
Emmanuel Vadot
334680ab07 linuxkpi: Add clear_bit_unlock
This calls clear_bit and adds a memory barrier.

Sponsored by: The FreeBSD Foundation

Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25943
2020-08-04 15:25:22 +00:00
Emmanuel Vadot
38ba9c8bac Re-apply r363564.
We now have linux/sizes.h in the tree.
2020-08-04 14:53:41 +00:00
Emmanuel Vadot
2d946b2e12 linuxkpi: Add nested variant of mutex_lock_interruptible
We don't do anything with the _nesteds variant so just call mutex_lock_interruptible

Sponsoredby: The FreeBSD Foundation
Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25944
2020-08-04 14:45:22 +00:00
Emmanuel Vadot
7237a74f3b linuxkpi: Add kref_put_lock
Same as kref_put but in addition to calling the rel function it will
acquire the lock first.

Sponsored by: The FreeBSD Foundation
Reviewed by:	hselasky, emaste
Differential Revision:	https://reviews.freebsd.org/D25942
2020-08-04 14:44:16 +00:00
Emmanuel Vadot
16fdd8b7ad linuxkpi: Add linux/sizes.h
This file contain some defines for common sizes.

Sponsored-by: The FreeBSD Foundation
Reviewed by:	hselasky, emaste
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25941
2020-08-04 14:42:38 +00:00
Emmanuel Vadot
85d787b2fe Fix r363565
lockdep.h needs sys/lock.h for LOCK_CLASS
2020-07-26 18:33:29 +00:00
Emmanuel Vadot
cdb6eebe08 Revert r363564
linux/sizes.h doesn't exists in base ... sorry.
2020-07-26 17:21:24 +00:00
Emmanuel Vadot
0e4e9e8f34 linuxkpi: Add taint* defines
This isn't used for us but allow us to port drivers more easily.

Reviewed by:	hselasky
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25703
2020-07-26 16:31:49 +00:00
Emmanuel Vadot
f12af2b387 linuxkpi: Include hardirq.h in preempt.h and lockdep.h in hardirq.h
Linux does the same, this avoids ifdef or extra includes in ported drivers.

Reviewed by:	emaste, hselasky
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25702
2020-07-26 16:30:59 +00:00
Emmanuel Vadot
820272c408 linuxkpi: Include linux/sizes.h in dma-mapping.h
Linux does the same, this avoids ifdef or extra includes in ported drivers.

Reviewed by:	emaste, hselasky
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25701
2020-07-26 16:30:01 +00:00
Mark Johnston
94140f4781 usb(4): Stop checking for failures from malloc(M_WAITOK).
Handle the fact that parts of usb(4) can be compiled into the boot
loader, where M_WAITOK does not guarantee a successful allocation.

PR:		240545
Submitted by:	Andrew Reiter <arr@watson.org> (original version)
Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25706
2020-07-22 14:32:47 +00:00
Edward Tomasz Napierala
aa75412146 Make linux(4) support the BLKPBSZGET ioctl. Oracle uses it.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25694
2020-07-19 12:25:03 +00:00
Edward Tomasz Napierala
d5c5b4b382 Make linux fallocate(2) return EOPNOTSUPP, not ENOSYS, on unsupported mode,
as documented in the man page.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-07-18 12:21:08 +00:00
Edward Tomasz Napierala
eb6ae7576d Bump the default linux version from 3.2.0 to 3.10.0, which corresponds
to RHEL 7.  Required for DB2.

Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25656
2020-07-18 11:37:30 +00:00
Edward Tomasz Napierala
8d1d017175 Add a trivial linux(4) splice(2) implementation, which simply
returns EINVAL.  Fixes grep (grep-3.1-2build1).

PR:		kern/218699
Reported by:	avos
Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25636
2020-07-18 11:28:40 +00:00
Edward Tomasz Napierala
978ffef22f Add missing SysV IPC stats to linprocfs(4). Fixes 'ipcs -l',
and also helps Oracle.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25669
2020-07-18 10:56:04 +00:00
Edward Tomasz Napierala
7ce051e799 Fix bogomips calculation. Previously it was off by half. This was
verified under VMWare Fusion, comparing to what's reported under CentOS,
and by comparing numbers reported by linuxulator on T420 with a googled
up Linux cpuinfo (https://lkml.org/lkml/2011/11/29/116).

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20693
2020-07-18 10:53:56 +00:00
Edward Tomasz Napierala
8ba7dddd6f Fix two typos in flag names in /proc/cpuinfo.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25695
2020-07-18 10:49:17 +00:00
Vladimir Kondratyev
34c2f79d83 linuxkpi: Ignore NULL pointers passed to string parameter of kstr(n)dup
That follows Linux and fixes related drm-kmod-5.3 panic.

Reviewed by:	imp, hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25657
2020-07-14 21:56:59 +00:00
Alexander Leidinger
e37db348c1 Fix r363125 (Implement CLOCK_MONOTONIC_RAW (linux >= 2.6.28)),
by realy using the MONOTONIC version and not the REALTIME version.

Noticed by:	myfreeweb at github
2020-07-12 14:57:29 +00:00
Alexander Leidinger
8c2602f30f Implement CLOCK_MONOTONIC_RAW (linux >= 2.6.28).
It is documented as a raw hardware-based clock not subject to NTP or
incremental adjustments. With this "not as precise as CLOCK_MONOTONIC"
description in mind, map it to our CLOCK_MONOTNIC_FAST (the same
mapping as for the linux CLOCK_MONOTONIC_COARSE).

This is needed for the webcomponent of steam (chromium) and some
other steam component or game.

The linux-steam-utils port contains a LD_PRELOAD based fix for this.
There this is mapped to CLOCK_MONOTONIC.
As an untrained ear/eye (= the majority of people) is normaly not
noticing a difference of jitter in the 10-20 ms range, specially
if you don't pay attention like for example in a browser session
while watching a video stream, the mapping to CLOCK_MONOTONIC_FAST
seems more appropriate than to CLOCK_MONOTONIC.
2020-07-12 09:51:09 +00:00
Edward Tomasz Napierala
ce28fd95e0 Make linprocfs(5) report correct tty number in /proc/<PID>/stat.
Fixes sudo (sudo-1.8.21p2-3ubuntu1.2); previously would fail
with "sudo: no tty present and no askpass program specified".

Reviewed by:	kib, emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25588
2020-07-11 13:11:54 +00:00
Edward Tomasz Napierala
17f701a3fb Make linux stat(2) return the same st_dev for every devfs instance.
The reason for this is to work around an idiosyncrasy of glibc
getttynam(3) implementation: it checks whether st_dev returned for
fd 0 is the same as st_dev returned for the target of /proc/self/fd/0
symlink, and with linux chroots having their own devfs instance,
the check will fail if you chrooted into it.

PR:		kern/240767
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25559
2020-07-11 13:08:16 +00:00
Edward Tomasz Napierala
09c4e43d18 Don't emit warnings on MADV_HUGEPAGE; Firefox uses it a lot.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-07-10 21:41:09 +00:00
Hans Petter Selasky
127d8cfafb Implement the bitmap_subset() function in the LinuxKPI. This function
checks if the bitmap pointed to by the first argument is a subset of
the bitmap pointed to by the second argument. The function returns one
on success and zero on failure.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2020-07-10 12:06:18 +00:00
Hans Petter Selasky
d2890eeea1 Implement the array_size() function in the LinuxKPI. This function
basically multiplies its two arguments and returns SIZE_MAX if the
result overflows the size_t type.  Else the product of the two
arguments is returned.

Bump the FreeBSD_version to mitigate issues with existing
implementation of array_size() in drm-devel-kmod.

Discussed with:		manu@
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2020-07-10 11:27:54 +00:00
Kyle Evans
423a033ba7 memfd_create: turn on SHM_GROW_ON_WRITE
memfd_create fds will no longer require an ftruncate(2) to set the size;
they'll grow (to the extent that it's possible) upon write(2)-like syscalls.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D25502
2020-07-10 00:45:16 +00:00
Mark Johnston
866a5d1298 Regenerate.
Sponsored by:	The FreeBSD Foundation
2020-07-06 16:34:49 +00:00
Hans Petter Selasky
588fbadffb Fix include file order in io.h in the LinuxKPI.
Make sure sys/types.h is included before machine/vm.h.

PR:		247775
Submitted by:	pkubaj@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-07-05 19:38:36 +00:00
Edward Tomasz Napierala
4d2b7be54a Fix Linux recvmsg(2) when msg_namelen returned is 0. Previously
it would fail with EINVAL, breaking some of the Python regression
tests.

While here, cap the user-controlled message length.

Note that the code doesn't seem to be copying out the new length
in either (success or failure) case. This will be addressed separately.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25392
2020-07-05 10:57:28 +00:00
Edward Tomasz Napierala
7fcc9f7ee5 Add /proc/sys/kernel/tainted to linprocfs(5). Helps LTP.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25556
2020-07-04 11:26:03 +00:00
Edward Tomasz Napierala
8b99a63fd8 Make linprocfs(5) create /proc/bus/pci/devices/, and linsysfs(5)
create /sys/class/power_supply/.  This silences some warnings
from biology/linux-foldingathome.

Reported by:	0mp
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25557
2020-07-04 11:22:35 +00:00
Mateusz Guzik
d6d9ddd41f linux: fix ioctl performance for termios
TCGETS et al are frequently issued by Linux binaries while the previous code
avoidably ping-pongs a global sx lock and serializes on Giant.

Note that even with the fix the common case will serialize on a per-tty lock.
2020-07-04 06:25:41 +00:00
Konstantin Belousov
f334f212d9 linuxkpi: improvements for linux_pid_task() and linux_get_pid_task().
Unify functions bodies.
Do not call tdfind() if pid is passed, and do not call pfind() if tid
is supplied.

Reviewed by:	hselasky
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25534
2020-07-02 10:42:58 +00:00
Edward Tomasz Napierala
6d76adbb6d Rework linux accept(2). This makes the code flow easier to follow,
and fixes a bug where calling accept(2) could result in closing fd 0.

Note that the code still contains a number of problems: it makes
assumptions about l_sockaddr_in being the same as sockaddr_in,
the EFAULT-related code looks like it doesn't work at all, and the
socket type check is racy.  Those will be addressed later on;
I'm trying to work in small steps to avoid breaking one thing while
fixing another.

It fixes Redis, among other things.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25461
2020-07-01 10:37:08 +00:00
Hans Petter Selasky
9a4e535b39 The "pid" field in the LinuxKPI task struct is typically set to the thread ID
and not the process ID. Make sure the linux_task_exiting() function uses tdfind()
to lookup the BSD procedure structure pointer by the "pid" field, and only
fallback to pfind() when no match is found! This makes linux_task_exiting()
in line with the rest of the code.

Differential Revision: https://reviews.freebsd.org/D25509
Submitted by:	Greg V <greg@unrelenting.technology>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-07-01 08:23:57 +00:00
Edward Tomasz Napierala
c2da36fecd Make linprocfs(5) create the /proc/<PID>/task/ directores.
This is to silence down some Chromium assertions.

PR:		kern/240991
Analyzed by:	Alex S <iwtcex@gmail.com>
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25256
2020-06-30 16:24:28 +00:00
Edward Tomasz Napierala
9bc42c18cb Make linux(4) ignore SA_INTERRUPT. The zsh(1) binary from Bionic uses it.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25499
2020-06-30 16:18:09 +00:00
Hans Petter Selasky
d326a6c7c1 Document the is_signed(), type_max() and type_min() function macros in the
LinuxKPI. Try to make the function argument more readable.

Suggested by:	several
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-06-30 08:41:33 +00:00
Kyle Evans
97ce5033a8 linux: reposition the comment for bsd_to_linux_bits/linux_to_bsd_bits
rpokala notes that splitting the definitions like this is kind of silly,
since the comment applies to both.  Move the comment up (or the definition
down, depending on your perspective on life) accordingly.

Reported by:	rpokala
2020-06-29 17:47:00 +00:00
Hans Petter Selasky
d0eed838e3 Implement is_signed(), type_max() and type_min() function macros in the
LinuxKPI.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-06-29 13:08:40 +00:00
Kyle Evans
5403f186a7 linuxolator: implement memfd_create syscall
This effectively mirrors our libc implementation, but with minor fudging --
name needs to be copied in from userspace, so we just copy it straight into
stack-allocated memfd_name into the correct position rather than allocating
memory that needs to be cleaned up.

The sealing-related fcntl(2) commands, F_GET_SEALS and F_ADD_SEALS, have
also been implemented now that we support them.

Note that this implementation is still not quite at feature parity w.r.t.
the actual Linux version; some caveats, from my foggy memory:

- Need to implement SHM_GROW_ON_WRITE, default for memfd (in progress)
- LTP wants the memfd name exposed to fdescfs
- Linux allows open() of an fdescfs fd with O_TRUNC to truncate after dup.
  (?)

Interested parties can install and run LTP from ports (devel/linux-ltp) to
confirm any fixes.

PR:		240874
Reviewed by:	kib, trasz
Differential Revision:	https://reviews.freebsd.org/D21845
2020-06-29 03:09:14 +00:00
Mark Johnston
3507b8d467 Remove some redundant assignments and computations.
Reported by:	alc
Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25400
2020-06-28 21:34:38 +00:00
Edward Tomasz Napierala
4fe5361cbe Make linux(4) support SO_PROTOCOL. Running Python test suite
with python3.8 from Focal triggers those.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25491
2020-06-28 18:56:32 +00:00
Edward Tomasz Napierala
d5629eb216 Make linux(4) warn about unsupported SA_ flags.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25453
2020-06-27 15:50:35 +00:00
Mark Johnston
f4134e3d87 Implement an approximation of Linux MADV_DONTNEED semantics.
Linux MADV_DONTNEED is not advisory: it has side effects for anonymous
memory, and some system software depends on that.  In particular,
MADV_DONTNEED causes anonymous pages to be discarded.  If the mapping is
a private mapping of a named object then subsequent faults are to
repopulate the range from that object, otherwise pages will be
zero-filled.  For mappings of non-anonymous objects, Linux MADV_DONTNEED
can be implemented in the same way as our MADV_DONTNEED.

This implementation differs from Linux semantics in its handling of
private mappings, inherited through fork(), of non-anonymous objects.
After applying MADV_DONTNEED, subsequent faults will repopulate the
mapping from the parent object rather than the root of the shadow chain.

PR:		230160
Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D25330
2020-06-25 20:30:30 +00:00
Doug Moore
158c55a584 In r362552, RB_SET_PARENT is defined, and use in parens in
RB_CLEAR_NODE.  But it is not an expression, and ought not to be
enclosed in parens.  Remove them.

Approved by:	markj
Differential Revision:	https://reviews.freebsd.org/D25421
2020-06-23 22:47:54 +00:00
Doug Moore
4d56980017 Define RB_SET_PARENT to do all assignments to rb parent
pointers. Define RB_SWAP_CHILD to replace the child of a parent with
its twin, and use it in 4 places. Use RB_SET in rb_link_node to remove
the only linuxkpi reference to color, and then drop color- and
parent-related definitions that are defined and used only in rbtree.h.

This is intended to be entirely cosmetic, with no impact on program
behavior, and leave RB_PARENT and RB_SET_PARENT as the only ways to
read and write rb parent pointers.

Reviewed by:	markj, kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D25264
2020-06-23 20:02:55 +00:00
Thomas Munro
f270658873 vfs: track sequential reads and writes separately
For software like PostgreSQL and SQLite that sometimes reads sequentially
while also writing sequentially some distance behind with interleaved
syscalls on the same fd, performance is better on UFS if we do
sequential access heuristics separately for reads and writes.

Patch originally by Andrew Gierth in 2008, updated and proposed by me with
his permission.

Reviewed by:	mjg, kib, tmunro
Approved by:	mjg (mentor)
Obtained from:	Andrew Gierth <andrew@tao11.riddles.org.uk>
Differential Revision:	https://reviews.freebsd.org/D25024
2020-06-21 08:51:24 +00:00
Edward Tomasz Napierala
52c81be11a Add linux_madvise(2) instead of having Linux apps call the native
FreeBSD madvise(2) directly.  While some of the flag values match,
most don't.

PR:		kern/230160
Reported by:	markj
Reviewed by:	markj
Discussed with:	brooks, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25272
2020-06-20 18:29:22 +00:00
Edward Tomasz Napierala
4afe4fae1b Add warnings for unsupported Linux clockids.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25322
2020-06-19 19:33:06 +00:00
Mark Johnston
0f1e6ec591 Add a helper function for validating VA ranges.
Functions which take untrusted user ranges must validate against the
bounds of the map, and also check for wraparound.  Instead of having the
same logic duplicated in a number of places, add a function to check.

Reviewed by:	dougm, kib
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25328
2020-06-19 03:32:04 +00:00
Konstantin Belousov
8a15ac8378 Fix execution of linux binary from multithreaded non-Linux process.
If multithreaded non-Linux process execs Linux binary, then non-Linux
threads different from the one that execing are cleared by
single-threading at boundary, and then terminating them in
post_execve(). Since at that time the process is already switched to
linux ABI, linuxolator is involved in the thread clearing on boundary,
but cannot find the emul data.

Handle it by pre-creating emuldata for all threads in the execing process.

Also remove a code in linux_proc_exec() handler that cleared emul data
for other threads when execing from multithreaded Linux process. It is
excessive.

PR:	247020
Reported by:	Martin FIlla <freebsd@sysctl.cz>
Reported by:	Henrique L. Amorim, Independent Security Researcher
Reported by:	Rodrigo Rubira Branco (BSDaemon), Amazon Web Services
Reviewed by:	markj
Tested by:	trasz
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25293
2020-06-18 20:49:56 +00:00
Edward Tomasz Napierala
3d8dd98381 Make Linux uname(2) return x86_64 to 32-bit apps. This helps Steam.
PR:		kern/240432
Analyzed by by:	Alex S <iwtcex@gmail.com>
Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25248
2020-06-15 20:12:10 +00:00
Edward Tomasz Napierala
889cd28520 Make linux(4) warn about unsupported CMSG level/type.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25255
2020-06-14 14:38:40 +00:00
Doug Moore
9f1041dc2e Linuxkpi uses the rb-tree structures without using their interfaces,
making them break when the representation changes. Revert changes that
eliminated the color field from rb-trees, leaving everything as it was
before.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D25250
2020-06-13 01:54:09 +00:00
Doug Moore
13dca1937f Revert r362108, as it breaks compilation. 2020-06-12 17:48:12 +00:00
Doug Moore
3159ceca97 The linuxkpi code accesses left/right rb tree pointers without using
RB_LEFT or RB_RIGHT, so they aren't stripping off the color bit
encoded there. Strip off that bit for linuxkpi.

Reported by:	dch
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D25245
2020-06-12 16:51:55 +00:00
Edward Tomasz Napierala
462171d9aa Add compat.linux.debug sysctl, to make it possible to silence down
the debug messages. While here, clean up some variable naming.

Reviewed by:	bcr (manpages), emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25230
2020-06-12 14:37:50 +00:00
Edward Tomasz Napierala
599dadca55 Fix naming clash.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-06-12 14:31:19 +00:00
Edward Tomasz Napierala
34ff0c0e6a Make linux(4) warn about unsupported fcntls.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25231
2020-06-12 14:25:32 +00:00
Edward Tomasz Napierala
4beacc3b1d Minor code cleanup; no functional changes.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25232
2020-06-12 14:23:10 +00:00
Edward Tomasz Napierala
86e794eb65 Don't use newlines with linux_msg(). No functional changes.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-06-11 14:57:30 +00:00
Edward Tomasz Napierala
bc8e281082 Replace LINUX_FASYNC with LINUX_O_ASYNC; no functional changes.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25218
2020-06-11 14:09:43 +00:00
Edward Tomasz Napierala
433d61a573 Improve the warnings.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-06-11 12:35:00 +00:00
Edward Tomasz Napierala
3bc69ad9b3 Make linux(4) handle SO_REUSEPORT.
Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25216
2020-06-11 12:25:49 +00:00
Mark Johnston
479f70ef24 Fix a couple of nits in Linux sysinfo(2) emulation.
- Use the same definition of free memory as Linux.
- Rename the totalbig and freebig fields to match the corresponding
  names on Linux.

Discussed with:	alc
MFC after:	1 week
2020-06-10 23:52:50 +00:00
Mark Johnston
27e4374dd4 Add a comment reflecting the commit log for r361945.
Suggested by:	alc
Reviewed by:	alc
MFC with:	r361945
2020-06-10 23:52:39 +00:00
Edward Tomasz Napierala
8c5059e9ea Make linux(4) set the openfiles soft resource limit to 1024 for Linux
applications, which often depend on this being the case.  There's a new
sysctl, compat.linux.default_openfiles, to control this behaviour.

Reviewed by:	kevans, emaste, bcr (manpages)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25177
2020-06-10 18:50:46 +00:00
Edward Tomasz Napierala
c31a6a6612 Support SO_SNDBUFFORCE/SO_RCVBUFFORCE by aliasing them to the
standard SO_SNDBUF/SO_RCVBUF.  Mostly cosmetics, to get rid
of the warning during 'apt upgrade'.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25173
2020-06-10 18:43:43 +00:00
Doug Moore
36ba4b393f To reduce the size of an rb_node, drop the color field. Set the least
significant bit in the pointer to the node from its parent to indicate
that the node is red. Have the tree rotation macros leave the
old-parent/new-child node red and the new-parent/old-child node black.

This change makes RB_LEFT and RB_RIGHT no longer assignable, and
RB_COLOR no longer defined. Any code that modifies the tree or
examines a node color would have to be modified after this change.

Reviewed by:	markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D25105
2020-06-09 20:19:11 +00:00
John Baldwin
58b552dcec Refactor ptrace() ABI compatibility.
Add a freebsd32_ptrace() and move as many freebsd32 shims as possible
to freebsd32_ptrace().  Aside from register sets, freebsd32 passes
pointers to native structures to kern_ptrace() and converts to/from
native/32-bit structure formats in freebsd32_ptrace() outside of
kern_ptrace().

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D25195
2020-06-09 16:43:23 +00:00
Mark Johnston
3e5fae34fc Stop computing a "sharedram" value when emulating Linux sysinfo(2).
The previous code was computing an incorrect value in a very expensive
manner.  "sharedram" is supposed to be the amount of memory used by
named swap objects, which on FreeBSD basically corresponds to memory
usage by shared memory objects (including, for example, GEM objects) and
tmpfs.  We currently have no cheap way to count such pages.  The
previous code tried to determine the number of copy-on-write pages
shared between processes.

Just replace the computed value with 0.  illumos reportedly does the
same thing.  Linux itself did not populate this field until a 2014
commit, "mm: export NR_SHMEM via sysinfo(2) / si_meminfo() interfaces".

Reported by:	mjg
MFC after:	1 week
2020-06-08 22:29:52 +00:00
Hans Petter Selasky
c51613866f Ensure pci_channel_offline() actually queries the PCI register space,
and not only the software cache of that register.  Else
pci_channel_offline() won't detect that the PCI device is gone when
using the LinuxKPI.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-06-05 08:12:08 +00:00
Hans Petter Selasky
d053391cd7 Implement __is_constexpr() function macro in the LinuxKPI.
Bump the FreeBSD version.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-06-02 12:23:04 +00:00
Hans Petter Selasky
ef5f8c18b5 Implement struct_size() function macro in the LinuxKPI.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-06-02 10:19:45 +00:00
Hans Petter Selasky
c185f13b92 Implement BUILD_BUG_ON_ZERO() in the LinuxKPI.
Tested using gcc and clang.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-06-02 09:45:43 +00:00
Rick Macklem
c01cd3f558 Update the files created from the new syscalls.master from r361599.
Reviewed by:	brooks
Differential Revision:	https://reviews.freebsd.org/D24949
2020-05-28 21:23:02 +00:00
Rick Macklem
d9021e389a Add a syscall for the nfs-over-tls daemons to use.
The nfs-over-tls daemons need a system call to perform operations such as
associate a file descriptor with a krpc socket.
The daemons will not be in head for some time, but it will make it
easier for testers of nfs-over-tls to do testing if the system call
is in head (basically the stub for libc which will be commited soon).

Reviewed by:	brooks
Differential Revision:	https://reviews.freebsd.org/D24949
2020-05-28 21:06:10 +00:00
Emmanuel Vadot
8e52f22b25 linuxkpi: Add kstrtou16
This function convert a char * to a u16.
Simply use strtoul and cast to compare for ERANGE

Sponsored-by: The FreeBSD Foundation
Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D24996
2020-05-27 11:42:09 +00:00
Emmanuel Vadot
9b703f3082 linuxkpi: Add rcu_swap_protected
This macros swap an rcu pointer with a normal pointer.
The condition only seems to be used for debug/warning under linux, ignore
for now.

Sponsored-by: The FreeBSD Foundation
Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D24954
2020-05-27 10:01:30 +00:00
Emmanuel Vadot
8287045dde linuxkpi: Add overflow.h
Only add check_add_overflow and check_mul_overflow as those are the only
two needed function by DRM v5.3.
Both gcc and clang have builtin to do this check so use them directly
but throw an error if the compiler/code checker doesn't support this builtin.

Sponsored-by: The FreeBSD Foundation
Reviewed by:	hselsasky
Differential Revision:	https://reviews.freebsd.org/D25015
2020-05-27 09:31:50 +00:00
Emmanuel Vadot
42f0f394f7 linuxkpi: Fix mod_timer and del_timer_sync
mod_timer is supposed to return 1 if the modified timer was pending, which
is exactly what callout_reset does so return the value after checking
that it's a correct one in case the api change.
del_timer_sync returns int so add a function and handle that.

Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D24983
2020-05-25 12:46:05 +00:00
Emmanuel Vadot
4efd5dd70d linuxkpi: Add refcount.h
Implement some refcount functions needed by drm.
Just use the atomic_t struct and functions from linuxkpi for simplicity.

Sponsored-by: The FreeBSD Foundation

Reviewed by:	hselsasky
Differential Revision:	https://reviews.freebsd.org/D24985
2020-05-25 12:44:07 +00:00
Emmanuel Vadot
93d70cd3fe linuxkpi: Add __same_type and __must_be_array macros
The same_type macro simply wraps around builtin_types_compatible_p which
exist for both GCC and CLANG, which returns 1 if both types are the same.
The __must_be_array macros returns 1 if the argument is an array.

This is needed for DRM v5.3

Sponsored-by: The FreeBSD Foundation
Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D24953
2020-05-25 12:42:55 +00:00
Emmanuel Vadot
af300929f9 linuxkpi: Add prandom_u32_max
This is just a wrapper around arc4random_uniform
Needed by DRM v5.3

Sponsored-by: The FreeBSD Foundation
Reviewed by:	cem, hselasky
Differential Revision:	https://reviews.freebsd.org/D24961
2020-05-23 17:52:25 +00:00
Emmanuel Vadot
2491b25c3a linuxkpi: Add rcu_work functions
The rcu_work function helps to queue some work after waiting for a grace
period.
This is needed by DRM drivers.

Sponsored-by: The FreeBSD Foundation
Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D24942
2020-05-21 20:18:38 +00:00
Emmanuel Vadot
eda697d2ef linuxkpi: Add irq_work.h
Since handlers are call in a thread context we can simply use a workqueue
to emulate those functions.
The DRM code was patched to do that already, having it in linuxkpi allows us
to not patch the upstream code.

Sponsored-by: The FreeBSD Foundation
Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D24859
2020-05-19 09:04:35 +00:00
Emmanuel Vadot
5e30a739c7 linuxkpi: add pci_dev_present
pci_dev_present shows if a set of pci ids are present in the system.
It just wraps pci_find_device.
Needed by DRMv5.2

Submitted by:	Austing Shafer (ashafer@badland.io)
Differential Revision:	https://reviews.freebsd.org/D24796
2020-05-19 08:44:33 +00:00
Emmanuel Vadot
ff443195bf linuxkpi: Add __init_waitqueue_head
The only difference with init_waitqueue_head is that the name and the
lock class key are provided but we don't use those so use init_waitqueue_head
directly.

Sponsored-by: The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24861
2020-05-19 08:43:17 +00:00
Emmanuel Vadot
355711ea76 linuxkpi: Add offsetofend macro
This calculate the offset of the end of the member in the given struct.
Needed by DRM in Linux v5.3

Sponsored-by: The FreeBSD Foudation
Differential Revision:	https://reviews.freebsd.org/D24849
2020-05-17 20:14:49 +00:00
Emmanuel Vadot
d003cc4318 linuxkpi: Add __mutex_init
Same as mutex_init, the lock_class_key argument seems to be only used for
debug in Linux, simply ignore it for now.
Needed by DRM in Linux v5.3

Sponsored-by: The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24848
2020-05-17 20:12:16 +00:00
Emmanuel Vadot
7708d3d765 linuxkpi: Add atomic_dec_and_mutex_lock
This function decrement the counter and if the result is 0 it acquires
the mutex and returns 1, if not it simply returns 0.
Needed by DRM from Linux v5.3

Sponsored-by: The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24847
2020-05-17 20:09:11 +00:00
Hans Petter Selasky
cf9f2ca3ef Implement synchronize_srcu_expedited() in the LinuxKPI.
Differential Revision:	https://reviews.freebsd.org/D24798
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-05-16 14:27:50 +00:00
Emmanuel Vadot
cfa985350d linuxkpi: Add EBADRQC to errno.h
This is used in the amdgpu driver from Linux 5.2

Sponsored-by: The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24807
2020-05-13 07:49:12 +00:00
Andriy Gapon
a164a32b4d linuxkpi: print stack trace in WARN_ON macros
Reviewed by:	hselasky, kib
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D24779
2020-05-13 07:47:56 +00:00
Emmanuel Vadot
3d84874da0 linuxkpi: Really add bitmap_alloc and bitmap_zalloc
This was missing in r360870

Sponsored-by: The FreeBSD Foundation
2020-05-10 13:12:05 +00:00
Emmanuel Vadot
ce03b3013f linuxkpi: Add bitmap_alloc and bitmap_free
This is a simple call to kmallock_array/kfree, therefore include linux/slab.h as
this is where the kmalloc_array/kfree definition is.

Sponsored-by: The FreeBSD Foundation
Reviewed by:	hselsasky
Differential Revision:	https://reviews.freebsd.org/D24794
2020-05-10 13:07:00 +00:00
Emmanuel Vadot
26a578697c linuxkpi: Add bitmap_copy and bitmap_andnot
bitmap_copy simply copy the bitmaps, no idea why it exists.
bitmap_andnot is similar to bitmap_and but uses !src2.

Sponsored-by: The FreeBSD Foundation
Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D24782
2020-05-09 17:52:50 +00:00
Emmanuel Vadot
4c27484934 linuxkpi: Add pci_iomap and pci_iounmap
Those function are use to map/unmap io region of a pci device.
Different resource can be mapped depending on the bar so use a
tailq to store them all.

Sponsored-by: The FreeBSD Foundation

Reviewed by:	emaste, hselasky
Differential Revision:	https://reviews.freebsd.org/D24696
2020-05-07 17:00:51 +00:00
Hans Petter Selasky
5e6233ccab Optimise use of sg_page_count() in __sg_page_iter_next() in the LinuxKPI.
No need to compute value twice.

No functional change intended.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-05-04 10:10:07 +00:00
Hans Petter Selasky
fe4b041a14 Implement more scatter and gather functions in the LinuxKPI.
Differential Revision:	https://reviews.freebsd.org/D24611
Submitted by:	ashafer_badland.io (Austin Shafer)
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-05-04 09:58:45 +00:00
Hans Petter Selasky
42f8ef4bf5 Fix warning about sleeping with non-sleepable lock when allocating
"current" from linux_cdev_pager_populate() in the LinuxKPI:

Backtrace:
witness_debugger()
witness_warn()
uma_zalloc_arg()
malloc()
linux_alloc_current()
linux_cdev_pager_populate()
vm_fault()
vm_fault_trap()
trap_pfault()
trap()
calltrap()

Suggested by:	avg@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-05-04 08:05:01 +00:00
Hans Petter Selasky
b4edb17c82 Implement more PCI-express bandwidth functions in the LinuxKPI.
Submitted by:	ashafer_badland.io (Austin Shafer)
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-05-01 10:32:42 +00:00
Hans Petter Selasky
1bbbe083a1 Implement mutex_lock_killable() in the LinuxKPI.
Submitted by:	ashafer_badland.io (Austin Shafer)
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-05-01 10:28:21 +00:00
Hans Petter Selasky
3ff7ec1cc1 Implement DIV64_U64_ROUND_UP() in the LinuxKPI.
Submitted by:	ashafer_badland.io (Austin Shafer)
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-05-01 10:25:07 +00:00
Hans Petter Selasky
922106bf00 Implement more lockdep macros in the LinuxKPI.
Submitted by:	ashafer_badland.io (Austin Shafer)
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-05-01 10:18:07 +00:00
Hans Petter Selasky
61f7fe6b2d Implement kstrtou64() in the LinuxKPI.
Submitted by:	ashafer_badland.io (Austin Shafer)
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-05-01 10:14:45 +00:00
Kyle Evans
2c9c433e17 sysent: re-roll after 360236 (AUE_CLOSERANGE used) 2020-04-24 01:30:33 +00:00
Kyle Evans
3e6b82913d close_range(2): use newly assigned AUE_CLOSERANGE 2020-04-24 01:30:00 +00:00
Hans Petter Selasky
253dbe7487 Factor code in LinuxKPI to allow attach and detach using any BSD device.
This allows non-LinuxKPI based infiniband device drivers to attach
correctly to ibcore.

No functional change intended.

Reviewed by:	np @
Differential Revision:	https://reviews.freebsd.org/D24514
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-04-22 14:33:25 +00:00
Hans Petter Selasky
b9bf16adfb Implement the atomic fetch add unless functions for the LinuxKPI.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-04-20 16:21:37 +00:00
Hans Petter Selasky
fdbfa4f19e Implement aligned LinuxKPI types for u16, u32 and u64.
Makes a difference for 32-bit platforms mostly.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-04-20 14:03:05 +00:00
Hans Petter Selasky
07fdea3672 Allow test_bit() in the LinuxKPI to accept a const pointer.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-04-20 13:47:15 +00:00
Hans Petter Selasky
47c0672b08 Allow the ERR_CAST() function in the LinuxKPI to take a const void pointer.
No functional change.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-04-20 13:36:01 +00:00
Mark Johnston
546c117f86 Remove a vestigal reference to kmem_object.
kmem_object has been an alias of kernel_object for a while.

MFC after:	1 week
2020-04-17 19:12:52 +00:00
Brooks Davis
b24e6ac8b7 Convert canary, execpathp, and pagesizes to pointers.
Use AUXARGS_ENTRY_PTR to export these pointers.  This is a followup to
r359987 and r359988.

Reviewed by:	jhb
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24446
2020-04-16 21:53:17 +00:00
Brooks Davis
9df1c38bbc Export argc, argv, envc, envv, and ps_strings in auxargs.
This simplifies discovery of these values, potentially with reducing the
number of syscalls we need to make at runtime.  Longer term, we wish to
convert the startup process to pass an auxargs pointer to _start() and
use that rather than walking off the end of envv.  This is cleaner,
more C-friendly, and for systems with strong bounds (e.g. CHERI)
necessary.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24407
2020-04-15 20:23:55 +00:00
Brooks Davis
397df744f9 Make ps_strings in struct image_params into a pointer.
This is a prepratory commit for D24407.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA
2020-04-15 20:21:30 +00:00
Brooks Davis
618a20d4f9 Remove bogus use of useracc() in (clock_)nanosleep.
There's no point in pre-checking that we can access the user's rmtp
pointer before we do it in copyout().

While here, improve style(9) compliance.

Reviewed by:	imp
MFC after:	1 week
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24409
2020-04-14 20:53:12 +00:00
Brooks Davis
562894f0dc Centralize compatability translation macros.
Copy the CP, PTRIN, etc macros from freebsd32.h into a sys/abi_compat.h
and replace existing definitation with includes where required. This
eliminates duplicate code and allows Linux and FreeBSD compatability
headers to be included in the same files.

Input from:	cem, jhb
Obtained from:	CheriBSD
MFC after:	2 weeks
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24275
2020-04-14 20:30:48 +00:00
Kyle Evans
e19b97f7a0 sysent: re-roll after r359930 2020-04-14 18:11:26 +00:00
Kyle Evans
7d03e08112 Mark closefrom(2) COMPAT12, reimplement in libc to wrap close_range
Include a temporarily compatibility shim as well for kernels predating
close_range, since closefrom is used in some critical areas.

Reviewed by:	markj (previous version), kib
Differential Revision:	https://reviews.freebsd.org/D24399
2020-04-14 18:07:42 +00:00
Kyle Evans
3d224fc909 sysent: re-roll after introduction of close_range in r359836 2020-04-12 21:23:51 +00:00
Kyle Evans
472ced39ef Implement a close_range(2) syscall
close_range(min, max, flags) allows for a range of descriptors to be
closed. The Python folk have indicated that they would much prefer this
interface to closefrom(2), as the case may be that they/someone have special
fds dup'd to higher in the range and they can't necessarily closefrom(min)
because they don't want to hit the upper range, but relocating them to lower
isn't necessarily feasible.

sys_closefrom has been rewritten to use kern_close_range() using ~0U to
indicate closing to the end of the range. This was chosen rather than
requiring callers of kern_close_range() to hold FILEDESC_SLOCK across the
call to kern_close_range for simplicity.

The flags argument of close_range(2) is currently unused, so any flags set
is currently EINVAL. It was added to the interface in Linux so that future
flags could be added for, e.g., "halt on first error" and things of this
nature.

This patch is based on a syscall of the same design that is expected to be
merged into Linux.

Reviewed by:	kib, markj, vangyzen (all slightly earlier revisions)
Differential Revision:	https://reviews.freebsd.org/D21627
2020-04-12 21:23:19 +00:00
Hans Petter Selasky
eae5868ce9 Clone the RCU interface into a sleepable and a non-sleepable part
in the LinuxKPI.

This allows synchronize RCU to be used inside a SRCU read section.
No functional change intended.

Bump the __FreeBSD_version to force recompilation of external kernel modules.

PR:		242272
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-04-08 17:09:45 +00:00
Hans Petter Selasky
61d82b0794 Some fixes for SRCU in the LinuxKPI.
- Make sure to use READ_ONCE() when deferring variables.
- Remove superfluous zero initializer.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-04-08 16:07:57 +00:00
John Baldwin
59838c1a19 Retire procfs-based process debugging.
Modern debuggers and process tracers use ptrace() rather than procfs
for debugging.  ptrace() has a supserset of functionality available
via procfs and new debugging features are only added to ptrace().
While the two debugging services share some fields in struct proc,
they each use dedicated fields and separate code.  This results in
extra complexity to support a feature that hasn't been enabled in the
default install for several years.

PR:		244939 (exp-run)
Reviewed by:	kib, mjg (earlier version)
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D23837
2020-04-01 19:22:09 +00:00
Mark Johnston
4596ac234e compat/linux/linux.h depends on queue.h since r353725.
Sponsored by:	The FreeBSD Foundation
2020-03-26 17:12:55 +00:00
Warner Losh
9275cd0dc5 Implement a workaround for kms-drm modules
pci_iov_if.h was added to pci.h, but none of the kms-drm branches have
that. Rather than play whack a mole with the branches, move its inclusion to
linux_pci.c which is the only part of the code that needs it now.

Longer term, other solutions will be needed, but this gives us time to get those
deployed on all the supported versions.
2020-03-20 15:07:25 +00:00
Konstantin Belousov
2928e60e55 linuxkpi: Add infrastructure to pass FreeBSD IOV method calls into
pci_driver methods.

Reviewed by:	hselasky
Sponsored by:	Mellanox Technologies
MFC after:	2 weeks
2020-03-18 22:10:49 +00:00
Hans Petter Selasky
d845d3dc9a Add support for the device statistics IOCTL, needed by the coming
linux_libusb upgrade.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2020-03-10 15:56:49 +00:00
Tijl Coosemans
b4147bf6b4 Move compat.linux.map_sched_prio sysctl definition to linux_mib.c so it is
only defined by linux_common kernel module and not both linux and linux64
modules.

Reported by:	Yuri Pankov <ypankov@fastmail.com>
2020-03-05 14:41:27 +00:00
Brooks Davis
d718de812f Introduce kern_mmap_req().
This presents an extensible interface to the generic mmap(2)
implementation via a struct pointer intended to use a designated
initializer or compount literal.  We take advantage of the mandatory
zeroing of fields not listed in the initializer.

Remove kern_mmap_fpcheck() and use kern_mmap_req().

The motivation for this change is a desire to keep the core
implementation from growing an ever-increasing number of arguments
that must be specified in the correct order for the lowest-level
implementations.  In CheriBSD we have already added two more arguments.

Reviewed by:	kib
Discussed with:	kevans
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D23164
2020-03-04 21:27:12 +00:00
Hans Petter Selasky
1328771d9d When closing a LinuxKPI file always use the real release function to avoid
resource leakage when destroying a LinuxKPI character device.

Submitted by:	Andrew Boyer <aboyer@pensando.io>
Reviewed by:	kib@
PR:		244572
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-03-03 15:49:34 +00:00
Mateusz Guzik
8d03b99b9d fd: move vnodes out of filedesc into a dedicated structure
The new structure is copy-on-write. With the assumption that path lookups are
significantly more frequent than chdirs and chrooting this is a win.

This provides stable root and jail root vnodes without the need to reference
them on lookup, which in turn means less work on globally shared structures.
Note this also happens to fix a bug where jail vnode was never referenced,
meaning subsequent access on lookup could run into use-after-free.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23884
2020-03-01 21:53:46 +00:00
Tijl Coosemans
f8b9b299a2 linuxulator: Map scheduler priorities to Linux priorities.
On Linux the valid range of priorities for the SCHED_FIFO and SCHED_RR
scheduling policies is [1,99].  For SCHED_OTHER the single valid priority is
0.  On FreeBSD it is [0,31] for all policies.  Programs are supposed to
query the valid range using sched_get_priority_(min|max), but of course some
programs assume the Linux values are valid.

This commit adds a tunable compat.linux.map_sched_prio.  When enabled
sched_get_priority_(min|max) return the Linux values and sched_setscheduler
and sched_(get|set)param translate between FreeBSD and Linux values.

Because there are more Linux levels than FreeBSD levels, multiple Linux
levels map to a single FreeBSD level, which means pre-emption might not
happen as it does on Linux, so the tunable allows to disable this behaviour.
It is enabled by default because I think it is unlikely that anyone runs
real-time software under Linux emulation on FreeBSD that critically relies
on correct pre-emption.

This fixes FMOD, a commercial sound library used by several games.

PR:		240043
Tested by:	Alex S <iwtcex@gmail.com>
Reviewed by:	dchagin
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D23790
2020-03-01 13:12:04 +00:00
Edward Tomasz Napierala
5d481ad8df Make linuxulator warn about unsupported getsockopt/setsockopt flags.
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D23791
2020-02-27 19:40:20 +00:00
Hans Petter Selasky
77632fc70f Extend the range of the return value from nsecs_to_jiffies64() to support
Mesa's drm_syncobj usage, in the LinuxKPI.

While at it optimise the jiffies conversion functions to avoid repeated
and constant calculations.

Submitted by:	Greg V <greg@unrelenting.technology>
Differential Revision:	https://reviews.freebsd.org/D23846
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-02-27 15:21:05 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Emmanuel Vadot
1179b649cf linuxkpi: Move shmem related functions in it's own file
For drmkpi (D23085) we don't want the Linux struct file as we don't emulate
everything. Also the prototypes should be in shmem_fs.h to have 100%
compatibility with Linux.

Reviewed by:	hselasky
MFC after:	Maybe
Differential Revision:	https://reviews.freebsd.org/D23764
2020-02-21 09:28:45 +00:00
Emmanuel Vadot
1a7ba9a01c linuxkpi: Add str_has_prefix
This function test if the string str begins with the string pointed
at by prefix.

Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23767
2020-02-20 17:20:50 +00:00
Emmanuel Vadot
8f0c734385 linuxkpi: Add list_is_first function
This function just test if the element is the first of the list.

Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23766
2020-02-20 17:19:16 +00:00
Mateusz Guzik
65cdfb4caa make sysent for r358172 ("vfs: add realpathat syscall") 2020-02-20 16:58:57 +00:00
Mateusz Guzik
0573d0a9b8 vfs: add realpathat syscall
realpath(3) is used a lot e.g., by clang and is a major source of getcwd
and fstatat calls. This can be done more efficiently in the kernel.

This works by performing a regular lookup while saving the name and found
parent directory. If the terminal vnode is a directory we can resolve it using
usual means. Otherwise we can use the name saved by lookup and resolve the
parent.

See the review for sample syscall counts.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23574
2020-02-20 16:58:19 +00:00
Pawel Biernacki
39a3542bef Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (2 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked). Use it in
preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Reviewed by:	hselasky, kib, zeising
Approved by:	kib (mentor)
Differential Revision:	https://reviews.freebsd.org/D23631
2020-02-15 18:54:59 +00:00
Mateusz Guzik
5af9cdaf8a cloudabi: use new capsicum helpers 2020-02-15 01:29:58 +00:00
Ed Maste
fe16bad415 regen sysent after r357831, r357838
Capability mode changes allowing fdatasync and getloginclass.

Sponsored by:	The FreeBSD Foundation
2020-02-12 19:05:10 +00:00
Edward Tomasz Napierala
0b40dcbe32 Make linux(4) use kern_socketpair(9) instead of going through
sys_socketpair().  It's a cleanup; no functional changes.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22814
2020-02-10 13:24:14 +00:00
Konstantin Belousov
f88c67a625 Regen. 2020-02-09 11:53:37 +00:00
Konstantin Belousov
146fc63fce Add a way to manage thread signal mask using shared word, instead of syscall.
A new syscall sigfastblock(2) is added which registers a uint32_t
variable as containing the count of blocks for signal delivery.  Its
content is read by kernel on each syscall entry and on AST processing,
non-zero count of blocks is interpreted same as the signal mask
blocking all signals.

The biggest downside of the feature that I see is that memory
corruption that affects the registered fast sigblock location, would
cause quite strange application misbehavior. For instance, the process
would be immune to ^C (but killable by SIGKILL).

With consumers (rtld and libthr added), benchmarks do not show a
slow-down of the syscalls in micro-measurements, and macro benchmarks
like buildworld do not demonstrate a difference. Part of the reason is
that buildworld time is dominated by compiler, and clang already links
to libthr. On the other hand, small utilities typically used by shell
scripts have the total number of syscalls cut by half.

The syscall is not exported from the stable libc version namespace on
purpose.  It is intended to be used only by our C runtime
implementation internals.

Tested by:	pho
Disscussed with:	cem, emaste, jilles
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D12773
2020-02-09 11:53:12 +00:00
Konstantin Belousov
8e3d7caee5 linux futex_put(): do not touch futex after dropping our reference.
Reported and tested by:	Steve Roome <me@stephenroome.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-02-07 22:21:44 +00:00
Ed Maste
fc7510aef7 linuxulator: implement sendfile
Submitted by:	Bora Özarslan <borako.ozarslan@gmail.com>
Submitted by:	Yang Wang <2333@outlook.jp>
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19917
2020-02-05 16:53:02 +00:00
Konstantin Belousov
a421e8786b Add sys/systm.h to several places that use vm headers.
It is needed (but not enough) to use e.g. KASSERT() in inline functions.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-02-04 18:56:26 +00:00
Dmitry Chagin
7bc05ae6bb Fix clock_gettime() and clock_getres() for cpu clocks:
- handle the CLOCK_{PROCESS,THREAD}_CPUTIME_ID specified directly;
- fix thread id calculation as in the Linuxulator we should
  convert the user supplied thread id to struct thread * by linux_tdfind();
- fix CPUCLOCK_SCHED case by using kern_{process,thread}_cputime()
  directly as native get_cputime() used by kern_clock_gettime() uses
  native tdfind()/pfind() to find proccess/thread.

PR:			240990
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D23341
MFC after:		2 weeks
2020-02-04 05:27:05 +00:00
Dmitry Chagin
2506c76121 linux_to_native_clockid() properly initializes nwhich variable (or return error),
so don't initialize nwhich in declaration and remove stale comment from r161304.

Reviewed by:		emaste
Differential Revision:	https://reviews.freebsd.org/D23339
MFC after:		2 weeks
2020-02-04 05:23:34 +00:00
Mateusz Guzik
52604ed792 fd: remove the seq argument from fget_unlocked
It is almost always NULL.
2020-02-03 22:27:55 +00:00
Mateusz Guzik
7739d92766 cache: replace kern___getcwd with vn_getcwd
The previous routine was resulting in extra data copies most notably in
linux_getcwd.
2020-02-01 20:38:38 +00:00
Edward Tomasz Napierala
c2d4745705 Add TCP_CORK support to linux(4). This fixes one of the things Nginx
trips over.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23171
2020-01-28 13:57:24 +00:00
Edward Tomasz Napierala
da6d8ae6d8 Add compat.linux.ignore_ip_recverr sysctl. This is a workaround
for missing IP_RECVERR setsockopt(2) support. Without it, DNS
resolution is broken for glibc >= 2.30 (glibc BZ #24047).

From the user point of view this fixes "yum update" on recent
CentOS 8.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23234
2020-01-28 13:51:53 +00:00
Konstantin Belousov
2a3529df1d Provide support for fdevname(3) on linuxkpi-backed devices.
Reported and tested by:	manu
Reviewed by:	hselasky, manu
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23386
2020-01-28 11:22:20 +00:00
Hans Petter Selasky
b9a6330da3 Implement mmget_not_zero() in the LinuxKPI.
Submitted by:	Austin Shafer <ashafer@badland.io>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-01-24 13:05:53 +00:00
Edward Tomasz Napierala
618b55c2e2 Make linux(4) handle MAP_32BIT.
This unbreaks Mono (mono-devel-4.6.2.7+dfsg-1ubuntu1 from Ubuntu Bionic);
previously would crash on "amd64_is_imm32" assert.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23306
2020-01-24 12:08:23 +00:00
Edward Tomasz Napierala
b3fb13eb55 Add kern_unmount() and use in Linuxulator. No functional changes.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22646
2020-01-24 11:57:55 +00:00
Gleb Smirnoff
c57b57da35 Remove comment that no longer describe reality. 2020-01-22 05:32:23 +00:00
Edward Tomasz Napierala
10f2d3f857 Revert r356948; breaks build somehow. 2020-01-21 20:32:49 +00:00
Edward Tomasz Napierala
c5f4e26e7d Make linux(4) handle MAP_32BIT.
This unbreaks Mono (mono-devel-4.6.2.7+dfsg-1ubuntu1 from Ubuntu Bionic);
previously would crash on "amd64_is_imm32" assert.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-01-21 19:19:02 +00:00
Mark Johnston
149afbf3ba Fix 64-bit syscall argument fetching in 32-bit Linux syscall handlers.
The Linux32 system call argument fetcher places each argument (passed in
registers in the Linux x86 system call convention) into an entry in the
generic system call args array.  Each member of this array is 8 bytes
wide, so this approach is broken for system calls that take off_t
arguments.

Fix the problem by splitting l_loff_t arguments in the 32-bit system
call descriptions, the same as we do for FreeBSD32.  Change entry points
to handle this using the PAIR32TO64 macro.

Move linux_ftruncate64() into compat/linux.

PR:		243155
Reported by:	Alex S <iwtcex@gmail.com>
Reviewed by:	kib (previous version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23210
2020-01-21 17:28:22 +00:00
Edward Tomasz Napierala
66632fe7bb Properly translate MNT_FORCE flag to Linux umount2(2). Previously
it worked by accident.

MFC after:	2 weeks
Sponsored by:	DARPA
2020-01-20 12:16:32 +00:00
Kyle Evans
05d7dd739c sysent targets: further cleanup and deduplication
r355473 vastly improved the readability and cleanliness of these Makefiles.
Every single one of them follows the same pattern and duplicates the exact
same logic.

Now that we have GENERATED/SRCS, split SRCS up into the two parameters we'll
use for ${MAKESYSCALLS} rather than assuming a specific ordering of SRCS and
include a common sysent.mk to handle the rest. This makes it less tedious to
make sweeping changes.

Some default values are provided for GENERATED/SYSENT_*; almost all of these
just use a 'syscalls.master' and 'syscalls.conf' in cwd, and they all use
effectively the same filenames with an arbitrary prefix. Most ABIs will be
able to get away with just setting GENERATED_PREFIX and including
^/sys/conf/sysent.mk, while others only need light additions. kern/Makefile
is the notable exception, as it doesn't take a SYSENT_CONF and the generated
files are spread out between ^/sys/kern and ^/sys/sys, but it otherwise fits
the pattern enough to use the common version.

Reviewed by:	brooks, imp
Nice!:		emaste
Differential Revision:	https://reviews.freebsd.org/D23197
2020-01-18 20:37:45 +00:00
Mark Johnston
a7e348d7cf Handle a NULL thread pointer in linux_close_file().
This can happen if a file is closed during unix socket GC.  The same bug
was fixed for devfs descriptors in r228361.

PR:		242913
Reported and tested by:	iz-rpi03@hs-karlsruhe.de
Reviewed by:	hselasky, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23178
2020-01-15 15:31:35 +00:00
Edward Tomasz Napierala
9c6eb0f92f Make linux(4) use kern_setsockopt(9) instead of going through
sys_setsockopt.  Just a cleanup; no functional changes.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22812
2020-01-14 11:33:07 +00:00
Edward Tomasz Napierala
dfd060c0b6 Make linux(4) use kern_getsockopt(9) instead of going through
sys_getsockopt().  It's a cleanup; no functional changes.

Reviewed by:	kib (earlier version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22813
2020-01-14 11:30:30 +00:00
Edward Tomasz Napierala
46209ceae5 Make linux getcpu(2) report the domain.
Submitted by:	markj
Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23144
2020-01-14 11:24:06 +00:00
Konstantin Belousov
fedab1b499 Code must not unlock a mutex while owning the thread lock.
Reviewed by:	hselasky, markj
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23150
2020-01-13 14:30:19 +00:00
Edward Tomasz Napierala
ca603bb1ee dd kern_getpriority(), make Linuxulator use it.
Reviewed by:	kib, emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22842
2020-01-12 14:25:44 +00:00
Edward Tomasz Napierala
7a0ef283e6 Add kern_setpriority(), use it in Linuxulator.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22841
2020-01-12 13:38:51 +00:00
Kyle Evans
1171c633fb Set .ORDER for makesyscalls generated files
When either makesyscalls.lua or syscalls.master changes, all of the
${GENERATED} targets are now out-of-date. With make jobs > 1, this means we
will run the makesyscalls script in parallel for the same ABI, generating
the same set of output files.

Prior to r356603 , there is a large window for interlacing output for some
of the generated files that we were generating in-place rather than staging
in a temp dir. After that, we still should't need to run the script more
than once per-ABI as the first invocation should update all of them. Add
.ORDER to do so cleanly.

Reviewed by:	brooks
Discussed with:	sjg
Differential Revision:	https://reviews.freebsd.org/D23099
2020-01-10 18:24:17 +00:00
Mark Johnston
f8091e2c8f linprocfs: Fix some bugs in the maps file implementation.
- Export the offset into the backing object, not the object size.
- Fix a bug where we would print the previous entry's "offset" when a
  map_entry has no object.
- Try to identify shared mappings.  Linux prints "s" when the mapping
  "may be shared".  This attempt is not perfect, for example, we print
  "p" for anonymous memory that may be shared via
  minherit(INHERIT_SHARE).

PR:		240992
Reviewed by:	kib
MFC after:	1 week
MFC note:	no OBJ_ANON in stable/12
Differential Revision:	https://reviews.freebsd.org/D23062
2020-01-08 16:57:08 +00:00
Mateusz Guzik
c8b3463dd0 vfs: reimplement deferred inactive to use a dedicated flag (VI_DEFINACT)
The previous behavior of leaving VI_OWEINACT vnodes on the active list without
a hold count is eliminated. Hold count is kept and inactive processing gets
explicitly deferred by setting the VI_DEFINACT flag. The syncer is then
responsible for vdrop.

Reviewed by:	kib (previous version)
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D23036
2020-01-07 15:56:24 +00:00
Kyle Evans
535b1df993 shm: correct KPI mistake introduced around memfd_create
When file sealing and shm_open2 were introduced, we should have grown a new
kern_shm_open2 helper that did the brunt of the work with the new interface
while kern_shm_open remains the same. Instead, more complexity was
introduced to kern_shm_open to handle the additional features and consumers
had to keep changing in somewhat awkward ways, and a kern_shm_open2 was
added to wrap kern_shm_open.

Backpedal on this and correct the situation- kern_shm_open returns to the
interface it had prior to file sealing being introduced, and neither
function needs an initial_seals argument anymore as it's handled in
kern_shm_open2 based on the shmflags.
2020-01-05 04:06:40 +00:00
Kyle Evans
18348a2369 kern_mmap: add a variant that allows caller to inspect fp
Linux mmap rejects mmap() on a write-only file with EACCES.
linux_mmap_common currently does a fun dance to grab the fp associated with
the passed in fd, validates it, then drops the reference and calls into
kern_mmap(). Doing so is perhaps both fragile and premature; there's still
plenty of chance for the request to get rejected with a more appropriate
error, and it's prone to a race where the file we ultimately mmap has
changed after it drops its referenced.

This change alleviates the need to do this by providing a kern_mmap variant
that allows the caller to inspect the fp just before calling into the fileop
layer. The callback takes flags, prot, and maxprot as one could imagine
scenarios where any of these, in conjunction with the file itself, may
influence a caller's decision.

The file type check in the linux compat layer has been removed; EINVAL is
seemingly not an appropriate response to the file not being a vnode or
device. The fileop layer will reject the operation with ENODEV if it's not
supported, which more closely matches the common linux description of
mmap(2) return values.

If we discover that we're allowing an mmap() on a file type that Linux
normally wouldn't, we should restrict those explicitly.

Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D22977
2020-01-04 23:39:58 +00:00
Mateusz Guzik
b249ce48ea vfs: drop the mostly unused flags argument from VOP_UNLOCK
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21427
2020-01-03 22:29:58 +00:00
Mark Johnston
aa2ad8c1e6 Remove set_page_dirty_lock().
Its use of the page lock is incorrect, and it is not used by the DRM
modules.

Reviewed by:	hselasky
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D23002
2020-01-02 19:29:14 +00:00