Commit Graph

3409 Commits

Author SHA1 Message Date
Warner Losh
9275cd0dc5 Implement a workaround for kms-drm modules
pci_iov_if.h was added to pci.h, but none of the kms-drm branches have
that. Rather than play whack a mole with the branches, move its inclusion to
linux_pci.c which is the only part of the code that needs it now.

Longer term, other solutions will be needed, but this gives us time to get those
deployed on all the supported versions.
2020-03-20 15:07:25 +00:00
Konstantin Belousov
2928e60e55 linuxkpi: Add infrastructure to pass FreeBSD IOV method calls into
pci_driver methods.

Reviewed by:	hselasky
Sponsored by:	Mellanox Technologies
MFC after:	2 weeks
2020-03-18 22:10:49 +00:00
Hans Petter Selasky
d845d3dc9a Add support for the device statistics IOCTL, needed by the coming
linux_libusb upgrade.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2020-03-10 15:56:49 +00:00
Tijl Coosemans
b4147bf6b4 Move compat.linux.map_sched_prio sysctl definition to linux_mib.c so it is
only defined by linux_common kernel module and not both linux and linux64
modules.

Reported by:	Yuri Pankov <ypankov@fastmail.com>
2020-03-05 14:41:27 +00:00
Brooks Davis
d718de812f Introduce kern_mmap_req().
This presents an extensible interface to the generic mmap(2)
implementation via a struct pointer intended to use a designated
initializer or compount literal.  We take advantage of the mandatory
zeroing of fields not listed in the initializer.

Remove kern_mmap_fpcheck() and use kern_mmap_req().

The motivation for this change is a desire to keep the core
implementation from growing an ever-increasing number of arguments
that must be specified in the correct order for the lowest-level
implementations.  In CheriBSD we have already added two more arguments.

Reviewed by:	kib
Discussed with:	kevans
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D23164
2020-03-04 21:27:12 +00:00
Hans Petter Selasky
1328771d9d When closing a LinuxKPI file always use the real release function to avoid
resource leakage when destroying a LinuxKPI character device.

Submitted by:	Andrew Boyer <aboyer@pensando.io>
Reviewed by:	kib@
PR:		244572
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-03-03 15:49:34 +00:00
Mateusz Guzik
8d03b99b9d fd: move vnodes out of filedesc into a dedicated structure
The new structure is copy-on-write. With the assumption that path lookups are
significantly more frequent than chdirs and chrooting this is a win.

This provides stable root and jail root vnodes without the need to reference
them on lookup, which in turn means less work on globally shared structures.
Note this also happens to fix a bug where jail vnode was never referenced,
meaning subsequent access on lookup could run into use-after-free.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23884
2020-03-01 21:53:46 +00:00
Tijl Coosemans
f8b9b299a2 linuxulator: Map scheduler priorities to Linux priorities.
On Linux the valid range of priorities for the SCHED_FIFO and SCHED_RR
scheduling policies is [1,99].  For SCHED_OTHER the single valid priority is
0.  On FreeBSD it is [0,31] for all policies.  Programs are supposed to
query the valid range using sched_get_priority_(min|max), but of course some
programs assume the Linux values are valid.

This commit adds a tunable compat.linux.map_sched_prio.  When enabled
sched_get_priority_(min|max) return the Linux values and sched_setscheduler
and sched_(get|set)param translate between FreeBSD and Linux values.

Because there are more Linux levels than FreeBSD levels, multiple Linux
levels map to a single FreeBSD level, which means pre-emption might not
happen as it does on Linux, so the tunable allows to disable this behaviour.
It is enabled by default because I think it is unlikely that anyone runs
real-time software under Linux emulation on FreeBSD that critically relies
on correct pre-emption.

This fixes FMOD, a commercial sound library used by several games.

PR:		240043
Tested by:	Alex S <iwtcex@gmail.com>
Reviewed by:	dchagin
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D23790
2020-03-01 13:12:04 +00:00
Edward Tomasz Napierala
5d481ad8df Make linuxulator warn about unsupported getsockopt/setsockopt flags.
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D23791
2020-02-27 19:40:20 +00:00
Hans Petter Selasky
77632fc70f Extend the range of the return value from nsecs_to_jiffies64() to support
Mesa's drm_syncobj usage, in the LinuxKPI.

While at it optimise the jiffies conversion functions to avoid repeated
and constant calculations.

Submitted by:	Greg V <greg@unrelenting.technology>
Differential Revision:	https://reviews.freebsd.org/D23846
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-02-27 15:21:05 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Emmanuel Vadot
1179b649cf linuxkpi: Move shmem related functions in it's own file
For drmkpi (D23085) we don't want the Linux struct file as we don't emulate
everything. Also the prototypes should be in shmem_fs.h to have 100%
compatibility with Linux.

Reviewed by:	hselasky
MFC after:	Maybe
Differential Revision:	https://reviews.freebsd.org/D23764
2020-02-21 09:28:45 +00:00
Emmanuel Vadot
1a7ba9a01c linuxkpi: Add str_has_prefix
This function test if the string str begins with the string pointed
at by prefix.

Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23767
2020-02-20 17:20:50 +00:00
Emmanuel Vadot
8f0c734385 linuxkpi: Add list_is_first function
This function just test if the element is the first of the list.

Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23766
2020-02-20 17:19:16 +00:00
Mateusz Guzik
65cdfb4caa make sysent for r358172 ("vfs: add realpathat syscall") 2020-02-20 16:58:57 +00:00
Mateusz Guzik
0573d0a9b8 vfs: add realpathat syscall
realpath(3) is used a lot e.g., by clang and is a major source of getcwd
and fstatat calls. This can be done more efficiently in the kernel.

This works by performing a regular lookup while saving the name and found
parent directory. If the terminal vnode is a directory we can resolve it using
usual means. Otherwise we can use the name saved by lookup and resolve the
parent.

See the review for sample syscall counts.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23574
2020-02-20 16:58:19 +00:00
Pawel Biernacki
39a3542bef Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (2 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked). Use it in
preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Reviewed by:	hselasky, kib, zeising
Approved by:	kib (mentor)
Differential Revision:	https://reviews.freebsd.org/D23631
2020-02-15 18:54:59 +00:00
Mateusz Guzik
5af9cdaf8a cloudabi: use new capsicum helpers 2020-02-15 01:29:58 +00:00
Ed Maste
fe16bad415 regen sysent after r357831, r357838
Capability mode changes allowing fdatasync and getloginclass.

Sponsored by:	The FreeBSD Foundation
2020-02-12 19:05:10 +00:00
Edward Tomasz Napierala
0b40dcbe32 Make linux(4) use kern_socketpair(9) instead of going through
sys_socketpair().  It's a cleanup; no functional changes.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22814
2020-02-10 13:24:14 +00:00
Konstantin Belousov
f88c67a625 Regen. 2020-02-09 11:53:37 +00:00
Konstantin Belousov
146fc63fce Add a way to manage thread signal mask using shared word, instead of syscall.
A new syscall sigfastblock(2) is added which registers a uint32_t
variable as containing the count of blocks for signal delivery.  Its
content is read by kernel on each syscall entry and on AST processing,
non-zero count of blocks is interpreted same as the signal mask
blocking all signals.

The biggest downside of the feature that I see is that memory
corruption that affects the registered fast sigblock location, would
cause quite strange application misbehavior. For instance, the process
would be immune to ^C (but killable by SIGKILL).

With consumers (rtld and libthr added), benchmarks do not show a
slow-down of the syscalls in micro-measurements, and macro benchmarks
like buildworld do not demonstrate a difference. Part of the reason is
that buildworld time is dominated by compiler, and clang already links
to libthr. On the other hand, small utilities typically used by shell
scripts have the total number of syscalls cut by half.

The syscall is not exported from the stable libc version namespace on
purpose.  It is intended to be used only by our C runtime
implementation internals.

Tested by:	pho
Disscussed with:	cem, emaste, jilles
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D12773
2020-02-09 11:53:12 +00:00
Konstantin Belousov
8e3d7caee5 linux futex_put(): do not touch futex after dropping our reference.
Reported and tested by:	Steve Roome <me@stephenroome.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-02-07 22:21:44 +00:00
Ed Maste
fc7510aef7 linuxulator: implement sendfile
Submitted by:	Bora Özarslan <borako.ozarslan@gmail.com>
Submitted by:	Yang Wang <2333@outlook.jp>
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19917
2020-02-05 16:53:02 +00:00
Konstantin Belousov
a421e8786b Add sys/systm.h to several places that use vm headers.
It is needed (but not enough) to use e.g. KASSERT() in inline functions.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-02-04 18:56:26 +00:00
Dmitry Chagin
7bc05ae6bb Fix clock_gettime() and clock_getres() for cpu clocks:
- handle the CLOCK_{PROCESS,THREAD}_CPUTIME_ID specified directly;
- fix thread id calculation as in the Linuxulator we should
  convert the user supplied thread id to struct thread * by linux_tdfind();
- fix CPUCLOCK_SCHED case by using kern_{process,thread}_cputime()
  directly as native get_cputime() used by kern_clock_gettime() uses
  native tdfind()/pfind() to find proccess/thread.

PR:			240990
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D23341
MFC after:		2 weeks
2020-02-04 05:27:05 +00:00
Dmitry Chagin
2506c76121 linux_to_native_clockid() properly initializes nwhich variable (or return error),
so don't initialize nwhich in declaration and remove stale comment from r161304.

Reviewed by:		emaste
Differential Revision:	https://reviews.freebsd.org/D23339
MFC after:		2 weeks
2020-02-04 05:23:34 +00:00
Mateusz Guzik
52604ed792 fd: remove the seq argument from fget_unlocked
It is almost always NULL.
2020-02-03 22:27:55 +00:00
Mateusz Guzik
7739d92766 cache: replace kern___getcwd with vn_getcwd
The previous routine was resulting in extra data copies most notably in
linux_getcwd.
2020-02-01 20:38:38 +00:00
Edward Tomasz Napierala
c2d4745705 Add TCP_CORK support to linux(4). This fixes one of the things Nginx
trips over.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23171
2020-01-28 13:57:24 +00:00
Edward Tomasz Napierala
da6d8ae6d8 Add compat.linux.ignore_ip_recverr sysctl. This is a workaround
for missing IP_RECVERR setsockopt(2) support. Without it, DNS
resolution is broken for glibc >= 2.30 (glibc BZ #24047).

From the user point of view this fixes "yum update" on recent
CentOS 8.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23234
2020-01-28 13:51:53 +00:00
Konstantin Belousov
2a3529df1d Provide support for fdevname(3) on linuxkpi-backed devices.
Reported and tested by:	manu
Reviewed by:	hselasky, manu
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23386
2020-01-28 11:22:20 +00:00
Hans Petter Selasky
b9a6330da3 Implement mmget_not_zero() in the LinuxKPI.
Submitted by:	Austin Shafer <ashafer@badland.io>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-01-24 13:05:53 +00:00
Edward Tomasz Napierala
618b55c2e2 Make linux(4) handle MAP_32BIT.
This unbreaks Mono (mono-devel-4.6.2.7+dfsg-1ubuntu1 from Ubuntu Bionic);
previously would crash on "amd64_is_imm32" assert.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23306
2020-01-24 12:08:23 +00:00
Edward Tomasz Napierala
b3fb13eb55 Add kern_unmount() and use in Linuxulator. No functional changes.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22646
2020-01-24 11:57:55 +00:00
Gleb Smirnoff
c57b57da35 Remove comment that no longer describe reality. 2020-01-22 05:32:23 +00:00
Edward Tomasz Napierala
10f2d3f857 Revert r356948; breaks build somehow. 2020-01-21 20:32:49 +00:00
Edward Tomasz Napierala
c5f4e26e7d Make linux(4) handle MAP_32BIT.
This unbreaks Mono (mono-devel-4.6.2.7+dfsg-1ubuntu1 from Ubuntu Bionic);
previously would crash on "amd64_is_imm32" assert.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-01-21 19:19:02 +00:00
Mark Johnston
149afbf3ba Fix 64-bit syscall argument fetching in 32-bit Linux syscall handlers.
The Linux32 system call argument fetcher places each argument (passed in
registers in the Linux x86 system call convention) into an entry in the
generic system call args array.  Each member of this array is 8 bytes
wide, so this approach is broken for system calls that take off_t
arguments.

Fix the problem by splitting l_loff_t arguments in the 32-bit system
call descriptions, the same as we do for FreeBSD32.  Change entry points
to handle this using the PAIR32TO64 macro.

Move linux_ftruncate64() into compat/linux.

PR:		243155
Reported by:	Alex S <iwtcex@gmail.com>
Reviewed by:	kib (previous version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23210
2020-01-21 17:28:22 +00:00
Edward Tomasz Napierala
66632fe7bb Properly translate MNT_FORCE flag to Linux umount2(2). Previously
it worked by accident.

MFC after:	2 weeks
Sponsored by:	DARPA
2020-01-20 12:16:32 +00:00
Kyle Evans
05d7dd739c sysent targets: further cleanup and deduplication
r355473 vastly improved the readability and cleanliness of these Makefiles.
Every single one of them follows the same pattern and duplicates the exact
same logic.

Now that we have GENERATED/SRCS, split SRCS up into the two parameters we'll
use for ${MAKESYSCALLS} rather than assuming a specific ordering of SRCS and
include a common sysent.mk to handle the rest. This makes it less tedious to
make sweeping changes.

Some default values are provided for GENERATED/SYSENT_*; almost all of these
just use a 'syscalls.master' and 'syscalls.conf' in cwd, and they all use
effectively the same filenames with an arbitrary prefix. Most ABIs will be
able to get away with just setting GENERATED_PREFIX and including
^/sys/conf/sysent.mk, while others only need light additions. kern/Makefile
is the notable exception, as it doesn't take a SYSENT_CONF and the generated
files are spread out between ^/sys/kern and ^/sys/sys, but it otherwise fits
the pattern enough to use the common version.

Reviewed by:	brooks, imp
Nice!:		emaste
Differential Revision:	https://reviews.freebsd.org/D23197
2020-01-18 20:37:45 +00:00
Mark Johnston
a7e348d7cf Handle a NULL thread pointer in linux_close_file().
This can happen if a file is closed during unix socket GC.  The same bug
was fixed for devfs descriptors in r228361.

PR:		242913
Reported and tested by:	iz-rpi03@hs-karlsruhe.de
Reviewed by:	hselasky, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23178
2020-01-15 15:31:35 +00:00
Edward Tomasz Napierala
9c6eb0f92f Make linux(4) use kern_setsockopt(9) instead of going through
sys_setsockopt.  Just a cleanup; no functional changes.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22812
2020-01-14 11:33:07 +00:00
Edward Tomasz Napierala
dfd060c0b6 Make linux(4) use kern_getsockopt(9) instead of going through
sys_getsockopt().  It's a cleanup; no functional changes.

Reviewed by:	kib (earlier version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22813
2020-01-14 11:30:30 +00:00
Edward Tomasz Napierala
46209ceae5 Make linux getcpu(2) report the domain.
Submitted by:	markj
Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23144
2020-01-14 11:24:06 +00:00
Konstantin Belousov
fedab1b499 Code must not unlock a mutex while owning the thread lock.
Reviewed by:	hselasky, markj
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23150
2020-01-13 14:30:19 +00:00
Edward Tomasz Napierala
ca603bb1ee dd kern_getpriority(), make Linuxulator use it.
Reviewed by:	kib, emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22842
2020-01-12 14:25:44 +00:00
Edward Tomasz Napierala
7a0ef283e6 Add kern_setpriority(), use it in Linuxulator.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22841
2020-01-12 13:38:51 +00:00
Kyle Evans
1171c633fb Set .ORDER for makesyscalls generated files
When either makesyscalls.lua or syscalls.master changes, all of the
${GENERATED} targets are now out-of-date. With make jobs > 1, this means we
will run the makesyscalls script in parallel for the same ABI, generating
the same set of output files.

Prior to r356603 , there is a large window for interlacing output for some
of the generated files that we were generating in-place rather than staging
in a temp dir. After that, we still should't need to run the script more
than once per-ABI as the first invocation should update all of them. Add
.ORDER to do so cleanly.

Reviewed by:	brooks
Discussed with:	sjg
Differential Revision:	https://reviews.freebsd.org/D23099
2020-01-10 18:24:17 +00:00
Mark Johnston
f8091e2c8f linprocfs: Fix some bugs in the maps file implementation.
- Export the offset into the backing object, not the object size.
- Fix a bug where we would print the previous entry's "offset" when a
  map_entry has no object.
- Try to identify shared mappings.  Linux prints "s" when the mapping
  "may be shared".  This attempt is not perfect, for example, we print
  "p" for anonymous memory that may be shared via
  minherit(INHERIT_SHARE).

PR:		240992
Reviewed by:	kib
MFC after:	1 week
MFC note:	no OBJ_ANON in stable/12
Differential Revision:	https://reviews.freebsd.org/D23062
2020-01-08 16:57:08 +00:00