Commit Graph

3505 Commits

Author SHA1 Message Date
Emmanuel Vadot
4efd5dd70d linuxkpi: Add refcount.h
Implement some refcount functions needed by drm.
Just use the atomic_t struct and functions from linuxkpi for simplicity.

Sponsored-by: The FreeBSD Foundation

Reviewed by:	hselsasky
Differential Revision:	https://reviews.freebsd.org/D24985
2020-05-25 12:44:07 +00:00
Emmanuel Vadot
93d70cd3fe linuxkpi: Add __same_type and __must_be_array macros
The same_type macro simply wraps around builtin_types_compatible_p which
exist for both GCC and CLANG, which returns 1 if both types are the same.
The __must_be_array macros returns 1 if the argument is an array.

This is needed for DRM v5.3

Sponsored-by: The FreeBSD Foundation
Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D24953
2020-05-25 12:42:55 +00:00
Emmanuel Vadot
af300929f9 linuxkpi: Add prandom_u32_max
This is just a wrapper around arc4random_uniform
Needed by DRM v5.3

Sponsored-by: The FreeBSD Foundation
Reviewed by:	cem, hselasky
Differential Revision:	https://reviews.freebsd.org/D24961
2020-05-23 17:52:25 +00:00
Emmanuel Vadot
2491b25c3a linuxkpi: Add rcu_work functions
The rcu_work function helps to queue some work after waiting for a grace
period.
This is needed by DRM drivers.

Sponsored-by: The FreeBSD Foundation
Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D24942
2020-05-21 20:18:38 +00:00
Emmanuel Vadot
eda697d2ef linuxkpi: Add irq_work.h
Since handlers are call in a thread context we can simply use a workqueue
to emulate those functions.
The DRM code was patched to do that already, having it in linuxkpi allows us
to not patch the upstream code.

Sponsored-by: The FreeBSD Foundation
Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D24859
2020-05-19 09:04:35 +00:00
Emmanuel Vadot
5e30a739c7 linuxkpi: add pci_dev_present
pci_dev_present shows if a set of pci ids are present in the system.
It just wraps pci_find_device.
Needed by DRMv5.2

Submitted by:	Austing Shafer (ashafer@badland.io)
Differential Revision:	https://reviews.freebsd.org/D24796
2020-05-19 08:44:33 +00:00
Emmanuel Vadot
ff443195bf linuxkpi: Add __init_waitqueue_head
The only difference with init_waitqueue_head is that the name and the
lock class key are provided but we don't use those so use init_waitqueue_head
directly.

Sponsored-by: The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24861
2020-05-19 08:43:17 +00:00
Emmanuel Vadot
355711ea76 linuxkpi: Add offsetofend macro
This calculate the offset of the end of the member in the given struct.
Needed by DRM in Linux v5.3

Sponsored-by: The FreeBSD Foudation
Differential Revision:	https://reviews.freebsd.org/D24849
2020-05-17 20:14:49 +00:00
Emmanuel Vadot
d003cc4318 linuxkpi: Add __mutex_init
Same as mutex_init, the lock_class_key argument seems to be only used for
debug in Linux, simply ignore it for now.
Needed by DRM in Linux v5.3

Sponsored-by: The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24848
2020-05-17 20:12:16 +00:00
Emmanuel Vadot
7708d3d765 linuxkpi: Add atomic_dec_and_mutex_lock
This function decrement the counter and if the result is 0 it acquires
the mutex and returns 1, if not it simply returns 0.
Needed by DRM from Linux v5.3

Sponsored-by: The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24847
2020-05-17 20:09:11 +00:00
Hans Petter Selasky
cf9f2ca3ef Implement synchronize_srcu_expedited() in the LinuxKPI.
Differential Revision:	https://reviews.freebsd.org/D24798
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-05-16 14:27:50 +00:00
Emmanuel Vadot
cfa985350d linuxkpi: Add EBADRQC to errno.h
This is used in the amdgpu driver from Linux 5.2

Sponsored-by: The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24807
2020-05-13 07:49:12 +00:00
Andriy Gapon
a164a32b4d linuxkpi: print stack trace in WARN_ON macros
Reviewed by:	hselasky, kib
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D24779
2020-05-13 07:47:56 +00:00
Emmanuel Vadot
3d84874da0 linuxkpi: Really add bitmap_alloc and bitmap_zalloc
This was missing in r360870

Sponsored-by: The FreeBSD Foundation
2020-05-10 13:12:05 +00:00
Emmanuel Vadot
ce03b3013f linuxkpi: Add bitmap_alloc and bitmap_free
This is a simple call to kmallock_array/kfree, therefore include linux/slab.h as
this is where the kmalloc_array/kfree definition is.

Sponsored-by: The FreeBSD Foundation
Reviewed by:	hselsasky
Differential Revision:	https://reviews.freebsd.org/D24794
2020-05-10 13:07:00 +00:00
Emmanuel Vadot
26a578697c linuxkpi: Add bitmap_copy and bitmap_andnot
bitmap_copy simply copy the bitmaps, no idea why it exists.
bitmap_andnot is similar to bitmap_and but uses !src2.

Sponsored-by: The FreeBSD Foundation
Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D24782
2020-05-09 17:52:50 +00:00
Emmanuel Vadot
4c27484934 linuxkpi: Add pci_iomap and pci_iounmap
Those function are use to map/unmap io region of a pci device.
Different resource can be mapped depending on the bar so use a
tailq to store them all.

Sponsored-by: The FreeBSD Foundation

Reviewed by:	emaste, hselasky
Differential Revision:	https://reviews.freebsd.org/D24696
2020-05-07 17:00:51 +00:00
Hans Petter Selasky
5e6233ccab Optimise use of sg_page_count() in __sg_page_iter_next() in the LinuxKPI.
No need to compute value twice.

No functional change intended.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-05-04 10:10:07 +00:00
Hans Petter Selasky
fe4b041a14 Implement more scatter and gather functions in the LinuxKPI.
Differential Revision:	https://reviews.freebsd.org/D24611
Submitted by:	ashafer_badland.io (Austin Shafer)
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-05-04 09:58:45 +00:00
Hans Petter Selasky
42f8ef4bf5 Fix warning about sleeping with non-sleepable lock when allocating
"current" from linux_cdev_pager_populate() in the LinuxKPI:

Backtrace:
witness_debugger()
witness_warn()
uma_zalloc_arg()
malloc()
linux_alloc_current()
linux_cdev_pager_populate()
vm_fault()
vm_fault_trap()
trap_pfault()
trap()
calltrap()

Suggested by:	avg@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-05-04 08:05:01 +00:00
Hans Petter Selasky
b4edb17c82 Implement more PCI-express bandwidth functions in the LinuxKPI.
Submitted by:	ashafer_badland.io (Austin Shafer)
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-05-01 10:32:42 +00:00
Hans Petter Selasky
1bbbe083a1 Implement mutex_lock_killable() in the LinuxKPI.
Submitted by:	ashafer_badland.io (Austin Shafer)
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-05-01 10:28:21 +00:00
Hans Petter Selasky
3ff7ec1cc1 Implement DIV64_U64_ROUND_UP() in the LinuxKPI.
Submitted by:	ashafer_badland.io (Austin Shafer)
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-05-01 10:25:07 +00:00
Hans Petter Selasky
922106bf00 Implement more lockdep macros in the LinuxKPI.
Submitted by:	ashafer_badland.io (Austin Shafer)
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-05-01 10:18:07 +00:00
Hans Petter Selasky
61f7fe6b2d Implement kstrtou64() in the LinuxKPI.
Submitted by:	ashafer_badland.io (Austin Shafer)
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-05-01 10:14:45 +00:00
Kyle Evans
2c9c433e17 sysent: re-roll after 360236 (AUE_CLOSERANGE used) 2020-04-24 01:30:33 +00:00
Kyle Evans
3e6b82913d close_range(2): use newly assigned AUE_CLOSERANGE 2020-04-24 01:30:00 +00:00
Hans Petter Selasky
253dbe7487 Factor code in LinuxKPI to allow attach and detach using any BSD device.
This allows non-LinuxKPI based infiniband device drivers to attach
correctly to ibcore.

No functional change intended.

Reviewed by:	np @
Differential Revision:	https://reviews.freebsd.org/D24514
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-04-22 14:33:25 +00:00
Hans Petter Selasky
b9bf16adfb Implement the atomic fetch add unless functions for the LinuxKPI.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-04-20 16:21:37 +00:00
Hans Petter Selasky
fdbfa4f19e Implement aligned LinuxKPI types for u16, u32 and u64.
Makes a difference for 32-bit platforms mostly.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-04-20 14:03:05 +00:00
Hans Petter Selasky
07fdea3672 Allow test_bit() in the LinuxKPI to accept a const pointer.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-04-20 13:47:15 +00:00
Hans Petter Selasky
47c0672b08 Allow the ERR_CAST() function in the LinuxKPI to take a const void pointer.
No functional change.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-04-20 13:36:01 +00:00
Mark Johnston
546c117f86 Remove a vestigal reference to kmem_object.
kmem_object has been an alias of kernel_object for a while.

MFC after:	1 week
2020-04-17 19:12:52 +00:00
Brooks Davis
b24e6ac8b7 Convert canary, execpathp, and pagesizes to pointers.
Use AUXARGS_ENTRY_PTR to export these pointers.  This is a followup to
r359987 and r359988.

Reviewed by:	jhb
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24446
2020-04-16 21:53:17 +00:00
Brooks Davis
9df1c38bbc Export argc, argv, envc, envv, and ps_strings in auxargs.
This simplifies discovery of these values, potentially with reducing the
number of syscalls we need to make at runtime.  Longer term, we wish to
convert the startup process to pass an auxargs pointer to _start() and
use that rather than walking off the end of envv.  This is cleaner,
more C-friendly, and for systems with strong bounds (e.g. CHERI)
necessary.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24407
2020-04-15 20:23:55 +00:00
Brooks Davis
397df744f9 Make ps_strings in struct image_params into a pointer.
This is a prepratory commit for D24407.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA
2020-04-15 20:21:30 +00:00
Brooks Davis
618a20d4f9 Remove bogus use of useracc() in (clock_)nanosleep.
There's no point in pre-checking that we can access the user's rmtp
pointer before we do it in copyout().

While here, improve style(9) compliance.

Reviewed by:	imp
MFC after:	1 week
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24409
2020-04-14 20:53:12 +00:00
Brooks Davis
562894f0dc Centralize compatability translation macros.
Copy the CP, PTRIN, etc macros from freebsd32.h into a sys/abi_compat.h
and replace existing definitation with includes where required. This
eliminates duplicate code and allows Linux and FreeBSD compatability
headers to be included in the same files.

Input from:	cem, jhb
Obtained from:	CheriBSD
MFC after:	2 weeks
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24275
2020-04-14 20:30:48 +00:00
Kyle Evans
e19b97f7a0 sysent: re-roll after r359930 2020-04-14 18:11:26 +00:00
Kyle Evans
7d03e08112 Mark closefrom(2) COMPAT12, reimplement in libc to wrap close_range
Include a temporarily compatibility shim as well for kernels predating
close_range, since closefrom is used in some critical areas.

Reviewed by:	markj (previous version), kib
Differential Revision:	https://reviews.freebsd.org/D24399
2020-04-14 18:07:42 +00:00
Kyle Evans
3d224fc909 sysent: re-roll after introduction of close_range in r359836 2020-04-12 21:23:51 +00:00
Kyle Evans
472ced39ef Implement a close_range(2) syscall
close_range(min, max, flags) allows for a range of descriptors to be
closed. The Python folk have indicated that they would much prefer this
interface to closefrom(2), as the case may be that they/someone have special
fds dup'd to higher in the range and they can't necessarily closefrom(min)
because they don't want to hit the upper range, but relocating them to lower
isn't necessarily feasible.

sys_closefrom has been rewritten to use kern_close_range() using ~0U to
indicate closing to the end of the range. This was chosen rather than
requiring callers of kern_close_range() to hold FILEDESC_SLOCK across the
call to kern_close_range for simplicity.

The flags argument of close_range(2) is currently unused, so any flags set
is currently EINVAL. It was added to the interface in Linux so that future
flags could be added for, e.g., "halt on first error" and things of this
nature.

This patch is based on a syscall of the same design that is expected to be
merged into Linux.

Reviewed by:	kib, markj, vangyzen (all slightly earlier revisions)
Differential Revision:	https://reviews.freebsd.org/D21627
2020-04-12 21:23:19 +00:00
Hans Petter Selasky
eae5868ce9 Clone the RCU interface into a sleepable and a non-sleepable part
in the LinuxKPI.

This allows synchronize RCU to be used inside a SRCU read section.
No functional change intended.

Bump the __FreeBSD_version to force recompilation of external kernel modules.

PR:		242272
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-04-08 17:09:45 +00:00
Hans Petter Selasky
61d82b0794 Some fixes for SRCU in the LinuxKPI.
- Make sure to use READ_ONCE() when deferring variables.
- Remove superfluous zero initializer.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-04-08 16:07:57 +00:00
John Baldwin
59838c1a19 Retire procfs-based process debugging.
Modern debuggers and process tracers use ptrace() rather than procfs
for debugging.  ptrace() has a supserset of functionality available
via procfs and new debugging features are only added to ptrace().
While the two debugging services share some fields in struct proc,
they each use dedicated fields and separate code.  This results in
extra complexity to support a feature that hasn't been enabled in the
default install for several years.

PR:		244939 (exp-run)
Reviewed by:	kib, mjg (earlier version)
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D23837
2020-04-01 19:22:09 +00:00
Mark Johnston
4596ac234e compat/linux/linux.h depends on queue.h since r353725.
Sponsored by:	The FreeBSD Foundation
2020-03-26 17:12:55 +00:00
Warner Losh
9275cd0dc5 Implement a workaround for kms-drm modules
pci_iov_if.h was added to pci.h, but none of the kms-drm branches have
that. Rather than play whack a mole with the branches, move its inclusion to
linux_pci.c which is the only part of the code that needs it now.

Longer term, other solutions will be needed, but this gives us time to get those
deployed on all the supported versions.
2020-03-20 15:07:25 +00:00
Konstantin Belousov
2928e60e55 linuxkpi: Add infrastructure to pass FreeBSD IOV method calls into
pci_driver methods.

Reviewed by:	hselasky
Sponsored by:	Mellanox Technologies
MFC after:	2 weeks
2020-03-18 22:10:49 +00:00
Hans Petter Selasky
d845d3dc9a Add support for the device statistics IOCTL, needed by the coming
linux_libusb upgrade.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2020-03-10 15:56:49 +00:00
Tijl Coosemans
b4147bf6b4 Move compat.linux.map_sched_prio sysctl definition to linux_mib.c so it is
only defined by linux_common kernel module and not both linux and linux64
modules.

Reported by:	Yuri Pankov <ypankov@fastmail.com>
2020-03-05 14:41:27 +00:00
Brooks Davis
d718de812f Introduce kern_mmap_req().
This presents an extensible interface to the generic mmap(2)
implementation via a struct pointer intended to use a designated
initializer or compount literal.  We take advantage of the mandatory
zeroing of fields not listed in the initializer.

Remove kern_mmap_fpcheck() and use kern_mmap_req().

The motivation for this change is a desire to keep the core
implementation from growing an ever-increasing number of arguments
that must be specified in the correct order for the lowest-level
implementations.  In CheriBSD we have already added two more arguments.

Reviewed by:	kib
Discussed with:	kevans
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D23164
2020-03-04 21:27:12 +00:00
Hans Petter Selasky
1328771d9d When closing a LinuxKPI file always use the real release function to avoid
resource leakage when destroying a LinuxKPI character device.

Submitted by:	Andrew Boyer <aboyer@pensando.io>
Reviewed by:	kib@
PR:		244572
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-03-03 15:49:34 +00:00
Mateusz Guzik
8d03b99b9d fd: move vnodes out of filedesc into a dedicated structure
The new structure is copy-on-write. With the assumption that path lookups are
significantly more frequent than chdirs and chrooting this is a win.

This provides stable root and jail root vnodes without the need to reference
them on lookup, which in turn means less work on globally shared structures.
Note this also happens to fix a bug where jail vnode was never referenced,
meaning subsequent access on lookup could run into use-after-free.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23884
2020-03-01 21:53:46 +00:00
Tijl Coosemans
f8b9b299a2 linuxulator: Map scheduler priorities to Linux priorities.
On Linux the valid range of priorities for the SCHED_FIFO and SCHED_RR
scheduling policies is [1,99].  For SCHED_OTHER the single valid priority is
0.  On FreeBSD it is [0,31] for all policies.  Programs are supposed to
query the valid range using sched_get_priority_(min|max), but of course some
programs assume the Linux values are valid.

This commit adds a tunable compat.linux.map_sched_prio.  When enabled
sched_get_priority_(min|max) return the Linux values and sched_setscheduler
and sched_(get|set)param translate between FreeBSD and Linux values.

Because there are more Linux levels than FreeBSD levels, multiple Linux
levels map to a single FreeBSD level, which means pre-emption might not
happen as it does on Linux, so the tunable allows to disable this behaviour.
It is enabled by default because I think it is unlikely that anyone runs
real-time software under Linux emulation on FreeBSD that critically relies
on correct pre-emption.

This fixes FMOD, a commercial sound library used by several games.

PR:		240043
Tested by:	Alex S <iwtcex@gmail.com>
Reviewed by:	dchagin
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D23790
2020-03-01 13:12:04 +00:00
Edward Tomasz Napierala
5d481ad8df Make linuxulator warn about unsupported getsockopt/setsockopt flags.
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D23791
2020-02-27 19:40:20 +00:00
Hans Petter Selasky
77632fc70f Extend the range of the return value from nsecs_to_jiffies64() to support
Mesa's drm_syncobj usage, in the LinuxKPI.

While at it optimise the jiffies conversion functions to avoid repeated
and constant calculations.

Submitted by:	Greg V <greg@unrelenting.technology>
Differential Revision:	https://reviews.freebsd.org/D23846
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-02-27 15:21:05 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Emmanuel Vadot
1179b649cf linuxkpi: Move shmem related functions in it's own file
For drmkpi (D23085) we don't want the Linux struct file as we don't emulate
everything. Also the prototypes should be in shmem_fs.h to have 100%
compatibility with Linux.

Reviewed by:	hselasky
MFC after:	Maybe
Differential Revision:	https://reviews.freebsd.org/D23764
2020-02-21 09:28:45 +00:00
Emmanuel Vadot
1a7ba9a01c linuxkpi: Add str_has_prefix
This function test if the string str begins with the string pointed
at by prefix.

Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23767
2020-02-20 17:20:50 +00:00
Emmanuel Vadot
8f0c734385 linuxkpi: Add list_is_first function
This function just test if the element is the first of the list.

Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23766
2020-02-20 17:19:16 +00:00
Mateusz Guzik
65cdfb4caa make sysent for r358172 ("vfs: add realpathat syscall") 2020-02-20 16:58:57 +00:00
Mateusz Guzik
0573d0a9b8 vfs: add realpathat syscall
realpath(3) is used a lot e.g., by clang and is a major source of getcwd
and fstatat calls. This can be done more efficiently in the kernel.

This works by performing a regular lookup while saving the name and found
parent directory. If the terminal vnode is a directory we can resolve it using
usual means. Otherwise we can use the name saved by lookup and resolve the
parent.

See the review for sample syscall counts.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23574
2020-02-20 16:58:19 +00:00
Pawel Biernacki
39a3542bef Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (2 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked). Use it in
preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Reviewed by:	hselasky, kib, zeising
Approved by:	kib (mentor)
Differential Revision:	https://reviews.freebsd.org/D23631
2020-02-15 18:54:59 +00:00
Mateusz Guzik
5af9cdaf8a cloudabi: use new capsicum helpers 2020-02-15 01:29:58 +00:00
Ed Maste
fe16bad415 regen sysent after r357831, r357838
Capability mode changes allowing fdatasync and getloginclass.

Sponsored by:	The FreeBSD Foundation
2020-02-12 19:05:10 +00:00
Edward Tomasz Napierala
0b40dcbe32 Make linux(4) use kern_socketpair(9) instead of going through
sys_socketpair().  It's a cleanup; no functional changes.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22814
2020-02-10 13:24:14 +00:00
Konstantin Belousov
f88c67a625 Regen. 2020-02-09 11:53:37 +00:00
Konstantin Belousov
146fc63fce Add a way to manage thread signal mask using shared word, instead of syscall.
A new syscall sigfastblock(2) is added which registers a uint32_t
variable as containing the count of blocks for signal delivery.  Its
content is read by kernel on each syscall entry and on AST processing,
non-zero count of blocks is interpreted same as the signal mask
blocking all signals.

The biggest downside of the feature that I see is that memory
corruption that affects the registered fast sigblock location, would
cause quite strange application misbehavior. For instance, the process
would be immune to ^C (but killable by SIGKILL).

With consumers (rtld and libthr added), benchmarks do not show a
slow-down of the syscalls in micro-measurements, and macro benchmarks
like buildworld do not demonstrate a difference. Part of the reason is
that buildworld time is dominated by compiler, and clang already links
to libthr. On the other hand, small utilities typically used by shell
scripts have the total number of syscalls cut by half.

The syscall is not exported from the stable libc version namespace on
purpose.  It is intended to be used only by our C runtime
implementation internals.

Tested by:	pho
Disscussed with:	cem, emaste, jilles
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D12773
2020-02-09 11:53:12 +00:00
Konstantin Belousov
8e3d7caee5 linux futex_put(): do not touch futex after dropping our reference.
Reported and tested by:	Steve Roome <me@stephenroome.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-02-07 22:21:44 +00:00
Ed Maste
fc7510aef7 linuxulator: implement sendfile
Submitted by:	Bora Özarslan <borako.ozarslan@gmail.com>
Submitted by:	Yang Wang <2333@outlook.jp>
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19917
2020-02-05 16:53:02 +00:00
Konstantin Belousov
a421e8786b Add sys/systm.h to several places that use vm headers.
It is needed (but not enough) to use e.g. KASSERT() in inline functions.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-02-04 18:56:26 +00:00
Dmitry Chagin
7bc05ae6bb Fix clock_gettime() and clock_getres() for cpu clocks:
- handle the CLOCK_{PROCESS,THREAD}_CPUTIME_ID specified directly;
- fix thread id calculation as in the Linuxulator we should
  convert the user supplied thread id to struct thread * by linux_tdfind();
- fix CPUCLOCK_SCHED case by using kern_{process,thread}_cputime()
  directly as native get_cputime() used by kern_clock_gettime() uses
  native tdfind()/pfind() to find proccess/thread.

PR:			240990
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D23341
MFC after:		2 weeks
2020-02-04 05:27:05 +00:00
Dmitry Chagin
2506c76121 linux_to_native_clockid() properly initializes nwhich variable (or return error),
so don't initialize nwhich in declaration and remove stale comment from r161304.

Reviewed by:		emaste
Differential Revision:	https://reviews.freebsd.org/D23339
MFC after:		2 weeks
2020-02-04 05:23:34 +00:00
Mateusz Guzik
52604ed792 fd: remove the seq argument from fget_unlocked
It is almost always NULL.
2020-02-03 22:27:55 +00:00
Mateusz Guzik
7739d92766 cache: replace kern___getcwd with vn_getcwd
The previous routine was resulting in extra data copies most notably in
linux_getcwd.
2020-02-01 20:38:38 +00:00
Edward Tomasz Napierala
c2d4745705 Add TCP_CORK support to linux(4). This fixes one of the things Nginx
trips over.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23171
2020-01-28 13:57:24 +00:00
Edward Tomasz Napierala
da6d8ae6d8 Add compat.linux.ignore_ip_recverr sysctl. This is a workaround
for missing IP_RECVERR setsockopt(2) support. Without it, DNS
resolution is broken for glibc >= 2.30 (glibc BZ #24047).

From the user point of view this fixes "yum update" on recent
CentOS 8.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23234
2020-01-28 13:51:53 +00:00
Konstantin Belousov
2a3529df1d Provide support for fdevname(3) on linuxkpi-backed devices.
Reported and tested by:	manu
Reviewed by:	hselasky, manu
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23386
2020-01-28 11:22:20 +00:00
Hans Petter Selasky
b9a6330da3 Implement mmget_not_zero() in the LinuxKPI.
Submitted by:	Austin Shafer <ashafer@badland.io>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-01-24 13:05:53 +00:00
Edward Tomasz Napierala
618b55c2e2 Make linux(4) handle MAP_32BIT.
This unbreaks Mono (mono-devel-4.6.2.7+dfsg-1ubuntu1 from Ubuntu Bionic);
previously would crash on "amd64_is_imm32" assert.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23306
2020-01-24 12:08:23 +00:00
Edward Tomasz Napierala
b3fb13eb55 Add kern_unmount() and use in Linuxulator. No functional changes.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22646
2020-01-24 11:57:55 +00:00
Gleb Smirnoff
c57b57da35 Remove comment that no longer describe reality. 2020-01-22 05:32:23 +00:00
Edward Tomasz Napierala
10f2d3f857 Revert r356948; breaks build somehow. 2020-01-21 20:32:49 +00:00
Edward Tomasz Napierala
c5f4e26e7d Make linux(4) handle MAP_32BIT.
This unbreaks Mono (mono-devel-4.6.2.7+dfsg-1ubuntu1 from Ubuntu Bionic);
previously would crash on "amd64_is_imm32" assert.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-01-21 19:19:02 +00:00
Mark Johnston
149afbf3ba Fix 64-bit syscall argument fetching in 32-bit Linux syscall handlers.
The Linux32 system call argument fetcher places each argument (passed in
registers in the Linux x86 system call convention) into an entry in the
generic system call args array.  Each member of this array is 8 bytes
wide, so this approach is broken for system calls that take off_t
arguments.

Fix the problem by splitting l_loff_t arguments in the 32-bit system
call descriptions, the same as we do for FreeBSD32.  Change entry points
to handle this using the PAIR32TO64 macro.

Move linux_ftruncate64() into compat/linux.

PR:		243155
Reported by:	Alex S <iwtcex@gmail.com>
Reviewed by:	kib (previous version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23210
2020-01-21 17:28:22 +00:00
Edward Tomasz Napierala
66632fe7bb Properly translate MNT_FORCE flag to Linux umount2(2). Previously
it worked by accident.

MFC after:	2 weeks
Sponsored by:	DARPA
2020-01-20 12:16:32 +00:00
Kyle Evans
05d7dd739c sysent targets: further cleanup and deduplication
r355473 vastly improved the readability and cleanliness of these Makefiles.
Every single one of them follows the same pattern and duplicates the exact
same logic.

Now that we have GENERATED/SRCS, split SRCS up into the two parameters we'll
use for ${MAKESYSCALLS} rather than assuming a specific ordering of SRCS and
include a common sysent.mk to handle the rest. This makes it less tedious to
make sweeping changes.

Some default values are provided for GENERATED/SYSENT_*; almost all of these
just use a 'syscalls.master' and 'syscalls.conf' in cwd, and they all use
effectively the same filenames with an arbitrary prefix. Most ABIs will be
able to get away with just setting GENERATED_PREFIX and including
^/sys/conf/sysent.mk, while others only need light additions. kern/Makefile
is the notable exception, as it doesn't take a SYSENT_CONF and the generated
files are spread out between ^/sys/kern and ^/sys/sys, but it otherwise fits
the pattern enough to use the common version.

Reviewed by:	brooks, imp
Nice!:		emaste
Differential Revision:	https://reviews.freebsd.org/D23197
2020-01-18 20:37:45 +00:00
Mark Johnston
a7e348d7cf Handle a NULL thread pointer in linux_close_file().
This can happen if a file is closed during unix socket GC.  The same bug
was fixed for devfs descriptors in r228361.

PR:		242913
Reported and tested by:	iz-rpi03@hs-karlsruhe.de
Reviewed by:	hselasky, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23178
2020-01-15 15:31:35 +00:00
Edward Tomasz Napierala
9c6eb0f92f Make linux(4) use kern_setsockopt(9) instead of going through
sys_setsockopt.  Just a cleanup; no functional changes.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22812
2020-01-14 11:33:07 +00:00
Edward Tomasz Napierala
dfd060c0b6 Make linux(4) use kern_getsockopt(9) instead of going through
sys_getsockopt().  It's a cleanup; no functional changes.

Reviewed by:	kib (earlier version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22813
2020-01-14 11:30:30 +00:00
Edward Tomasz Napierala
46209ceae5 Make linux getcpu(2) report the domain.
Submitted by:	markj
Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23144
2020-01-14 11:24:06 +00:00
Konstantin Belousov
fedab1b499 Code must not unlock a mutex while owning the thread lock.
Reviewed by:	hselasky, markj
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23150
2020-01-13 14:30:19 +00:00
Edward Tomasz Napierala
ca603bb1ee dd kern_getpriority(), make Linuxulator use it.
Reviewed by:	kib, emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22842
2020-01-12 14:25:44 +00:00
Edward Tomasz Napierala
7a0ef283e6 Add kern_setpriority(), use it in Linuxulator.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22841
2020-01-12 13:38:51 +00:00
Kyle Evans
1171c633fb Set .ORDER for makesyscalls generated files
When either makesyscalls.lua or syscalls.master changes, all of the
${GENERATED} targets are now out-of-date. With make jobs > 1, this means we
will run the makesyscalls script in parallel for the same ABI, generating
the same set of output files.

Prior to r356603 , there is a large window for interlacing output for some
of the generated files that we were generating in-place rather than staging
in a temp dir. After that, we still should't need to run the script more
than once per-ABI as the first invocation should update all of them. Add
.ORDER to do so cleanly.

Reviewed by:	brooks
Discussed with:	sjg
Differential Revision:	https://reviews.freebsd.org/D23099
2020-01-10 18:24:17 +00:00
Mark Johnston
f8091e2c8f linprocfs: Fix some bugs in the maps file implementation.
- Export the offset into the backing object, not the object size.
- Fix a bug where we would print the previous entry's "offset" when a
  map_entry has no object.
- Try to identify shared mappings.  Linux prints "s" when the mapping
  "may be shared".  This attempt is not perfect, for example, we print
  "p" for anonymous memory that may be shared via
  minherit(INHERIT_SHARE).

PR:		240992
Reviewed by:	kib
MFC after:	1 week
MFC note:	no OBJ_ANON in stable/12
Differential Revision:	https://reviews.freebsd.org/D23062
2020-01-08 16:57:08 +00:00
Mateusz Guzik
c8b3463dd0 vfs: reimplement deferred inactive to use a dedicated flag (VI_DEFINACT)
The previous behavior of leaving VI_OWEINACT vnodes on the active list without
a hold count is eliminated. Hold count is kept and inactive processing gets
explicitly deferred by setting the VI_DEFINACT flag. The syncer is then
responsible for vdrop.

Reviewed by:	kib (previous version)
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D23036
2020-01-07 15:56:24 +00:00
Kyle Evans
535b1df993 shm: correct KPI mistake introduced around memfd_create
When file sealing and shm_open2 were introduced, we should have grown a new
kern_shm_open2 helper that did the brunt of the work with the new interface
while kern_shm_open remains the same. Instead, more complexity was
introduced to kern_shm_open to handle the additional features and consumers
had to keep changing in somewhat awkward ways, and a kern_shm_open2 was
added to wrap kern_shm_open.

Backpedal on this and correct the situation- kern_shm_open returns to the
interface it had prior to file sealing being introduced, and neither
function needs an initial_seals argument anymore as it's handled in
kern_shm_open2 based on the shmflags.
2020-01-05 04:06:40 +00:00
Kyle Evans
18348a2369 kern_mmap: add a variant that allows caller to inspect fp
Linux mmap rejects mmap() on a write-only file with EACCES.
linux_mmap_common currently does a fun dance to grab the fp associated with
the passed in fd, validates it, then drops the reference and calls into
kern_mmap(). Doing so is perhaps both fragile and premature; there's still
plenty of chance for the request to get rejected with a more appropriate
error, and it's prone to a race where the file we ultimately mmap has
changed after it drops its referenced.

This change alleviates the need to do this by providing a kern_mmap variant
that allows the caller to inspect the fp just before calling into the fileop
layer. The callback takes flags, prot, and maxprot as one could imagine
scenarios where any of these, in conjunction with the file itself, may
influence a caller's decision.

The file type check in the linux compat layer has been removed; EINVAL is
seemingly not an appropriate response to the file not being a vnode or
device. The fileop layer will reject the operation with ENODEV if it's not
supported, which more closely matches the common linux description of
mmap(2) return values.

If we discover that we're allowing an mmap() on a file type that Linux
normally wouldn't, we should restrict those explicitly.

Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D22977
2020-01-04 23:39:58 +00:00
Mateusz Guzik
b249ce48ea vfs: drop the mostly unused flags argument from VOP_UNLOCK
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21427
2020-01-03 22:29:58 +00:00