Commit Graph

3342 Commits

Author SHA1 Message Date
Edward Tomasz Napierala
518cce0274 Don't use K&R definitions. No functional changes.
Reported by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-12-16 17:45:15 +00:00
Jeff Roberson
686bcb5c14 schedlock 4/4
Don't hold the scheduler lock while doing context switches.  Instead we
unlock after selecting the new thread and switch within a spinlock
section leaving interrupts and preemption disabled to prevent local
concurrency.  This means that mi_switch() is entered with the thread
locked but returns without.  This dramatically simplifies scheduler
locking because we will not hold the schedlock while spinning on
blocked lock in switch.

This change has not been made to 4BSD but in principle it would be
more straightforward.

Discussed with:	markj
Reviewed by:	kib
Tested by:	pho
Differential Revision: https://reviews.freebsd.org/D22778
2019-12-15 21:26:50 +00:00
Jeff Roberson
61a74c5ccd schedlock 1/4
Eliminate recursion from most thread_lock consumers.  Return from
sched_add() without the thread_lock held.  This eliminates unnecessary
atomics and lock word loads as well as reducing the hold time for
scheduler locks.  This will eventually allow for lockless remote adds.

Discussed with:	kib
Reviewed by:	jhb
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22626
2019-12-15 21:11:15 +00:00
Conrad Meyer
482f0c0255 Revert r355760, r355759
And remove the inline/deprecated attribute use entirely in stdlib.h, from
r355747.  The intent was to provide a buildable API transitionary period, but
clearly that was counter-productive.

Reported by:	delphij, imp, others
2019-12-15 17:33:26 +00:00
Conrad Meyer
9d4710591b linuxkpi: Drop incompatible __deprecated definition
Probably all of these linuxkpi stubs should be '#ifndef' guarded, but maybe
that would prevent people from noticing when they are defined.

Introduced in r355759.  For some reason I only ran a buildworld and not a
kernel.  Mea culpa.

Reported by:	Mark Millard
X-MFC-with:	r355759
2019-12-14 23:39:32 +00:00
Edward Tomasz Napierala
cf69fe66d4 Add sync_file_range(2) implementation to linux(4); it's a thin wrapper
over the usual fsync(2).

This silences some warnings when running "apt-get upgrade".

Reviewed by:	brooks, emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22371
2019-12-14 13:37:17 +00:00
Edward Tomasz Napierala
34ad5ac242 Add kern_kill() and use it in Linuxulator. It's just a cleanup,
no functional changes.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22645
2019-12-13 18:44:02 +00:00
Edward Tomasz Napierala
be2cfdbc86 Add kern_getsid() and use it in Linuxulator; no functional changes.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22647
2019-12-13 18:39:36 +00:00
John Baldwin
d8010b1175 Copy out aux args after the argument and environment vectors.
Partially revert r354741 and r354754 and go back to allocating a
fixed-size chunk of stack space for the auxiliary vector.  Keep
sv_copyout_auxargs but change it to accept the address at the end of
the environment vector as an input stack address and no longer
allocate room on the stack.  It is now called at the end of
copyout_strings after the argv and environment vectors have been
copied out.

This should fix a regression in r354754 that broke the stack alignment
for newer Linux amd64 binaries (and probably broke Linux arm64 as
well).

Reviewed by:	kib
Tested on:	amd64 (native, linux64 (only linux-base-c7), and i386)
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22695
2019-12-09 19:17:28 +00:00
Brooks Davis
af796bfa71 sysent: Reduce duplication and improve readability.
Use the power of variable to avoid spelling out source and generated
files too many times.  The previous Makefiles were hard to read, hard to
edit, and badly formatted.

Reviewed by:	kevans, emaste
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D22714
2019-12-06 23:59:23 +00:00
John Baldwin
31174518d2 Use uintptr_t instead of register_t * for the stack base.
- Use ustringp for the location of the argv and environment strings
  and allow destp to travel further down the stack for the stackgap
  and auxv regions.
- Update the Linux copyout_strings variants to move destp down the
  stack as was done for the native ABIs in r263349.
- Stop allocating a space for a stack gap in the Linux ABIs.  This
  used to hold translated system call arguments, but hasn't been used
  since r159992.

Reviewed by:	kib
Tested on:	md64 (amd64, i386, linux64), i386 (i386, linux)
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22501
2019-12-03 23:17:54 +00:00
Jeff Roberson
4504268a1b Fix the last few cases that grab without busy or valid. The grab functions must
return the page in some held state for consistency elsewhere.

Reviewed by:	alc, kib, markj
Differential Revision:	https://reviews.freebsd.org/D22610
2019-12-02 22:38:25 +00:00
Vladimir Kondratyev
71b8e362c5 Linux epoll: Allow passing of any negative timeout value to epoll_wait
Linux epoll allow passing of any negative timeout value to epoll_wait()
to cause unbound blocking

Reviewed by:	emaste
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D22517
2019-11-24 20:51:09 +00:00
Vladimir Kondratyev
335fe0afb8 Linux epoll: Register events with zero event mask
Such an events are legal and should be interpreted as EPOLLERR | EPOLLHUP.
Register a disabled kqueue event in that case as we do not support EPOLLHUP yet.

Required by Linux Steam client.

PR:		240590
Reported by:	Alex S <iwtcex@gmail.com>
Reviewed by:	emaste
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D22516
2019-11-24 20:47:40 +00:00
Vladimir Kondratyev
461120b834 Linux epoll: Check both read and write kqueue events existence in EPOLL_CTL_ADD
Linux epoll EPOLL_CTL_ADD op handler should always check registration
of both EVFILT_READ and EVFILT_WRITE kevents to deceide if supplied
file descriptor fd is already registered with epoll instance.

Reviewed by:	emaste
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D22515
2019-11-24 20:44:14 +00:00
Vladimir Kondratyev
896a4c279d Linux epoll: Don't deregister file descriptor after EPOLLONESHOT is fired
Linux epoll does not remove descriptor after one-shot event has been triggered.
Set EV_DISPATCH kqueue flag rather then EV_ONESHOT to get the same behavior.

Required by Linux Steam client.

PR:		240590
Reported by:	Alex S <iwtcex@gmail.com>
Reviewed by:	emaste, imp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D22513
2019-11-24 20:41:47 +00:00
Mateusz Guzik
4de1818baf linux: avoid overhead of P_CONTROLT checks if possible
Sponsored by:	The FreeBSD Foundation
2019-11-20 12:06:29 +00:00
Kyle Evans
4cc12fb848 sysent: regenerate after r354835
The lua-based makesyscalls produces slightly different output than its
makesyscalls.sh predecessor, all whitespace differences more closely
matching the source syscalls.master.
2019-11-18 23:31:12 +00:00
Kyle Evans
f22a592111 Convert in-tree sysent targets to use new makesyscalls.lua
flua is bootstrapped as part of the build for those on older
versions/revisions that don't yet have flua installed. Once upgraded past
r354833, "make sysent" will again naturally work as expected.

Reviewed by:	brooks
Differential Revision:	https://reviews.freebsd.org/D21894
2019-11-18 23:28:23 +00:00
John Baldwin
03b0d68c72 Check for errors from copyout() and suword*() in sv_copyout_args/strings.
Reviewed by:	brooks, kib
Tested on:	amd64 (amd64, i386, linux64), i386 (i386, linux)
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22401
2019-11-18 20:07:43 +00:00
David Bright
2d5603fe65 Jail and capability mode for shm_rename; add audit support for shm_rename
Co-mingling two things here:

  * Addressing some feedback from Konstantin and Kyle re: jail,
    capability mode, and a few other things
  * Adding audit support as promised.

The audit support change includes a partial refresh of OpenBSM from
upstream, where the change to add shm_rename has already been
accepted. Matthew doesn't plan to work on refreshing anything else to
support audit for those new event types.

Submitted by:	Matthew Bryan <matthew.bryan@isilon.com>
Reviewed by:	kib
Relnotes:	Yes
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22083
2019-11-18 13:31:16 +00:00
Edward Tomasz Napierala
dfe91e5e34 Make linux(4) open(2)/openat(2) return ELOOP instead of EMLINK,
when being passed O_NOFOLLOW.  This fixes LTP testcase openat02:5.

Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22384
2019-11-18 10:19:16 +00:00
John Baldwin
e353233118 Add a sv_copyout_auxargs() hook in sysentvec.
Change the FreeBSD ELF ABIs to use this new hook to copyout ELF auxv
instead of doing it in the sv_fixup hook.  In particular, this new
hook allows the stack space to be allocated at the same time the auxv
values are copied out to userland.  This allows us to avoid wasting
space for unused auxv entries as well as not having to recalculate
where the auxv vector is by walking back up over the argv and
environment vectors.

Reviewed by:	brooks, emaste
Tested on:	amd64 (amd64 and i386 binaries), i386, mips, mips64
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22355
2019-11-15 18:42:13 +00:00
Edward Tomasz Napierala
299cb52a80 Support O_CLOEXEC in linux(4) open(2) and openat(2).
Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21966
2019-11-15 16:21:46 +00:00
Brooks Davis
96c914ee97 Tidy syscall declerations.
Pointer arguments should be of the form "<type> *..." and not "<type>* ...".

No functional change.

Reviewed by:	kevans
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D22373
2019-11-14 17:11:52 +00:00
Olivier Houchard
2b2cde807c linprocfs: Make sure to report -1 as tty when we have no controlling tty.
When reporting a process' stats, we can't just provide the tty as an
unsigned long, as if we have no controlling tty, the tty would be NODEV, or
-1. Instaed, just special-case NODEV.

Submitted by:	Juraj Lutter <otis@sk.FreeBSD.org>
MFC after:	1 week
2019-11-11 00:21:05 +00:00
Ed Maste
01b9ee4c50 linux_renameat2: improve flag checks
In the cases where Linux returns an error (e.g. passing in an undefined
flag) there's no need for us to emit a message.  (The target of this
message is a developer working on the linuxulatorm, not the author of
presumably broken Linux software).

Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21606
2019-11-07 15:51:44 +00:00
Edward Tomasz Napierala
044ab55e41 Make linux(4) create /dev/shm. Linux applications often expect
a tmpfs to be mounted there, and because they like to verify it's
actually a mountpoint, a symlink won't do.

Reviewed by:	dchagin (earlier version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20333
2019-11-06 20:53:33 +00:00
Hans Petter Selasky
76354fa40f Enable device class group attributes in the LinuxKPI.
Bump the __FreeBSD_version to force recompilation of
external kernel modules due to structure change.

Differential Revision:	https://reviews.freebsd.org/D21564
Submitted by:	Greg V <greg@unrelenting.technology>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-11-04 14:19:09 +00:00
Ryan Stone
92a15f946b Add missing M_NOWAIT flag
The LinuxKPI linux_dma code calls PCTRIE_INSERT with a
mutex held, but does not set M_NOWAIT when allocating
nodes, leading to a potential panic.  All of this code
can handle an allocation failure here, so prefer an
allocation failure to sleeping on memory.

Also fix a related case where NOWAIT/WAITOK was not
specified.  In this case it's not clear whether sleeping
is allowed so be conservative and assume not.  There are
a lot of other paths in this code that can fail due to
a lack of memory anyway.

Differential Revision: https://reviews.freebsd.org/D22127
Reviewed by: imp
Sponsored by: Dell EMC Isilon
MFC After: 1 week
2019-10-23 17:20:20 +00:00
Yuri Pankov
a161fba992 linux: futex_mtx should follow futex_list
Move futex_mtx to linux_common.ko for amd64 and aarch64 along
with respective list/mutex init/destroy.

PR:		240989
Reported by:	Alex S <iwtcex@gmail.com>
2019-10-18 12:25:33 +00:00
Yuri Pankov
b9d3556a34 linux: provide just one instance of futex_list
Move futex_list definition to linux.c which is included once
in linux.ko (i386) and in linux_common.ko (amd64 and aarch64)
allowing 32/64 bit linux programs to access the same futexes
in the latter case.

PR:		240989
Reviewed by:	dchagin
Differential Revision:	https://reviews.freebsd.org/D22073
2019-10-18 10:28:08 +00:00
Hans Petter Selasky
427c402de2 Fix missing epochification of the LinuxKPI after r353292.
Sponsored by:	Mellanox Technologies
2019-10-15 11:14:14 +00:00
Jeff Roberson
0012f373e4 (4/6) Protect page valid with the busy lock.
Atomics are used for page busy and valid state when the shared busy is
held.  The details of the locking protocol and valid and dirty
synchronization are in the updated vm_page.h comments.

Reviewed by:    kib, markj
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:        https://reviews.freebsd.org/D21594
2019-10-15 03:45:41 +00:00
Doug Moore
2288078c5e Define macro VM_MAP_ENTRY_FOREACH for enumerating the entries in a vm_map.
In case the implementation ever changes from using a chain of next pointers,
then changing the macro definition will be necessary, but changing all the
files that iterate over vm_map entries will not.

Drop a counter in vm_object.c that would have an effect only if the
vm_map entry count was wrong.

Discussed with: alc
Reviewed by: markj
Tested by: pho (earlier version)
Differential Revision:	https://reviews.freebsd.org/D21882
2019-10-08 07:14:21 +00:00
Brooks Davis
544e6c96b7 Regen after r347228 and r352693.
No functional change.
2019-09-30 21:00:19 +00:00
Pawel Biernacki
ea2609a490 linux_renameat2: don't add extra \n on error.
linux_msg() already adds \n at the end of all messages.

Reported by:	emaste, kib (mentor), mjg (mentor)
Reviewed by:	kib (mentor), mjg (mentor)
Differential Revision:	https://reviews.freebsd.org/D21852
2019-09-30 19:05:14 +00:00
David Bright
c4571256af sysent: regenerate after r352747.
Sponsored by:	Dell EMC Isilon
2019-09-26 15:41:10 +00:00
David Bright
9afb12bab4 Add an shm_rename syscall
Add an atomic shm rename operation, similar in spirit to a file
rename. Atomically unlink an shm from a source path and link it to a
destination path. If an existing shm is linked at the destination
path, unlink it as part of the same atomic operation. The caller needs
the same permissions as shm_unlink to the shm being renamed, and the
same permissions for the shm at the destination which is being
unlinked, if it exists. If those fail, EACCES is returned, as with the
other shm_* syscalls.

truss support is included; audit support will come later.

This commit includes only the implementation; the sysent-generated
bits will come in a follow-on commit.

Submitted by:	Matthew Bryan <matthew.bryan@isilon.com>
Reviewed by:	jilles (earlier revision)
Reviewed by:	brueffer (manpages, earlier revision)
Relnotes:	yes
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D21423
2019-09-26 15:32:28 +00:00
Kyle Evans
618d66a56f compat/freebsd32: restore style after r352705 (no functional change)
The escaped newlines haven't been necessary since r339624, but this file has
not been reformatted. Restore the style.
2019-09-25 18:48:05 +00:00
Kyle Evans
a9ac5e1424 sysent: regenerate after r352705
This also implements it, fixes kdump, and removes no longer needed bits from
lib/libc/sys/shm_open.c for the interim.
2019-09-25 18:09:19 +00:00
Kyle Evans
234879a7e3 Mark shm_open(2) as COMPAT12, succeeded by shm_open2
Implementation and regenerated files will follow.
2019-09-25 18:06:48 +00:00
Kyle Evans
460211e730 sysent: regenerate after r352700 2019-09-25 17:59:58 +00:00
Kyle Evans
20f7057685 Add a shm_open2 syscall to support upcoming memfd_create
shm_open2 allows a little more flexibility than the original shm_open.
shm_open2 doesn't enforce CLOEXEC on its callers, and it has a separate
shmflag argument that can be expanded later. Currently the only shmflag is
to allow file sealing on the returned fd.

shm_open and memfd_create will both be implemented in libc to use this new
syscall.

__FreeBSD_version is bumped to indicate the presence.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D21393
2019-09-25 17:59:15 +00:00
Kyle Evans
0cd95859c8 [2/3] Add an initial seal argument to kern_shm_open()
Now that flags may be set on posixshm, add an argument to kern_shm_open()
for the initial seals. To maintain past behavior where callers of
shm_open(2) are guaranteed to not have any seals applied to the fd they're
given, apply F_SEAL_SEAL for existing callers of kern_shm_open. A special
flag could be opened later for shm_open(2) to indicate that sealing should
be allowed.

We currently restrict initial seals to F_SEAL_SEAL. We cannot error out if
F_SEAL_SEAL is re-applied, as this would easily break shm_open() twice to a
shmfd that already existed. A note's been added about the assumptions we've
made here as a hint towards anyone wanting to allow other seals to be
applied at creation.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D21392
2019-09-25 17:35:03 +00:00
Kyle Evans
d19f028e33 sysent: regenerate after r352693 2019-09-25 17:30:28 +00:00
Kyle Evans
85c5f3cb57 Add COMPAT12 support to makesyscalls.sh
Reviewed by:	kib, imp, brooks (all without syscalls.master edits)
Differential Revision:	https://reviews.freebsd.org/D21366
2019-09-25 17:29:45 +00:00
Tijl Coosemans
55258ab0ff Create a "drm" subdirectory for drm devices in linsysfs. Recent versions of
linux libdrm check for the existence of this directory:

https://cgit.freedesktop.org/mesa/drm/commit/?id=f8392583418aef5e27bfed9989aeb601e20cc96d

MFC after:	2 weeks
2019-09-23 12:27:55 +00:00
Ed Maste
2eb6ef203a linux: add trivial renameat2 implementation
Just return EINVAL if flags != 0.  The Linux man page documents one
case of EINVAL as "The filesystem does not support one of the flags in
flags."

After r351723 userland binaries will try using new system calls.

Reported by:	mjg
Reviewed by:	mjg, trasz
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21590
2019-09-11 13:01:59 +00:00
Hans Petter Selasky
4c8ba7d94f Use true and false when dealing with bool type in the LinuxKPI.
No functional change.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-09-11 08:24:47 +00:00