Commit Graph

126 Commits

Author SHA1 Message Date
Mateusz Guzik
af4051d250 linux: remove the always curthread argument from lconvpath 2021-11-25 22:50:42 +00:00
Dmitry Chagin
0a4b664ae8 linux(4): Add struct clone_args for future clone3 system call.
In preparation for clone3 system call add struct clone_args and use it in
clone implementation.
Move all of clone related bits to the newly created linux_fork.h header.

Differential revision:	https://reviews.freebsd.org/D31474
MFC after:		2 weeks
2021-08-12 11:49:01 +03:00
Edward Tomasz Napierala
30c6d98219 linux: implement sigaltstack(2) on arm64
... by making it machine-independent.

Reviewed By:	dchagin
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D31286
2021-07-27 13:34:49 +00:00
Edward Tomasz Napierala
72f7ddb587 linux: implement rt_sigsuspend(2) on arm64
... by making it architecture-independent.

Reviewed By:	dchagin
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D31259
2021-07-23 20:13:00 +00:00
Dmitry Chagin
79617645c6 linux(4): Retire unused declaration.
MFC after:	2 weeks
2021-06-22 08:41:33 +03:00
John Baldwin
3b57ddb029 Rename linux_set_upcall_kse() to linux_set_upcall().
This matches the rename of cpu_set_upcall_kse() in
5c2cf81845.

Reviewed by:	kib, emaste
MFC after:	1 week
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D29295
2021-03-18 12:14:34 -07:00
Kyle Evans
d42a83b1a9 audit: also correctly audit linux_execve()
Linux execve() gets audited as AUE_EXECVE as well, we should also interpret
the return from this correctly for the same reasoning as in r367002.

MFC with:	r367002
2020-10-26 17:30:17 +00:00
Mateusz Guzik
ed83a56181 i386: clean up empty lines in .c and .h files 2020-09-01 21:19:39 +00:00
Mateusz Guzik
a125ed50a6 linux: add sysctl compat.linux.use_emul_path
This is a step towards facilitating jails with only Linux binaries.
Supporting emul_path adds path lookups which are completely spurious
if the binary at hand runs in a Linux-based root directory.

It defaults to on (== current behavior).

make -C /root/linux-5.3-rc8 -s -j 1 bzImage:

use_emul_path=1: 101.65s user 68.68s system 100% cpu 2:49.62 total
use_emul_path=0: 101.41s user 64.32s system 100% cpu 2:45.02 total
2020-08-18 22:04:22 +00:00
Mateusz Guzik
d5e3895ea4 linux: consistently use LFREEPATH instead of open-coding it 2020-08-18 22:03:55 +00:00
Edward Tomasz Napierala
52c81be11a Add linux_madvise(2) instead of having Linux apps call the native
FreeBSD madvise(2) directly.  While some of the flag values match,
most don't.

PR:		kern/230160
Reported by:	markj
Reviewed by:	markj
Discussed with:	brooks, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25272
2020-06-20 18:29:22 +00:00
Mark Johnston
149afbf3ba Fix 64-bit syscall argument fetching in 32-bit Linux syscall handlers.
The Linux32 system call argument fetcher places each argument (passed in
registers in the Linux x86 system call convention) into an entry in the
generic system call args array.  Each member of this array is 8 bytes
wide, so this approach is broken for system calls that take off_t
arguments.

Fix the problem by splitting l_loff_t arguments in the 32-bit system
call descriptions, the same as we do for FreeBSD32.  Change entry points
to handle this using the PAIR32TO64 macro.

Move linux_ftruncate64() into compat/linux.

PR:		243155
Reported by:	Alex S <iwtcex@gmail.com>
Reviewed by:	kib (previous version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23210
2020-01-21 17:28:22 +00:00
Dmitry Chagin
c5156c7785 Linuxulator depends on a fundamental kernel settings such as SMP. Many
of them listed in opt_global.h which is not generated while building
modules outside of a kernel and such modules never match real cofigured
kernel.

So, we should prevent our users from building obviously defective modules.

Therefore, remove the root cause of the building of modules outside of a
kernel - the possibility of building modules with DEBUG or KTR flags.
And remove all of DEBUG printfs as it is incomplete and in threaded
programms not informative, also a half of system call does not have DEBUG
printf. For debuging Linux programms we have dtrace, ktr and ktrace ability.

PR:		222861
Reviewed by:	trasz
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20178
2019-05-13 18:24:29 +00:00
Dmitry Chagin
7dabf89bcf Linux between 4.18 and 5.0 split IPC system calls.
In preparation for doing this in the Linuxulator modify our linux_shmat()
to match actual Linux shmat() system call.

MFC after:	1 month
2019-03-24 14:44:35 +00:00
Ed Maste
c0aa0e2c27 Sort headers in MD Linuxulator files
Bring #includes closer to style(9) and reduce differences between the
(three) MD versions of linux_machdep.c and linux_sysvec.c.

Sponsored by:	Turing Robotic Industries Inc.
2018-03-23 17:16:36 +00:00
Ed Maste
340f4a8d3e Linuxulator: apply style(9) to return
Sponsored by:	Turing Robotic Industries Inc.
2018-03-12 15:35:24 +00:00
Ed Maste
eae594f7d5 Correct proper nouns in the Linuxulator
- Capitalize Linux
- Spell FreeBSD out in full
- Address some style(9) on changed lines

Sponsored by:	Turing Robotic Industries Inc.
2018-02-22 02:24:17 +00:00
Ed Maste
0ba1b36553 Rationalize license text on Linuxolator files
Many licenses on Linuxolator files contained small variations from the
standard FreeBSD license text.  To avoid license proliferation switch to
the standard 2-clause FreeBSD license for those files where I have
permission from each of the listed copyright holders.  Additional files
waiting on permission from others are listed in review D14210.

Approved by:	kan, marcel, sos, rdivacky
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2018-02-16 15:00:14 +00:00
Ed Maste
132f90c660 Linuxolator whitespace cleanup
A version of each of the MD files by necessity exists for each CPU
architecture supported by the Linuxolator.  Clean these up so that new
architectures do not inherit whitespace issues.

Clean up shared Linuxolator files while here.

Sponsored by:	Turing Robotic Industries Inc.
2018-02-05 17:29:12 +00:00
Pedro F. Giffuni
83ef78be95 sys/i386: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:08:52 +00:00
Edward Tomasz Napierala
ae6b6ef6cb Replace sys_ftruncate() with kern_ftruncate() in various compats.
Reviewed by:	kib@
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9368
2017-01-30 11:50:54 +00:00
Dmitry Chagin
97d06da692 Fix a copy/paste bug introduced during X86_64 Linuxulator work.
FreeBSD support NX bit on X86_64 processors out of the box, for i386 emulation
use READ_IMPLIES_EXEC flag, introduced in r302515.

While here move common part of mmap() and mprotect() code to the files in compat/linux
to reduce code dupcliation between Linuxulator's.

Reported by:    Johannes Jost Meixner, Shawn Webb

MFC after:	1 week
XMFC with:	r302515, r302516
2016-07-10 08:22:04 +00:00
Mateusz Guzik
f6f6d24062 Implement lockless resource limits.
Use the same scheme implemented to manage credentials.

Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.

Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.
2015-06-10 10:48:12 +00:00
Dmitry Chagin
d707582f83 When I merged the lemul branch I missied kib@'s r282708 commit.
This is not the final fix as I need properly cleanup thread resources
before other threads suicide.

Tested by:	Ruslan Makhmatkhanov
2015-05-25 20:44:46 +00:00
Dmitry Chagin
4ab7403bbd Rework signal code to allow using it by other modules, like linprocfs:
1. Linux sigset always 64 bit on all platforms. In order to move Linux
sigset code to the linux_common module define it as 64 bit int. Move
Linux sigset manipulation routines to the MI path.

2. Move Linux signal number definitions to the MI path. In general, they
are the same on all platforms except for a few signals.

3. Map Linux RT signals to the FreeBSD RT signals and hide signal conversion
tables to avoid conversion errors.

4. Emulate Linux SIGPWR signal via FreeBSD SIGRTMIN signal which is outside
of allowed on Linux signal numbers.

PR:		197216
2015-05-24 17:47:20 +00:00
Dmitry Chagin
a7ac457613 According to Linux man sigaltstack(3) shall return EINVAL if the ss
argument is not a null pointer, and the ss_flags member pointed to by ss
contains flags other than SS_DISABLE. However, in fact, Linux also
allows SS_ONSTACK flag which is simply ignored.

For buggy apps (at least mono) ignore other than SS_DISABLE
flags as a Linux do.

While here move MI part of sigaltstack code to the appropriate place.

Reported by:	abi at abinet dot ru
2015-05-24 17:44:08 +00:00
Dmitry Chagin
001398c4c5 To reduce code duplication introduce linux_copyout_rusage() method.
Use it in linux_wait4() system call and move linux_wait4() to the MI path.
While here add a prototype for the static bsd_to_linux_rusage().

Differential Revision:	https://reviews.freebsd.org/D2138
Reviewed by:	trasz
2015-05-24 15:03:09 +00:00
Dmitry Chagin
81338031c4 Switch linuxulator to use the native 1:1 threads.
The reasons:
1. Get rid of the stubs/quirks with process dethreading,
   process reparent when the process group leader exits and close
   to this problems on wait(), waitpid(), etc.
2. Reuse our kernel code instead of writing excessive thread
   managment routines in Linuxulator.

Implementation details:

1. The thread is created via kern_thr_new() in the clone() call with
   the CLONE_THREAD parameter. Thus, everything else is a process.
2. The test that the process has a threads is done via P_HADTHREADS
   bit p_flag of struct proc.
3. Per thread emulator state data structure is now located in the
   struct thread and freed in the thread_dtor() hook.
   Mandatory holdig of the p_mtx required when referencing emuldata
   from the other threads.
4. PID mangling has changed. Now Linux pid is the native tid
   and Linux tgid is the native pid, with the exception of the first
   thread in the process where tid and pid are one and the same.

Ugliness:

   In case when the Linux thread is the initial thread in the thread
   group thread id is equal to the process id. Glibc depends on this
   magic (assert in pthread_getattr_np.c). So for system calls that
   take thread id as a parameter we should use the special method
   to reference struct thread.

Differential Revision:	https://reviews.freebsd.org/D1039
2015-05-24 14:53:16 +00:00
Konstantin Belousov
7b445033ff On exec, single-threading must be enforced before arguments space is
allocated from exec_map.  If many threads try to perform execve(2) in
parallel, the exec map is exhausted and some threads sleep
uninterruptible waiting for the map space.  Then, the thread which won
the race for the space allocation, cannot single-thread the process,
causing deadlock.

Reported and tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-05-10 09:00:40 +00:00
Bjoern A. Zeeb
0a041f3b47 Implement most of timer_{create,settime,gettime,getoverrun,delete}
for amd64/linux32.  Fix the entirely bogus (untested) version from
r161310 for i386/linux using the same shared code in compat/linux.

It is unclear to me if we could support more clock mappings but
the current set allows me to successfully run commercial
32bit linux software under linuxolator on amd64.

Reviewed by:		jhb
Differential Revision:	D784
MFC after:		3 days
Sponsored by:		DARPA, AFRL
2014-09-18 08:36:45 +00:00
Robert Watson
4a14441044 Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

MFC after:	3 weeks
2014-03-16 10:55:57 +00:00
Pawel Jakub Dawidek
7008be5bd7 Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

	struct cap_rights {
		uint64_t	cr_rights[CAP_RIGHTS_VERSION + 2];
	};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

	#define	CAP_PDKILL	CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

	#define	CAP_LOOKUP	CAPRIGHT(0, 0x0000000000000400ULL)
	#define	CAP_FCHMOD	CAPRIGHT(0, 0x0000000000002000ULL)

	#define	CAP_FCHMODAT	(CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

	cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
	void cap_rights_set(cap_rights_t *rights, ...);
	void cap_rights_clear(cap_rights_t *rights, ...);
	bool cap_rights_is_set(const cap_rights_t *rights, ...);

	bool cap_rights_is_valid(const cap_rights_t *rights);
	void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
	void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
	bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

	cap_rights_t rights;

	cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

	#define	cap_rights_set(rights, ...)				\
		__cap_rights_set((rights), __VA_ARGS__, 0ULL)
	void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

	cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by:	The FreeBSD Foundation
2013-09-05 00:09:56 +00:00
Jung-uk Kim
d69a426fce - Implement pipe2 syscall for Linuxulator. This syscall appeared in 2.6.27
but GNU libc used it without checking its kernel version, e. g., Fedora 10.
- Move pipe(2) implementation for Linuxulator from MD files to MI file,
sys/compat/linux/linux_file.c.  There is no MD code for this syscall at all.
- Correct an argument type for pipe() from l_ulong * to l_int *.  Probably
this was the source of MI/MD confusion.

Reviewed by:	emulation
2012-04-16 21:22:02 +00:00
Kip Macy
8451d0dd78 In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
Robert Watson
a9d2f8d84f Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *.  With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by:	re (bz)
Submitted by:	jonathan
Sponsored by:	Google Inc
2011-08-11 12:30:23 +00:00
Dmitry Chagin
222198ab0b Move linux_clone(), linux_fork(), linux_vfork() to a MI path. 2011-02-12 18:17:12 +00:00
Dmitry Chagin
c8d6845e9e In preparation for moving linux_clone() to a MI path
introduce linux_set_upcall_kse().
2011-02-12 16:33:00 +00:00
Dmitry Chagin
2c7660ba3e In preparation for moving linux_clone () to a MI path
move the TLS code in a separate function.

Use function parameter instead of direct using register.
2011-02-12 15:50:21 +00:00
Dmitry Chagin
9adaae9403 The kern_wait() code already removes the SIGCHLD signal for the waited
process. Removing other SIGCHLD signals is not needed and may cause
problems.

Pointed out by:	jilles

MFC after:	1 Month.
2011-01-30 18:17:38 +00:00
Dmitry Chagin
adc7ece00a Implement a variation of the linux_common_wait() which should
be used by linuxolator itself.

Move linux_wait4() to MD path as it requires native struct
rusage translation to struct l_rusage on linux32/amd64.

MFC after:	1 Month.
2011-01-28 18:47:07 +00:00
Dmitry Chagin
a5c1afadeb Add macro to test the sv_flags of any process. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures.

MFC after:	1 month
2011-01-26 20:03:58 +00:00
Alexander Kabaev
60743cbd22 Do not require pos parameter to be zero in MAP_ANONYMOUS mmap requests
in Linux emulation layer. Linux seems to only require that pos is
page-aligned, but otherwise ignores it. Default FreeBSD mmap parameter
checking is too strict to allow some Linux binaries to run. tsMuxeR is
one example of such a binary.

Discussed with:	jhb
MFC after:	1 week
2010-06-10 17:59:47 +00:00
John Baldwin
f12c034874 Fix some problems with effective mmap() offsets > 32 bits. This was
partially fixed on amd64 earlier.  Rather than forcing linux_mmap_common()
to use a 32-bit offset, have it accept a 64-bit file offset.  This offset
is then passed to the real mmap() call.  Rather than inventing a structure
to hold the normal linux_mmap args that has a 64-bit offset, just pass
each of the arguments individually to linux_mmap_common() since that more
closes matches the existing style of various kern_foo() functions.

Submitted by:	Christian Zander @ Nvidia
MFC after:	1 week
2009-10-28 20:17:54 +00:00
John Baldwin
7c020cbbe5 Return ENOSYS instead of EINVAL for invalid function codes to match the
behavior of Linux.

Reported by:	Alexander Best  alexbestms of math.uni-muenster.de
Approved by:	re (kib)
2009-06-26 19:39:33 +00:00
Konstantin Belousov
99b7f1a10b Adapt linux emulation to use cv for vfork wait.
Submitted by:	Takahiro Kurosawa <takahiro.kurosawa gmail com>
PR:	kern/131506
2009-02-18 16:11:39 +00:00
Ed Schouten
ab0d10f68e Several cleanups related to pipe(2).
- Use `fildes[2]' instead of `*fildes' to make more clear that pipe(2)
  fills an array with two descriptors.

- Remove EFAULT from the manual page. Because of the current calling
  convention, pipe(2) raises a segmentation fault when an invalid
  address is passed.

- Introduce kern_pipe() to make it easier for binary emulations to
  implement pipe(2).

- Make Linux binary emulation use kern_pipe(), which means we don't have
  to recover td_retval after calling the FreeBSD system call.

Approved by:	rdivacky
Discussed on:	arch
2008-11-11 14:55:59 +00:00
Jung-uk Kim
865df544c6 Fix Linux mmap with MAP_GROWSDOWN flag.
Reported by:	Andriy Gapon (avg at icyb dot net dot ua)
Tested by:	Andriy Gapon (avg at icyb dot net dot ua)
Pointyhat:	me
MFC after:	3 days
2008-02-11 19:35:03 +00:00
Konstantin Belousov
6259969d36 Implement read_default_ldt in linux_modify_ldt(). It copies out zeroed
descriptor, like real Linux does.

Tested by: Yuriy Tsibizov <yuriy.tsibizov at gmail com>
Submitted by:	rdivacky
MFC after:	1 week
2007-11-26 11:06:19 +00:00
Attilio Rao
52739c2d25 i386_set_ioperm, i386_get_ldt and i386_set_ldt are now MPSAFE
(Giant/sched_lock free) so remove unuseful Giant cruft.

Approved by: jeff
Approved by: re
Sponsorized by: NGX Italy (http://www.ngx.it)
2007-07-20 08:35:18 +00:00
Peter Wemm
79d5bdcca5 Don't add the 'pad' argument to the mmap/truncate/etc syscalls.
Submitted by: kensmith
Approved by: re (kensmith)
2007-07-04 23:06:43 +00:00