CloudABI is a pure capability-based runtime environment for UNIX. It
works similar to Capsicum, except that processes already run in
capabilities mode on startup. All functionality that conflicts with this
model has been omitted, making it a compact binary interface that can be
supported by other operating systems without too much effort.
CloudABI is 'secure by default'; the idea is that it should be safe to
run arbitrary third-party binaries without requiring any explicit
hardware virtualization (Bhyve) or namespace virtualization (Jails). The
rights of an application are purely determined by the set of file
descriptors that you grant it on startup.
The datatypes and constants used by CloudABI's C library (cloudlibc) are
defined in separate files called syscalldefs_mi.h (pointer size
independent) and syscalldefs_md.h (pointer size dependent). We import
these files in sys/contrib/cloudabi and wrap around them in
cloudabi*_syscalldefs.h.
We then add stubs for all of the system calls in sys/compat/cloudabi or
sys/compat/cloudabi64, depending on whether the system call depends on
the pointer size. We only have nine system calls that depend on the
pointer size. If we ever want to support 32-bit binaries, we can simply
add sys/compat/cloudabi32 and implement these nine system calls again.
The next step is to send in code reviews for the individual system call
implementations, but also add a sysentvec, to allow CloudABI executabled
to be started through execve().
More information about CloudABI:
- GitHub: https://github.com/NuxiNL/cloudlibc
- Talk at BSDCan: https://www.youtube.com/watch?v=SVdF84x1EdA
Differential Revision: https://reviews.freebsd.org/D2848
Reviewed by: emaste, brooks
Obtained from: https://github.com/NuxiNL/freebsd
Use the same scheme implemented to manage credentials.
Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.
Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.
When providing memory map information to userland, populate the vnode pointer
for tmpfs files. Set the memory mapping to appear as a vnode type, to match
FreeBSD 9 behavior.
This fixes the use of tmpfs files with the dtrace pid provider,
procstat -v, procfs, linprocfs, pmc (pmcstat), and ptrace (PT_VM_ENTRY).
Submitted by: Eric Badger <eric@badgerio.us> (initial revision)
Obtained from: Dell Inc.
PR: 198431
MFC after: 2 weeks
Reviewed by: jhb
Approved by: kib (mentor)
writes the remaining time into the structure pointed to by rmtp
unless rmtp is NULL. The value of *rmtp can then be used to call
nanosleep() again and complete the specified pause if the previous
call was interrupted.
Note. clock_nanosleep() with an absolute time value does not write
the remaining time.
While here fix whitespaces and typo in SDT_PROBE.
implemented via ioctl interface. First of all return ENOTSUP for this
operation as a cp fallback to usual method in that case. Secondly, do
not print out the message about unimplemented operation.
1. Linux sigset always 64 bit on all platforms. In order to move Linux
sigset code to the linux_common module define it as 64 bit int. Move
Linux sigset manipulation routines to the MI path.
2. Move Linux signal number definitions to the MI path. In general, they
are the same on all platforms except for a few signals.
3. Map Linux RT signals to the FreeBSD RT signals and hide signal conversion
tables to avoid conversion errors.
4. Emulate Linux SIGPWR signal via FreeBSD SIGRTMIN signal which is outside
of allowed on Linux signal numbers.
PR: 197216
argument is not a null pointer, and the ss_flags member pointed to by ss
contains flags other than SS_DISABLE. However, in fact, Linux also
allows SS_ONSTACK flag which is simply ignored.
For buggy apps (at least mono) ignore other than SS_DISABLE
flags as a Linux do.
While here move MI part of sigaltstack code to the appropriate place.
Reported by: abi at abinet dot ru
"highly experimental" remove /dev/shm magic commited
in r218497 and convert tmpfs type to an expected magic number.
Differential Revision: https://reviews.freebsd.org/D1497
Reviewed by: emaste, trasz
as it already zeroed by malloc with M_ZERO flag
and move zeroing to the proper place in exec path.
Differential Revision: https://reviews.freebsd.org/D1462
Reviewed by: trasz
around kqueue() to implement epoll subset of functionality.
The kqueue user data are 32bit on i386 which is not enough for
epoll user data, so we keep user data in the proc emuldata.
Initial patch developed by rdivacky@ in 2007, then extended
by Yuri Victorovich @ r255672 and finished by me
in collaboration with mjg@ and jillies@.
Differential Revision: https://reviews.freebsd.org/D1092
Check wait options as a Linux do.
Linux always set WEXITED option not a WUNTRACED|WNOHANG
which is a strange bug.
Differential Revision: https://reviews.freebsd.org/D1085
Reviewed by: trasz
The AT_EACCESS and AT_SYMLINK_NOFOLLOW flags are actually implemented
within the glibc wrapper function for faccessat(). If either of these
flags are specified, then the wrapper function employs fstatat() to
determine access permissions.
Differential Revision: https://reviews.freebsd.org/D1078
Reviewed by: trasz
thread emuldata to proc emuldata as it was originally intended.
As we can have both 64 & 32 bit Linuxulator running any eventhandler
can be called twice for us. To prevent this move eventhandlers code
from linux_emul.c to the linux_common.ko module.
Differential Revision: https://reviews.freebsd.org/D1073
following primary purposes:
1. Remove the dependency of linsysfs and linprocfs modules from linux.ko,
which will be architecture specific on amd64.
2. Incorporate into linux_common.ko general code for platforms on which
we'll support two Linuxulator modules (for both instruction set - 32 & 64 bit).
3. Move malloc(9) declaration to linux_common.ko, to enable getting memory
usage statistics properly.
Currently linux_common.ko incorporates a code from linux_mib.c and linux_util.c
and linprocfs, linsysfs and linux kernel modules depend on linux_common.ko.
Temporarily remove dtrace garbage from linux_mib.c and linux_util.c
Differential Revision: https://reviews.freebsd.org/D1072
In collaboration with: Vassilis Laganakos.
Reviewed by: trasz
Move struct ipc_perm definition to the MD path as it differs for 64 and
32 bit platform.
Differential Revision: https://reviews.freebsd.org/D1068
Reviewed by: trasz
All fields of type l_int in struct statfs are defined
as l_long on i386 and amd64.
Differential Revision: https://reviews.freebsd.org/D1064
Reviewed by: trasz
exposes functions from kernel with proper DWARF CFI information so that
it becomes easier to unwind through them.
Using vdso is a mandatory for a thread cancelation && cleanup
on a modern glibc.
Differential Revision: https://reviews.freebsd.org/D1060
Use it in linux_wait4() system call and move linux_wait4() to the MI path.
While here add a prototype for the static bsd_to_linux_rusage().
Differential Revision: https://reviews.freebsd.org/D2138
Reviewed by: trasz
of harcoded pr_osrelease, pr_osrel values. This will be used later in
the VDSO.
Differential Revision: https://reviews.freebsd.org/D1042
Reviewed by: trasz
of multiple simultaneous calls to pthread_join() specifying the same
target thread are undefined wake up the one thread.
Differential Revision: https://reviews.freebsd.org/D1040
The reasons:
1. Get rid of the stubs/quirks with process dethreading,
process reparent when the process group leader exits and close
to this problems on wait(), waitpid(), etc.
2. Reuse our kernel code instead of writing excessive thread
managment routines in Linuxulator.
Implementation details:
1. The thread is created via kern_thr_new() in the clone() call with
the CLONE_THREAD parameter. Thus, everything else is a process.
2. The test that the process has a threads is done via P_HADTHREADS
bit p_flag of struct proc.
3. Per thread emulator state data structure is now located in the
struct thread and freed in the thread_dtor() hook.
Mandatory holdig of the p_mtx required when referencing emuldata
from the other threads.
4. PID mangling has changed. Now Linux pid is the native tid
and Linux tgid is the native pid, with the exception of the first
thread in the process where tid and pid are one and the same.
Ugliness:
In case when the Linux thread is the initial thread in the thread
group thread id is equal to the process id. Glibc depends on this
magic (assert in pthread_getattr_np.c). So for system calls that
take thread id as a parameter we should use the special method
to reference struct thread.
Differential Revision: https://reviews.freebsd.org/D1039
threads refactor kern_sched_rr_get_interval() and sys_sched_rr_get_interval().
Add a kern_sched_rr_get_interval() counterpart which takes a targettd
parameter to allow specify target thread directly by callee (new Linuxulator).
Linuxulator temporarily uses first thread in proc.
Move linux_sched_rr_get_interval() to the MI part.
Differential Revision: https://reviews.freebsd.org/D1032
Reviewed by: trasz
threads introduce linux_exit() stub instead of sys_exit() call
(which terminates process).
In the new linuxulator exit() system call terminates the calling
thread (not a whole process).
Differential Revision: https://reviews.freebsd.org/D1027
Reviewed by: trasz
years for head. However, it is continuously misused as the mpsafe argument
for callout_init(9). Deprecate the flag and clean up callout_init() calls
to make them more consistent.
Differential Revision: https://reviews.freebsd.org/D2613
Reviewed by: jhb
MFC after: 2 weeks
allocated from exec_map. If many threads try to perform execve(2) in
parallel, the exec map is exhausted and some threads sleep
uninterruptible waiting for the map space. Then, the thread which won
the race for the space allocation, cannot single-thread the process,
causing deadlock.
Reported and tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
rework. The number of entries was supposed to be returned to the user,
not used as a scratch variable.
This broke RELENG_4 jails starting up on current systems.
argument. This will be used for the Linux emulation layer - for Linux,
PATH_MAX is 4096 and not 1024.
Differential Revision: https://reviews.freebsd.org/D2335
Reviewed by: kib@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation