freebsd-nq/sys/kern
Ed Schouten 457f7e23b1 Implement CloudABI's exec() call.
Summary:
In a runtime that is purely based on capability-based security, there is
a strong emphasis on how programs start their execution. We need to make
sure that we execute an new program with an exact set of file
descriptors, ensuring that credentials are not leaked into the process
accidentally.

Providing the right file descriptors is just half the problem. There
also needs to be a framework in place that gives meaning to these file
descriptors. How does a CloudABI mail server know which of the file
descriptors corresponds to the socket that receives incoming emails?
Furthermore, how will this mail server acquire its configuration
parameters, as it cannot open a configuration file from a global path on
disk?

CloudABI solves this problem by replacing traditional string command
line arguments by tree-like data structure consisting of scalars,
sequences and mappings (similar to YAML/JSON). In this structure, file
descriptors are treated as a first-class citizen. When calling exec(),
file descriptors are passed on to the new executable if and only if they
are referenced from this tree structure. See the cloudabi-run(1) man
page for more details and examples (sysutils/cloudabi-utils).

Fortunately, the kernel does not need to care about this tree structure
at all. The C library is responsible for serializing and deserializing,
but also for extracting the list of referenced file descriptors. The
system call only receives a copy of the serialized data and a layout of
what the new file descriptor table should look like:

    int proc_exec(int execfd, const void *data, size_t datalen, const int *fds,
              size_t fdslen);

This change introduces a set of fd*_remapped() functions:

- fdcopy_remapped() pulls a copy of a file descriptor table, remapping
  all of the file descriptors according to the provided mapping table.
- fdinstall_remapped() replaces the file descriptor table of the process
  by the copy created by fdcopy_remapped().
- fdescfree_remapped() frees the table in case we aborted before
  fdinstall_remapped().

We then add a function exec_copyin_data_fds() that builds on top these
functions. It copies in the data and constructs a new remapped file
descriptor. This is used by cloudabi_sys_proc_exec().

Test Plan:
cloudabi-run(1) is capable of spawning processes successfully, providing
it data and file descriptors. procstat -f seems to confirm all is good.
Regular FreeBSD processes also work properly.

Reviewers: kib, mjg

Reviewed By: mjg

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3079
2015-07-16 07:05:42 +00:00
..
bus_if.m Add a bus method to fetch the VM domain for the given device/bus. 2014-10-09 05:33:25 +00:00
capabilities.conf Add futimens and utimensat system calls. 2015-01-23 21:07:08 +00:00
clock_if.m
cpufreq_if.m
device_if.m Change the default method for device_quiesce() to return 0 instead of 2015-01-08 21:46:28 +00:00
genassym.sh
imgact_aout.c Implement lockless resource limits. 2015-06-10 10:48:12 +00:00
imgact_binmisc.c At the suggestion of jhb, replace atomic_set/clear calls with use of 2015-06-24 15:52:26 +00:00
imgact_elf32.c
imgact_elf64.c
imgact_elf.c Fix some error-handling bugs when core dump compression is enabled: 2015-07-14 18:24:05 +00:00
imgact_gzip.c Implement lockless resource limits. 2015-06-10 10:48:12 +00:00
imgact_shell.c Allow multiple image activators to run on the same execution by changing 2014-09-04 21:31:25 +00:00
inflate.c
init_main.c Add an initial NUMA affinity/policy configuration for threads and processes. 2015-07-11 15:21:37 +00:00
init_sysent.c Add an initial NUMA affinity/policy configuration for threads and processes. 2015-07-11 15:21:37 +00:00
kern_acct.c acct: create a special plimit object and set it for exiting processes 2013-06-30 19:08:06 +00:00
kern_alq.c Prevent alq from panic when the invalid alq_file path specified. 2014-04-05 16:54:47 +00:00
kern_clock.c Initialize ticks so that it wraps 10 minutes after boot to increase the 2015-02-05 01:43:21 +00:00
kern_clocksource.c Add ddb command 'show clocksource' to display state of the per-cpu 2015-02-04 14:49:47 +00:00
kern_condvar.c Revert r282971. It depends on condvar consumers not destroying condvars 2015-05-21 16:43:26 +00:00
kern_conf.c Fix for out of order device destruction notifications when using the 2015-03-22 13:11:56 +00:00
kern_cons.c CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten 2015-05-22 17:05:21 +00:00
kern_context.c
kern_cpu.c Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
kern_cpuset.c Un-static cpuset_which() - it's useful in other contexts, such as some 2015-06-26 04:14:05 +00:00
kern_ctf.c Don't specify a resid parameter if we're just going to ignore it. Instead, 2015-02-20 20:49:00 +00:00
kern_descrip.c Implement CloudABI's exec() call. 2015-07-16 07:05:42 +00:00
kern_dtrace.c Commit the rest of the changes that were intended to be part of r266826. 2014-05-29 01:42:22 +00:00
kern_dump.c Factor out duplicated code from dumpsys() on each architecture into generic 2015-01-07 01:01:39 +00:00
kern_environment.c Test if 'env' is NULL before doing memset() and strlen(), 2014-10-23 18:23:50 +00:00
kern_et.c Trivial change / forced-commit to document prior change that slipped in 2015-03-16 19:29:19 +00:00
kern_event.c Implement lockless resource limits. 2015-06-10 10:48:12 +00:00
kern_exec.c Implement CloudABI's exec() call. 2015-07-16 07:05:42 +00:00
kern_exit.c Add an initial NUMA affinity/policy configuration for threads and processes. 2015-07-11 15:21:37 +00:00
kern_fail.c Use a regular sbuf + SYSCTL_OUT() rather than sbuf_new_for_sysctl() with 2015-03-16 19:18:45 +00:00
kern_ffclock.c The SYSCTL data pointers can come from userspace and must not be 2014-10-28 12:00:39 +00:00
kern_fork.c Add an initial NUMA affinity/policy configuration for threads and processes. 2015-07-11 15:21:37 +00:00
kern_gzio.c Move zlib.c from net to libkern. 2015-04-22 14:38:58 +00:00
kern_hhook.c Move hhook's per-vnet initialisation to an earlier SYSINIT SI_SUB stage to 2013-06-15 10:08:34 +00:00
kern_idle.c
kern_intr.c Do not use atomic_swap_int(9), it is not available on all 2015-07-15 21:44:16 +00:00
kern_jail.c Move chdir/chroot-related fdp manipulation to kern_descrip.c 2015-07-11 16:19:11 +00:00
kern_khelp.c Cleanup and simplification in khelp_{register|deregister}_helper(). No 2013-06-15 06:45:17 +00:00
kern_kthread.c Ansify another function. This is the last in the file, I hope. 2015-06-28 10:51:08 +00:00
kern_ktr.c Expand ktr_mask to be a 64-bit unsigned integer. 2015-05-22 11:09:41 +00:00
kern_ktrace.c Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
kern_linker.c sysctl: switch sysctllock to a sleepable rmlock 2015-07-04 06:54:15 +00:00
kern_lock.c Revert for r277213: 2015-01-22 11:12:42 +00:00
kern_lockf.c Improve style and fix a possible use-after-free case introduced in r268384 2015-01-10 06:48:35 +00:00
kern_lockstat.c - For kernel compiled only with KDTRACE_HOOKS and not any lock debugging 2013-11-25 07:38:45 +00:00
kern_loginclass.c cred: add proc_set_cred helper 2015-03-16 00:10:03 +00:00
kern_malloc.c The vmem callback to reclaim kmem arena address space on low or 2015-05-09 20:08:36 +00:00
kern_mbuf.c Fix integer truncation bug in malloc(9) 2015-04-01 12:42:26 +00:00
kern_mib.c Huge cleanup of random(4) code. 2015-06-30 17:00:45 +00:00
kern_module.c
kern_mtxpool.c Garbage collect mtxpool_lockbuilder, the mutex pool historically used 2014-05-02 07:57:40 +00:00
kern_mutex.c several lockstat improvements 2015-06-12 10:01:24 +00:00
kern_ntptime.c Use the monotonic (uptime) counter rather than time-of-day to measure elapsed 2015-07-12 18:38:17 +00:00
kern_numa.c Add an initial NUMA affinity/policy configuration for threads and processes. 2015-07-11 15:21:37 +00:00
kern_osd.c Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
kern_physio.c Rewrite physio() to not allocate pbufs for unmapped I/O. 2015-04-21 10:55:53 +00:00
kern_pmc.c Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
kern_poll.c When a kernel has DEVICE_POLLING turned on but no drivers have 2015-04-14 14:22:34 +00:00
kern_priv.c Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
kern_proc.c Implement lockless resource limits. 2015-06-10 10:48:12 +00:00
kern_procctl.c Reparenting done by debugger attach can leave reaper without direct 2015-02-15 08:44:30 +00:00
kern_prot.c Generalised support for copy-on-write structures shared by threads. 2015-06-10 10:43:59 +00:00
kern_racct.c nit: Rename racct_alloc_resource to racct_adjust_resource. 2015-06-14 08:33:14 +00:00
kern_rangelock.c Change the queue of locks in kern_rangelock.c from holding lock requests in 2013-08-15 20:19:17 +00:00
kern_rctl.c Add kern.racct.enable tunable and RACCT_DISABLED config option. 2015-04-29 10:23:02 +00:00
kern_resource.c rlimit: deduplicate code in chg* functions 2015-06-25 00:15:37 +00:00
kern_rmlock.c Add _NEW flag to mtx(9), sx(9), rmlock(9) and rwlock(9). 2014-12-13 21:00:10 +00:00
kern_rwlock.c several lockstat improvements 2015-06-12 10:01:24 +00:00
kern_sdt.c Print a backtrace if the SDT(9) stub gets called so that there's at least 2014-02-22 01:41:45 +00:00
kern_sema.c
kern_sharedpage.c Remove the deprecated VM_ALLOC_RETRY flag for the vm_page_grab(9). 2013-08-22 07:39:53 +00:00
kern_shutdown.c Properly null-terminate strings in a kernel dump header. A version string 2015-05-19 16:23:47 +00:00
kern_sig.c Add missing const keyword to kern_sigaction()'s 'act' parameter. 2015-07-10 14:39:46 +00:00
kern_switch.c Revert for r277213: 2015-01-22 11:12:42 +00:00
kern_sx.c several lockstat improvements 2015-06-12 10:01:24 +00:00
kern_synch.c Remove several write-only variables, all reported by the gcc 4.9 2015-05-29 13:24:17 +00:00
kern_syscalls.c Implement lockless resource limits. 2015-06-10 10:48:12 +00:00
kern_sysctl.c Don't acquire sysctlmemlock in userland_sysctl() when the old value 2015-07-06 16:07:21 +00:00
kern_tc.c Reimplement the ordering requirements for the timehands updates, and 2015-07-08 18:42:08 +00:00
kern_thr.c Add an initial NUMA affinity/policy configuration for threads and processes. 2015-07-11 15:21:37 +00:00
kern_thread.c Add an initial NUMA affinity/policy configuration for threads and processes. 2015-07-11 15:21:37 +00:00
kern_time.c Fix an off by one in ppsratecheck(). If you asked for N=1 you'd get one, 2015-01-11 20:48:29 +00:00
kern_timeout.c Fix my stupid restoral of old code.. must be c_iflags now. 2015-04-14 00:02:39 +00:00
kern_umtx.c Clean up some cosmetic nits in kern_umtx.c, found during recent work 2015-03-28 21:21:40 +00:00
kern_uuid.c Fix a bug in be_uuid_dec(); it called le16dec() instead of be16dec(), 2014-02-13 22:24:36 +00:00
kern_xxx.c
ksched.c
link_elf_obj.c Move zlib.c from net to libkern. 2015-04-22 14:38:58 +00:00
link_elf.c Move zlib.c from net to libkern. 2015-04-22 14:38:58 +00:00
linker_if.m
Make.tags.inc Remove AppleTalk support. 2014-03-14 06:29:43 +00:00
Makefile
makesyscalls.sh Import the CloudABI datatypes and create a system call table. 2015-07-09 07:20:15 +00:00
md4c.c
md5c.c
p1003_1b.c In preparation for switching linuxulator to the use the native 1:1 2015-05-24 14:44:06 +00:00
posix4_mib.c
sched_4bsd.c Add kern.racct.enable tunable and RACCT_DISABLED config option. 2015-04-29 10:23:02 +00:00
sched_ule.c Change the mb() use in the sched_ult tdq_notify() and sched_idletd() 2015-07-10 08:54:12 +00:00
serdev_if.m
stack_protector.c Use nitems() macro instead of __arraycount() 2015-06-16 20:19:00 +00:00
subr_acl_nfs4.c
subr_acl_posix1e.c
subr_autoconf.c
subr_blist.c
subr_bufring.c
subr_bus_dma.c Add bus_dmamap_load_ma() function to load map with the array of 2013-10-27 21:39:16 +00:00
subr_bus.c Huge cleanup of random(4) code. 2015-06-30 17:00:45 +00:00
subr_busdma_bufalloc.c Fix integer truncation bug in malloc(9) 2015-04-01 12:42:26 +00:00
subr_capability.c Remove duplicated includes. 2014-06-26 13:57:44 +00:00
subr_clock.c For architectures where time_t is wide enough, in particular, 64bit 2014-12-12 09:37:18 +00:00
subr_counter.c Create two public UMA_ZONE_PCPU zones: 64 bit sized and pointer sized. 2014-02-10 19:59:46 +00:00
subr_devstat.c Fix multiple incorrect SYSCTL arguments in the kernel: 2014-10-21 07:31:21 +00:00
subr_disk.c
subr_dummy_vdso_tc.c Update the vdso timehands only via tc_windup(). 2015-01-20 03:54:30 +00:00
subr_eventhandler.c
subr_fattime.c Where appropriate, use the modern terms for the one true time base 2014-12-21 05:07:11 +00:00
subr_firmware.c Create a dedicated function for ensuring that cdir and rdir are populated. 2015-07-11 16:22:48 +00:00
subr_hash.c
subr_hints.c Add a new device control utility for new-bus devices called devctl. This 2015-02-06 16:09:01 +00:00
subr_kdb.c Fix multiple incorrect SYSCTL arguments in the kernel: 2014-10-21 07:31:21 +00:00
subr_kobj.c
subr_lock.c Add _NEW flag to mtx(9), sx(9), rmlock(9) and rwlock(9). 2014-12-13 21:00:10 +00:00
subr_log.c
subr_mbpool.c All mbuf external free functions never fail, so let them be void. 2014-07-11 13:58:48 +00:00
subr_mchain.c
subr_module.c Turns out, this isn't only called from i386... 2014-12-30 02:39:47 +00:00
subr_msgbuf.c Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
subr_param.c Remove support for Xen PV domU kernels. Support for HVM domU kernels 2015-04-30 15:48:48 +00:00
subr_pcpu.c Create two public UMA_ZONE_PCPU zones: 64 bit sized and pointer sized. 2014-02-10 19:59:46 +00:00
subr_pctrie.c - Add a new general purpose path-compressed radix trie which can be used 2013-05-12 04:05:01 +00:00
subr_power.c
subr_prf.c Add support for reading MAM attributes to camcontrol(8) and libcam(3). 2015-06-09 21:39:38 +00:00
subr_prof.c The process spin lock currently has the following distinct uses: 2014-11-26 14:10:00 +00:00
subr_rman.c Nuke the never-used RF_TIMESHARE feature, reducing the complexity of the 2014-07-16 22:18:19 +00:00
subr_rtc.c
subr_sbuf.c The minimum sbuf buffer size is 2 bytes (a byte plus a nulterm), assert that. 2015-03-17 21:00:31 +00:00
subr_scanf.c
subr_sfbuf.c Move KASSERT into locked region. 2014-08-11 15:06:07 +00:00
subr_sglist.c Fix a couple of panics when detaching from a cxgbe/cxl interface that was 2015-01-26 16:26:28 +00:00
subr_sleepqueue.c Revert for r277213: 2015-01-22 11:12:42 +00:00
subr_smp.c Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
subr_stack.c
subr_syscall.c Generalised support for copy-on-write structures shared by threads. 2015-06-10 10:43:59 +00:00
subr_taskqueue.c MFuser/delphij/zfs-arc-rebase@r281754: 2015-05-26 01:40:33 +00:00
subr_terminal.c vt(4): Adjust the cursor position after changing the window size 2014-11-01 17:05:15 +00:00
subr_trap.c racct: perform a lockless check for p_throttled 2015-07-13 22:52:11 +00:00
subr_turnstile.c ddb: finish converting boolean values. 2015-05-21 15:16:18 +00:00
subr_uio.c Implement lockless resource limits. 2015-06-10 10:48:12 +00:00
subr_unit.c Move the definition of the struct unrhdr into a separate header file, 2013-08-30 07:37:45 +00:00
subr_vmem.c CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten 2015-05-22 17:05:21 +00:00
subr_witness.c Fix an incorrect assertion in witness. 2015-07-07 19:29:18 +00:00
sys_capability.c cred: add proc_set_cred helper 2015-03-16 00:10:03 +00:00
sys_generic.c Cover a race between doselwakeup() and selfdfree(). If doselwakeup() 2015-07-09 09:22:21 +00:00
sys_pipe.c pipe_direct_write: Fix mismatched pipelock/unlock 2015-07-13 17:45:22 +00:00
sys_procdesc.c Add a new fo_fill_kinfo fileops method to add type-specific information to 2014-09-22 16:20:47 +00:00
sys_process.c Provide vnode in memory map info for files on tmpfs 2015-06-02 18:37:04 +00:00
sys_socket.c In preparation of merging projects/sendfile, transform bare access to 2014-11-12 09:57:15 +00:00
syscalls.c Regenerate syscalls. 2015-07-11 15:22:11 +00:00
syscalls.master Regenerate syscalls. 2015-07-11 15:22:11 +00:00
systrace_args.c Regenerate syscalls. 2015-07-11 15:22:11 +00:00
sysv_ipc.c
sysv_msg.c Add kern.racct.enable tunable and RACCT_DISABLED config option. 2015-04-29 10:23:02 +00:00
sysv_sem.c Add kern.racct.enable tunable and RACCT_DISABLED config option. 2015-04-29 10:23:02 +00:00
sysv_shm.c sysvshm: fix up some whitespace issues and spurious initialisation 2015-07-02 19:14:30 +00:00
tty_compat.c
tty_info.c
tty_inq.c
tty_outq.c
tty_pts.c Implement lockless resource limits. 2015-06-10 10:48:12 +00:00
tty_tty.c tty: replace several curthread->td_proc with stored curproc 2015-07-06 18:53:56 +00:00
tty_ttydisc.c
tty.c filedesc: simplify fget_unlocked & friends 2015-02-17 23:54:06 +00:00
uipc_accf.c The accept filter code is not specific to the FreeBSD IPv4 network stack, 2014-07-26 19:27:34 +00:00
uipc_debug.c Merge from projects/sendfile: 2014-11-30 12:52:33 +00:00
uipc_domain.c CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten 2015-05-22 17:05:21 +00:00
uipc_mbuf2.c Remove a 'This is dumb' comment that has been incorrect for at least a 2015-01-09 12:08:51 +00:00
uipc_mbuf.c Fix leak in tcp_lro_rx. Simply clearing M_PKTHDR isn't enough, any tags 2015-06-30 17:19:58 +00:00
uipc_mbufhash.c Reduce header pollution. 2015-03-17 14:16:50 +00:00
uipc_mqueue.c fd: remove filedesc argument from fdclose 2015-04-11 15:40:28 +00:00
uipc_sem.c fd: remove filedesc argument from fdclose 2015-04-11 15:40:28 +00:00
uipc_shm.c Make KPI of vm_pager_get_pages() more strict: if a pager changes a page 2015-06-12 11:32:20 +00:00
uipc_sockbuf.c Implement lockless resource limits. 2015-06-10 10:48:12 +00:00
uipc_socket.c Fix cleanup race between unp_dispose and unp_gc 2015-07-14 02:00:50 +00:00
uipc_syscalls.c Make KPI of vm_pager_get_pages() more strict: if a pager changes a page 2015-06-12 11:32:20 +00:00
uipc_usrreq.c Fix cleanup race between unp_dispose and unp_gc 2015-07-14 02:00:50 +00:00
vfs_acl.c Replace struct filedesc argument in getvnode with struct thread 2015-06-16 13:09:18 +00:00
vfs_aio.c Mutex memory is not zeroed, add MTX_NEW. 2015-07-06 14:09:00 +00:00
vfs_bio.c Do not allow creation of the dirty buffers for the dead buffer 2015-07-11 11:21:56 +00:00
vfs_cache.c Modify kern___getcwd() to take max pathlen limit as an additional 2015-04-21 13:55:24 +00:00
vfs_cluster.c Remove several write-only variables, all reported by the gcc 4.9 2015-05-29 13:24:17 +00:00
vfs_default.c vfs: use shared vnode locking when looking up ".." in vop_stdvptocnp 2015-07-04 15:46:39 +00:00
vfs_export.c After the changes in r274118 make NOIP kernels compile by hiding an 2014-11-06 12:19:39 +00:00
vfs_extattr.c Replace struct filedesc argument in getvnode with struct thread 2015-06-16 13:09:18 +00:00
vfs_hash.c Convert vfs hash lock from a mutex to an rwlock. 2014-12-30 21:40:45 +00:00
vfs_init.c sysctl: switch sysctllock to a sleepable rmlock 2015-07-04 06:54:15 +00:00
vfs_lookup.c vfs: cosmetic changes to namei and namei_handle_root 2015-07-09 17:17:26 +00:00
vfs_mount.c Vnode is not referenced by the vfs_domount() at the point where 2015-07-02 14:31:47 +00:00
vfs_mountroot.c Remove the no-at variants of the kern_xx() syscall helpers. E.g., we 2014-11-13 18:01:51 +00:00
vfs_subr.c vfs: always clear VI_OWEINACT in consumers bumping v_usecount 2015-07-11 16:28:55 +00:00
vfs_syscalls.c Try to unbreak the build after r285390 removing the obsolete static 2015-07-12 00:26:22 +00:00
vfs_vnops.c Check suspendability on the mountpoint returned by VOP_GETWRITEMOUNT. 2015-07-05 22:37:33 +00:00
vnode_if.src Catch up on r271387 and remove unused parameter from 2015-03-30 22:49:26 +00:00