freebsd-nq/sys/kern
John Baldwin 6bc1e9cd84 Rework the lifetime management of the kernel implementation of POSIX
semaphores.  Specifically, semaphores are now represented as new file
descriptor type that is set to close on exec.  This removes the need for
all of the manual process reference counting (and fork, exec, and exit
event handlers) as the normal file descriptor operations handle all of
that for us nicely.  It is also suggested as one possible implementation
in the spec and at least one other OS (OS X) uses this approach.

Some bugs that were fixed as a result include:
- References to a named semaphore whose name is removed still work after
  the sem_unlink() operation.  Prior to this patch, if a semaphore's name
  was removed, valid handles from sem_open() would get EINVAL errors from
  sem_getvalue(), sem_post(), etc.  This fixes that.
- Unnamed semaphores created with sem_init() were not cleaned up when a
  process exited or exec'd.  They were only cleaned up if the process
  did an explicit sem_destroy().  This could result in a leak of semaphore
  objects that could never be cleaned up.
- On the other hand, if another process guessed the id (kernel pointer to
  'struct ksem' of an unnamed semaphore (created via sem_init)) and had
  write access to the semaphore based on UID/GID checks, then that other
  process could manipulate the semaphore via sem_destroy(), sem_post(),
  sem_wait(), etc.
- As part of the permission check (UID/GID), the umask of the proces
  creating the semaphore was not honored.  Thus if your umask denied group
  read/write access but the explicit mode in the sem_init() call allowed
  it, the semaphore would be readable/writable by other users in the
  same group, for example.  This includes access via the previous bug.
- If the module refused to unload because there were active semaphores,
  then it might have deregistered one or more of the semaphore system
  calls before it noticed that there was a problem.  I'm not sure if
  this actually happened as the order that modules are discovered by the
  kernel linker depends on how the actual .ko file is linked.  One can
  make the order deterministic by using a single module with a mod_event
  handler that explicitly registers syscalls (and deregisters during
  unload after any checks).  This also fixes a race where even if the
  sem_module unloaded first it would have destroyed locks that the
  syscalls might be trying to access if they are still executing when
  they are unloaded.

  XXX: By the way, deregistering system calls doesn't do any blocking
  to drain any threads from the calls.
- Some minor fixes to errno values on error.  For example, sem_init()
  isn't documented to return ENFILE or EMFILE if we run out of semaphores
  the way that sem_open() can.  Instead, it should return ENOSPC in that
  case.

Other changes:
- Kernel semaphores now use a hash table to manage the namespace of
  named semaphores nearly in a similar fashion to the POSIX shared memory
  object file descriptors.  Kernel semaphores can now also have names
  longer than 14 chars (up to MAXPATHLEN) and can include subdirectories
  in their pathname.
- The UID/GID permission checks for access to a named semaphore are now
  done via vaccess() rather than a home-rolled set of checks.
- Now that kernel semaphores have an associated file object, the various
  MAC checks for POSIX semaphores accept both a file credential and an
  active credential.  There is also a new posixsem_check_stat() since it
  is possible to fstat() a semaphore file descriptor.
- A small set of regression tests (using the ksem API directly) is present
  in src/tools/regression/posixsem.

Reported by:	kris (1)
Tested by:	kris
Reviewed by:	rwatson (lightly)
MFC after:	1 month
2008-06-27 05:39:04 +00:00
..
bus_if.m Implement a BUS_BIND_INTR() method in the bus interface to bind an IRQ 2008-03-20 21:24:32 +00:00
clock_if.m
cpufreq_if.m
device_if.m
genassym.sh refactor code so it can run in a chroot without having to have /dev/mounted 2008-01-18 17:02:14 +00:00
imgact_aout.c VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in 2008-01-13 14:44:15 +00:00
imgact_elf32.c
imgact_elf64.c
imgact_elf.c Go back to using the process command name (p_comm) for the file name and 2008-05-15 03:07:34 +00:00
imgact_gzip.c VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in 2008-01-13 14:44:15 +00:00
imgact_shell.c
inflate.c
init_main.c In keeping with style(9)'s recommendations on macros, use a ';' 2008-03-16 10:58:09 +00:00
init_sysent.c Add code to allow the system to handle multiple routing tables. 2008-05-09 23:03:00 +00:00
kern_acct.c VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in 2008-01-13 14:44:15 +00:00
kern_alq.c In keeping with style(9)'s recommendations on macros, use a ';' 2008-03-16 10:58:09 +00:00
kern_clock.c Implement per-cpu callout threads, wheels, and locks. 2008-04-02 11:20:30 +00:00
kern_condvar.c - Pass the priority argument from *sleep() into sleepq and down into 2008-03-12 06:31:06 +00:00
kern_conf.c Struct cdev is always the member of the struct cdev_priv. When devfs 2008-06-16 17:34:59 +00:00
kern_context.c Further system call comment cleanup: 2007-03-05 13:10:58 +00:00
kern_cpu.c Fix a few edge cases with error handling in cpufreq(4)'s CPUFREQ_GET() 2008-05-05 19:13:52 +00:00
kern_cpuset.c Take into account possible overflow when multiplying. The casuality is 2008-05-26 10:01:13 +00:00
kern_ctf.c Add the CTF source file which gets shared with link_elf.c and link_elf_obj.c. 2008-05-23 03:04:27 +00:00
kern_descrip.c Rework the lifetime management of the kernel implementation of POSIX 2008-06-27 05:39:04 +00:00
kern_dtrace.c Remove code that isn't required. It actually breaks the case where KDTRACE_HOOKS 2008-06-16 04:44:29 +00:00
kern_environment.c Merge first in a series of TrustedBSD MAC Framework KPI changes 2007-10-24 19:04:04 +00:00
kern_event.c Kqueue_scan() may sleep when encountered the influx knotes. On the other 2008-05-10 11:37:05 +00:00
kern_exec.c Add DTrace 'proc' provider probes using the Statically Defined Trace 2008-05-24 06:22:16 +00:00
kern_exit.c Add DTrace 'proc' provider probes using the Statically Defined Trace 2008-05-24 06:22:16 +00:00
kern_fork.c Add DTrace 'proc' provider probes using the Statically Defined Trace 2008-05-24 06:22:16 +00:00
kern_idle.c In keeping with style(9)'s recommendations on macros, use a ';' 2008-03-16 10:58:09 +00:00
kern_intr.c - Make SCHED_STATS more generic by adding a wrapper to create the 2008-04-17 04:20:10 +00:00
kern_jail.c Revert rev. 178124 as requested by kris@. Having jail id not being 2008-06-19 21:41:57 +00:00
kern_kthread.c Document the kproc_kthread_add() call 2008-04-29 22:43:15 +00:00
kern_ktr.c Remove slightly oddly placed suser() call from the KTR/ALQ setup sysctl: 2006-09-09 16:09:01 +00:00
kern_ktrace.c This patch adds a new ktrace(2) record type, KTR_STRUCT, whose payload 2008-02-23 01:01:49 +00:00
kern_linker.c Add the ctf_get function and update the args to linker_file_function_listall. 2008-05-23 07:08:59 +00:00
kern_lock.c The "if" semantic is not needed, just fix this. 2008-05-25 16:11:27 +00:00
kern_lockf.c Re-implement the client side of rpc.lockd in the kernel. This implementation 2008-06-26 10:21:54 +00:00
kern_malloc.c Add support for the DTrace malloc provider which can enable probes 2008-05-23 00:43:36 +00:00
kern_mbuf.c Reintroduce UMA_SLAB_KMAP; however, change its spelling to 2008-04-04 18:41:12 +00:00
kern_mib.c Make sysctl_kern_arnd return a random buffer instead of a random long, 2008-02-17 16:44:48 +00:00
kern_module.c In keeping with style(9)'s recommendations on macros, use a ';' 2008-03-16 10:58:09 +00:00
kern_mtxpool.c Universally adopt most conventional spelling of acquire. 2007-05-27 20:50:23 +00:00
kern_mutex.c Add KASSERT()'s to catch attempts to recurse on spin mutexes that aren't 2008-02-13 23:39:05 +00:00
kern_ntptime.c In keeping with style(9)'s recommendations on macros, use a ';' 2008-03-16 10:58:09 +00:00
kern_physio.c
kern_pmc.c Kernel and hwpmc(4) support for callchain capture. 2007-12-07 08:20:17 +00:00
kern_poll.c In keeping with style(9)'s recommendations on macros, use a ';' 2008-03-16 10:58:09 +00:00
kern_priv.c Add __FBSDID() tag. 2008-03-07 15:27:08 +00:00
kern_proc.c Add DTrace 'proc' provider probes using the Statically Defined Trace 2008-05-24 06:22:16 +00:00
kern_prot.c Merge first in a series of TrustedBSD MAC Framework KPI changes 2007-10-24 19:04:04 +00:00
kern_resource.c Remove extra uihold() call that accidentally sneak in during perforce 2008-03-19 07:52:07 +00:00
kern_rmlock.c Expand lock class with the "virtual" function lc_assert which will offer 2007-11-18 14:43:53 +00:00
kern_rwlock.c Improve a comment which, in the actual CVS stock, doesn't completely 2008-05-27 00:27:50 +00:00
kern_sdt.c Add kernel support for the Statically Defined Trace provider. 2008-05-18 19:32:36 +00:00
kern_sema.c
kern_shutdown.c In keeping with style(9)'s recommendations on macros, use a ';' 2008-03-16 10:58:09 +00:00
kern_sig.c Add DTrace 'proc' provider probes using the Statically Defined Trace 2008-05-24 06:22:16 +00:00
kern_subr.c - Make SCHED_STATS more generic by adding a wrapper to create the 2008-04-17 04:20:10 +00:00
kern_switch.c fix typo in runz_fuzz 2008-05-12 06:42:06 +00:00
kern_sx.c - Embed the recursion counter for any locking primitive directly in the 2008-05-15 20:10:06 +00:00
kern_synch.c - Make SCHED_STATS more generic by adding a wrapper to create the 2008-04-17 04:20:10 +00:00
kern_syscalls.c
kern_sysctl.c Add sysctl_rename_oid() to support device_set_unit() usage. Otherwise, 2007-11-30 21:29:08 +00:00
kern_tc.c In keeping with style(9)'s recommendations on macros, use a ';' 2008-03-16 10:58:09 +00:00
kern_thr.c Fix compiling problem. 2008-04-29 05:48:05 +00:00
kern_thread.c - Make SCHED_STATS more generic by adding a wrapper to create the 2008-04-17 04:20:10 +00:00
kern_time.c Make sure reading td_runtime in critical section since thread may be 2008-01-18 13:00:28 +00:00
kern_timeout.c - Correct a major error introduced in the per-cpu timeout commit. Sleep 2008-04-06 11:08:49 +00:00
kern_umtx.c Add two commands to _umtx_op system call to allow a simple mutex to be 2008-06-24 07:32:12 +00:00
kern_uuid.c Correct typo. 2007-04-23 12:53:00 +00:00
kern_xxx.c Someone cut and pasted a bunch of stuff here so lots of 2008-06-26 22:45:04 +00:00
ksched.c Commit 14/14 of sched_lock decomposition. 2007-06-05 00:00:57 +00:00
link_elf_obj.c Enforce the mapping of kernel loadable modules in the uppermost 2GB of the 2008-06-20 06:24:34 +00:00
link_elf.c Add hooks for the Compact C Type Format (CTF) data to be attached to 2008-05-23 00:49:39 +00:00
linker_if.m Add the ctf_get method. 2008-05-23 04:06:49 +00:00
Make.tags.inc Remove netatm from HEAD as it is not MPSAFE and relies on the now removed 2008-05-25 22:11:40 +00:00
Makefile style.Makefile(5) 2007-12-14 21:30:51 +00:00
makesyscalls.sh Generate another function for the DTrace syscall provider to specify 2008-03-27 01:53:44 +00:00
md4c.c
md5c.c
p1003_1b.c Remove kernel support for M:N threading. 2008-03-12 10:12:01 +00:00
posix4_mib.c Fix mispatch of includes list; allows my kernel to build successfully. 2006-11-12 03:34:03 +00:00
sched_4bsd.c Add the vtime (virtual time) hooks for DTrace. 2008-05-25 01:44:58 +00:00
sched_ule.c Add the vtime (virtual time) hooks for DTrace. 2008-05-25 01:44:58 +00:00
serdev_if.m
stack_protector.c Fix a chicken-and-egg problem: this files implements SSP support, 2008-06-26 07:52:45 +00:00
subr_acl_posix1e.c Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in 2007-06-12 00:12:01 +00:00
subr_autoconf.c In keeping with style(9)'s recommendations on macros, use a ';' 2008-03-16 10:58:09 +00:00
subr_blist.c add malloc flag to blist so that it can be used in ithread context 2008-05-05 19:48:54 +00:00
subr_bus.c Split out the probing magic of device_probe_and_attach into 2008-06-20 16:58:15 +00:00
subr_clist.c Move TTY unrelated bits out of <sys/tty.h>. 2008-05-23 16:06:35 +00:00
subr_clock.c Now that all platforms use genclock, shuffle things around slightly 2008-04-22 19:38:30 +00:00
subr_devstat.c
subr_disk.c Add a new I/O request - BIO_FLUSH, which basically tells providers below to 2006-10-31 21:11:21 +00:00
subr_eventhandler.c In keeping with style(9)'s recommendations on macros, use a ';' 2008-03-16 10:58:09 +00:00
subr_fattime.c Better naming of fattime conversion functions, they do convert to timespec 2006-10-24 10:27:23 +00:00
subr_firmware.c Do image loading in a context known to have a root directory: 2008-04-09 19:07:48 +00:00
subr_hints.c
subr_kdb.c Expand kdb_alt_break a little, most commonly used with the option 2008-05-04 23:29:38 +00:00
subr_kobj.c
subr_lock.c - Embed the recursion counter for any locking primitive directly in the 2008-05-15 20:10:06 +00:00
subr_log.c In keeping with style(9)'s recommendations on macros, use a ';' 2008-03-16 10:58:09 +00:00
subr_mbpool.c Add parens around *free in *free++ in mbp_count() so that mbp_count() 2007-05-27 17:38:36 +00:00
subr_mchain.c Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT. 2008-03-25 09:39:02 +00:00
subr_module.c
subr_msgbuf.c
subr_param.c - Export HZ value via kern.hz sysctl (this is the same name as for the 2008-05-09 07:42:02 +00:00
subr_pcpu.c generally we are interested in what thread did something as 2007-11-14 06:21:24 +00:00
subr_power.c
subr_prf.c Instead of doing comparisons using the pcpu area to see if 2007-03-08 06:44:34 +00:00
subr_prof.c In keeping with style(9)'s recommendations on macros, use a ';' 2008-03-16 10:58:09 +00:00
subr_rman.c Complete removal of restriction about overlaps to rman_manage_region: 2007-04-28 07:37:49 +00:00
subr_rtc.c Now that all platforms use genclock, shuffle things around slightly 2008-04-22 19:38:30 +00:00
subr_sbuf.c
subr_scanf.c
subr_sleepqueue.c - Make SCHED_STATS more generic by adding a wrapper to create the 2008-04-17 04:20:10 +00:00
subr_smp.c Allow a rendezvous with just a specified CPU too. 2008-05-23 04:05:26 +00:00
subr_stack.c When a symbol name can't be resolved, return "??" as the name, rather 2007-12-03 14:44:35 +00:00
subr_taskqueue.c Use kthread_exit() to terminate a taskqueue thread rather than kproc_exit() 2008-04-11 17:35:54 +00:00
subr_trap.c - Make SCHED_STATS more generic by adding a wrapper to create the 2008-04-17 04:20:10 +00:00
subr_turnstile.c - Make SCHED_STATS more generic by adding a wrapper to create the 2008-04-17 04:20:10 +00:00
subr_unit.c Since cdev mutex is after system map mutex in global lock order, free() 2007-07-04 06:56:58 +00:00
subr_witness.c - Embed the recursion counter for any locking primitive directly in the 2008-05-15 20:10:06 +00:00
sys_generic.c - Remove stale comment. 2008-03-19 07:33:16 +00:00
sys_pipe.c Another problem caused by the knlist_cleardel() potentially dropping 2008-05-23 11:14:03 +00:00
sys_process.c - Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from 2008-03-19 06:19:01 +00:00
sys_socket.c Add code to allow the system to handle multiple routing tables. 2008-05-09 23:03:00 +00:00
syscalls.c Add code to allow the system to handle multiple routing tables. 2008-05-09 23:03:00 +00:00
syscalls.master Add code to allow the system to handle multiple routing tables. 2008-05-09 23:03:00 +00:00
systrace_args.c Add code to allow the system to handle multiple routing tables. 2008-05-09 23:03:00 +00:00
sysv_ipc.c Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in 2007-06-12 00:12:01 +00:00
sysv_msg.c Merge first in a series of TrustedBSD MAC Framework KPI changes 2007-10-24 19:04:04 +00:00
sysv_sem.c Renew semaphore's pointer after wakeup since during msleep 2008-06-19 18:08:42 +00:00
sysv_shm.c Make sure we restrict Linux only IPC calls from being executed 2008-02-12 20:55:03 +00:00
tty_compat.c
tty_conf.c
tty_cons.c Move TTY unrelated bits out of <sys/tty.h>. 2008-05-23 16:06:35 +00:00
tty_pts.c In keeping with style(9)'s recommendations on macros, use a ';' 2008-03-16 10:58:09 +00:00
tty_pty.c In keeping with style(9)'s recommendations on macros, use a ';' 2008-03-16 10:58:09 +00:00
tty_tty.c Remove unneeded Giant locking of /dev/tty. 2008-06-03 12:38:00 +00:00
tty.c Rev. 1.274 put the ttyrel() call before the destroy_dev() in the 2008-05-23 16:47:55 +00:00
uipc_accf.c
uipc_cow.c Give MEXTADD() another argument to make both void pointers to the 2008-02-01 19:36:27 +00:00
uipc_debug.c Add missing sb_sndptr* fields to db_print_sockbuf(). 2008-01-03 15:19:31 +00:00
uipc_domain.c In keeping with style(9)'s recommendations on macros, use a ';' 2008-03-16 10:58:09 +00:00
uipc_mbuf2.c Merge first in a series of TrustedBSD MAC Framework KPI changes 2007-10-24 19:04:04 +00:00
uipc_mbuf.c Attempt to make the print types more friendly to other architectures. 2008-04-30 20:00:30 +00:00
uipc_mqueue.c - Use vget() to lock the vnode rather than refing without a lock and 2008-03-29 23:30:40 +00:00
uipc_sem.c Rework the lifetime management of the kernel implementation of POSIX 2008-06-27 05:39:04 +00:00
uipc_shm.c Rework the lifetime management of the kernel implementation of POSIX 2008-06-27 05:39:04 +00:00
uipc_sockbuf.c Update the kernel to count the number of mbufs and clusters 2008-05-15 20:18:44 +00:00
uipc_socket.c Add code to allow the system to handle multiple routing tables. 2008-05-09 23:03:00 +00:00
uipc_syscalls.c When sendto(2) is called with an explicit destination address 2008-05-22 07:18:54 +00:00
uipc_usrreq.c Move unlock of global UNIX domain socket lock slightly lower in 2008-01-18 19:16:03 +00:00
vfs_acl.c Add the support for the AT_FDCWD and fd-relative name lookups to the 2008-03-31 12:01:21 +00:00
vfs_aio.c Use minimum of max_aio_procs and target_aio_procs when spawning new 2008-06-21 11:34:34 +00:00
vfs_bio.c b_waiters cannot be adequately protected by the interlock because it is 2008-03-28 12:30:12 +00:00
vfs_cache.c - Use LK_TYPE_MASK where needed. Actually after sys/sys/lockmgr.h:1.69 it is 2008-04-09 20:19:55 +00:00
vfs_cluster.c - Complete part of the unfinished bufobj work by consistently using 2008-03-22 09:15:16 +00:00
vfs_default.c Move the head of byte-level advisory lock list from the 2008-04-16 11:33:32 +00:00
vfs_export.c Provide the mutual exclusion between the nfs export list modifications 2008-06-09 10:31:38 +00:00
vfs_extattr.c Add the support for the AT_FDCWD and fd-relative name lookups to the 2008-03-31 12:01:21 +00:00
vfs_hash.c In keeping with style(9)'s recommendations on macros, use a ';' 2008-03-16 10:58:09 +00:00
vfs_init.c Remove VFS_VPTOFH entirely. API is already broken and it is good time to 2007-02-16 17:32:41 +00:00
vfs_lookup.c Implement the linux syscalls 2008-04-08 09:45:49 +00:00
vfs_mount.c Provide the mutual exclusion between the nfs export list modifications 2008-06-09 10:31:38 +00:00
vfs_subr.c Be more friendly for DDB pager. 2008-05-18 21:08:12 +00:00
vfs_syscalls.c If S_IFIFO is passed to mknod(2), invoke kern_mkfifoat(9) to create a 2008-06-22 21:51:32 +00:00
vfs_vnops.c Add the support for the O_EXEC open(2) mode, as specified by the 2008-03-31 11:57:18 +00:00
vnode_if.src Add the new kernel-mode NFS Lock Manager. To use it instead of the 2008-03-26 15:23:12 +00:00