freebsd-nq/cddl
Alexander Motin 6d03c5504b MFV r316875: 7336 vfork and O_CLOEXEC causes zfs_mount EBUSY
illumos/illumos-gate@873c4903a5
873c4903a5

https://www.illumos.org/issues/7336
  We can run into a problem where we call into zfs_mount, which in turn calls
  is_dir_empty, which opens the directory to try and make sure it's empty. The
  issue with the current approach is that it holds the directory open while it
  traverses it with readdir, which, due to subtle interaction with the Java JVM,
  vfork, and exec can cause a tricky race condition resulting in zfs_mount
  failures.
  The approach to resolving the issue in this patch is to drop the usage of
  readdir altogether, and instead rely on the fact that ZFS stores the number of
  entries contained in a directory using the st_size field of the stat structure.
  Thus, if the directory in question is a ZFS directory, we can check to see if
  it's empty by calling stat() and inspecting the st_size field of structure
  returned.
  ===============================================================================
  The root cause appears to be an interesting race between vfork, exec, and
  zfs_mount's usage of O_CLOEXEC when calling openat. Here's what is going on:
  1. We call zfs_mount, and this in turn calls openat to check if the directory
  is empty, which results in opening the directory we're trying to mount onto,
  and increment v_count.
  2. As we're in the middle of reading the directory, vfork is called by the JVM
  and proceeds to exec the jspawnhelper utility. As a result of the vfork, we
  take an additional hold on the directory, which increments v_count a second
  time. The semantics of vfork mean the parent process will wait for the child
  process to exit or exec before the parent can continue; at this point the
  parent is in the middle of zfs_mount, reading the directory to determine if
  it's empty or not.
  3. The child process exec-ing jspawnhelper gets to the relvm call within
  exec_args (which is called by exec_common). relvm is the function that releases
  the parent process, allowing the parent to proceed. The problem is, at this
  point of calling relvm, the child hasn't yet called close_exec which is
  responsible for closing the file descriptors inherited from the parent process

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Prakash Surya <prakash.surya@delphix.com>
2018-02-20 20:26:48 +00:00
..
compat/opensolaris Honor all options of "zfs mount -o" 2017-09-05 19:28:35 +00:00
contrib/opensolaris MFV r316875: 7336 vfork and O_CLOEXEC causes zfs_mount EBUSY 2018-02-20 20:26:48 +00:00
lib Add inline to errno.d for translating int to string 2018-02-16 04:22:29 +00:00
sbin DIRDEPS_BUILD: Update dependencies. 2017-10-31 00:07:04 +00:00
tests Merge ^/user/ngie/release-pkg-fix-tests to unbreak how test files are installed 2016-05-04 23:20:53 +00:00
usr.bin Add basic tests for ctfconvert(1), fold(1) and rs(1) 2017-11-27 20:01:58 +00:00
usr.sbin Optimize zfsd for the happy case 2018-02-15 21:30:30 +00:00
Makefile Convert traditional ${MK_TESTS} conditional idiom for including test 2017-08-02 08:35:51 +00:00
Makefile.inc Make DTrace stuff compile with C99 standard. 2014-08-22 20:04:51 +00:00