FreeBSD src
Go to file
Alexander Motin 272f833cde MFV r328227: 8909 8585 can cause a use-after-free kernel panic
illumos/illumos-gate@94ddd0900a

https://www.illumos.org/issues/8909:
There's a race condition that exists if `zil_free_lwb` races with either
`zil_commit_waiter_timeout` and/or `zil_lwb_flush_vdevs_done`.

Here's an example panic due to this bug:

> ::status
    debugging crash dump vmcore.0 (64-bit) from ip-10-110-205-40
    operating system: 5.11 dlpx-5.2.2.0_2017-12-04-17-28-32b6ba51fb (i86pc)
    image uuid: 4af0edfb-e58e-6ed8-cafc-d3e9167c7513
    panic message:
    BAD TRAP: type=e (#pf Page fault) rp=ffffff0010555970 addr=60 occurred in mo
dule "zfs" due to a NULL pointer dereference
    dump content: kernel pages only

> $c
    zio_shrink+0x12()
    zil_lwb_write_issue+0x30d(ffffff03dcd15cc0, ffffff03e0730e20)
    zil_commit_waiter_timeout+0xa2(ffffff03dcd15cc0, ffffff03d97ffcf8)
    zil_commit_waiter+0xf3(ffffff03dcd15cc0, ffffff03d97ffcf8)
    zil_commit+0x80(ffffff03dcd15cc0, 9a9)
    zfs_write+0xc34(ffffff03dc38b140, ffffff0010555e60, 40, ffffff03e00fb758, 0)
    fop_write+0x5b(ffffff03dc38b140, ffffff0010555e60, 40, ffffff03e00fb758, 0)
    write+0x250(42, fffffd7ff4832000, 2000)
    sys_syscall+0x177()

If there's an outstanding lwb that's in `zil_commit_waiter_timeout`
waiting to timeout, waiting on it's waiter's CV, we must be sure not to
call `zil_free_lwb`. If we end up calling `zil_free_lwb`, then that LWB
may be freed and can result in a use-after-free situation where the
stale lwb pointer stored in the `zil_commit_waiter_t` structure of the
thread waiting on the waiter's CV is used.

A similar situation can occur if an lwb is issued to disk, and thus in
the `LWB_STATE_ISSUED` state, and `zil_free_lwb` is called while the
disk is servicing that lwb. In this situation, the lwb will be freed by
`zil_free_lwb`, which will result in a use-after-free situation when the
lwb's zio completes, and `zil_lwb_flush_vdevs_done` is called.

This race condition is prevented in `zil_close` by calling `zil_commit`
before `zil_free_lwb` is called, which will ensure all outstanding (i.e.
all lwb's in the `LWB_STATE_OPEN` and/or `LWB_STATE_ISSUED` states)
reach the `LWB_STATE_DONE` state before the lwb's are freed
(`zil_commit` will not return untill all the lwb's are
`LWB_STATE_DONE`).

Further, this race condition is prevented in `zil_sync` by only calling
`zil_free_lwb` for lwb's that do not have their `lwb_buf` pointer set.
All lwb's not in the `LWB_STATE_DONE` state will have a non-null value
for this pointer; the pointer is only cleared in
`zil_lwb_flush_vdevs_done`, at which point the lwb's state will be
changed to `LWB_STATE_DONE`.

This race is present in `zil_suspend`, leading to this bug.

At first glance, it would appear as though this would not be true
because `zil_suspend` will call `zil_commit`, just like `zil_close`, but
the problem is that `zil_suspend` will set the zilog's `zl_suspend`
field prior to calling `zil_commit`. Further, in `zil_commit`, if
`zl_suspend` is set, `zil_commit` will take a special branch of logic
and use `txg_wait_synced` instead of performing the normal `zil_commit`
logic.

This call to `txg_wait_synced` might be good enough for the data to
reach disk safely before it returns, but it does not ensure that all
outstanding lwb's reach the `LWB_STATE_DONE` state before it returns.
This is because, if there's an lwb "stuck" in
`zil_commit_waiter_timeout`, waiting for it's lwb to timeout, it will
maintain a non-null value for it's `lwb_buf` field and thus `zil_sync`
will not free that lwb. Thus, even though the lwb's data is already on
disk, the lwb will be left lingering, waiting on the CV, and will
eventually timeout and be issued to disk even though the write is
unnesseary.

So, after `zil_commit` is called from `zil_suspend`, we incorrectly
assume that there are not outstanding lwb's, and proceed to free all
lwb's found on the zilog's lwb list. As a result, we free the lwb that
will later be used `zil_commit_waiter_timeout`.

Reviewed by: John Kennedy <jwk404@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>
2018-01-21 23:18:42 +00:00
bin Convert ls(1) to not use libxo(3) 2018-01-17 22:47:34 +00:00
cddl MFV r328220: 8677 Open-Context Channel Programs 2018-01-21 23:02:05 +00:00
contrib Rename "index" variable to "idx" since gcc complains that it shadows 2018-01-19 20:33:47 +00:00
crypto Merge OpenSSL 1.0.2n. 2017-12-07 18:02:57 +00:00
etc Teach the resolv startup script to respect its enable flag. 2018-01-18 20:45:41 +00:00
gnu Recognize mchk_calltrap as a trapframe generator. 2018-01-19 01:36:25 +00:00
include Use the __result_use_check attribute also for reallocf(3). 2018-01-09 22:48:13 +00:00
kerberos5 various: general adoption of SPDX licensing ID tags. 2017-11-27 15:37:16 +00:00
lib iconv: adding missing break 2018-01-21 21:09:08 +00:00
libexec rpc.sprayd: Remove 3rd and 4th clauses in christos' license. 2017-12-28 17:51:53 +00:00
release After removal of loader.ps3, change petitboot configuration in release media 2018-01-01 03:33:01 +00:00
rescue Avoid referencing private lib names directly. 2017-11-10 07:53:02 +00:00
sbin Revert ABI breakage to CAM that came in with MMC/SD support in r320844. 2018-01-19 15:32:27 +00:00
secure Merge OpenSSL 1.0.2n. 2017-12-07 18:02:57 +00:00
share termcap: add xterm-termite 2018-01-20 22:24:45 +00:00
stand Remove extra copy of bootinfo.c. It's a bit rotted copy of the one in 2018-01-19 19:09:17 +00:00
sys MFV r328227: 8909 8585 can cause a use-after-free kernel panic 2018-01-21 23:18:42 +00:00
targets Remove a reference to burncd 2017-12-29 23:56:06 +00:00
tests Add ccp(4): experimental driver for AMD Crypto Co-Processor 2018-01-18 22:01:30 +00:00
tools Merge ^/head r327624 through r327885. 2018-01-12 18:23:35 +00:00
usr.bin limits(1): fix always true condition 2018-01-21 08:48:26 +00:00
usr.sbin Adjust format string to fix build. 2018-01-18 00:24:05 +00:00
.arcconfig callsign isn't required anymore 2016-09-29 06:19:45 +00:00
.arclint arc lint: ignore /tests/ in chmod 2017-12-19 03:38:06 +00:00
.gitattributes .git*: add gitattributes and gitignore 2017-12-25 21:07:54 +00:00
.gitignore .git*: add gitattributes and gitignore 2017-12-25 21:07:54 +00:00
COPYRIGHT Happy New Year 2018 my friends! 2017-12-31 16:48:04 +00:00
LOCKS Explicitly require Security Officer's approval for kernel PRNG bits. 2013-09-17 14:19:05 +00:00
MAINTAINERS Move sys/boot to stand. Fix all references to new location 2017-11-14 23:02:19 +00:00
Makefile Import tzdata 2018a 2018-01-16 18:26:11 +00:00
Makefile.inc1 Don't build share/syscons in build-tools stage if MK_SYSCONS == "no" 2018-01-16 21:43:36 +00:00
Makefile.libcompat Check for GCC first rather than clang in the MIPS lib32 rules. 2018-01-16 01:05:04 +00:00
Makefile.sys.inc AUTO_OBJ: For all top-level targets enforce using an OBJDIR. 2017-12-05 21:29:47 +00:00
ObsoleteFiles.inc Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to 2018-01-14 00:08:34 +00:00
README Import tzdata 2018a 2018-01-16 18:26:11 +00:00
README.md Document the sys/boot -> stand move in hier.7 and the top-level README. 2017-12-03 20:36:36 +00:00
UPDATING Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to 2018-01-14 00:08:34 +00:00

FreeBSD Source:

This is the top level of the FreeBSD source directory. This file was last revised on: FreeBSD

For copyright information, please see the file COPYRIGHT in this directory (additional copyright information also exists for some sources in this tree - please see the specific source directories for more information).

The Makefile in this directory supports a number of targets for building components (or all) of the FreeBSD source tree. See build(7) and https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/makeworld.html for more information, including setting make(1) variables.

The buildkernel and installkernel targets build and install the kernel and the modules (see below). Please see the top of the Makefile in this directory for more information on the standard build targets and compile-time flags.

Building a kernel is a somewhat more involved process. See build(7), config(8), and https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/kernelconfig.html for more information.

Note: If you want to build and install the kernel with the buildkernel and installkernel targets, you might need to build world before. More information is available in the handbook.

The kernel configuration files reside in the sys/<arch>/conf sub-directory. GENERIC is the default configuration used in release builds. NOTES contains entries and documentation for all possible devices, not just those commonly used.

Source Roadmap:

bin				System/user commands.

cddl			Various commands and libraries under the Common Development  
				and Distribution License.

contrib			Packages contributed by 3rd parties.

crypto			Cryptography stuff (see crypto/README).

etc				Template files for /etc.

gnu				Various commands and libraries under the GNU Public License.  
				Please see gnu/COPYING* for more information.

include			System include files.

kerberos5		Kerberos5 (Heimdal) package.

lib				System libraries.

libexec			System daemons.

release			Release building Makefile & associated tools.

rescue			Build system for statically linked /rescue utilities.

sbin			System commands.

secure			Cryptographic libraries and commands.

share			Shared resources.

stand			Boot loader sources.

sys				Kernel sources.

tests			Regression tests which can be run by Kyua.  See tests/README
				for additional information.

tools			Utilities for regression testing and miscellaneous tasks.

usr.bin			User commands.

usr.sbin		System administration commands.

For information on synchronizing your source tree with one or more of the FreeBSD Project's development branches, please see:

https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/current-stable.html