freebsd-dev

Author	SHA1	Message	Date
Glen Barber	6c41d54855	Bump Copyright year following r328283. MFC after: 3 days MFC with: r328283 Sponsored by: The FreeBSD Foundation	2018-01-23 16:48:31 +00:00
Glen Barber	afac3ed6c3	When CHROOTBUILD_SKIP is set, evaluate the existence of /bin/sh within the CHROOTDIR. If it does not exist, unset CHROOTBUILD_SKIP to prevent build failures. Requested by: swills MFC after: 3 days Sponsored by: The FreeBSD Foundation	2018-01-23 16:41:31 +00:00
Warner Losh	dfb1d80f64	Fill in ut_id. While it's not relevant to the {OLD,NEW}_TIME entries, we shouldn't leak stack garbage into the field. Sponsored by: Netflix	2018-01-23 15:34:34 +00:00
Emmanuel Vadot	62de70375f	sockstat: add break that was forgot in 328279 Reported by: garga@ MFC after: 1 week X-MFC With: 328279 Sponsored by: Gandi.net	2018-01-23 14:33:19 +00:00
Pedro F. Giffuni	7d84ca677f	extfs: Remove unused variables. Found by: scan-build Reviewed by: fsu Differential Revision: https://reviews.freebsd.org/D14017	2018-01-23 14:17:04 +00:00
Emmanuel Vadot	ee0afaa9d7	sockstat: Add -q option to suppress the header line MFC after: 1 week Sponsored by: Gandi.net	2018-01-23 13:03:47 +00:00
Wojciech Macek	a81a290f34	PowerNV: send MSI_EOI always after MSI unmask MSI/MSI-x interrupts are edge-triggered. If an interrupt arrives when IRQ line is masked, it will be lost and will never recover. Perform MSI_EOI always after unmask to give a chance for PHB/XICS to send an interrupt again if MSI/MSI-x pending bit is set in MSI/MSI-x BAR space. Submitted by: Wojciech Macek <wma@semihalf.org> Obtained from: Semihalf Sponsored by: IBM, QCM Technologies	2018-01-23 08:07:00 +00:00
Xin LI	593ff55396	Document how to load nmdm(4) from a kernel module. Submitted by: kevlo MFC after: 2 weeks	2018-01-23 03:36:49 +00:00
Ryan Stone	bc3d87fd59	Increment the route table gen count after a modify Increment the route table generation count after modifying a route. This signals back to TCP connections that they need to update their L2 caches as the gateway for their route may have changed. This is a heavier hammer than is needed, strictly speaking, but route changes will be unlikely enough that the performance effects of invalidating all connection route caches should be negligible. MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13990 Reviewed by: karels	2018-01-23 03:15:44 +00:00
Ryan Stone	fc21c53f63	Reduce code duplication for inpcb route caching Add a new macro to clear both the L3 and L2 route caches, to hopefully prevent future instances where only the L3 cache was cleared when both should have been. MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13989 Reviewed by: karels	2018-01-23 03:15:39 +00:00
Ryan Stone	dbeab32f94	Invalidate inpcb LLE cache if cached route is invalidated When the inpcb route cache is invalidated after a change to the routing tables, we need to invalidate the LLE cache as well. Previous to this change packets for the connection would continue to use the old L2 information from the old L3 gateway, and the packets for the connection would likely be blackholed. MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13988 Reviewed by: karels	2018-01-23 03:15:35 +00:00
Justin Hibbits	1a55e1d038	Fix 64-bit booke kernel builds after the ldscript changes Commits r326203 and r326978 broke 64-bit booke kernels by introducing a 1MB zero-pad between the ELF header and the start of the kernel. This didn't cause a build failure, but caused kernels to need to be loaded into memory 1MB lower, which could easily break scripts expecting previous behavior. This change matches the similar change made to AIM in r327358.	2018-01-23 02:52:12 +00:00
Alan Somers	76f9d2759b	mlock(2): correct documentation for error conditions. The man page is years out of date regarding errors. Our implementation _does_ allow unaligned addresses, and it _does_not_ check for negative lengths, because the length is unsigned. It checks for overflow instead. Update the tests accordingly. Reviewed by: bcr MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D13826	2018-01-22 21:45:54 +00:00
Eric Joyner	2149fa3ec8	ixv(4): Stop setting editing ifnet flags in ixv_if_init() In iflib, the device-specific init() function isn't supposed to edit the struct ifnet driver flags. If it does, it'll cause an MPASS() assert in iflib to fail. PR: 225312 Reported by: bhughes@	2018-01-22 20:56:21 +00:00
Konstantin Belousov	279e33d489	Fix compat32 for sysctl net.PF_ROUTE...NET_RT_IFLISTL. Route messages are aligned to the host long type alignment, which breaks 32bit. Reported and tested by: lwhsu Diagnosed by: Yuri Pankov <yuripv@icloud.com> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-01-22 20:49:17 +00:00
Kyle Evans	e36cba8a36	libregex: Add a symbol map kib points out that trying to re-use symbol versioning from libc is dirty and wrong. The implementation in libregex is incompatible by design with the implementation in libc. Using the symbol versions from libc can and likely will cause confusions for linkers and bring unexpected behavior for consumers that unwillingly (transitively) link against libregex. Reported by: kib	2018-01-22 18:40:19 +00:00
Warner Losh	6a86483da1	This comment is bogus. This is a legit release. Reviewed by: scottl@, ken@ Sponsored by: Netflix	2018-01-22 17:47:49 +00:00
Pedro F. Giffuni	7d81f67b88	drm2: Basic use of mallocarray(9). These functions deal the same type of overflows we do with mallocarray(9). Using our mallocarray will panic, which different from the previous behavior (returning NULL), but neither behavior is more correct. As a sidenote, drm_calloc_large() is not currently used at all. Reviewed by: dumbbell Differential Revision: https://reviews.freebsd.org/D13835	2018-01-22 15:55:51 +00:00
Poul-Henning Kamp	67e8bb2f5e	Forgot to add the skeleton BCM283x Clock Manager Reminded by: lwhsu	2018-01-22 08:33:59 +00:00
Poul-Henning Kamp	59e6efefac	Add skeleton manual page for bcm283x_pwm (Feel free to improve this)	2018-01-22 07:43:54 +00:00
Poul-Henning Kamp	2b7d9db4e2	Forgot to edit copy&pasted copyright blurb.	2018-01-22 07:15:24 +00:00
Poul-Henning Kamp	fc62b7e5de	Add a skeleton Clock Manager for RPi2/3, and use that from pwm instead of frobbing the registers directly. As a hack the bcm2835_pwm kmod presently ignores the 'status="disabled"' in the RPI3 DTB, assuming that if you load the kld you probably want the PWM to work.	2018-01-22 07:10:30 +00:00
Alexander Motin	0d6993dfd3	MFV r328255: 8972 zfs holds: In scripted mode, do not pad columns with spaces illumos/illumos-gate@e9b7d6e7f7 https://www.illumos.org/issues/8972: 'zfs holds -H' does not properly output content in scripted mode. It uses a tab instead of two spaces, but it still pads column widths with spaces when it should not. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Allan Jude <allanjude@freebsd.org>	2018-01-22 06:00:45 +00:00
Alexander Motin	d0fd40e7c9	8972 zfs holds: In scripted mode, do not pad columns with spaces illumos/illumos-gate@e9b7d6e7f7 https://www.illumos.org/issues/8972: 'zfs holds -H' does not properly output content in scripted mode. It uses a tab instead of two spaces, but it still pads column widths with spaces when it should not. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Allan Jude <allanjude@freebsd.org>	2018-01-22 05:59:48 +00:00
Alexander Motin	1dbc0fc352	MFV r328253: 8835 Speculative prefetch in ZFS not working for misaligned reads illumos/illumos-gate@5cb8d943bc https://www.illumos.org/issues/8835: Sequential reads not aligned to block size are not detected by ZFS prefetcher as sequential, killing prefetch and severely hurting performance. It is caused by dmu_zfetch() in case of misaligned sequential accesses being called with overlap of one block. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Allan Jude <allanjude@freebsd.org> Approved by: Gordon Ross <gwr@nexenta.com> Author: Alexander Motin <mav@FreeBSD.org>	2018-01-22 05:57:14 +00:00
Alexander Motin	2d224f6116	8835 Speculative prefetch in ZFS not working for misaligned reads illumos/illumos-gate@5cb8d943bc https://www.illumos.org/issues/8835: Sequential reads not aligned to block size are not detected by ZFS prefetcher as sequential, killing prefetch and severely hurting performance. It is caused by dmu_zfetch() in case of misaligned sequential accesses being called with overlap of one block. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Allan Jude <allanjude@freebsd.org> Approved by: Gordon Ross <gwr@nexenta.com> Author: Alexander Motin <mav@FreeBSD.org>	2018-01-22 05:55:43 +00:00
Alexander Motin	4eb2803697	MFV r328251: 8652 Tautological comparisons with ZPROP_INVAL illumos/illumos-gate@4ae5f5f06c https://www.illumos.org/issues/8652: Clang and GCC prefer to use unsigned ints to store enums. With Clang, that causes tautological comparison warnings when comparing a zfs_prop_t or zpool_prop_t variable to the macro ZPROP_INVAL. It's likely that error handling code is being silently removed as a result. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Approved by: Gordon Ross <gwr@nexenta.com> Author: Alan Somers <asomers@gmail.com>	2018-01-22 05:52:39 +00:00
Alexander Motin	1e2bad5ea0	8652 Tautological comparisons with ZPROP_INVAL illumos/illumos-gate@4ae5f5f06c https://www.illumos.org/issues/8652: Clang and GCC prefer to use unsigned ints to store enums. With Clang, that causes tautological comparison warnings when comparing a zfs_prop_t or zpool_prop_t variable to the macro ZPROP_INVAL. It's likely that error handling code is being silently removed as a result. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Approved by: Gordon Ross <gwr@nexenta.com> Author: Alan Somers <asomers@gmail.com>	2018-01-22 04:48:14 +00:00
Alexander Motin	cf52647c41	MFV r328249: 8641 "zpool clear" and "zinject" don't work on "spare" or "replacing" vdevs illumos/illumos-gate@2ba5f978a4 https://www.illumos.org/issues/8641: "zpool clear" and "zinject -d" can both operate on specific vdevs, either leaf or interior. However, due to an oversight, neither works on a "spare" or "replacing" vdev. For example: sudo zpool create foo raidz1 c1t5000CCA000081D61d0 c1t5000CCA000186235d0 spare c 1t5000CCA000094115d0 sudo zpool replace foo c1t5000CCA000186235d0 c1t5000CCA000094115d0 $ zpool status foo pool: foo state: ONLINE scan: resilvered 81.5K in 0h0m with 0 errors on Fri Sep 8 10:53:03 2017 config: NAME STATE READ WRITE CKSUM foo ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c1t5000CCA000081D61d0 ONLINE 0 0 0 spare-1 ONLINE 0 0 0 c1t5000CCA000186235d0 ONLINE 0 0 0 c1t5000CCA000094115d0 ONLINE 0 0 0 spares c1t5000CCA000094115d0 INUSE currently in use $ sudo zinject -d spare-1 -A degrade foo cannot find device 'spare-1' in pool 'foo' $ sudo zpool clear foo spare-1 cannot clear errors for spare-1: no such device in pool Even though there was nothing to clear, those commands shouldn't have reported an error. by contrast, trying to clear "raidz1-0" works just fine: $ sudo zpool clear foo raidz1-0 Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Gordon Ross <gwr@nexenta.com> Author: Alan Somers <asomers@gmail.com>	2018-01-22 04:37:04 +00:00
Alexander Motin	434e06c6f9	8641 "zpool clear" and "zinject" don't work on "spare" or "replacing" vdevs illumos/illumos-gate@2ba5f978a4 https://www.illumos.org/issues/8641: "zpool clear" and "zinject -d" can both operate on specific vdevs, either leaf or interior. However, due to an oversight, neither works on a "spare" or "replacing" vdev. For example: sudo zpool create foo raidz1 c1t5000CCA000081D61d0 c1t5000CCA000186235d0 spare c1t5000CCA000094115d0 sudo zpool replace foo c1t5000CCA000186235d0 c1t5000CCA000094115d0 $ zpool status foo pool: foo state: ONLINE scan: resilvered 81.5K in 0h0m with 0 errors on Fri Sep 8 10:53:03 2017 config: NAME STATE READ WRITE CKSUM foo ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c1t5000CCA000081D61d0 ONLINE 0 0 0 spare-1 ONLINE 0 0 0 c1t5000CCA000186235d0 ONLINE 0 0 0 c1t5000CCA000094115d0 ONLINE 0 0 0 spares c1t5000CCA000094115d0 INUSE currently in use $ sudo zinject -d spare-1 -A degrade foo cannot find device 'spare-1' in pool 'foo' $ sudo zpool clear foo spare-1 cannot clear errors for spare-1: no such device in pool Even though there was nothing to clear, those commands shouldn't have reported an error. by contrast, trying to clear "raidz1-0" works just fine: $ sudo zpool clear foo raidz1-0	2018-01-22 04:35:17 +00:00
Alexander Motin	b5eb78f824	MFV r328247: 8959 Add notifications when a scrub is paused or resumed illumos/illumos-gate@301fd1d6f2 Reviewed by: Alek Pinchuk <pinchuk.alek@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Gordon Ross <gwr@nexenta.com> Author: Sean Eric Fagan <sef@ixsystems.com>	2018-01-22 04:31:48 +00:00
Alexander Motin	ee700ae0c6	8959 Add notifications when a scrub is paused or resumed illumos/illumos-gate@301fd1d6f2 Reviewed by: Alek Pinchuk <pinchuk.alek@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Gordon Ross <gwr@nexenta.com> Author: Sean Eric Fagan <sef@ixsystems.com>	2018-01-22 04:27:05 +00:00
Alexander Motin	abe7ff88e1	MFV r328245: 8856 arc_cksum_is_equal() doesn't take into account ABD-logic illumos/illumos-gate@01a059ee0c https://www.illumos.org/issues/8856: arc_cksum_is_equal() calls zio_push_transform() that requires abd_t* (second arg), but a void* is passed. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Gordon Ross <gwr@nexenta.com> Author: Roman Strashkin <roman.strashkin@nexenta.com>	2018-01-22 04:23:48 +00:00
Alexander Motin	1536455b31	8856 arc_cksum_is_equal() doesn't take into account ABD-logic illumos/illumos-gate@01a059ee0c https://www.illumos.org/issues/8856: arc_cksum_is_equal() calls zio_push_transform() that requires abd_t* (second arg), but a void* is passed. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Gordon Ross <gwr@nexenta.com> Author: Roman Strashkin <roman.strashkin@nexenta.com>	2018-01-22 04:21:55 +00:00
Kyle Evans	d5b15ddbfc	usr.sbin/service: Fix -j to not be order dependant The introduced -j option is highly dependant on the ordering of arguments, and it exhibited broken behavior in some other circumstances. Fix these issues, and simplify the feature by removing the unneessary double parsing of options. Reviewed by: jilles Differential Revision: https://reviews.freebsd.org/D13952	2018-01-22 03:38:10 +00:00
Kyle Evans	df1043c201	libregex: Drop WARNS to 2 to match libc It's become clear that my armv7 builds didn't catch all of the warnings that other builds are picking up, drop WARNS to 2 to match libc until they're all caught.	2018-01-22 03:12:26 +00:00
Kyle Evans	fe5bf674e6	Add missing patch from r328240 regcomp uses some libc internal collation bits that are not available in the libregex context. It's easy enough to bring in the needed parts that can work in a libregex world, so do so. Pointy hat to: me	2018-01-22 02:58:33 +00:00
Kyle Evans	b37f6c9805	Add libregex, connect it to the build libregex is a regex(3) implementation intended to feature GNU extensions and any other non-POSIX compliant extensions that are deemed worthy. These extensions are separated out into a separate library for the sake of not cluttering up libc further with them as well as not deteriorating the speed (or lack thereof) of the libc implementation. libregex is implemented as a build of the libc implementation with LIBREGEX defined to distinguish this from a libc build. The reasons for implementation like this are two-fold: 1.) Maintenance- This reduces the overhead induced by adding yet another regex implementation to base. 2.) Ease of use- Flipping on GNU extensions will be as simple as linking against libregex, and POSIX-compliant compilations can be guaranteed with a REG_POSIX cflag that should be ignored by libc/regex and disables extensions in libregex. It is also easier to keep REG_POSIX sane and POSIX pure when implemented in this fashion. Tests are added for future functionality, but left disconnected for the time being while other testing is done. Reviewed by: cem (previous version) Differential Revision: https://reviews.freebsd.org/D12934	2018-01-22 02:44:41 +00:00
Pedro F. Giffuni	44c514b142	Forgot to sort here in r328238.	2018-01-22 02:26:10 +00:00
Pedro F. Giffuni	d821d36419	Unsign some values related to allocation. When allocating memory through malloc(9), we always expect the amount of memory requested to be unsigned as a negative value would either stand for an error or an overflow. Unsign some values, found when considering the use of mallocarray(9), to avoid unnecessary casting. Also consider that indexes should be of at least the same size/type as the upper limit they pretend to index. MFC after: 3 weeks	2018-01-22 02:08:10 +00:00
Pedro F. Giffuni	b8d1747e75	Use the __alloc_size2 attribute where relevant. This follows the documented use in GCC. It is basically only relevant for calloc(3), reallocarray(3) and mallocarray(9). Suggested by: Mark Millard Reference: https://docs.freebsd.org/cgi/mid.cgi?9DE674C6-EAA3-4E8A-906F-446E74D82FC4	2018-01-22 01:50:10 +00:00
Alexander Motin	d26c97e2fb	MFV r328233: 8898 creating fs with checksum=skein on the boot pools fails ungracefully illumos/illumos-gate@9fa2266d9a https://www.illumos.org/issues/8898: # zfs create -o checksum=skein rpool/test internal error: Result too large Abort (core dumped) Not a big deal per se, but should be handled correctly. Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: Andy Stormont <astormont@racktopsystems.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Yuri Pankov <yuri.pankov@nexenta.com> PR: 222199	2018-01-22 00:01:36 +00:00
Alexander Motin	4c30fea809	8898 creating fs with checksum=skein on the boot pools fails ungracefully illumos/illumos-gate@9fa2266d9a https://www.illumos.org/issues/8898: # zfs create -o checksum=skein rpool/test internal error: Result too large Abort (core dumped) Not a big deal per se, but should be handled correctly. Also reported as: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=222199 Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: Andy Stormont <astormont@racktopsystems.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Yuri Pankov <yuri.pankov@nexenta.com>	2018-01-21 23:57:41 +00:00
Alexander Motin	9b347baa3a	MFV r328231: 8897 zpool online -e fails assertion when run on non-leaf vdevs illumos/illumos-gate@9a551dd645 https://www.illumos.org/issues/8897: # zpool online -e test mirror-1 Assertion failed: nvlist_lookup_string(tgt, "path", &pathname) == 0, file ../common/libzfs_pool.c, line 2558, function zpool_vdev_online Abort (core dumped) Not a big deal per se, but should be handled gracefully, same way as 'offline' and 'online' without '-e'. Also reported as: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221408 Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Approved by: Dan McDonald <danmcd@joyent.com> Author: Yuri Pankov <yuri.pankov@nexenta.com>	2018-01-21 23:53:56 +00:00
Alexander Motin	adbaf37829	8897 zpool online -e fails assertion when run on non-leaf vdevs illumos/illumos-gate@9a551dd645 https://www.illumos.org/issues/8897: # zpool online -e test mirror-1 Assertion failed: nvlist_lookup_string(tgt, "path", &pathname) == 0, file ../common/libzfs_pool.c, line 2558, function zpool_vdev_online Abort (core dumped) Not a big deal per se, but should be handled gracefully, same way as 'offline' and 'online' without '-e'. Also reported as: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221408 Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Approved by: Dan McDonald <danmcd@joyent.com> Author: Yuri Pankov <yuri.pankov@nexenta.com>	2018-01-21 23:52:37 +00:00
Alexander Motin	226cd471b4	MFV r328229: 8930 zfs_zinactive: do not remove the node if the filesystem is readonly illumos/illumos-gate@93c618e0f4 https://www.illumos.org/issues/8930: We normally remove an unlinked node when its last user goes away and the node becomes inactive. However, we should not do that if the filesystem is mounted read-only including the case where it has its readonly property set. The node will remain on the unlinked queue, so it will not be leaked. One particular scenario is when we receive an incremental stream into a mounted read-only filesystem and that stream contains an unlinked file (still on the unlinked queue). If that file is opened before the receive and some time later after the receive it becomes inactive we would remove it and, thus, modify the read-only filesystem. As a result, the filesystem would diverge from its source and further incremental receives would not be possible (without forcing a rollback). Another related scenario, that may or may not be possible depending on an OS / VFS policy, is when an open file is unlinked, then the filesystem is remounted read-only, and then the file is closed. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Gordon Ross <gwr@nexenta.com> Author: Andriy Gapon <avg@FreeBSD.org>	2018-01-21 23:49:17 +00:00
Alexander Motin	dbee84cc27	8930 zfs_zinactive: do not remove the node if the filesystem is readonly illumos/illumos-gate@93c618e0f4 https://www.illumos.org/issues/8930: We normally remove an unlinked node when its last user goes away and the node becomes inactive. However, we should not do that if the filesystem is mounted read-only including the case where it has its readonly property set. The node will remain on the unlinked queue, so it will not be leaked. One particular scenario is when we receive an incremental stream into a mounted read-only filesystem and that stream contains an unlinked file (still on the unlinked queue). If that file is opened before the receive and some time later after the receive it becomes inactive we would remove it and, thus, modify the read-only filesystem. As a result, the filesystem would diverge from its source and further incremental receives would not be possible (without forcing a rollback). Another related scenario, that may or may not be possible depending on an OS / VFS policy, is when an open file is unlinked, then the filesystem is remounted read-only, and then the file is closed. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Gordon Ross <gwr@nexenta.com> Author: Andriy Gapon <avg@FreeBSD.org>	2018-01-21 23:42:45 +00:00
Alexander Motin	272f833cde	MFV r328227: 8909 8585 can cause a use-after-free kernel panic illumos/illumos-gate@94ddd0900a https://www.illumos.org/issues/8909: There's a race condition that exists if `zil_free_lwb` races with either `zil_commit_waiter_timeout` and/or `zil_lwb_flush_vdevs_done`. Here's an example panic due to this bug: > ::status debugging crash dump vmcore.0 (64-bit) from ip-10-110-205-40 operating system: 5.11 dlpx-5.2.2.0_2017-12-04-17-28-32b6ba51fb (i86pc) image uuid: 4af0edfb-e58e-6ed8-cafc-d3e9167c7513 panic message: BAD TRAP: type=e (#pf Page fault) rp=ffffff0010555970 addr=60 occurred in mo dule "zfs" due to a NULL pointer dereference dump content: kernel pages only > $c zio_shrink+0x12() zil_lwb_write_issue+0x30d(ffffff03dcd15cc0, ffffff03e0730e20) zil_commit_waiter_timeout+0xa2(ffffff03dcd15cc0, ffffff03d97ffcf8) zil_commit_waiter+0xf3(ffffff03dcd15cc0, ffffff03d97ffcf8) zil_commit+0x80(ffffff03dcd15cc0, 9a9) zfs_write+0xc34(ffffff03dc38b140, ffffff0010555e60, 40, ffffff03e00fb758, 0) fop_write+0x5b(ffffff03dc38b140, ffffff0010555e60, 40, ffffff03e00fb758, 0) write+0x250(42, fffffd7ff4832000, 2000) sys_syscall+0x177() If there's an outstanding lwb that's in `zil_commit_waiter_timeout` waiting to timeout, waiting on it's waiter's CV, we must be sure not to call `zil_free_lwb`. If we end up calling `zil_free_lwb`, then that LWB may be freed and can result in a use-after-free situation where the stale lwb pointer stored in the `zil_commit_waiter_t` structure of the thread waiting on the waiter's CV is used. A similar situation can occur if an lwb is issued to disk, and thus in the `LWB_STATE_ISSUED` state, and `zil_free_lwb` is called while the disk is servicing that lwb. In this situation, the lwb will be freed by `zil_free_lwb`, which will result in a use-after-free situation when the lwb's zio completes, and `zil_lwb_flush_vdevs_done` is called. This race condition is prevented in `zil_close` by calling `zil_commit` before `zil_free_lwb` is called, which will ensure all outstanding (i.e. all lwb's in the `LWB_STATE_OPEN` and/or `LWB_STATE_ISSUED` states) reach the `LWB_STATE_DONE` state before the lwb's are freed (`zil_commit` will not return untill all the lwb's are `LWB_STATE_DONE`). Further, this race condition is prevented in `zil_sync` by only calling `zil_free_lwb` for lwb's that do not have their `lwb_buf` pointer set. All lwb's not in the `LWB_STATE_DONE` state will have a non-null value for this pointer; the pointer is only cleared in `zil_lwb_flush_vdevs_done`, at which point the lwb's state will be changed to `LWB_STATE_DONE`. This race is present in `zil_suspend`, leading to this bug. At first glance, it would appear as though this would not be true because `zil_suspend` will call `zil_commit`, just like `zil_close`, but the problem is that `zil_suspend` will set the zilog's `zl_suspend` field prior to calling `zil_commit`. Further, in `zil_commit`, if `zl_suspend` is set, `zil_commit` will take a special branch of logic and use `txg_wait_synced` instead of performing the normal `zil_commit` logic. This call to `txg_wait_synced` might be good enough for the data to reach disk safely before it returns, but it does not ensure that all outstanding lwb's reach the `LWB_STATE_DONE` state before it returns. This is because, if there's an lwb "stuck" in `zil_commit_waiter_timeout`, waiting for it's lwb to timeout, it will maintain a non-null value for it's `lwb_buf` field and thus `zil_sync` will not free that lwb. Thus, even though the lwb's data is already on disk, the lwb will be left lingering, waiting on the CV, and will eventually timeout and be issued to disk even though the write is unnesseary. So, after `zil_commit` is called from `zil_suspend`, we incorrectly assume that there are not outstanding lwb's, and proceed to free all lwb's found on the zilog's lwb list. As a result, we free the lwb that will later be used `zil_commit_waiter_timeout`. Reviewed by: John Kennedy <jwk404@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Approved by: Robert Mustacchi <rm@joyent.com> Author: Prakash Surya <prakash.surya@delphix.com>	2018-01-21 23:18:42 +00:00
Alexander Motin	ebd7699264	8909 8585 can cause a use-after-free kernel panic illumos/illumos-gate@94ddd0900a https://www.illumos.org/issues/8909: There's a race condition that exists if `zil_free_lwb` races with either `zil_commit_waiter_timeout` and/or `zil_lwb_flush_vdevs_done`. Here's an example panic due to this bug: > ::status debugging crash dump vmcore.0 (64-bit) from ip-10-110-205-40 operating system: 5.11 dlpx-5.2.2.0_2017-12-04-17-28-32b6ba51fb (i86pc) image uuid: 4af0edfb-e58e-6ed8-cafc-d3e9167c7513 panic message: BAD TRAP: type=e (#pf Page fault) rp=ffffff0010555970 addr=60 occurred in module "zfs" due to a NULL pointer dereference dump content: kernel pages only > $c zio_shrink+0x12() zil_lwb_write_issue+0x30d(ffffff03dcd15cc0, ffffff03e0730e20) zil_commit_waiter_timeout+0xa2(ffffff03dcd15cc0, ffffff03d97ffcf8) zil_commit_waiter+0xf3(ffffff03dcd15cc0, ffffff03d97ffcf8) zil_commit+0x80(ffffff03dcd15cc0, 9a9) zfs_write+0xc34(ffffff03dc38b140, ffffff0010555e60, 40, ffffff03e00fb758, 0) fop_write+0x5b(ffffff03dc38b140, ffffff0010555e60, 40, ffffff03e00fb758, 0) write+0x250(42, fffffd7ff4832000, 2000) sys_syscall+0x177() If there's an outstanding lwb that's in `zil_commit_waiter_timeout` waiting to timeout, waiting on it's waiter's CV, we must be sure not to call `zil_free_lwb`. If we end up calling `zil_free_lwb`, then that LWB may be freed and can result in a use-after-free situation where the stale lwb pointer stored in the `zil_commit_waiter_t` structure of the thread waiting on the waiter's CV is used. A similar situation can occur if an lwb is issued to disk, and thus in the `LWB_STATE_ISSUED` state, and `zil_free_lwb` is called while the disk is servicing that lwb. In this situation, the lwb will be freed by `zil_free_lwb`, which will result in a use-after-free situation when the lwb's zio completes, and `zil_lwb_flush_vdevs_done` is called. This race condition is prevented in `zil_close` by calling `zil_commit` before `zil_free_lwb` is called, which will ensure all outstanding (i.e. all lwb's in the `LWB_STATE_OPEN` and/or `LWB_STATE_ISSUED` states) reach the `LWB_STATE_DONE` state before the lwb's are freed (`zil_commit` will not return untill all the lwb's are `LWB_STATE_DONE`). Further, this race condition is prevented in `zil_sync` by only calling `zil_free_lwb` for lwb's that do not have their `lwb_buf` pointer set. All lwb's not in the `LWB_STATE_DONE` state will have a non-null value for this pointer; the pointer is only cleared in `zil_lwb_flush_vdevs_done`, at which point the lwb's state will be changed to `LWB_STATE_DONE`. This race is present in `zil_suspend`, leading to this bug. At first glance, it would appear as though this would not be true because `zil_suspend` will call `zil_commit`, just like `zil_close`, but the problem is that `zil_suspend` will set the zilog's `zl_suspend` field prior to calling `zil_commit`. Further, in `zil_commit`, if `zl_suspend` is set, `zil_commit` will take a special branch of logic and use `txg_wait_synced` instead of performing the normal `zil_commit` logic. This call to `txg_wait_synced` might be good enough for the data to reach disk safely before it returns, but it does not ensure that all outstanding lwb's reach the `LWB_STATE_DONE` state before it returns. This is because, if there's an lwb "stuck" in `zil_commit_waiter_timeout`, waiting for it's lwb to timeout, it will maintain a non-null value for it's `lwb_buf` field and thus `zil_sync` will not free that lwb. Thus, even though the lwb's data is already on disk, the lwb will be left lingering, waiting on the CV, and will eventually timeout and be issued to disk even though the write is unnesseary. So, after `zil_commit` is called from `zil_suspend`, we incorrectly assume that there are not outstanding lwb's, and proceed to free all lwb's found on the zilog's lwb list. As a result, we free the lwb that will later be used `zil_commit_waiter_timeout`. Reviewed by: John Kennedy <jwk404@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Approved by: Robert Mustacchi <rm@joyent.com> Author: Prakash Surya <prakash.surya@delphix.com>	2018-01-21 23:12:38 +00:00
Alexander Motin	d252c97fc0	MFV r328225: 8603 rename zilog's "zl_writer_lock" to "zl_issuer_lock" illumos/illumos-gate@cf07d3da99 https://www.illumos.org/issues/8603: To help make the ZIL's code more understandable, it was suggested that the zilog_t's "zl_writer_lock" field should be renamed to "zl_issuer_lock". Reviewed by: C Fraire <cfraire@me.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Prakash Surya <prakash.surya@delphix.com>	2018-01-21 23:11:20 +00:00

1 2 3 4 5 ...

229440 Commits