freebsd-skq

Author	SHA1	Message	Date
Alan Cox	07c348ea7b	After r118390, the variable "dmmax" was neither the correct strip size nor the correct maximum block size. Moreover, after r318995, it serves no purpose except to provide information to user space through a read- sysctl. This change eliminates the variable "dmmax" but retains the sysctl. It also corrects the value returned by the sysctl. Reviewed by: kib, markj MFC after: 3 days	2017-05-27 21:46:00 +00:00
Baptiste Daroussin	04238e0a32	Update the comments concerning net_parse_rootpath to reflect what it is now really doing Reported by: rgrimes Reviewed by: rgrimes Differential Revision: https://reviews.freebsd.org/D10959	2017-05-27 18:46:00 +00:00
Cy Schubert	243567356b	Fix return value of ip_sync_nat. Previously, regardless of error it always returned a return code of 0. Obtained from: NetBSD ip_sync.c r1.5 MFC after: 1 week	2017-05-27 18:01:14 +00:00
Konstantin Belousov	03311f117b	Use whole mnt_stat.f_fsid bits for st_dev. Since ino64 expanded dev_t to 64bit, make VOP_GETATTR(9) provide all bits of mnt_stat.f_fsid as va_fsid for vnodes on filesystems which use f_fsid. In particular, NFSv3 and sometimes NFSv4, and ZFS use this method or reporting st_dev by stat(2). Provide a new helper vn_fsid() to avoid duplicating code to copy f_fsid to va_fsid. Note that the change is mostly cosmetic. Its motivation is to avoid sign-extension of f_fsid[0] into 64bit dev_t value which happens after dev_t becomes 64bit.. Reviewed by: avg(zfs), rmacklem (nfs) (both for previous version) Sponsored by: The FreeBSD Foundation	2017-05-27 17:00:30 +00:00
Alan Cox	fe71561af2	In r118390, the swap pager's approach to striping swap allocation over multiple devices was changed. However, swapoff_one() was not fully and correctly converted. In particular, with r118390's introduction of a per- device blist, the maximum swap block size, "dmmax", became irrelevant to swapoff_one()'s operation. Moreover, swapoff_one() was performing out-of- range operations on the per-device blist that were silently ignored by blist_fill(). This change corrects both of these problems with swapoff_one(), which will allow us to potentially increase MAX_PAGEOUT_CLUSTER. Previously, swapoff_one() would panic inside of blist_fill() if you increased MAX_PAGEOUT_CLUSTER. Reviewed by: kib, markj MFC after: 3 days	2017-05-27 16:40:00 +00:00
Baptiste Daroussin	b5b274ce12	Catch with the change in the user class	2017-05-27 14:07:46 +00:00
Baptiste Daroussin	4e2a7b5c99	Capitalize DHCP Reported by: danfe	2017-05-27 13:55:20 +00:00
Baptiste Daroussin	aff810f1b2	Document recent changes on pxeboot	2017-05-27 13:26:18 +00:00
Baptiste Daroussin	e9ce925773	Partially revert r314948 While it sounds like a good idea to extract the RFC1048 data from PXE, in the end it is not and it is causing lots of issues. Our pxeloader might need options which are incompatible with other pxe servers (for example iPXE, but not only). Our pxe loaders are also now settings their own user class, so it is useful to issue our own pxe request at startup Reviewed by: tsoome Differential Revision: https://reviews.freebsd.org/D10953	2017-05-27 12:46:46 +00:00
Baptiste Daroussin	4dfd16670e	Always issue the pxe request All the code are now only issueing one single dhcp request at startup of the loader meaning we can always request a the PXE informations from the dhcp server. Previous code lost that information, meaning no option 55 anymore (meaning not working with the kea dhcp server) and no request for rootpath etc, no user class Remove the flags from the bootp function which is not needed anymore Reviewed by: tsoome Differential Revision: https://reviews.freebsd.org/D10952	2017-05-27 12:35:01 +00:00
Baptiste Daroussin	5fe86cd909	Always build tftpfs support along with nfs for pxeboot This change was already done for loader.efi	2017-05-27 12:20:13 +00:00
Baptiste Daroussin	404f5b6b29	Support URI scheme for root-path in netbooting Rather that previous attempts to add tftpfs support at the same time as NFS support. This time decide on a proper URI parser rather than hacks. root-path can now be define the following way: For tftpfs: tftp://ip/path tftp:/path (this one will consider the tftp server is the same as the one where the pxeboot file was fetched from) For nfs: nfs:/path nfs://ip/path The historical ip:/path /path are kept on NFS Reviewed by: tsoom, rgrimes Differential Revision: https://reviews.freebsd.org/D10947	2017-05-27 12:06:52 +00:00
Ed Maste	ef7161e774	uart: add AMT SOL PCI ID I adjusted the description to be similar to existing AMT entries. PR: 219384 Submitted by: "Tooker" MFC after: 1 week	2017-05-27 02:07:22 +00:00
Alexander Motin	41cf0d54a2	Call VLAN_CAPABILITIES() when LAGG capabilities change. This makes VLAN on top of LAGG to expose proper capabilities if they are changed after creation. MFC after: 1 week	2017-05-26 22:22:48 +00:00
Conrad Meyer	95b978955c	procstat(1): Add TCP socket send/recv buffer size Add TCP socket send and receive buffer size to procstat -f output. Reviewed by: kib, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D10689	2017-05-26 22:17:44 +00:00
John Baldwin	d68990a14c	Fail large requests with EFBIG. The adapter firmware in general does not accept PDUs larger than 64k - 1 bytes in size. Sending crypto requests larger than this size result in hangs or incorrect output, so reject them with EFBIG. For requests chaining an AES cipher with an HMAC, the firmware appears to require slightly smaller requests (around 512 bytes). Sponsored by: Chelsio Communications	2017-05-26 20:20:40 +00:00
Alexander Motin	8403ab7919	Improve applying unified capabilities to the lagg ports. Some NICs have some capabilities dependent, so that disabling one require disabling some other (TXCSUM/RXCSUM on em). This code tries to reach the consensus more insistently. PR: 219453 MFC after: 1 week	2017-05-26 20:15:33 +00:00
Andriy Gapon	b5617df55b	Allow PROBE_SPINUP to fail in CAM ATA transport The motivation for this is two-fold. 1. Some old WD SATA disks may appear as if they need to be spun up when they are already spinning. Those disks would respond with an error to the spin-up request. 2. Even if we really fail to spin up the disk, we still can try to proceed to the subsequent phases. If we fail later on, then no difference. Otherwise we get a chance to communicate with the disk which is better than completely ignoring it, because a user can try to recover the disk. Reviewed by: mav MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D10896	2017-05-26 17:44:47 +00:00
Dimitry Andric	b47efe07c4	Define a new __INO64 macro in <sys/_types.h>, to indicate the system uses 64-bit inode numbers. Programs can use this to avoid including <sys/param.h>, with its associated namespace pollution. Reviewed by: kib	2017-05-26 16:29:55 +00:00
Michael Tuexen	5d08768a2b	Use the SCTP_PCB_FLAGS_ACCEPTING flags to check for listeners. While there, use a macro for checking the listen state to allow for easier changes if required. This done to help glebius@ with his listen changes.	2017-05-26 16:29:00 +00:00
Andriy Gapon	32ecf81aff	MFV r318944: 8265 Reserve send stream flag for large dnode feature illumos/illumos-gate@bc83969fdb `bc83969fdb` https://www.illumos.org/issues/8265 Reserve bit 23 in the zfs send stream flags for the large dnode feature which has been implemented for Linux. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Brian Behlendorf <behlendorf1@llnl.gov> MFC after: 1 week	2017-05-26 12:08:38 +00:00
Andriy Gapon	a51eb0a964	MFV r318942: 8166 zpool scrub thinks it repaired offline device illumos/illumos-gate@2d2f193a21 `2d2f193a21` https://www.illumos.org/issues/8166 If we do a scrub while a leaf device is offline (via "zpool offline"), we will inadvertently clear the DTL (dirty time log) of the offline device, even though it is still damaged. When the device comes back online, we will incompletely resilver it, thinking that the scrub repaired blocks written before the scrub was started. The incomplete resilver can lead to data loss if there is a subsequent failure of a different leaf device. The fix is to never clear the DTL of offline devices. Note that if a device is onlined while a scrub is in progress, the scrub will be restarted. The problem can be worked around by running "zpool scrub" after "zpool online". See also https://github.com/zfsonlinux/zfs/issues/5806 Reviewed by: George Wilson george.wilson@delphix.com Reviewed by: Brad Lewis <brad.lewis@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com>	2017-05-26 12:04:21 +00:00
Andriy Gapon	2cd05c2473	MFV r318934: 8070 Add some ZFS comments illumos/illumos-gate@40713f2b24 `40713f2b24` https://www.illumos.org/issues/8070 Add some ZFS comments left by various developers at different times Reviewed by: Yuri Pankov <yuri.pankov@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Alan Somers <asomers@gmail.com> MFC after: 1 week	2017-05-26 11:49:42 +00:00
Andriy Gapon	0a07ea0e2f	MFV r318931: 8063 verify that we do not attempt to access inactive txg illumos/illumos-gate@b7b2590dd9 `b7b2590dd9` https://www.illumos.org/issues/8063 A standard practice in ZFS is to keep track of "per-txg" state. Any of the 3 active TXG's (open, quiescing, syncing) can have different values for this state. We should assert that we do not attempt to modify other (inactive) TXG's. Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> MFC after: 2 weeks	2017-05-26 11:37:11 +00:00
Andriy Gapon	28c5e43e36	MFV r318929: 7786 zfs`vdev_online() needs better notification about state changes illumos/illumos-gate@5f368aef86 `5f368aef86` https://www.illumos.org/issues/7786 Currently, vdev_online() will only post sysevent if previous state was "offline". It should also post the event when the state changes from "removed" or "faulted" to "healthy" or "degraded". This will fix the following scenario: - pull disk from slot A - check that hotspare has taken its place (if available) - insert disk into slot B - check that hotspare moved back to "avail" state (if spare was used) The problem here is that we don't get any ESC_ZFS_VDEV_* notification and fail to update the vdev FRU. Reviewed by: Matthew Ahrens mahrens@delphix.com Reviewed by: George Wilson george.wilson@delphix.com Approved by: Albert Lee <trisk@forkgnu.org> Author: Yuri Pankov <yuri.pankov@nexenta.com> MFC after: 1 week	2017-05-26 11:33:34 +00:00
Andriy Gapon	9c2a3c861f	MFV r318927: 8025 dbuf_read() creates unnecessary zio_root() for bonus buf illumos/illumos-gate@def4fac588 `def4fac588` https://www.illumos.org/issues/8025 dbuf_read() creates a zio_root() to track and wait for all the zio's that may happen as part of this call. However, if the blkptr_t for this buffer is NULL or a hole, we will not create any more zio's, so this zio_root() is unnecessary. This is always the case when calling dbuf_read() on a bonus buffer, because it has no blkptr (it's part of the containing dnode). For workloads that read a lot of bonus buffers (e.g. file creation and removal), creating and destroying these unnecessary zio's can decrease performance by around 3%. Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>	2017-05-26 11:30:55 +00:00
Andriy Gapon	ebaf416f95	MFV r316929: 6914 kernel virtual memory fragmentation leads to hang illumos/illumos-gate@af868f46a5 `af868f46a5` https://www.illumos.org/issues/6914 FreeBSD note: only a ZFS part of the change is merged, changes to the VM subsystem are not ported (obviously). Also, now that FreeBSD has vmem(9) we don't have to ifdef-out the code that uses it. MFC after: 2 weeks	2017-05-26 11:23:16 +00:00
Andriy Gapon	8629ec8394	arc_init: make code closer to upstream by introducing 'allmem' variable All the differences in calculations are kept. A comment about arc_max being 1/2 of all memory is fixed to reflect the actual code that uses 5/8 as a factor. MFC after: 1 week	2017-05-26 11:05:56 +00:00
Andriy Gapon	cf781c9b60	zfs_putpages: assert that sa_bulk_update() must succeed Same as the upstream does in r316927. MFC after: 1 week	2017-05-26 10:37:55 +00:00
Andriy Gapon	04b7c6b337	MFV r316928: 7256 low probability race in zfs_get_data illumos/illumos-gate@0c94e1af67 `0c94e1af67` https://www.illumos.org/issues/7256 error = dmu_sync(zio, lr->lr_common.lrc_txg, zfs_get_done, zgd); ASSERT(error \|\| lr->lr_length <= zp->z_blksz); It's possible, although extremely rare, that the zfs_get_done() callback is executed before dmu_sync() returns. In that case the znode's range lock is dropped and the znode is unreferenced. Thus, the assertion can access some invalid or wrong data via the zp pointer. size variable caches the correct value of z_blksz and can be safely used here. Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Andriy Gapon <andriy.gapon@clusterhq.com> MFC after: 1 week	2017-05-26 10:31:05 +00:00
Andriy Gapon	7a94dd7aee	MFC r316924: 8061 sa_find_idx_tab can be declared more type-safely illumos/illumos-gate@7f0bdb4257 `7f0bdb4257` https://www.illumos.org/issues/8061 sa_find_idx_tab() is declared as taking and returning "void *" parameters. These can be declared to be the specific types. Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Chris Williamson <chris.williamson@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com> MFC after: 1 week	2017-05-26 10:27:35 +00:00
Adrian Chadd	7b6899bf2a	[ath] fix short-GI wireshark flag. Yes, HAL_RX_GI means "short guard interval."	2017-05-26 00:48:21 +00:00
Alexander Motin	e3d90506c4	Remove some code, dead from the day one.	2017-05-25 23:19:09 +00:00
Stephen McConnell	327f2e6c56	Fix several problems with mapping code. Reviewed by: ken, scottl, asomers, ambrisko, mav Approved by: ken, mav MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D10861	2017-05-25 19:20:06 +00:00
Stephen McConnell	635e58c715	Fix several problems with mapping code. Reviewed by: ken, scottl, asomers, ambrisko, mav Approved by: ken, mav MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D10878	2017-05-25 19:14:44 +00:00
Zbigniew Bodek	26872c13ce	Unmask legacy interrupts on Marvell PCIE controller This patch fixes a bug introduced with commit: r294510 "Remove an extra '!' found by clang 3.8." '!' was removed without inverting the logic, which broke PCIe legacy interrupts operation for Marvell controllers. Submitted by: Michal Mazur <mkm@semihalf.com> Obtained from: Semihalf Sponsored by: Netgate	2017-05-25 14:34:21 +00:00
Zbigniew Bodek	fa5f501d0a	Add workaround for CESA MBUS windows with 4GB DRAM Armada 38x SoC's equipped with 4GB DRAM suffer freeze during CESA operation, if MBUS window opened at given DRAM CS reaches end of the address space. Apply a workaround by setting the window size to the closest possible value, i.e. divide it by 2 (it has to be power-of-2). Submitted by: Marcin Wojtas <mw@semihalf.com> Obtained from: Semihalf Sponsored by: Stormshield Differential revision: https://reviews.freebsd.org/D10724	2017-05-25 14:25:05 +00:00
Zbigniew Bodek	0c79c0b138	Fix PM recognition on recent Marvell boards PM status is only supported on Kirkwood and Disvovery. Cleanup the code to properly report its state on other platforms. Submitted by: Wojciech Macek <wma@semihalf.com> Obtained from: Semihalf Sponsored by: Stormshield Differential revision: https://reviews.freebsd.org/D10718	2017-05-25 14:23:49 +00:00
Zbigniew Bodek	92ce47d94e	Introduce separate watchdog driver for Armada to fix phony DELAY DELAY is a problematic routine called all over the kernel. Armada38x using CA-9 CPUs are using mpcore timer to count events and measure time but DELAY in the mpcore timer code is a weak function reference and therefore will be replaced by the platform implementation if the one is introduced. Since Armada38x uses on-chip watchdog to which the driver is merged with the on-chip timer driver there will be a platform DELAY implementation. The latter however will not use any HW timers as it will not attempt to configure any. Phony busy loop will be used instead. To fix that we introduce a separate watchdog driver for Armada platforms, (currently only A38X) and stop using Marvell timer driver. That switches DELAY to the desired implementation. Submitted by: Zbigniew Bodek <zbb@semihalf.com> Obtained from: Semihalf Sponsored by: Stormshield Differential revision: https://reviews.freebsd.org/D10710	2017-05-25 14:22:00 +00:00
Zbigniew Bodek	bb98396b47	Enable SCU Speculative linefills to L2 on Armada 38x Submitted by: Marcin Wojtas <mw@semihalf.com> Obtained from: Semihalf Sponsored by: Stormshield Differential revision: https://reviews.freebsd.org/D10709	2017-05-25 14:19:20 +00:00
Zbigniew Bodek	70d163328d	Fix memory corruption while configuring CPU windows on Marvell SoCs Resolving CPU windows from localbus entry caused buffer overflow and memory corruption. Fix wrong indexing and ensure the index does not exceed table size. Submitted by: Wojciech Macek <wma@semihalf.com> Obtained from: Semihalf Sponsored by: Stormshield Differential revision: https://reviews.freebsd.org/D10720	2017-05-25 14:16:43 +00:00
Andriy Gapon	ced98d784b	fix vmxnet3 crash when LRO is enabled The crash can occur when all of the following conditions are true: - a packet consists of multiple segements (requires LRO enabled) - there has been a failure to allocate an mbuf for the packet and the packet has to be dropped - a host (vmware) still owned at least one segment of the packet, so the driver had to wait for another interrupt to proceed to discarding the remaning segment(s) Reviewed by: rstone MFC after: 2 weeks Sponsored by: Panzura Differential Revision: https://reviews.freebsd.org/D10874	2017-05-25 10:49:56 +00:00
Hans Petter Selasky	3f9dcc588d	Declare the "snd_fxdiv_table" once. This shaves around 24Kbytes of binary data from sound.ko and the kernel. MFC after: 3 days	2017-05-25 05:23:47 +00:00
Adrian Chadd	f46839b9e3	[ath] [ath_hal] retire AH_SUPPORT_AR5416 changing anything. Yes, the memory bloat is large, but it's 2017 and I'll fix it later by making it runtime configurable / per-chip configurable if I ever need to.	2017-05-25 04:26:26 +00:00
Adrian Chadd	41059135ce	[ath] [ath_hal] (etc, etc) - begin the task of re-modularising the HAL. In the deep past, when this code compiled as a binary module, ath_hal built as a module. This allowed custom, smaller HAL modules to be built. This was especially beneficial for small embedded platforms where you didn't require /everything/ just to run. However, sometime around the HAL opening fanfare, the HAL landed here as one big driver+HAL thing, and a lot of the (dirty) infrastructure (ie, #ifdef AH_SUPPORT_XXX) to build specific subsets of the HAL went away. This was retained in sys/conf/files as "ath_hal_XXX" but it wasn't really floated up to the modules themselves. I'm now in a position where for the reaaaaaly embedded boards (both the really old and the last couple generation of QCA MIPS boards) having a cut down HAL module and driver loaded at runtime is /actually/ beneficial. This reduces the kernel size down by quite a bit. The MIPS modules look like this: adrian@gertrude:~/work/freebsd/head-embedded/src % ls -l ../root/mips_ap/boot/kernel.CARAMBOLA2/athko -r-xr-xr-x 1 adrian adrian 5076 May 23 23:45 ../root/mips_ap/boot/kernel.CARAMBOLA2/ath_dfs.ko -r-xr-xr-x 1 adrian adrian 100588 May 23 23:45 ../root/mips_ap/boot/kernel.CARAMBOLA2/ath_hal.ko -r-xr-xr-x 1 adrian adrian 627324 May 23 23:45 ../root/mips_ap/boot/kernel.CARAMBOLA2/ath_hal_ar9300.ko -r-xr-xr-x 1 adrian adrian 314588 May 23 23:45 ../root/mips_ap/boot/kernel.CARAMBOLA2/ath_main.ko -r-xr-xr-x 1 adrian adrian 23472 May 23 23:45 ../root/mips_ap/boot/kernel.CARAMBOLA2/ath_rate.ko And the x86 versions, like this: root@gertrude:/home/adrian # ls -l /boot/kernel/athko -r-xr-xr-x 1 root wheel 36632 May 24 18:32 /boot/kernel/ath_dfs.ko -r-xr-xr-x 1 root wheel 134440 May 24 18:32 /boot/kernel/ath_hal.ko -r-xr-xr-x 1 root wheel 82320 May 24 18:32 /boot/kernel/ath_hal_ar5210.ko -r-xr-xr-x 1 root wheel 104976 May 24 18:32 /boot/kernel/ath_hal_ar5211.ko -r-xr-xr-x 1 root wheel 236144 May 24 18:32 /boot/kernel/ath_hal_ar5212.ko -r-xr-xr-x 1 root wheel 336104 May 24 18:32 /boot/kernel/ath_hal_ar5416.ko -r-xr-xr-x 1 root wheel 598336 May 24 18:32 /boot/kernel/ath_hal_ar9300.ko -r-xr-xr-x 1 root wheel 406144 May 24 18:32 /boot/kernel/ath_main.ko -r-xr-xr-x 1 root wheel 55352 May 24 18:32 /boot/kernel/ath_rate.ko .. so you can see, not building the whole HAL can save quite a bit. For example, if you don't need AR9300 support, you can actually avoid wasting half a megabyte of RAM. On embedded routers this is quite a big deal. The AR9300 HAL can be later further shrunk because, hilariously, it indeed supports AH_SUPPORT_<xxx> for optionally adding chipset support. (I'll chase that down later as it's quite a big savings if you're only building for a single embedded target.) So: * Create a very hackish way to load/unload HAL modules * Create module metadata for each HAL subtype - ah_osdep_arXXXX.c * Create module metadata for ath_rate and ath_dfs (bluetooth is currently just built as part of it) * .. yes, this means we could actually build multiple rate control modules and pick one at load time, but I'd rather just glue this into net80211's rate control code. Oh well, baby steps. * Main driver is now "ath_main" * Create an "if_ath" module that does what the ye olde one did - load PCI glue, main driver, HAL and all child modules. In this way, if you have "if_ath_load=YES" in /boot/modules.conf it will load everything the old way and stuff should still work. * For module autoloading purposes, I actually /did/ fix up the name of the modules in if_ath_pci and if_ath_ahb. If you want to selectively load things (eg on ye cheape ARM/MIPS platforms where RAM is at a premium) you should: * load ath_hal * load the chip modules in question * load ath_rate, ath_dfs * load ath_main * load if_ath_pci and/or if_ath_ahb depending upon your particular bus bind type - this is where probe/attach is done. TODO: * AR5312 module and associated pieces - yes, we have the SoC side support now so the wifi support would be good to "round things out"; * Just nuke AH_SUPPORT_AR5416 for now and always bloat the packet structures; this'll simplify other things. * Should add a simple refcnt thing to the HAL RF/chip modules so you can't unload them whilst you're using them. * Manpage updates, UPDATING if appropriate, etc.	2017-05-25 04:18:46 +00:00
Andriy Gapon	8816c0bb48	MFV r316925: 6101 attempt to lzc_create() a filesystem under a volume results in a panic illumos/illumos-gate@b127fe3c05 `b127fe3c05` https://www.illumos.org/issues/6101 lzc_create(), or more correctly, zfs_ioc_create() does not reject an attempt to create a filesystem as a child of a volume, instead it proceeds to a crash. A crash stack obtained on FreeBSD: page fault while in kernel mode zap_leaf_lookup() fzap_lookup() zap_lookup_norm() zap_lookup() zfs_get_zplprop() zfs_fill_zplprops_impl() zfs_ioc_create() zfsdev_ioctl() devfs_ioctl_f() kern_ioctl() sys_ioctl() This crash happened with a kernel without debugging assertions. The immediate cause of crash appears to an attempt to interpret a zvol object as a zap object. For filesystems: #define MASTER_NODE_OBJ 1 For zvols: #define ZVOL_OBJ 1ULL #define ZVOL_ZAP_OBJ 2ULL So, I see two problems here: 1. an attempt to create a filesystem under a zvol should be rejected as early as possible, maybe in zfs_fill_zplprops() 2. maybe zap_lookup / zap_lockdir should reject objects that are not of one of the zap object types Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Andriy Gapon <avg@FreeBSD.org> MFC after: 2 weeks	2017-05-24 22:34:54 +00:00
Andriy Gapon	e73f9f8a49	MFV r316923: 8026 retire zfs_throttle_delay and zfs_throttle_resolution illumos/illumos-gate@6b03625981 `6b03625981` https://www.illumos.org/issues/8026 zfs_throttle_delay and zfs_throttle_resolution became disused since the new write throttling mechanism was introduced. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Andriy Gapon <avg@FreeBSD.org> MFC after: 1 week	2017-05-24 22:32:56 +00:00
Andriy Gapon	9fe5e04dfc	MFC r316921: 8027 tighten up dsl_pool_dirty_delta illumos/illumos-gate@313ae1e182 `313ae1e182` https://www.illumos.org/issues/8027 dsl_pool_dirty_delta() should not wake up waiters when dp->dp_dirty_total == zfs_dirty_data_max, because they wait for dp_dirty_total to fall strictly below the threshold. It's probably very rare for that condition to occur, but it's better to have more accurate code. Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Andriy Gapon <avg@FreeBSD.org> MFC after: 1 week	2017-05-24 22:27:48 +00:00
Andriy Gapon	e1b8f10a5e	MFV r316920: 8023 Panic destroying a metaslab deferred range tree illumos/illumos-gate@3991b535a8 `3991b535a8` https://www.illumos.org/issues/8023 $C ffffff0011bc0970 vpanic() ffffff0011bc0a00 strlog() ffffff0011bc0a30 range_tree_destroy+0x72(ffffff043769ad00) ffffff0011bc0a70 metaslab_fini+0xd5(ffffff0449acf380) ffffff0011bc0ab0 vdev_metaslab_fini+0x56(ffffff0462bae800) ffffff0011bc0af0 spa_unload+0x9b(ffffff03e3dac000) ffffff0011bc0b70 spa_export_common+0x115(ffffff047f4b4000, 2, 0, 0, 0) ffffff0011bc0b90 spa_destroy+0x1d(ffffff047f4b4000) ffffff0011bc0bd0 zfs_ioc_pool_destroy+0x20(ffffff047f4b4000) ffffff0011bc0c80 zfsdev_ioctl+0x4d7(11400000000, 5a01, 8040190, 100003, ffffff03e1956b10, ffffff0011bc0e68) ffffff0011bc0cc0 cdev_ioctl+0x39(11400000000, 5a01, 8040190, 100003, ffffff03e1956b10, ffffff0011bc0e68) ffffff0011bc0d10 spec_ioctl+0x60(ffffff03d9153b00, 5a01, 8040190, 100003, ffffff03e1956b10, ffffff0011bc0e68, 0) ffffff0011bc0da0 fop_ioctl+0x55(ffffff03d9153b00, 5a01, 8040190, 100003, ffffff03e1956b10, ffffff0011bc0e68, 0) ffffff0011bc0ec0 ioctl+0x9b(3, 5a01, 8040190) ffffff0011bc0f10 _sys_sysenter_post_swapgs+0x149() Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: George Wilson <george.wilson@delphix.com> MFC after: 2 weeks	2017-05-24 22:25:26 +00:00
Andriy Gapon	5386d7295a	MFV r316917: 7968 multi-threaded spa_sync() illumos/illumos-gate@94c2d0eb22 `94c2d0eb22` https://www.illumos.org/issues/7968 spa_sync() iterates over all the dirty dnodes and processes each of them by calling dnode_sync(). If there are many dirty dnodes (e.g. because we created or removed a lot of files), the single thread of spa_sync() calling dnode_sync() can become a bottleneck. Additionally, if many dnodes are dirtied concurrently in open context (e.g. due to concurrent file creation), the os_lock will experience lock contention via dnode_setdirty(). The solution is to track dirty dnodes on a multilist_t, and for spa_sync() to use separate threads to process each of the sublists in the multilist. On the concurrent file creation microbenchmark, the performance improvement from dnode_setdirty() is up to 7%. Additionally, the wall clock time spent in spa_sync() is reduced to 15%-40% of the single-threaded case. In terms of cost/ reward, once the other bottlenecks are addressed, fixing this bug will provide a medium-large performance gain and require a medium amount of effort to implement. Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Dan McDonald <danmcd@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com> MFC after: 3 weeks	2017-05-24 22:21:24 +00:00

1 2 3 4 5 ...

117043 Commits