freebsd-dev

Author	SHA1	Message	Date
Andriy Gapon	f9cdbaba8d	MFV r318946: 8021 ARC buf data scatter-ization illumos/illumos-gate@770499e185 `770499e185` https://www.illumos.org/issues/8021 The ARC buf data project (known simply as "ABD" since its genesis in the ZoL community) changes the way the ARC allocates `b_pdata` memory from using linear `void *` buffers to using scatter/gather lists of fixed-size 1KB chunks. This improves ZFS's performance by helping to defragment the address space occupied by the ARC, in particular for cases where compressed ARC is enabled. It could also ease future work to allocate pages directly from `segkpm` for minimal- overhead memory allocations, bypassing the `kmem` subsystem. This is essentially the same change as the one which recently landed in ZFS on Linux, although they made some platform-specific changes while adapting this work to their codebase: 1. Implemented the equivalent of the `segkpm` suggestion for future work mentioned above to bypass issues that they've had with the Linux kernel memory allocator. 2. Changed the internal representation of the ABD's scatter/gather list so it could be used to pass I/O directly into Linux block device drivers. (This feature is not available in the illumos block device interface yet.) FreeBSD notes: - the actual (default) chunk size is 4KB (despite the text above saying 1KB) - we can try to reimplement ABDs, so that they are not permanently mapped into the KVA unless explicitly requested, especially on platforms with scarce KVA - we can try to use unmapped I/O and avoid intermediate allocation of a linear, virtual memory mapped buffer - we can try to avoid extra data copying by referring to chunks / pages in the original ABD Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Chris Williamson <chris.williamson@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Dan Kimmel <dan.kimmel@delphix.com> MFC after: 3 weeks	2017-06-20 17:39:24 +00:00
Andriy Gapon	9f141f8d71	MFV r319738: 8155 simplify dmu_write_policy handling of pre-compressed buffers illumos/illumos-gate@adaec86ad2 `adaec86ad2` https://www.illumos.org/issues/8155 When writing pre-compressed buffers, arc_write() requires that the compression algorithm used to compress the buffer matches the compression algorithm requested by the zio_prop_t, which is set by dmu_write_policy(). This makes dmu_write_policy() and its callers a bit more complicated. We can simplify this by making arc_write() trust the caller to supply the type of pre-compressed buffer that it wants to write, and override the compression setting in the zio_prop_t. Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> MFC after: 10 days	2017-06-09 15:26:03 +00:00
Andriy Gapon	31fd119cc2	MFC r316914: 7801 add more by-dnode routines illumos/illumos-gate@b0c42cd470 `b0c42cd470` https://www.illumos.org/issues/7801 Add _by_dnode() routines for accessing objects given their dnode_t , this is more efficient than accessing the object by (objset_t *, uint64_t object). This change converts some but not all of the existing consumers. As performance-sensitive code paths are discovered they should be converted to use these routines. Ported from: `0eef1bde31` Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: bzzz77 <bzzz.tomas@gmail.com> MFC after: 24 days	2017-05-24 21:49:21 +00:00
Josh Paetzel	c78abb8b50	MFV 316894 7252 7628 compressed zfs send / receive illumos/illumos-gate@5602294fda `5602294fda` https://www.illumos.org/issues/7252 This feature includes code to allow a system with compressed ARC enabled to send data in its compressed form straight out of the ARC, and receive data in its compressed form directly into the ARC. https://www.illumos.org/issues/7628 We should have longer, more readable versions of the ZFS send / recv options. 7628 create long versions of ZFS send / receive options Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed by: David Quigley <dpquigl@davequigley.com> Reviewed by: Thomas Caputi <tcaputi@datto.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Dan Kimmel <dan.kimmel@delphix.com>	2017-04-25 17:57:43 +00:00
Josh Paetzel	e106234416	MFV: 315989 7603 xuio_stat_wbuf_* should be declared (void) illumos/illumos-gate@99aa8b5505 `99aa8b5505` https://www.illumos.org/issues/7603 The funcs are declared k&r style, where the args are not specified: void xuio_stat_wbuf_copied(); They should be declared to take no arguments: void xuio_stat_wbuf_copied(void); Need to change both .c and .h. Author: Prashanth Sreenivasa <pks@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Robert Mustacchi <rm@joyent.com> Approved by: Richard Lowe <richlowe@richlowe.net>	2017-03-27 17:27:46 +00:00
Josh Paetzel	f2be81e92c	MFV 312436 6569 large file delete can starve out write ops illumos/illumos-gate@ff5177ee8b `ff5177ee8b` https://www.illumos.org/issues/6569 The core issue I've found is that there is no throttle for how many deletes get assigned to one TXG. As a results when deleting large files we end up filling consecutive TXGs with deletes/frees, then write throttling other (more important) ops. There is an easy test case for this problem. Try deleting several large files (at least 1/2 TB) while you do write ops on the same pool. What we've seen is performance of these write ops (let's call it sideload I/O) would drop to zero. More specifically the problem is that dmu_free_long_range_impl() can/will fill up all of the dirty data in the pool "instantly", before many of the sideload ops can get in. So sideload performance will be impacted until all the files are freed. The solution we have tested at Nexenta (with positive results) creates a relatively simple throttle for how many "free" ops we let into one TXG. However this solution exposes other problems that should also be addressed. If we are to slow down freeing of data that means one has to wait even longer (assuming vnode ref count of 1) to get shell back after an rm or for NFS thread to finish the free-ing op. To avoid this the proposed solution is to call zfs_inactive() async for "large" files. Async freeing then begs for the reclaimed space to be accounted for in the zpool's "freeing" prop. The other issue with having a longer delete is the inability to export/unmount for a longer period of time. The proposed solution is to interrupt freeing of blocks when a fs is unmounted. Author: Alek Pinchuk <alek@nexenta.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Reviewed by: avg Differential Revision: D9008	2017-01-20 15:01:04 +00:00
Alexander Motin	d7e781bda3	MFC r305337: 7004 dmu_tx_hold_zap() does dnode_hold() 7x on same object Using a benchmark which has 32 threads creating 2 million files in the same directory, on a machine with 16 CPU cores, I observed poor performance. I noticed that dmu_tx_hold_zap() was using about 30% of all CPU, and doing dnode_hold() 7 times on the same object (the ZAP object that is being held). dmu_tx_hold_zap() keeps a hold on the dnode_t the entire time it is running, in dmu_tx_hold_t:txh_dnode, so it would be nice to use the dnode_t that we already have in hand, rather than repeatedly calling dnode_hold(). To do this, we need to pass the dnode_t down through all the intermediate calls that dmu_tx_hold_zap() makes, making these routines take the dnode_t* rather than an objset_t* and a uint64_t object number. In particular, the following routines will need to have analogous *_by_dnode() variants created: dmu_buf_hold_noread() dmu_buf_hold() zap_lookup() zap_lookup_norm() zap_count_write() zap_lockdir() zap_count_write() This can improve performance on the benchmark described above by 100%, from 30,000 file creations per second to 60,000. (This improvement is on top of that provided by working around the object allocation issue. Peak performance of ~90,000 creations per second was observed with 8 CPUs; adding CPUs past that decreased performance due to lock contention.) The CPU used by dmu_tx_hold_zap() was reduced by 88%, from 340 CPU-seconds to 40 CPU-seconds. Sponsored by: Intel Corp. Closes #109 Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Ned Bass <bass6@llnl.gov> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Author: Matthew Ahrens <mahrens@delphix.com> openzfs/openzfs@d3e523d489	2016-09-03 11:00:29 +00:00
Alexander Motin	efa0867fb0	MFV r302991: 6950 ARC should cache compressed data illumos/illumos-gate@dcbf3bd6a1 `dcbf3bd6a1` https://www.illumos.org/issues/6950 When reading compressed data from disk, the ARC should keep the compressed block cached and only decompress it when consumers access the block. The uncompressed data should be short-lived allowing the ARC to cache a much larger amount of data. The DMU would also maintain a smaller cache of uncompressed blocks to minimize the impact of decompressing frequently accessed blocks. Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Don Brady <don.brady@intel.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: George Wilson <george.wilson@delphix.com>	2016-09-03 08:30:51 +00:00
Andriy Gapon	fe0cc75230	MFV r302644: 6513 partially filled holes lose birth time illumos/illumos-gate@8df0bcf0df `8df0bcf0df` https://www.illumos.org/issues/6513 If a ZFS object contains a hole at level one, and then a data block is created at level 0 underneath that l1 block, l0 holes will be created. However, these l0 holes do not have the birth time property set; as a result, incremental sends will not send those holes. Fix is to modify the dbuf_read code to fill in birth time data. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Boris Protopopov <bprotopopov@hotmail.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Paul Dagnelie <pcd@delphix.com> MFC after: 3 weeks	2016-07-14 11:48:42 +00:00
Andriy Gapon	c2d36fc5cd	zfs: enable vn_io_fault support Note that now we have to account for possible partial writes in dmu_write_uio_dbuf(). It seems that on illumos either all or none of the data are expected to be written. But the partial writes are quite expected when vn_io_fault support is enabled. Reviewed by: kib MFC after: 7 weeks Differential Revision: https://reviews.freebsd.org/D2790	2016-04-16 07:35:53 +00:00
Alexander Motin	78aec5c610	MFV r297831: 6322 ZFS indirect block predictive prefetch Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Author: Alexander Motin <mav@FreeBSD.org> Improve speculative prefetch of indirect blocks. Scalability of many operations on wide ZFS pool can be limited by requirement to prefetch indirect blocks first. Recently added asynchronous indirect block read partially helped, but did not solve the problem completely. This patch extends existing prefetcher functionality to explicitly work with indirect blocks. Before this change prefetcher issued reads for up to 8MB of data in advance. With this change it also issues indirect block reads for up to 64MB of data in advance, so that when it will be time to actually read those data, it can be done immediately. Alike effect can be achieved by just increasing maximal data prefetch distance, but at higher memory cost. Also this change introduces indirect block prefetch for rewrite operations, that was never done before. Previously ARC miss for Indirect blocks regularly blocked rewrites, converting perfectly aligned asynchronous operations into synchronous read-write pairs, significantly reducing maximal rewrite speed. While being there this issue was also fixed: - prefetch was done always, even if caching for the dataset was completely disabled. Testing on FreeBSD with zvol on top of 6x striped 2x mirrored pool of 12 assorted HDDs shown me such performance numbers: ------- BEFORE -------- Write 491363677 bytes/sec Read 312430631 bytes/sec Rewrite 97680464 bytes/sec -------- AFTER -------- Write 493524146 bytes/sec Read 438598079 bytes/sec Rewrite 277506044 bytes/sec Closes #65 Closes #80 openzfs/openzfs@792fd28ac0	2016-04-11 21:09:15 +00:00
Edward Tomasz Napierala	ae34b6ff96	Add four new RCTL resources - readbps, readiops, writebps and writeiops, for limiting disk (actually filesystem) IO. Note that in some cases these limits are not quite precise. It's ok, as long as it's within some reasonable bounds. Testing - and review of the code, in particular the VFS and VM parts - is very welcome. MFC after: 1 month Relnotes: yes Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5080	2016-04-07 04:23:25 +00:00
Edward Tomasz Napierala	aa9b057c08	Fix ru_oublocks accounting for ZFS. There are two code paths that can be called from zfs_write() - one of them, through dmu_write(), was handled correctly; the other wasn't. Reviewed by: avg@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D4923	2016-01-23 12:13:09 +00:00
Alexander Motin	6b513e2853	MFV r289561: 6328 Fix cstyle errors in zfs codebase Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed by: Jorgen Lundman <lundman@lundman.net> Approved by: Robert Mustacchi <rm@joyent.com> Author: Paul Dagnelie <pcd@delphix.com> illumos/illumos-gate@9a686fbc18	2015-10-19 08:25:37 +00:00
Alexander Motin	43f774f296	MFV r289310: 4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Approved by: Garrett D'Amore <garrett@damore.org> Author: Matthew Ahrens <mahrens@delphix.com> illumos/illumos-gate@45818ee124 This is only a partial merge of respective ZFS infrastructure changes. At this moment FreeBSD kernel has no those crypto algorithms, so the parts of the code to enable them are commented out. When they are implemented, it will be trivial to plug them in.	2015-10-16 14:45:21 +00:00
Alexander Motin	ec5a8cf7c0	Restore original array_rd_sz semantics. Before r278702 prefetch was blocked for I/Os > 1MB, after -- >= 1MB. 1MB I/Os are used for bulk operations in CTL (XCOPY, VERIFY), and disabling prefetch for them reduced the performance. This is temporary local patch, that should be replaced when upstreamed. Discussed with: mahrens MFC after: 3 days	2015-10-03 11:05:58 +00:00
Xin LI	8c4f41ff34	MFV r287624: 5987 zfs prefetch code needs work Rewrite the ZFS prefetch code to detect only forward, sequential streams. The following kstats have been added: kstat.zfs.misc.arcstats.sync_wait_for_async How many sync reads have waited for async read to complete. (less is better) kstat.zfs.misc.arcstats.demand_hit_predictive_prefetch How many demand read didn't have to wait for I/O because of predictive prefetch. (more is better) zfetch kstats have been similified to hits, misses, and max_streams, with max_streams representing times when we were not able to create new stream because we already have the maximum number of sequences for a file. The sysctl variable/loader tunable vfs.zfs.zfetch.block_cap have been replaced by vfs.zfs.zfetch.max_distance, which controls maximum bytes to prefetch per stream. illumos/illumos-gate@cf6106c8a0 Illumos ZFS issues: 5987 zfs prefetch code needs work https://www.illumos.org/issues/5987	2015-09-12 08:35:51 +00:00
Andriy Gapon	9f2d1b28df	MFV (partial) r286889: 5692 expose the number of hole blocks in a file FreeBSD porting notes: - only kernel-side changes are merged - the new ioctl is not actually implemented yet - thus, the goal is to synchronize DMU code illumos/illumos-gate@2bcf0248e9 https://www.illumos.org/issues/5692 we would like to expose the number of hole (sparse) blocks in a file. this can be useful to for example if you want to fill in the holes with some data; knowing the number of holes in advances allows you to report progress on hole filling. We could use SEEK_HOLE to do that but it would be O(n) where n is the number of holes present in the file. Author: Max Grossman <max.grossman@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Boris Protopopov <bprotopopov@hotmail.com> Approved by: Richard Lowe <richlowe@richlowe.net>	2015-08-24 09:48:50 +00:00
Alexander Motin	b696497df0	MFV r286704: 5960 zfs recv should prefetch indirect blocks 5925 zfs receive -o origin= Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Author: Paul Dagnelie <pcd@delphix.com> While running 'zfs recv' we noticed that every 128th 8K block required a read. We were seeing that restore_write() was calling dmu_tx_hold_write() and the indirect block was not cached. We should prefetch upcoming indirect blocks to avoid having to go to disk and blocking the restore_write(). Allow an incremental send stream to be received as a clone, even if the stream does not mark it as a clone.	2015-08-12 22:41:06 +00:00
Alexander Motin	2d41b1006f	MFV r286224: 5695 dmu_sync'ed holes do not retain birth time illumos/illumos-gate@70163ac57e https://www.illumos.org/issues/5695 In dmu_sync_ready(), a hole block pointer will have it's logical size explicitly set as it's necessary for replay purposes. To "undo" this, dmu_sync_done() will zero out any hole that it finds. This becomes a problem when using the "hole_birth" feature, as this will also wipe out any birth time that might have happened to be set on the hole. ... As a fix, the logic to zero out a hole is only applied to old style holes with a birth time of zero. Holes created with the "hole_birth" feature enabled will have a non-zero birth time, and will be skipped (thus preserving the ltime, type, and level information as well). In addition, zdb was updated to also print the ltime, type, and level information for these new style holes. Previously, only the logical birth time would be printed. Author: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com>	2015-08-12 17:21:41 +00:00
Alexander Motin	de8b7ceff1	MFV 286588: 5820 verify failed in zio_done(): BP_EQUAL(bp, io_bp_orig) Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Steven Hartland <killing@multiplay.co.uk> Approved by: Garrett D'Amore <garrett@damore.org> Author: Matthew Ahrens <mahrens@delphix.com> illumod/illumos-gate@34e8acef00	2015-08-10 19:38:07 +00:00
Alexander Motin	4ff9527edc	MFV 286546: 5661 ZFS: "compression = on" should use lz4 if feature is enabled Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Reviewed by: Xin LI <delphij@freebsd.org> Approved by: Robert Mustacchi <rm@joyent.com> Author: Justin T. Gibbs <justing@spectralogic.com> illumos/illumos-gate@db1741f555	2015-08-09 20:02:16 +00:00
Steven Hartland	bc96366c86	Mechanically convert cddl sun #ifdef's to illumos Since the upstream for cddl code is now illumos not sun, mechanically convert all sun #ifdef's to illumos #ifdef's which have been used in all newer code for some time. Also do a manual pass to correct the use if #ifdef comments as per style(9) as well as few uses of #if defined(__FreeBSD__) vs #ifndef illumos. MFC after: 1 month Sponsored by: Multiplay	2015-01-17 14:44:59 +00:00
Xin LI	eba15cf463	MFV r272804: Refactor the code and stop restore_object from creating two transactions. Illumos issue: 3693 restore_object uses at least two transactions to restore an object MFC after: 2 weeks	2014-10-09 07:52:51 +00:00
Xin LI	ce44f14b41	MFV r272803: Illumos issue: 5175 implement dmu_read_uio_dbuf() to improve cached read performance MFC after: 2 weeks	2014-10-09 07:18:40 +00:00
Xin LI	1b5bcb8425	MFV r272591: Use loaned ARC buffer for zfs receive to avoid copy. Illumos issue: 5162 zfs recv should use loaned arc buffer to avoid copy MFC after: 2 weeks	2014-10-06 07:29:17 +00:00
Xin LI	7c1db36b28	MFV r270196: Illumos issue: 5047 don't use atomic_*_nv if you discard the return value MFC after: 2 weeks	2014-08-20 22:39:26 +00:00
Xin LI	fdc0ee2cf5	MFV r268452: Explicitly mark file removal transactions as "presumed to result in a net free of space" so they will not fail with ENOSPC. Illumos issue: 4950 files sometimes can't be removed from a full filesystem MFC after: 2 weeks	2014-07-09 18:32:40 +00:00
Xin LI	9cc8a15b2e	MFV r268121: 4924 LZ4 Compression for metadata illumos/illumos-gate@b8289d24d8 MFC after: 2 weeks	2014-07-01 22:31:09 +00:00
Xin LI	aa882b9048	MFV r268119: 4914 zfs on-disk bookmark structure should be named *_phys_t illumos/illumos-gate@7802d7bf98 MFC after: 2 weeks	2014-07-01 21:51:30 +00:00
Xin LI	29441ba3fa	MFV r267565: 4757 ZFS embedded-data block pointers ("zero block compression") 4913 zfs release should not be subject to space checks MFC after: 2 weeks	2014-07-01 06:43:15 +00:00
Hans Petter Selasky	af3b2549c4	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
Glen Barber	37a107a407	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
Hans Petter Selasky	3da1cf1e88	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
Xin LI	2bdf7f79bc	MFV r266766: Add a new zfs property, "redundant_metadata" which can have values "all" or "most". The default will be "all", which is the current behavior. When set to all, ZFS stores an extra copy of all metadata. If a single on-disk block is corrupt, at worst a single block of user data (which is recordsize bytes long) can be lost. Setting to "most" will cause us to only store 1 copy of level-1 indirect blocks of user data files. This can improve performance of random writes, because less metadata has to be written. In practice, at worst about 100 blocks (of recordsize bytes each) of user data can be lost if a single on-disk block is corrupt. The exact behavior of which metadata blocks are stored redundantly may change in future releases. Illumos issue: 3835 zfs need not store 2 copies of all metadata MFC after: 2 weeks	2014-05-27 19:46:11 +00:00
Xin LI	f4c8ba8370	MFV r259170: 4370 avoid transmitting holes during zfs send 4371 DMU code clean up illumos/illumos-gate@43466aae47 NOTE: Make sure the boot code is updated if a zpool upgrade is done on boot zpool. MFC after: 2 weeks	2014-01-01 00:45:28 +00:00
Andriy Gapon	6c5b7fffce	zfs: add dmu_write_pages variant for freebsd The freebsd variant of dmu_write_pages is hidden under _KERNEL to avoid needlessly pulling in vm_page_t declaration. Besides, this function seems to be useless for ZFS userland counterpart. MFC after: 15 days	2013-11-29 15:34:43 +00:00
Andriy Gapon	2a4704ab01	MFV r255255: 4045 zfs write throttle & i/o scheduler performance work illumos/illumos-gate@69962b5647 Please note the following changes: - zio_ioctl has lost its priority parameter and now TRIM is executed with 'now' priority - some knobs are gone and some new knobs are added; not all of them are exposed as tunables / sysctls yet MFC after: 10 days Sponsored by: HybridCluster [merge]	2013-11-26 09:57:14 +00:00
Andriy Gapon	5d8fac897e	MFV r255257: 4082 zfs receive gets EFBIG from dmu_tx_hold_free() illumos change 14172:be36a38bac3d: illumos ZFS issues: 4082 zfs receive gets EFBIG from dmu_tx_hold_free() Please note that this change is slightly different from r255257, because it is merged out of order with other (larger) upstream changes. PR: kern/182570 Reported by: Keith White <kwhite@site.uottawa.ca> Tested by: Keith White <kwhite@site.uottawa.ca> Approved by: re (glebius) MFC after: 1 week X-MFC after: r254753	2013-10-10 09:53:46 +00:00
Xin LI	253aa02fc3	MFV r254750: Add support of Illumos dumps on zvol over RAID-Z. Note that this only adds the features. FreeBSD would still need more work to support dumping on zvols. Illumos ZFS issues: 2932 support crash dumps to raidz, etc. pools MFC after: 1 month Approved by: re (ZFS blanket)	2013-09-21 00:17:26 +00:00
Xin LI	00e37ef129	MFV r254747: Fix a panic from dbuf_free_range() from dmu_free_object() while doing zfs receive. This is a regression from FreeBSD r253821. Illumos ZFS issues: 4047 panic from dbuf_free_range() from dmu_free_object() while doing zfs receive	2013-08-24 00:19:26 +00:00
Justin T. Gibbs	5119608387	Add kstat entries for ZFS compression statistics. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_compress.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio_compress.c: Add module lifetime functions to allocate and teardown state data. Report: - Compression attempts. - Buffers found to be empty. - Compression calls that are skipped because the data length is already less than or equal to the minimum block length. - Compression attempts that fail to yield a 12.5% compression ratio. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c: Add calls to the zio_compress.c module's init and fini functions. Sponosred by: Spectra Logic Corporation MFC after: 2 weeks	2013-08-21 19:40:43 +00:00
Xin LI	4acaabea05	MFV r251619: ZFS needs better comments. Illumos ZFS issues: 3741 zfs needs better comments MFC after: 2 weeks	2013-06-11 19:02:36 +00:00
Xin LI	ca8a27d4b1	MFV r251474: * Illumos zfs issue #3137 L2ARC compression Whether or not to compress buffers entering the L2ARC is controlled by "compression" setting on the dataset, when compression is not "off", L2ARC compression is enabled. The compress method is always LZ4 for L2ARC when enabled because it works best for the scenario. MFC after: 2 weeks	2013-06-06 23:21:41 +00:00
Martin Matuska	f1b5c26470	MFV r248217: Merge change from vendor to reduce diff only. ZFS dtrace probes are not supported on FreeBSD yet. Illumos ZFS issues: 3598 want to dtrace when errors are generated in zfs MFC after: 3 weeks	2013-04-06 10:39:38 +00:00
Martin Matuska	07091d8f14	MFV r247580: Merge synctask code restructuring from vendor. Modify forward and backward compatibility to support new change. Illumos ZFS issues: 3464 zfs synctask code needs restructuring Sponsored by: Hybrid Logic Ltd.	2013-03-19 12:51:18 +00:00
Martin Matuska	400c4069a5	MFV r247845: Import ZFS bpobj bugfix from vendor. Illumos ZFS issues: 3603 panic from bpobj_enqueue_subobj() 3604 zdb should print bpobjs more verbosely References: https://www.illumos.org/issues/3603 https://www.illumos.org/issues/3604 MFC after: 1 week	2013-03-05 18:54:41 +00:00
Martin Matuska	781c0f87d3	MFV r246653: Import vendor change to avoid "unitialized variable" warnings. Illumos ZFS issues: 3522 zfs module should not allow uninitialized variables References: https://www.illumos.org/issues/3522	2013-02-23 11:21:05 +00:00
Martin Matuska	53e5858c68	Add loader(8) tunable to enable/disable nopwrite functionality: vfs.zfs.nopwrite_enabled MFC after: 2 weeks	2012-11-25 16:54:43 +00:00
Martin Matuska	dd801aa546	MFV r243013 and r243267: Import the zio nop-write improvement from Illumos. To reduce I/O, nop-write omits overwriting data if the checksum (cryptographically secure) of new data matches the checksum of existing data. It also saves space if snapshots are in use. It currently works only on datasets with enabled compression, disabled deduplication and sha256 checksums. IllumOS 13887:196932ec9e6a and 13888:7204b3392a58 3236 zio nop-write References: https://www.illumos.org/issues/3236 MFC after: 2 weeks	2012-11-25 16:32:07 +00:00

1 2

65 Commits