freebsd-dev

Author	SHA1	Message	Date
Luiz Otavio O Souza	6f207f5b47	Add support to the Marvell Xenon SDHCI controller. Tested on Espresso.bin (37x0) and Macchiato.bin (8k) with SD cards and eMMCs. Obtained from: pfSense Sponsored by: Rubicon Communications, LLC (Netgate)	2018-08-14 16:33:30 +00:00
Ruslan Bukin	5bcd113c91	Query MVPConf0.PVPE for number of CPUs. Rather than hard-coding the number of CPUs to 2, look up the PVPE field in MVPConf0, as the valid VPE numbers are from 0 to PVPE inclusive. Submitted by: "James Clarke" <jrtc4@cam.ac.uk> Reviewed by: br Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16644	2018-08-14 16:29:10 +00:00
Konstantin Belousov	ef52dc71eb	Fix typo. Noted by: alc MFC after: 3 days	2018-08-14 16:27:17 +00:00
Ruslan Bukin	b3410bc623	Avoid repeated address calculation for malta_ap_boot. Submitted by: "James Clarke" <jrtc4@cam.ac.uk> Reviewed by: br, arichardson Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16655	2018-08-14 16:26:44 +00:00
Ruslan Bukin	9aa2d5e4fa	Remove unused code. Sponsored by: DARPA, AFRL	2018-08-14 16:22:14 +00:00
Ruslan Bukin	2cfd37def0	Rewrite RISC-V disassembler: - Use macroses from encoding.h generated by riscv-opcodes. - Add support for C-compressed ISA extension. Sponsored by: DARPA, AFRL	2018-08-14 16:03:03 +00:00
Andrew Turner	3f9baabdd0	Remove cpu_pfr from arm. It's unused.	2018-08-14 16:01:25 +00:00
Andrew Turner	52a532939b	Remove an old comment now the code it references has been removed.	2018-08-14 15:48:13 +00:00
Andrew Turner	27e0028cdd	Fix the spelling of armv4_idcache_inv_all in an END macro.	2018-08-14 15:42:27 +00:00
Luiz Otavio O Souza	37844eaacf	Use the correct PTE when changing the attribute of multiple pages. Submitted by: andrew (long time ago) Sponsored by: Rubicon Communications, LLC (Netgate)	2018-08-14 15:27:50 +00:00
Mark Johnston	27f4c235ee	Explain why we aren't using memcpy(). Reported by: jmg X-MFC with: r337715 Sponsored by: The FreeBSD Foundation	2018-08-14 14:50:06 +00:00
Mark Johnston	845800e190	Don't use memcpy() in the early microcode loading code. At some point memcpy() may be an ifunc, ifunc resolution cannot be done until CPU identification has been performed, and CPU identification must be done after loading any microcode updates. X-MFC with: r337715 Sponsored by: The FreeBSD Foundation	2018-08-14 14:02:53 +00:00
Luiz Otavio O Souza	217643e7da	Fix a typo on the PSCI smc call wrapper. Looks good from: andrew Sponsored by: Rubicon Communications, LLC (Netgate)	2018-08-14 13:56:49 +00:00
Mark Johnston	3571aee662	Fix the !SMP x86 build. Reported by: Michael Butler <imb@protected-networks.net> X-MFC with: r337715 Sponsored by: The FreeBSD Foundation	2018-08-14 13:56:42 +00:00
Andrew Turner	398810619c	Support reading from the arm64 ID registers from userspace. Trap reads to the arm64 ID registers and write a safe value into them. This will allow us to put more useful values in these later and have userland check them to find what features the hardware supports. These are currently safe defaults, but will later be populated with better values from the hardware. Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16533	2018-08-14 11:00:54 +00:00
Michael Tuexen	98f04e431e	Use a macro to set the assoc state. I missed this in r337706.	2018-08-14 08:33:47 +00:00
Michael Tuexen	0f1346f7f4	Remove a set but not used warning showing up in usrsctp.	2018-08-14 08:32:33 +00:00
Andrey V. Elsukov	62484790e0	Restore ability to send ICMP and ICMPv6 redirects. It was lost when tryforward appeared. Now ip[6]_tryforward will be enabled only when sending redirects for corresponding IP version is disabled via sysctl. Otherwise will be used default forwarding function. PR: 221137 Submitted by: mckay@ MFC after: 2 weeks	2018-08-14 07:54:14 +00:00
Matt Macy	81eb4dcf9e	Add library and kernel support for AMD Family 17h counters NB: lacks default sample rate for most counters	2018-08-14 05:18:43 +00:00
Ian Lepore	5af4ab6524	Export the eeprom device size via readonly sysctl. Also export the write page size and address size, although they are likely to be inherently less-interesting values outside of the driver.	2018-08-13 23:53:11 +00:00
Brooks Davis	8f4dfca127	Copy out from kernel to data, not the other way around. MFC after: 3 days Sponsored by: DARPA, AFRL	2018-08-13 21:53:18 +00:00
Marius Strobl	73ed47f04f	Remove the duplicated CSUM_IP6_TCP introduced in r311849 from the TX checksum capabilities of IGB-class MACs. While at it, fix the line wrapping. PR: 230571	2018-08-13 20:29:39 +00:00
Warner Losh	acc173a6aa	Port the mps panic-safe shutdown_final handling to mpr r330951 by smh fixed the mps driver to avoid deadlocks when panicing. The same code is needed for mpr, so port it here, along with the fix which allows the CCBs scheduled to complete avoiding at least a scary message and likely other unintended consequences. Sponsored by: Netflix Differential Review: https://reviews.freebsd.org/D16663	2018-08-13 19:59:42 +00:00
Warner Losh	d4b95382ee	Call xpt_sim_poll in shutdown_final handler. When we're shutting down, we send a number of start/stop commands to the known targets. We have to wait for them to complete. During a panic, the interrupts are off, and using pause to wait for them to fire and complete won't work: we have to poll after pause returns so the completion routines of the CCBs run so we decrement work outstanding counts. Sponsored by: Netflix Differential Review: https://reviews.freebsd.org/D16663	2018-08-13 19:59:37 +00:00
Warner Losh	0cc28e3cd5	Create xpt_sim_poll and refactor a bit using it. xpt_sim_poll takes the sim to poll as an argument. It will do the proper locking protocol, call the SIM polling routine, and then call camisr_runqueue to process completions on any CCBs the SIM's poll routine completed. It will be used during late shutdown when a SIM is waiting for CCBs it sent during shutdown to finish and the scheduler isn't running because we've panic'd. This sequence was used twice in cam_xpt, so refactor those to use this new function. Sponsored by: Netflix Differential Review: https://reviews.freebsd.org/D16663	2018-08-13 19:59:32 +00:00
Navdeep Parhar	408954013a	Whitespace nit in t4_tom.h	2018-08-13 19:21:28 +00:00
Vladimir Kondratyev	48f2b00648	evdev: Remove evdev.ko linkage dependency on kbd driver Move evdev_ev_kbd_event() helper from evdev to kbd.c as otherwise evdev unconditionally requires all keyboard and console stuff to be compiled into the kernel. This dependency happens as evdev_ev_kbd_event() helper references kbdsw global variable defined in kbd.c through use of kbdd_ioctl() macro. While here make all keyboard drivers respect evdev_rcpt_mask while setting typematic rate and LEDs with evdev interface. Requested by: Milan Obuch <bsd@dino.sk> Reviewed by: hselasky, gonzo Differential Revision: https://reviews.freebsd.org/D16614	2018-08-13 19:05:53 +00:00
Vladimir Kondratyev	911aed94fa	evdev: remove soft context from evdev methods parameter list. Now softc should be retrieved from struct edvev * pointer with evdev_get_softc() helper. wmt(4) is a sample of driver that support both KPI. Reviewed by: hselasky, gonzo Differential Revision: https://reviews.freebsd.org/D16614	2018-08-13 19:00:42 +00:00
Oleksandr Tymoshenko	b16d03ad6e	[ig4] Fix initialization sequence for newer ig4 chips Newer chips may require assert/deassert after power down for proper startup. Check respective flag in DEVIDLE_CTRL and perform operation if neccesssary. PR: 221777 Submitted by: marc.priggemeyer@gmail.com Obtained from: DragonFly BSD Tested on: Thinkpad T470	2018-08-13 18:53:14 +00:00
Mark Johnston	97edfc1b45	Implement kernel support for early loading of Intel microcode updates. Updates in the format described in section 9.11 of the Intel SDM can now be applied as one of the first steps in booting the kernel. Updates that are loaded this way are automatically re-applied upon exit from ACPI sleep states, in contrast with the existing cpucontrol(8)-based method. For the time being only Intel updates are supported. Microcode update files are passed to the kernel via loader(8). The file type must be "cpu_microcode" in order for the file to be recognized as a candidate microcode update. Updates for multiple CPU types may be concatenated together into a single file, in which case the kernel will select and apply a matching update. Memory used to store the update file will be freed back to the system once the update is applied, so this approach will not consume more memory than required. Reviewed by: kib MFC after: 6 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D16370	2018-08-13 17:13:09 +00:00
Konstantin Belousov	c1344d2bbe	Prevent some parallel swap-ins, rate-limit swapper swap-ins. If faultin() was called outside swapper (from PHOLD()), do not allow swapper to initiate additional swap-ins. Swapper' initiated swap-ins are serialized because they are synchronous and executed in the context of the thread0. With the added limitation, we only allow parallel swap-ins from PHOLD(), which is up to PHOLD() users to manage, usually they do not need to. Rate-limit swapper' swap-ins to one in the MAXSLP / 2 seconds interval, counting faultin() swapins. Suggested by: alc Reviewed by: alc, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D16610	2018-08-13 16:48:46 +00:00
Jung-uk Kim	51f42bad71	Merge ACPICA 20180810.	2018-08-13 16:26:26 +00:00
Ruslan Bukin	c1d0e057d8	Add RISC-V instructions encoding. This is the output of $ cat opcodes opcodes-rvc-pseudo opcodes-rvc opcodes-custom \| ./parse-opcodes -c It is confirmed by author that the output of parse-opcodes is in the public domain. This will be required for DDB disassembler. Discussed with: Andrew Waterman <waterman@eecs.berkeley.edu> Obtained from: https://github.com/riscv/riscv-opcodes Sponsored by: DARPA, AFRL	2018-08-13 16:07:18 +00:00
Andrew Gallatin	5ccac9f972	lagg: allow lacp to manage the link state Lacp needs to manage the link state itself. Unlike other lagg protocols, the ability of lacp to pass traffic depends not only on the lagg members having link, but also on the lacp protocol converging to a distributing state with the link partner. If we prematurely mark the link as up, then we will send a gratuitous arp (via arp_handle_ifllchange()) before the lacp interface is capable of passing traffic. When this happens, the gratuitous arp is lost, and our link partner may cache a stale mac address (eg, when the base mac address for the lagg bundle changes, due to a BIOS change re-ordering NIC unit numbers) Reviewed by: jtl, hselasky Sponsored by: Netflix	2018-08-13 14:13:25 +00:00
Michael Tuexen	839d21d62e	Use the stacb instead of the asoc in state macros. This is not a functional change. Just a preparation for upcoming dtrace state change provider support.	2018-08-13 13:58:45 +00:00
Michael Tuexen	61a2188021	Use consistently the macors to modify the assoc state. No functional change.	2018-08-13 11:56:21 +00:00
Michal Meloun	23242e7a9c	Add USB ID for rebranded RTL8153 found on NVIDIA Jetson TX1 board. MFC after: 3 days	2018-08-13 07:28:25 +00:00
Emmanuel Vadot	2421576ca3	Import DTS files from Linux 4.18	2018-08-13 06:40:20 +00:00
Matt Macy	20a3cbe1f8	fix static ZFS linking Static linking of ZFS is a newish option and LINT doesn't include it	2018-08-12 21:04:53 +00:00
Justin Hibbits	54318d2a6a	ipmi/opal: Enable polled mode and proper callback Fix a NULL dereference that would occur any time an ioctl() was done, due to a missing ipmi_enqueue_request callback. Just use the default for now, until we decide to properly enable IPMI interrupts. Reported by: kbowling	2018-08-12 20:33:55 +00:00
Michael Tuexen	812649d86f	Add explicit cast to silence a warning for the userland stack. Thanks to Felix Weinrank for providing the patch.	2018-08-12 14:05:15 +00:00
Navdeep Parhar	4a89444d7e	Remove unused stuff from iw_cxgbe.h	2018-08-12 03:36:09 +00:00
Matt Macy	fb8f55f586	MFV/ZoL: Add dbuf hash and dbuf cache kstats TODO: KSTAT_TYPE_NAMED support commit `5e021f56d3` Author: Giuseppe Di Natale <dinatale2@users.noreply.github.com> Date: Mon Jan 29 10:24:52 2018 -0800 Add dbuf hash and dbuf cache kstats Introduce kstats about the dbuf hash and dbuf cache to make it easier to inspect state. This should help with debugging and understanding of these portions of the codebase. Correct format of dbuf kstat file. Introduce a dbc column to dbufs kstat to indicate if a dbuf is in the dbuf cache. Introduce field filtering in the dbufstat python script. Introduce a no header option to the dbufstat python script. Introduce a test case to test basic mru->mfu list movement in the ARC. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov> Closes #6906	2018-08-12 03:15:30 +00:00
Matt Macy	13ae5c6ba8	MFV/ZoL: Fix stack dbuf_hold_impl() commit `fc5bb51f08` Author: Brian Behlendorf <behlendorf1@llnl.gov> Date: Thu Aug 26 10:52:00 2010 -0700 Fix stack dbuf_hold_impl() This commit preserves the recursive function dbuf_hold_impl() but moves the local variables and function arguments to the heap to minimize the stack frame size. Enough space is initially allocated on the stack for 20 levels of recursion. This technique was based on commit 34229a2f2ac07363f64ddd63e014964fff2f0671 which reduced stack usage of traverse_visitbp(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2018-08-12 02:24:18 +00:00
Matt Macy	6e3d1345d9	fix build DN_MAX_BONUSLEN -> DN_OLD_MAX_BONUSLEN	2018-08-12 02:12:44 +00:00
Matt Macy	0f5add2566	Restore legacy dnode_phys layout on tier 2 arches Evidently gcc4 doesn't support anonymous union members	2018-08-12 02:09:06 +00:00
Matt Macy	104ed324dd	MFV/ZoL: Fix stack noinline commit `60948de1ef` Author: Brian Behlendorf <behlendorf1@llnl.gov> Date: Thu Aug 26 10:58:36 2010 -0700 Fix stack noinline Certain function must never be automatically inlined by gcc because they are stack heavy or called recursively. This patch flags all such functions I've found as 'noinline' to prevent gcc from making the optimization. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2018-08-12 01:29:30 +00:00
Matt Macy	71d48dbda3	MFV/ZoL: Fix PANIC: metaslab_free_dva(): bad DVA X:Y:Z commit `81edd3e834` Author: Peng <peng.hse@xtaotech.com> Date: Wed Jun 8 15:22:07 2016 +0800 Fix PANIC: metaslab_free_dva(): bad DVA X:Y:Z The following scenario can result in garbage in the dn_spill field. The db->db_blkptr must be set to NULL when DNODE_FLAG_SPILL_BLKPTR is clear to ensure the dn_spill field is cleared. Current txg = A. * A new spill buffer is created. Its dbuf is initialized with db_blkptr = NULL and it's dirtied. Current txg = B. * The spill buffer is modified. It's marked as dirty in this txg. * Additional changes make the spill buffer unnecessary because the xattr fits into the bonus buffer, so it's removed. The dbuf is undirtied in this txg, but it's still referenced and cannot be destroyed. Current txg = C. * Starts syncing of txg A * dbuf_sync_leaf() is called for the spill buffer. Since db_blkptr is NULL, dbuf_check_blkptr() is called. * The dbuf starts being written and it reaches the ready state (not done yet). * A new change makes the spill buffer necessary again. sa_build_layouts() ends up calling dbuf_find() to locate the dbuf. It finds the old dbuf because it has not been destroyed yet (it will be destroyed when the previous write is done and there are no more references). The old dbuf has db_blkptr != NULL. * txg A write is complete and the dbuf released. However it's still referenced, so it's not destroyed. Current txg = D. * Starts syncing of txg B * dbuf_sync_leaf() is called for the bonus buffer. Its contents are directly copied into the dnode, overwriting the blkptr area because, in txg B, the bonus buffer was big enough to hold the entire xattr. * At this point, the db_blkptr of the spill buffer used in txg C gets corrupted. Signed-off-by: Peng <peng.hse@xtaotech.com> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3937	2018-08-12 01:17:32 +00:00
Matt Macy	6f06a36d47	MFV/ZoL: add dbuf stats NB: disabled pending the addition of KSTAT_TYPE_RAW support to the SPL commit `e0b0ca983d` Author: Brian Behlendorf <behlendorf1@llnl.gov> Date: Wed Oct 2 17:11:19 2013 -0700 Add visibility in to cached dbufs Currently there is no mechanism to inspect which dbufs are being cached by the system. There are some coarse counters in arcstats by they only give a rough idea of what's being cached. This patch aims to improve the current situation by adding a new dbufs kstat. When read this new kstat will walk all cached dbufs linked in to the dbuf_hash. For each dbuf it will dump detailed information about the buffer. It will also dump additional information about the referenced arc buffer and its related dnode. This provides a more complete view in to exactly what is being cached. With this generic infrastructure in place utilities can be written to post-process the data to understand exactly how the caching is working. For example, the data could be processed to show a list of all cached dnodes and how much space they're consuming. Or a similar list could be generated based on dnode type. Many other ways to interpret the data exist based on what kinds of questions you're trying to answer. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <surya1@llnl.gov>	2018-08-12 01:10:18 +00:00
Matt Macy	cc0fbbb92e	MFV/ZoL: Implement large_dnode pool feature commit `50c957f702` Author: Ned Bass <bass6@llnl.gov> Date: Wed Mar 16 18:25:34 2016 -0700 Implement large_dnode pool feature Justification ------------- This feature adds support for variable length dnodes. Our motivation is to eliminate the overhead associated with using spill blocks. Spill blocks are used to store system attribute data (i.e. file metadata) that does not fit in the dnode's bonus buffer. By allowing a larger bonus buffer area the use of a spill block can be avoided. Spill blocks potentially incur an additional read I/O for every dnode in a dnode block. As a worst case example, reading 32 dnodes from a 16k dnode block and all of the spill blocks could issue 33 separate reads. Now suppose those dnodes have size 1024 and therefore don't need spill blocks. Then the worst case number of blocks read is reduced to from 33 to two--one per dnode block. In practice spill blocks may tend to be co-located on disk with the dnode blocks so the reduction in I/O would not be this drastic. In a badly fragmented pool, however, the improvement could be significant. ZFS-on-Linux systems that make heavy use of extended attributes would benefit from this feature. In particular, ZFS-on-Linux supports the xattr=sa dataset property which allows file extended attribute data to be stored in the dnode bonus buffer as an alternative to the traditional directory-based format. Workloads such as SELinux and the Lustre distributed filesystem often store enough xattr data to force spill bocks when xattr=sa is in effect. Large dnodes may therefore provide a performance benefit to such systems. Other use cases that may benefit from this feature include files with large ACLs and symbolic links with long target names. Furthermore, this feature may be desirable on other platforms in case future applications or features are developed that could make use of a larger bonus buffer area. Implementation -------------- The size of a dnode may be a multiple of 512 bytes up to the size of a dnode block (currently 16384 bytes). A dn_extra_slots field was added to the current on-disk dnode_phys_t structure to describe the size of the physical dnode on disk. The 8 bits for this field were taken from the zero filled dn_pad2 field. The field represents how many "extra" dnode_phys_t slots a dnode consumes in its dnode block. This convention results in a value of 0 for 512 byte dnodes which preserves on-disk format compatibility with older software. Similarly, the in-memory dnode_t structure has a new dn_num_slots field to represent the total number of dnode_phys_t slots consumed on disk. Thus dn->dn_num_slots is 1 greater than the corresponding dnp->dn_extra_slots. This difference in convention was adopted because, unlike on-disk structures, backward compatibility is not a concern for in-memory objects, so we used a more natural way to represent size for a dnode_t. The default size for newly created dnodes is determined by the value of a new "dnodesize" dataset property. By default the property is set to "legacy" which is compatible with older software. Setting the property to "auto" will allow the filesystem to choose the most suitable dnode size. Currently this just sets the default dnode size to 1k, but future code improvements could dynamically choose a size based on observed workload patterns. Dnodes of varying sizes can coexist within the same dataset and even within the same dnode block. For example, to enable automatically-sized dnodes, run # zfs set dnodesize=auto tank/fish The user can also specify literal values for the dnodesize property. These are currently limited to powers of two from 1k to 16k. The power-of-2 limitation is only for simplicity of the user interface. Internally the implementation can handle any multiple of 512 up to 16k, and consumers of the DMU API can specify any legal dnode value. The size of a new dnode is determined at object allocation time and stored as a new field in the znode in-memory structure. New DMU interfaces are added to allow the consumer to specify the dnode size that a newly allocated object should use. Existing interfaces are unchanged to avoid having to update every call site and to preserve compatibility with external consumers such as Lustre. The new interfaces names are given below. The versions of these functions that don't take a dnodesize parameter now just call the _dnsize() versions with a dnodesize of 0, which means use the legacy dnode size. New DMU interfaces: dmu_object_alloc_dnsize() dmu_object_claim_dnsize() dmu_object_reclaim_dnsize() New ZAP interfaces: zap_create_dnsize() zap_create_norm_dnsize() zap_create_flags_dnsize() zap_create_claim_norm_dnsize() zap_create_link_dnsize() The constant DN_MAX_BONUSLEN is renamed to DN_OLD_MAX_BONUSLEN. The spa_maxdnodesize() function should be used to determine the maximum bonus length for a pool. These are a few noteworthy changes to key functions: * The prototype for dnode_hold_impl() now takes a "slots" parameter. When the DNODE_MUST_BE_FREE flag is set, this parameter is used to ensure the hole at the specified object offset is large enough to hold the dnode being created. The slots parameter is also used to ensure a dnode does not span multiple dnode blocks. In both of these cases, if a failure occurs, ENOSPC is returned. Keep in mind, these failure cases are only possible when using DNODE_MUST_BE_FREE. If the DNODE_MUST_BE_ALLOCATED flag is set, "slots" must be 0. dnode_hold_impl() will check if the requested dnode is already consumed as an extra dnode slot by an large dnode, in which case it returns ENOENT. * The function dmu_object_alloc() advances to the next dnode block if dnode_hold_impl() returns an error for a requested object. This is because the beginning of the next dnode block is the only location it can safely assume to either be a hole or a valid starting point for a dnode. * dnode_next_offset_level() and other functions that iterate through dnode blocks may no longer use a simple array indexing scheme. These now use the current dnode's dn_num_slots field to advance to the next dnode in the block. This is to ensure we properly skip the current dnode's bonus area and don't interpret it as a valid dnode. zdb --- The zdb command was updated to display a dnode's size under the "dnsize" column when the object is dumped. For ZIL create log records, zdb will now display the slot count for the object. ztest ----- Ztest chooses a random dnodesize for every newly created object. The random distribution is more heavily weighted toward small dnodes to better simulate real-world datasets. Unused bonus buffer space is filled with non-zero values computed from the object number, dataset id, offset, and generation number. This helps ensure that the dnode traversal code properly skips the interior regions of large dnodes, and that these interior regions are not overwritten by data belonging to other dnodes. A new test visits each object in a dataset. It verifies that the actual dnode size matches what was stored in the ztest block tag when it was created. It also verifies that the unused bonus buffer space is filled with the expected data patterns. ZFS Test Suite -------------- Added six new large dnode-specific tests, and integrated the dnodesize property into existing tests for zfs allow and send/recv. Send/Receive ------------ ZFS send streams for datasets containing large dnodes cannot be received on pools that don't support the large_dnode feature. A send stream with large dnodes sets a DMU_BACKUP_FEATURE_LARGE_DNODE flag which will be unrecognized by an incompatible receiving pool so that the zfs receive will fail gracefully. While not implemented here, it may be possible to generate a backward-compatible send stream from a dataset containing large dnodes. The implementation may be tricky, however, because the send object record for a large dnode would need to be resized to a 512 byte dnode, possibly kicking in a spill block in the process. This means we would need to construct a new SA layout and possibly register it in the SA layout object. The SA layout is normally just sent as an ordinary object record. But if we are constructing new layouts while generating the send stream we'd have to build the SA layout object dynamically and send it at the end of the stream. For sending and receiving between pools that do support large dnodes, the drr_object send record type is extended with a new field to store the dnode slot count. This field was repurposed from unused padding in the structure. ZIL Replay ---------- The dnode slot count is stored in the uppermost 8 bits of the lr_foid field. The bits were unused as the object id is currently capped at 48 bits. Resizing Dnodes --------------- It should be possible to resize a dnode when it is dirtied if the current dnodesize dataset property differs from the dnode's size, but this functionality is not currently implemented. Clearly a dnode can only grow if there are sufficient contiguous unused slots in the dnode block, but it should always be possible to shrink a dnode. Growing dnodes may be useful to reduce fragmentation in a pool with many spill blocks in use. Shrinking dnodes may be useful to allow sending a dataset to a pool that doesn't support the large_dnode feature. Feature Reference Counting -------------------------- The reference count for the large_dnode pool feature tracks the number of datasets that have ever contained a dnode of size larger than 512 bytes. The first time a large dnode is created in a dataset the dataset is converted to an extensible dataset. This is a one-way operation and the only way to decrement the feature count is to destroy the dataset, even if the dataset no longer contains any large dnodes. The complexity of reference counting on a per-dnode basis was too high, so we chose to track it on a per-dataset basis similarly to the large_block feature. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3542	2018-08-12 00:45:53 +00:00

1 2 3 4 5 ...

123607 Commits