freebsd-dev

Author	SHA1	Message	Date
Allan Jude	57e9b361ba	Fix typo in r348068	2019-05-21 21:39:03 +00:00
Allan Jude	4e7d8292ec	ZFS: Make deadman tunables no longer read-only This allows the user to enable, disable, and adjust the I/O deadman at runtime. This can be especially useful when a pool is backed by remote storage (such as iscsi, ggated, etc). PR: 221906 Submitted by: Fabian Keil <fk@fabiankeil.de> Obtained from: ElectroBSD MFC after: 1 week Sponsored by: Klara Systems Event: Waterloo Hackathon 2019	2019-05-21 21:26:18 +00:00
Alexander Motin	7763842174	Add mutex_destroy() missed in r334844. MFC after: 1 week Sponsored by: iXsystems, Inc.	2019-04-26 19:02:21 +00:00
Alexander Motin	32d8034f77	Fix minor mismerges. No functional change. MFC after: 1 week	2019-04-26 18:25:59 +00:00
Alexander Motin	48ecceba1e	Change the way FreeBSD GID inheritance is hacked. I believe previous ifdef caused NULL dereference in later zfs_log_create() on attempt to create file inside directory belonging to ephemeral group created on illumos, trying to write to log information about GID domain of the newly created file, inheriting the ephemeral GID. This patch reuses original illumos SGID code with exception that due to lack of ID mapping code on FreeBSD ephemeral GID will turn into GID_NOBODY by another ifdef inside zfs_fuid_map_id(). MFC after: 1 month Sponsored by: iXsystems, Inc.	2019-04-19 15:44:45 +00:00
Conrad Meyer	a8a16c7128	Replace read_random(9) with more appropriate arc4rand(9) KPIs Reviewed by: ae, delphij Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D19760	2019-04-04 01:02:50 +00:00
Pawel Jakub Dawidek	af4f9e5f00	If the autoexpand pool property is turned on and vdev is healthy try to expand the pool automatically when we detect underlying GEOM provider size change. Obtained from: Fudo Security Tested in: AWS	2019-03-30 07:29:20 +00:00
Andriy Gapon	76a63ee510	Revert r345410, VOP_FSYNC change in ZFS vdev_file I overlooked the fact that that VOP_FSYNC() call is not a FreeBSD VFS call, but a macro that provides an illumos-compatible wrapper for the FreeBSD operation. PR: 236475 Reported by: lwhsu Pointyhat to: avg	2019-03-22 17:44:47 +00:00
Andriy Gapon	c1581f4f4d	ZFS vdev_file: use correct value for waitfor parameter of VOP_FSYNC PR: 236475 Reported by: asomers MFC after: 2 weeks	2019-03-22 09:11:45 +00:00
Alexander Motin	6bb46107d8	MFV r336930: 9284 arc_reclaim_thread has 2 jobs `arc_reclaim_thread()` calls `arc_adjust()` after calling `arc_kmem_reap_now()`; `arc_adjust()` signals `arc_get_data_buf()` to indicate that we may no longer be `arc_is_overflowing()`. The problem is, `arc_kmem_reap_now()` can take several seconds to complete, has no impact on `arc_is_overflowing()`, but due to how the code is structured, can impact how long the ARC will remain in the `arc_is_overflowing()` state. The fix is to use seperate threads to: 1. keep `arc_size` under `arc_c`, by calling `arc_adjust()`, which improves `arc_is_overflowing()` 2. keep enough free memory in the system, by calling `arc_kmem_reap_now()` plus `arc_shrink()`, which improves `arc_available_memory()`. illumos/illumos-gate@de753e34f9 Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Dan McDonald <danmcd@joyent.com> Reviewed by: Tim Kordas <tim.kordas@joyent.com> Approved by: Garrett D'Amore <garrett@damore.org> Author: Brad Lewis <brad.lewis@delphix.com>	2019-03-15 18:59:04 +00:00
Simon J. Gerraty	f5fdf82d82	Add _PC_ACL_* to vop_stdpathconf This avoid EINVAL from tmpfs etc. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D19512	2019-03-11 20:40:56 +00:00
Alexander Motin	aa8676f25d	Revert minor part of r344934. I tried to save some CPU time on hopeless aggregation attempts, but it seems the condition I added is overly strict, blocking also aggregation of optional I/Os in cases which previously were possible. Revert just to be safe. MFC after: 1 month	2019-03-11 17:39:09 +00:00
Alexander Motin	5ca679e3c4	MFV/ZoL: Disable LBA weighting on files and SSDs The LBA weighting makes sense on rotational media where the outer tracks have twice the bandwidth of the inner tracks. However, it is detrimental on nonrotational media such as solid state disks, where the only effect is to ensure that metaslabs enter the best-fit allocation behavior sooner, which is detrimental to performance. It also makes no sense on files where the underlying filesystem can arrange things however it wants. Author: Richard Yao <ryao@gentoo.org> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3712 zfsonlinux/zfs@fb40095f5f To reduce code divergence this merge replaces equivalent but different FreeBSD code detecting non-rotating medium vdevs. MFC after: 1 month	2019-03-08 21:13:45 +00:00
Alexander Motin	673544c3dd	Add separate aggregation limit for non-rotating media. Before sequential scrub patches ZFS never aggregated I/Os above 128KB. Sequential scrub bumped that to 1MB, which motivation I understand for spinning disks, since it should reduce number of head seeks. But for SSDs it makes much less sense to me, especially on FreeBSD, where due to MAXPHYS limitation device will likely still see bunch of 128KB I/Os instead of one large. Having more strict aggregation limit allows to avoid allocation of large memory buffer and memcpy to/from it, that is a serious problem when bandwidth reaches few GB/s. MFC after: 1 month Sponsored by: iXsystems, Inc.	2019-03-08 19:38:52 +00:00
Alexander Motin	3a3ba532e7	MFV/ZoL: Fix zfs_vdev_aggregation_limit bounds checking Update the bounds checking for zfs_vdev_aggregation_limit so that it has a floor of zero and a maximum value of the supported block size for the pool. Additionally add an early return when zfs_vdev_aggregation_limit equals zero to disable aggregation. For very fast solid state or memory devices it may be more expensive to perform the aggregation than to issue the IO immediately. Author: Brian Behlendorf <behlendorf1@llnl.gov> zfsonlinux/zfs@a58df6f536 MFV/ZoL: Cap maximum aggregate IO size Commit `8542ef8` allowed optional IOs to be aggregated beyond the specified aggregation limit. Since the aggregation limit was also used to enforce the maximum block size, setting `zfs_vdev_aggregation_limit=16777216` could result in an attempt to allocate an ABD larger than 16M. Author: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #6259 Closes #6270 zfsonlinux/zfs@2d678f779a	2019-03-08 18:49:27 +00:00
Alexander Motin	ede8782611	Improve entropy for ZFS taskqueue selection. I just found that at least on Skylake CPUs cpu_ticks() never returns odd values, only even, and possibly has even bigger step (176/2?), that makes its lower bits very bad entropy source, leaving half of taskqueues unused. Switch to sbinuptime(), closer to upstreams, mitigates the problem by the rate conversion working as kind of hash function. In case that is somehow not enough (timer rate is too low or too divisible) mix in curcpu. MFC after: 1 week	2019-03-07 22:56:39 +00:00
Alexander Motin	551b7d3a29	Add respective tunables to few ZFS sysctls. MFC after: 1 week	2019-03-07 01:24:08 +00:00
Pawel Jakub Dawidek	b8da50d526	Improve readability of the code by making it explicit where the 'c' variable starts. It is also more consistent with similar code in this file.	2019-03-01 05:54:13 +00:00
Mark Johnston	8e7127fd91	Fix fasttrap_sig{trap,segv}(). - Don't leak the ksiginfo structure. - Hold the proc lock when sending a signal in fasttrap_sigsegv(). MFC after: 1 week Sponsored by: The FreeBSD Foundation	2019-02-26 18:20:41 +00:00
Mark Johnston	5563c675b3	Revert r344587. The fasttrap_isa.h header is needed by libdtrace, not just the kernel.	2019-02-26 17:33:56 +00:00
Mark Johnston	df59ed0787	Remove illumos-specific code from the x86 fasttrap_isa.c. The file has not been touched upstream in over a decade, and the nature of the code means that a lot of FreeBSD-specific bits are required. Remove the dead code to improve readability. No functional change intended. Discussed with: cem MFC after: 1 week Sponsored by: The FreeBSD Foundation	2019-02-26 16:34:43 +00:00
Mark Johnston	6829dae12b	Remove stub fasttrap implementations. No platforms except i386, amd64 and powerpc implement fasttrap; the fasttrap files for other arches do not contain any code and bloat the output from cscope, so just remove them. MFC after: 1 week	2019-02-26 16:31:47 +00:00
Mark Johnston	f23e684bbf	Commit a missing piece of r344452. MFC with: r344452	2019-02-21 22:56:54 +00:00
Mark Johnston	4f1b715c84	Fix a tracepoint lookup race in fasttrap_pid_probe(). fasttrap hooks the userspace breakpoint handler; the hook looks up the breakpoint address in a hash table of tracepoints. It is possible for the tracepoint to be removed by a different thread in between the breakpoint trap and the hash table lookup, in which case SIGTRAP gets delivered to the target process. Fix the problem by adding a per-process generation counter that gets incremented when a tracepoint belonging to that process is removed. Then, when a lookup fails, the trapping instruction is restarted if the thread's counter doesn't match that of the process. Reviewed by: cem MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19273	2019-02-21 22:54:17 +00:00
Pawel Jakub Dawidek	2691ae3230	Simplify the code. No functional changes. Reviewed by: rpokala	2019-02-20 00:25:45 +00:00
Pawel Jakub Dawidek	91853b8546	Simplify the code.	2019-02-19 23:53:33 +00:00
Pawel Jakub Dawidek	01e21ead90	Correct typo in the comment.	2019-02-19 23:44:00 +00:00
Pawel Jakub Dawidek	99ab63b69d	Change assertion to log the incorrect io_type we've got.	2019-02-19 23:43:15 +00:00
Pawel Jakub Dawidek	36d43b5dfe	Grabage-collect no longer used variable.	2019-02-19 23:41:23 +00:00
Pawel Jakub Dawidek	11c8759337	The way ZFS searches for its vdevs is the following: first it looks for a vdev that has the same name as the one stored in metadata and that has all VDEV labels in place. If it cannot find a GEOM provider with the given name and all VDEV labels it will scan all GEOM providers for the best match (the most VDEV labels available), but here the name is ignored. In case the ZFS pool is created, eg. using GPT partition label: # zpool create tank /dev/gpt/tank everything works, and on every import ZFS will pick /dev/gpt/tank and not /dev/da0p4. The problem occurs when da0p4 is extended and ZFS is unable to find all VDEV labels in /dev/gpt/tank anymore (the VDEV labels stored at the end of the partition are now somewhere else). In this case it will scan all GEOM providers and will pick the first one with the best match, ie. da0p4. Fix this problem by checking the VDEV/provider name even if we get the same match. If the name is the same as the one we have in pool's metadata, prefer this GEOM provider. Reported by: oshogbo, Michal Mroz <m.mroz@fudosecurity.com> Tested by: Michal Mroz <m.mroz@fudosecurity.com> Obtained from: Fudo Security	2019-02-19 23:35:55 +00:00
Pawel Jakub Dawidek	d793cf7019	In the vdev_geom_open_by_path() function we assume that vdev path starts with "/dev/". Make sure this is the case.	2019-02-19 23:22:39 +00:00
Alexander Motin	ed0a3e8637	s/Maximal/Maximum/ in sysctl description. Submitted by: smh MFC after: 1 week	2019-02-04 20:09:22 +00:00
Alexander Motin	ef08154150	Add missed tunables/sysctls for some new vdev variables. While there, make few existing sysctls writeable, since there is no reason not to. MFC after: 1 week	2019-02-04 16:13:41 +00:00
Alexander Motin	54cde30f92	Remove BIO_ORDERED flag from BIO_FLUSH sent by ZFS. In all cases where ZFS sends BIO_FLUSH, it first waits for all related writes to complete, so its BIO_FLUSH does not care about strict ordering. Removal of one makes life much easier at least for NVMe driver, which hardware has no concept of request ordering, relying completely on software. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2019-01-30 17:39:44 +00:00
Mariusz Zaborski	db009dddfd	zfs: allow to change cache flush sysctl There is no reason for this variable to be tunable. This variable is used as a barrier in few places. Discussed with: pjd MFC after: 2 weeks Sponsored by: Fudo Security	2019-01-26 13:53:00 +00:00
Sean Eric Fagan	82e20c0a72	Change ZFS quotas to return EINVAL when not present (matches man page). UFS will return EINVAL when quotas are not enabled on a filesystem; ZFS' equivalent involves not having quotas (there is not way to enable or disable quotas as such). My initial implementation had it return ENOENT, but quotactl(2) indicates EINVAL is more appropriate. MFC after: 2 weeks Approved by: mav Reviewed by: markj Reported by: Emrion <kmachine@free.fr> Sponsored by: iXsystems Inc PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=234413	2019-01-11 02:53:46 +00:00
Andriy Gapon	4c325393f3	MFV r342532: 5882 Temporary pool names Note that this commit brings only formatting changes that were done during the final review of the illumos change, because FreeBSD got the main changes before illumos. illumos/illumos-gate@04e5635652 `04e5635652` https://www.illumos.org/issues/5882 This is an import of the temporary pool names functionality from ZoL: `e2282ef57e` `26b42f3f9d` `2f3ec90061` `00d2a8c92f` `83e9986f6e` `023bbe6f01` It is intended to assist the creation and management of virtual machines that have their rootfs on ZFS on hosts that also have their rootfs on ZFS. These situations cause SPA namespace collisions when the standard name rpool is used in both cases. The solution is either to give each guest pool a name unique to the host, which is not always desireable, or boot a VM environment containing an ISO image to install it, which is cumbersome. MFC after: 1 week Sponsored by: Panzura	2018-12-26 11:03:14 +00:00
Andriy Gapon	f050611e7f	MFV r342469: 9630 add lzc_rename and lzc_destroy to libzfs_core illumos/illumos-gate@049ba636fa `049ba636fa` https://www.illumos.org/issues/9630 Rename and destroy are very useful operations that deserve to be in libzfs_core. And they are not hard to implement too. MFC after: 2 weeks Relnotes: maybe	2018-12-26 10:37:41 +00:00
Mateusz Guzik	cc426dd319	Remove unused argument to priv_check_cred. Patch mostly generated with cocinnelle: @@ expression E1,E2; @@ - priv_check_cred(E1,E2,0) + priv_check_cred(E1,E2) Sponsored by: The FreeBSD Foundation	2018-12-11 19:32:16 +00:00
Toomas Soome	7aaf685ba7	zfs: we can boot from dataset with large_dnode enabled loader has been supporting large_dnode for some time, no need to block the feature for boot dataset. Reviewed by: avg MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D18391	2018-12-03 19:35:21 +00:00
Mark Johnston	6d2e2df764	Ensure that directory entry padding bytes are zeroed. Directory entries must be padded to maintain alignment; in many filesystems the padding was not initialized, resulting in stack memory being copied out to userspace. With the ino64 work there are also some explicit pad fields in struct dirent. Add a subroutine to clear these bytes and use it in the in-tree filesystems. The NFS client is omitted for now as it was fixed separately in r340787. Reported by: Thomas Barabosch, Fraunhofer FKIE Reviewed by: kib MFC after: 3 days Sponsored by: The FreeBSD Foundation	2018-11-23 22:24:59 +00:00
Alexander Motin	eecd0a1856	Revert r340096: 9952 Block size change during zfs receive drops spill block It was reported, and I easily reproduced it, that this change triggers panic when receiving replication stream with enabled embedded blocks, when short file compressing into one embedded block changes its block size. I am not sure that the problem is in this particuler patch, not just triggered by it, but since investigation and fix will take some time, I've decided to revert this for now. PR: 198457, 233277	2018-11-21 18:18:57 +00:00
Justin Hibbits	cfebc0faa7	DTrace/powerpc: Fix FBT return probes The FBT fuction boundary prober was setting one return probe marker value, but the dtrace handler was expecting another. This causes a hang when tracing return probes.	2018-11-21 16:47:11 +00:00
Konstantin Belousov	1c4ca77890	Add d_off support for multiple filesystems. The d_off field has been added to the dirent structure recently. Currently filesystems don't support this feature. Support has been added and tested for zfs, ufs, ext2fs, fdescfs, msdosfs and unionfs. A stub implementation is available for cd9660, nandfs, udf and pseudofs but hasn't been tested. Motivation for this feature: our usecase is for a userspace nfs server (nfs-ganesha) with zfs. At the moment we cache direntry offsets by calling lseek once per entry, with this patch we can get the offset directly from getdirentries(2) calls which provides a significant speedup. Submitted by: Jack Halford <jack@gandi.net> Reviewed by: mckusick, pfg, rmacklem (previous versions) Sponsored by: Gandi.net MFC after: 1 week Differential revision: https://reviews.freebsd.org/D17917	2018-11-14 14:18:35 +00:00
Alexander Motin	1fcdb58634	Do not ignore arc_adjust() return value. This covers scenario when ARC may not shrink as fast as it could: 1. arc_size < arc_c and arc_adjust() does not evict anything, returning zero to arc_reclaim_thread(); 2. arc_available_memory() reports memory pressure, which can not be satisfied by arc_kmem_reap_now(); 3. arc_shrink() reduces arc_c and calls arc_adjust(), return of which is ignored; 4. even if the last arc_adjust() could not satisfy arc_size < arc_c, arc_reclaim_thread() will still go to sleep, since the first one returned zero. Reviewed by: allanjude, markj, sef MFC after: 2 weeks Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D17927	2018-11-10 01:58:37 +00:00
Alexander Motin	b4d66a1739	9952 Block size change during zfs receive drops spill block Replication code in receive_object() falsely assumes that if received object block size is different from local, then it must be a new object and calls dmu_object_reclaim() to wipe it out. In most cases it is not a problem, since all dnode, bonus buffer and data block(s) are immediately rewritten any way, but the problem is that spill block (if used) is not. This means loss of ACLs, extended attributes, etc. This issue can be triggered in very simple way: 1. create 4KB file with 10+ ACL entries; 2. take snapshot and send it to different dataset; 3. append another 4KB to the file; 4. take another snapshot and send incrementally; 5. witness ACL loss on receive side. PR: 198457 Discussed with: mahrens MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2018-11-03 03:10:06 +00:00
Brooks Davis	1493c2ee62	Make vop_symlink take a const target path. This will enable callers to take const paths as part of syscall decleration improvements. Where doing so is easy and non-distruptive carry the const through implementations. In UFS the value is passed to an interface that must take non-const values. In ZFS, const poisoning would touch code shared with upstream and it's not worth adding diffs. Bump __FreeBSD_version for external API consumers. Reviewed by: kib (prior version) Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D17805	2018-11-02 14:42:36 +00:00
Alexander Motin	2cb74ed856	Skip VDEV_IO_DONE stage only for ZIO_TYPE_FREE. Device removal code uses zio_vdev_child_io() with ZIO_TYPE_NULL parent, that never happened before. It confused FreeBSD-specific TRIM code, which does not use VDEV_IO_DONE for logical ZIO_TYPE_FREE ZIOs. As result of that stage being skipped device removal ZIOs leaked references and memory that supposed to be freed by VDEV_IO_DONE, making it stuck. It is a quick patch rather then a nice fix, but hopefully we'll be able to drop it all together when alternative TRIM implementation finally get landed. PR: 228750, 229007 Discussed with: allanjude, avg, smh Approved by: re (delphij) MFC after: 5 days Sponsored by: iXsystems, Inc.	2018-10-15 21:59:24 +00:00
Mateusz Guzik	bca84f54ce	zfs: fix a panic after failed mount r338927("zfs: depessimize zfs_root with rmlocks") failed to error check the mount before caching root vnode. Results in crashes in rrw_enter_read_impl tracing back to zfs_mount. Reported by: Mike Tancsa Tested by: allanjude Approved by: re (kib)	2018-10-14 16:14:01 +00:00
Alexander Motin	178777f516	Avoid zero-sized kmem_alloc() in vdev_compact_children(). The device evacuation code adds a dependency that vdev_compact_children() be able to properly empty the vdev_child array by setting it to NULL and zeroing vdev_children. Under Linux, kmem_alloc() and related functions return a sentinel pointer rather than NULL for zero-sized allocations. This is a part of ZoL port of device removal patch: commit `a1d477c24c` Author: Matthew Ahrens <mahrens@delphix.com> Ported-by: Tim Chase <tim@chase2k.com> Approved by: re (kib) MFC after: 1 week	2018-10-12 16:55:28 +00:00

1 2 3 4 5 ...

1732 Commits