freebsd-dev

Author	SHA1	Message	Date
Martin Matuska	e7af90ab00	Analogous to r232059, add a parameter for the ZFS file system: allow.mount.zfs: allow mounting the zfs filesystem inside a jail This way the permssions for mounting all current VFCF_JAIL filesystems inside a jail are controlled wia allow.mount.* jail parameters. Update sysctl descriptions. Update jail(8) and zfs(8) manpages. TODO: document the connection of allow.mount.* and VFCF_JAIL for kernel developers MFC after: 10 days	2012-02-26 16:30:39 +00:00
Konstantin Belousov	526d0bd547	Fix found places where uio_resid is truncated to int. Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from the usermode. Discussed with: bde, das (previous versions) MFC after: 1 month	2012-02-21 01:05:12 +00:00
Martin Matuska	4ee8a13704	Revert r230913 and r230914. The initialization was correct, the problem needs deeper analysis.	2012-02-03 13:40:51 +00:00
Martin Matuska	8b152ded1c	Add copyright information on last commits to comply with CDDL. Discussed with: pluknet@ MFC after: 3 days	2012-02-02 16:33:58 +00:00
Martin Matuska	3ce26884ba	Fix out of bounds write causing random panics, uncovered by the change in r230256 Reviewed by: pluknet@ MFC after: 3 days	2012-02-02 16:18:40 +00:00
Kip Macy	ad69d4e266	always exclude data bufs regardless of debug settings	2012-01-29 00:19:19 +00:00
Kip Macy	cc0021eb34	add tunable for developers working on areas outside of ZFS to further reduce core size by excluding ARC metadata buffers from core dumps	2012-01-28 17:41:42 +00:00
Kip Macy	263811f724	exclude kmem_alloc'ed ARC data buffers from kernel minidumps on amd64 excluding other allocations including UMA now entails the addition of a single flag to kmem_alloc or uma zone create Reviewed by: alc, avg MFC after: 2 weeks	2012-01-27 20:18:31 +00:00
Martin Matuska	538251bbf6	Merge illumos revisions 13572, 13573, 13574: Rev. 13572: disk sync write perf regression when slog is used post oi_148 [1] Rev. 13573: crash during reguid causes stale config [2] allow and unallow missing from zpool history since removal of pyzfs [5] Rev. 13574: leaking a vdev when removing an l2cache device [3] memory leak when adding a file-based l2arc device [4] leak in ZFS from metaslab_group_create and zfs_ereport_checksum [6] References: https://www.illumos.org/issues/1909 [1] https://www.illumos.org/issues/1949 [2] https://www.illumos.org/issues/1951 [3] https://www.illumos.org/issues/1952 [4] https://www.illumos.org/issues/1953 [5] https://www.illumos.org/issues/1954 [6] Obtained from: illumos (issues #1909, #1949, #1951, #1952, #1953, #1954) MFC after: 2 weeks	2012-01-24 23:09:54 +00:00
Pawel Jakub Dawidek	241b3b8122	Use provided name when allocating ksid domain. It isn't really used on FreeBSD, but should fix a panic when pool is imported from another OS that is using this. MFC after: 1 week	2012-01-22 10:58:17 +00:00
Pawel Jakub Dawidek	1698a6aec9	Dramatically optimize listing snapshots when user requests only snapshot names and wants to sort them by name, ie. when executes: # zfs list -t snapshot -o name -s name Because only name is needed we don't have to read all snapshot properties. Below you can find how long does it take to list 34509 snapshots from a single disk pool before and after this change with cold and warm cache: before: # time zfs list -t snapshot -o name -s name > /dev/null cold cache: 525s warm cache: 218s after: # time zfs list -t snapshot -o name -s name > /dev/null cold cache: 1.7s warm cache: 1.1s MFC after: 1 week	2012-01-21 21:12:53 +00:00
Pawel Jakub Dawidek	b636ebaa6a	By default turn off prefetch when listing snapshots. In my tests it makes listing snapshots 19% faster with cold cache and 47% faster with warm cache. MFC after: 1 week	2012-01-20 22:04:59 +00:00
Sergey Kandaurov	37c2842272	Fix the "lock &zrl->zr_mtx already initialized" assertion by initializing the allocated memory before calling mtx_init(9) on mtx pointing to it. Otherwize, random contents of uninitialized memory might occasionally trigger the assertion. Reported by: Pavel Polyakov <bsd kobyla org> Reviewed by: pjd MFC after: 1 week	2012-01-17 06:23:25 +00:00
Pawel Jakub Dawidek	62859c9061	- Allow to change vfs.zfs.arc_meta_limit at runtime. - Change vfs.zfs.arc_meta_used from CTLFLAG_RDTUN to CTLFLAG_RD, as it is not a tunable. MFC after: 3 days	2012-01-05 22:16:41 +00:00
Dimitry Andric	a5988eb997	In sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c, check the the number of links against LINK_MAX (which is INT16_MAX), not against UINT32_MAX. Otherwise, the constant would implicitly be converted to -1. Reviewed by: pjd MFC after: 1 week	2012-01-03 20:53:07 +00:00
Andriy Gapon	528bf6e40e	opensolaris compat: fix vcmn_err so that panic(9) produces a proper message ... instead of just a verbatim format string. Reviewed by: pjd MFC after: 1 week	2011-12-19 14:55:14 +00:00
Pawel Jakub Dawidek	bb265163b2	From time to time people report space map corruption resulting in panic (ss == NULL) on pool import. I had such a panic recently. With current version of ZFS it is still possible to import the pool in readonly mode and backup all the data, but in case it is impossible for some reason add tunable vfs.zfs.space_map_last_hope, which when set to '1' will tell ZFS to remove colliding range and retry. This seems to have worked for me, but I consider it highly risky to use. MFC after: 1 week	2011-12-18 12:27:45 +00:00
Pawel Jakub Dawidek	efe17e5a28	Implement replying of ACLs updates. ACL changes should go to ZIL only if the 'sync' property is set to 'always', so replying them is not common. MFC after: 1 month	2011-12-18 12:19:03 +00:00
Attilio Rao	77befd1d23	Revert the approach for skipping lockstat_probe_func call when doing lock_success/lock_failure, introduced in r228424, by directly skipping in dtrace_probe. This mainly helps in avoiding namespace pollution and thus lockstat.h dependency by systm.h. As an added bonus, this also helps in MFC case. Reviewed by: avg MFC after: 3 months (or never) X-MFC: r228424	2011-12-12 23:29:32 +00:00
Pawel Jakub Dawidek	75c0e29ff3	Move ru_inblock increment into arc_read_nolock() so we don't account for cached reads. Discussed with: gibbs No objections from: avg Tested by: Marcus Reid <marcus@blazingdot.com> MFC after: 1 week	2011-12-10 13:02:52 +00:00
Pawel Jakub Dawidek	381962ee59	The vfs.zfs.txg.timeout sysctl can be safely modified at run time. MFC after: 1 week	2011-12-09 18:22:57 +00:00
Martin Matuska	62e6ce9a4b	Fix typo in copyright notice. MFC after: 1 month	2011-11-28 21:42:31 +00:00
Martin Matuska	2f7f0f4112	Merge new ZFS features from illumos: 1644 add ZFS "clones" property https://www.illumos.org/issues/1644 1645 add ZFS "written" and "written@..." properties https://www.illumos.org/issues/1645 1646 "zfs send" should estimate size of stream https://www.illumos.org/issues/1646 1647 "zfs destroy" should determine space reclaimed by destroying multiple snapshots https://www.illumos.org/issues/1647 1693 persistent 'comment' field for a zpool https://www.illumos.org/issues/1693 1708 adjust size of zpool history data https://www.illumos.org/issues/1708 1748 desire support for reguid in zfs https://www.illumos.org/issues/1748 Obtained from: illumos (changesets 13514, 13524, 13525) MFC after: 1 month	2011-11-28 21:40:00 +00:00
Konstantin Belousov	f82360acf2	Existing VOP_VPTOCNP() interface has a fatal flow that is critical for nullfs. The problem is that resulting vnode is only required to be held on return from the successfull call to vop, instead of being referenced. Nullfs VOP_INACTIVE() method reclaims the vnode, which in combination with the VOP_VPTOCNP() interface means that the directory vnode returned from VOP_VPTOCNP() is reclaimed in advance, causing vn_fullpath() to error with EBADF or like. Change the interface for VOP_VPTOCNP(), now the dvp must be referenced. Convert all in-tree implementations of VOP_VPTOCNP(), which is trivial, because vhold(9) and vref(9) are similar in the locking prerequisites. Out-of-tree fs implementation of VOP_VPTOCNP(), if any, should have no trouble with the fix. Tested by: pho Reviewed by: mckusick MFC after: 3 weeks (subject of re approval)	2011-11-19 07:50:49 +00:00
Ryan Stone	493b584dbd	Correct the types of the arguments to return probes of the syscall provider. Previously we were erroneously supplying the argument types of the corresponding entry probe. Reviewed by: rpaulo MFC after: 1 week	2011-11-11 03:49:42 +00:00
Ryan Stone	cddcb8b4dc	On i386, fbt probes are implemented by writing an invalid opcode over certain instructions in a function prologue or epilogue. DTrace has a hook into the invalid opcode fault handler that checks whether the fault was due to an probe and if so, runs the DTrace magic. Upon returning from an invalid opcode fault caused by a probe, DTrace must emulate the instruction that was replaced with the invalid opcode and then return control to the instruction following the invalid opcode. There were a pair of related bugs in the emulation for the leave instruction. The leave instruction is used to pop off a stack frame prior to returning from a function. The emulation for this instruction must move the trap frame for the invalid opcode fault down the stack to the bottom of the stack frame that is being removed, and then execute an iret. At two points in this process, the emulation code was storing values above the current value of the stack pointer. This opened up a window in which if we were two take an interrupt, the trap frame for the interrupt would overwrite the values stored on the stack, causing the system to panic later. The first bug was that at one point the emulation code saves the new value for $esp above the current stack pointer value. The fix is to save this value instead inside of the original trap frame. At this point we do not need the original trap frame so this is safe. The second bug is that when the emulate code loads $esp from the stack, it points part-way through the new trap frame instead of at its beginning. The emulation code adjusts the stack pointer to the correct value immediately afterwards, but this still leaves a one instruction window in which an interrupt would corrupt this trap frame. Fix this by adjusting the stack frame value before loading it into $esp. This fixes panics in invop_leave on i386 when using fbt return probes. Reviewed by: rpaulo, attilio MFC after: 1 week	2011-11-10 22:03:35 +00:00
Ed Schouten	d745c852be	Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs. This means that their use is restricted to a single C file.	2011-11-07 06:44:47 +00:00
Ryan Stone	add89852d6	Replace fasttrap_copyout() with uwrite(). FreeBSD copyout() is not able to write to the .text section of a process. Obtained from: rpaulo MFC after: 3 days	2011-11-07 01:55:58 +00:00
Pawel Jakub Dawidek	df663c3dd3	Correct typo in comment. Reported by: Fabian Keil <fk@fabiankeil.de> MFC after: 3 days	2011-11-05 16:44:25 +00:00
Pawel Jakub Dawidek	98dd1c40c4	In zvol_open() if the spa_namespace_lock is already held, it means that ZFS is trying to open and taste ZVOL as its VDEV. This is not supported, so return an error instead of panicing on spa_namespace_lock recursion. Reported by: Robert Millan <rmh@debian.org> PR: kern/162008 MFC after: 3 days	2011-11-05 16:29:03 +00:00
Martin Matuska	e1d4b72a2e	Fix typo in copyright notice introduced in r226724 (missing character in e-mail adress) Reported by: pjd MFC after: 3 days	2011-10-25 13:52:38 +00:00
Martin Matuska	571e19b341	Update copyright information in several ZFS files, as the clause 3.3 of the CDDL licence explicitly requires every Contributor to add a copyright notice. This also reflects the copyright notices for the changes recently added by Illumos. MFC after: 3 days	2011-10-25 08:35:30 +00:00
Pawel Jakub Dawidek	9782a86c85	- Use better naming now that we allow to rename any mounted file system (not only legacy). - Update copyright to include myself. MFC after: 2 weeks	2011-10-24 21:31:53 +00:00
Pawel Jakub Dawidek	649bbd1cd0	Don't forget to rename mounted snapshots of the file system being renamed. MFC after: 2 weeks	2011-10-24 20:41:31 +00:00
Pawel Jakub Dawidek	27fbc05657	Include <sys/zfs_vfsops.h> only when compiling kernel module. MFC after: 2 weeks	2011-10-24 05:26:40 +00:00
Pawel Jakub Dawidek	497b7ef946	Allow to rename file systems without remounting if it is possible. It is possible for file systems with 'mountpoint' preperty set to 'legacy' or 'none' - we don't have to change mount directory for them. Currently such file systems are unmounted on rename and not even mounted back. This introduces layering violation, as we need to update 'f_mntfromname' field in statfs structure related to mountpoint (for the dataset we are renaming and all its children). In my opinion it is worth it, as it allow to update FreeBSD in even cleaner way - in ZFS-only configuration root file system is ZFS file system with 'mountpoint' property set to 'legacy'. If root dataset is named system/rootfs, we can snapshot it (system/rootfs@upgrade), clone it (system/oldrootfs), update FreeBSD and if it doesn't boot we can boot back from system/oldrootfs and rename it back to system/rootfs while it is mounted as /. Before it was not possible, because unmounting / was not possible. MFC after: 2 weeks	2011-10-24 00:38:09 +00:00
Pawel Jakub Dawidek	72b880fa83	Update per-thread I/O statistics collection in ZFS. This allows to see processes I/O activity in 'top -m io' output. PR kern/156218 Reported by: Marcus Reid <marcus@blazingdot.com> Patch by: avg MFC after: 3 days	2011-10-21 21:49:34 +00:00
Pawel Jakub Dawidek	b39ba076ec	zfs vdev_file_io_start: validate vdev before using vdev_tsd vdev_tsd can be NULL for certain vdev states. At least in userland testing with ztest. Submitted by: avg MFC after: 3 days	2011-10-21 14:00:48 +00:00
Pawel Jakub Dawidek	9838b8b0ee	- Correctly read gang header from raidz. - Decompress assembled gang block data if compressed. - Verify checksum of a gang header. - Verify checksum of assembled gang block data. - Verify checksum of uber block. Submitted by: avg MFC after: 3 days	2011-10-20 15:42:38 +00:00
Pawel Jakub Dawidek	81fdf04870	Always pass data size for checksum verification function, as using physical block size declared in bp may not always be what we want. For example in case of gang block header physical block size declared in bp is much larger than SPA_GANGBLOCKSIZE (512 bytes) and checksum calculation failed. This bug could lead to accessing unallocated memory and resets/failures during boot. MFC after: 3 days	2011-10-19 23:44:38 +00:00
Pawel Jakub Dawidek	9498501254	Initialize 'rc' properly before using it. This error could lead to infinite loop when data reconstruction was needed. MFC after: 3 days	2011-10-19 23:33:48 +00:00
Pawel Jakub Dawidek	13d46594d1	Remove redundant size calculation. MFC after: 3 days	2011-10-19 23:31:50 +00:00
Martin Matuska	ceac02f8e6	Import fix for Illumos bug #1475 to reduce diff against upstream. Panic caused by this bug was already partially fixed by pjd@ in p4 CH 185940 and 185942. Reference: 1475 zfs spill block hold can access invalid spill blkptr https://www.illumos.org/issues/1475 Reviewed by: delphij Obtained from: Illumos (issue 1475, changeset 13469:b8e89e5c4167) MFC after: 1 week	2011-10-18 13:58:22 +00:00
Xin LI	4aadb12e0b	Fix a bug in sa_find_sizes() which could lead to panic: When calculating space needed for SA_BONUS buffers, hdrsize is always rounded up to next 8-aligned boundary. However, in two places the round up was done against sum of 'total' plus hdrsize. On the other hand, hdrsize increments by 4 each time, which means in certain conditions, we would end up returning with will_spill == 0 and (total + hdrsize) larger than full_space, leading to a failed assertion because it's invalid for dmu_set_bonus. Sponsored by: iXsystems, Inc. Reviewed by: mm MFC after: 3 days	2011-10-17 22:23:27 +00:00
Marcel Moolenaar	ad4e66a63c	Define dtrace_cmpset_long in terms of atomic_cmpset_long and not by virtue of inline assembly. Now this file compiles on all supported architectures.	2011-10-16 22:18:08 +00:00
Kip Macy	8451d0dd78	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)	2011-09-16 13:58:51 +00:00
Andriy Gapon	c3e00850b4	zfs boot subroutines: correctly specify type of an integer literal Found by adding more warning flags to zfs boot blocks build. Approved by: re (kib) MFC after: 1 week	2011-09-13 14:07:05 +00:00
Konstantin Belousov	3407fefef6	Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs. Document the changes to flags field to only require the page lock. Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced. Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)	2011-09-06 10:30:11 +00:00
Martin Matuska	82378711f9	Generalize ffs_pages_remove() into vn_pages_remove(). Remove mapped pages for all dataset vnodes in zfs_rezget() using new vn_pages_remove() to fix mmapped files changed by zfs rollback or zfs receive -F. PR: kern/160035, kern/156933 Reviewed by: kib, pjd Approved by: re (kib) MFC after: 1 week	2011-08-25 08:17:39 +00:00
Pawel Jakub Dawidek	4969b96e57	We need to unlock and destroy vnode attached to znode which we are freeing. Reviewed by: kib Approved by: re (bz) MFC after: 1 week	2011-08-24 22:07:38 +00:00
Martin Matuska	6e1f1d4690	zfs_ioctl.c: improve code readability in zfs_ioc_dataset_list_next() zvol.c: fix calling of dmu_objset_prefetch() in zvol_create_minors() by passing full instead of relative dataset name and prefetching all visible datasets to be processed later instead of just the pool name Reviewed by: pjd Approved by: re (kib) MFC after: 1 week > Reviewed by: If someone else reviewed your modification. > Approved by: If you needed approval for this commit. > Obtained from: If the change is from a third party. > MFC after: N [day[s]\|week[s]\|month[s]]. Request a reminder email. > Security: Vulnerability reference (one per line) or description. > Empty fields above will be automatically removed. M opensolaris/uts/common/fs/zfs/zfs_ioctl.c M opensolaris/uts/common/fs/zfs/zvol.c	2011-08-13 21:35:22 +00:00
Martin Matuska	cc82ff1c96	Fix race between dmu_objset_prefetch() invoked from zfs_ioc_dataset_list_next() and dsl_dir_destroy_check() indirectly invoked from dmu_recv_existing_end() via dsl_dataset_destroy() by not prefetching temporary clones, as these count as always inconsistent. In addition, do not prefetch hidden datasets at all as we are not going to process these later. Filed as Illumos Bug #1346 PR: kern/157728 Tested by: Borja Marcos <borjam@sarenet.es>, mm Reviewed by: pjd Approved by: re (kib) MFC after: 1 week	2011-08-13 10:58:53 +00:00
Pawel Jakub Dawidek	7b1085ba55	Eliminate the zfsdev_state_lock entirely and replace it with the spa_namespace_lock. This fixes LOR between the spa_namespace_lock and spa_config lock. LOR can cause deadlock on vdevs removal/insertion. Reported by: gibbs, delphij Tested by: delphij Approved by: re (kib) MFC after: 1 week	2011-08-12 07:04:16 +00:00
Robert Watson	a9d2f8d84f	Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc	2011-08-11 12:30:23 +00:00
Martin Matuska	d32cac295c	Fix panic in zfs_read() if IO_SYNC flag supplied by checking for zfsvfs->z_log before calling zil_commit(). [1] Do not call zfs_read() from zfs_getextattr() with the IO_SYNC flag. Submitted by: Alexander Zagrebin <alex@zagrebin.ru> [1] Reviewed by: pjd@ Approved by: re (kib) MFC after: 3 days	2011-08-02 11:28:33 +00:00
Martin Matuska	ad4887a72a	Fix integer overflow in txg_delay() by initializing the variable "timeout" as clock_t. Filed as Illumos Bug #1313 Reviewed by: avg Approved by: re (kib) MFC after: 3 days	2011-08-01 14:50:31 +00:00
Martin Matuska	4e1407c428	Fix serious bug in ZIL that can lead to pool corruption in the case of a held dataset during remount. Detailed description is available at: https://www.illumos.org/issues/883 illumos-gate revision: 13380:161b964a0e10 Reviewed by: pjd Approved by: re (kib) Obtained from: Illumos (Bug #883) MFC after: 3 days	2011-07-30 19:00:31 +00:00
Xin LI	101b7b5daa	Bring the code more in-line with OpenSolaris source to ease future port. Reviewed by: pjd, mm Approved by: re (kib)	2011-07-21 20:02:22 +00:00
Xin LI	b447d101fa	A different implementation of r224231 proposed by pjd@, which does not require change in the znode structure. Specifically, it queries rdev from the znode in the same sa_bulk_lookup already done in zfs_getattr(). Submitted by: pjd (with some revisions) Reviewed by: pjd, mm Approved by: re (kib)	2011-07-21 20:01:51 +00:00
Xin LI	b1ad061e42	Add a new field to in-core znode, z_rdev, to represent device nodes. PR: kern/159010 Reviewed by: mm@ Approved by: re (kib) MFC after: 2 weeks	2011-07-20 16:53:32 +00:00
Martin Matuska	1bc399c4b1	ZFS tries to allocate blocks evenly across all devices. This means when devices are imbalanced zfs will lots of CPU searching for space on devices which tend to be pretty full. It should instead fail quickly on the full devices and move onto devices which have more availability. New loader tunable: vfs.zfs.mg_alloc_failures (min = 8) Illumos-gate changeset: 13379:4df42cc92254 Obtained from: Illumos (Bug #1051) MFC after: 2 weeks	2011-07-18 08:29:49 +00:00
Martin Matuska	3ded43e7b7	Resurrect the ZFS "aclmode" property Change default of "aclmode" to "discard". Illumos-gate changeset: 13370:8c04143bd318 Obtained from: Illumos (Feature #742) MFC after: 2 weeks	2011-07-18 07:16:44 +00:00
Attilio Rao	40a034576b	MFC	2011-06-28 14:40:17 +00:00
Attilio Rao	ada5b73915	Remove pc_cpumask usage from dtrace MD support	2011-06-28 13:14:39 +00:00
Martin Matuska	fbfed0cda6	Add a new "REFCOMPRESSRATIO" property. For snapshots, this is the same as COMPRESSRATIO, but for filesystems/volumes, the COMPRESSRATIO is based on the data "USED" (ie, includes blocks in children, but not blocks shared with the origin). This is needed to figure out how much space a filesystem would use if it were not compressed (ignoring snapshots). Illumos-gate revision: 13387 Obtained from: Illumos (Feature #1092) MFC after: 2 weeks	2011-06-28 07:52:01 +00:00
Martin Matuska	85a418012f	Disable vdev cache (readahead) by default. The vdev cache is very underutilized (hit ratio 30%-70%) and may consume excessive memory on systems with many vdevs. Illumos-gate revision: 13346 Obtained from: Illumos (Bug #175) MFC after: 1 week	2011-06-28 06:32:35 +00:00
Ben Laurie	5f301949ef	Fix clang warnings. Approved by: philip (mentor)	2011-06-18 13:56:33 +00:00
Justin T. Gibbs	1c3bf59584	Remove C constructs that are incompatible with C++ from various OpenSolaris and ZFS header files. These changes are sufficient to allow a C++ program to use the libzfs library. Note: The majority of these files already included 'extern "C"' declarations, so the intention of providing C++ compatibility already existed even if it wasn't provided. cddl/compat/opensolaris/include/assert.h: Wrap our compatibility assert implementation in 'extern "C"'. Since this is a compatibility header I matched the Solaris style of doing this explicitly rather than rely on FreeBSD's __BEGIN/END_DECLS macro. sys/cddl/compat/opensolaris/sys/kstat.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/arc.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_pool.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/ddt.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h: Rename parameters in function declarations that conflict with C++ keywords. This was the solution preferred by members of the Illumos community. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h: In C, nested structures are visible in the global namespace, but in C++, they take on the namespace of the structure in which they are contained. Flatten nested structure definitions within struct zfs_cmd so these structures are visible in the global namespace when compiled in both languages. Sponsored by: Spectra Logic Corporation	2011-06-10 20:10:30 +00:00
Martin Matuska	baa256da8c	Silence notice on pool creation, import and access. Suggested by: Jeremy Chadwick (freebsd-stable@) Discussed with: pjd MFC after: 1 week	2011-06-07 20:46:31 +00:00
Attilio Rao	81c02539f1	MFC	2011-06-06 21:38:39 +00:00
Martin Matuska	298a6c3de6	Remove empty #ifndef MFC after: 3 days	2011-06-06 14:46:43 +00:00
Attilio Rao	3bce356ea4	MFC	2011-06-04 22:05:20 +00:00
Andriy Gapon	2386e135da	opensolaris compat / zfs: avoid early overflow in ddi_get_lbolt* Reported by: David P. Discher <dpd@bitgravity.com> Tested by: will Reviewed by: art Discussed with: dwhite MFC after: 2 weeks	2011-06-04 07:02:06 +00:00
Attilio Rao	5b6ea0b538	MFC	2011-05-31 14:18:10 +00:00
Pawel Jakub Dawidek	12b9f8e47d	Imagine situation where a security problem is found in setuid binary. User upgrades his system to fix the problem, but if he has any ZFS snapshots for the file system which contains problematic binary, any user can mount the snapshot and execute vulnerable binary. Prevent this from happening by always mounting snapshots with setuid turned off. MFC after: 2 weeks	2011-05-31 07:02:49 +00:00
Attilio Rao	9cb46334ee	MFC	2011-05-27 16:09:10 +00:00
Pawel Jakub Dawidek	43cadeaa27	Silence warnings about unsupoorted value types. MFC after: 2 weeks	2011-05-27 08:34:31 +00:00
Attilio Rao	7fcdc9a26f	MFC	2011-05-26 17:38:00 +00:00
Pawel Jakub Dawidek	b5a060dd8b	Don't pass pointer to name buffer which is on the stack to another thread, because the stack might be paged out once the other thread tries to use the data. Instead, just allocate memory. MFC after: 2 weeks	2011-05-24 20:10:12 +00:00
Pawel Jakub Dawidek	541c60d988	Don't access task structure once we call task function. The task structure might be no longer available. This also allows to eliminates the need for two tasks in the zio structure. Submitted by: anonymous MFC after: 2 weeks	2011-05-24 20:07:15 +00:00
Attilio Rao	b97e49c0e1	MFC	2011-05-22 21:46:55 +00:00
Rick Macklem	965e561750	Fix the zfs file system so that it uses the lock flags argument added to VFS_FHTOVP() by r222167. Reviewed by: pjd	2011-05-22 21:04:32 +00:00
Attilio Rao	8c4431d022	MFC	2011-05-22 20:41:10 +00:00
Rick Macklem	694a586a43	Add a lock flags argument to the VFS_FHTOVP() file system method, so that callers can indicate the minimum vnode locking requirement. This will allow some file systems to choose to return a LK_SHARED locked vnode when LK_SHARED is specified for the flags argument. This patch only adds the flag. It does not change any file system to use it and all callers specify LK_EXCLUSIVE, so file system semantics are not changed. Reviewed by: kib	2011-05-22 01:07:54 +00:00
Attilio Rao	5f6b159db7	MFC	2011-05-18 16:01:29 +00:00
Martin Matuska	a5c44f92bf	Restore old (v15) behaviour for a recursive snapshot destroy. (zfs destroy -r pool/dataset@snapshot) To destroy all descendent snapshots with the same name the top level snapshot was not required to exist. So if the top level snapshot does not exist, check permissions of the parent dataset instead. Filed as Illumos Bug #1043 Reviewed by: delphij Approved by: pjd MFC after: together with v28	2011-05-18 07:37:02 +00:00
Attilio Rao	7e7a34e520	MFC	2011-05-16 16:34:03 +00:00
Andriy Gapon	20208c3bf0	Revert accidentally committed local change in r221990 Pointyhat to: avg	2011-05-16 15:36:11 +00:00
Andriy Gapon	dd7498ae03	better integrate cyclic module with clocksource/eventtimer subsystem Now in the case when one-shot timers are used cyclic events should fire closer to theier scheduled times. As the cyclic is currently used only to drive DTrace profile provider, this is the area where the change makes a difference. Reviewed by: mav (earlier version, a while ago) X-MFC after: clocksource/eventtimer subsystem	2011-05-16 15:29:59 +00:00
Attilio Rao	b68eda3b54	MFC	2011-05-10 15:54:37 +00:00
Andriy Gapon	d9b8935fb9	dtrace: remove unused code Which is also useless, IMO. MFC after: 5 days	2011-05-10 15:05:27 +00:00
Attilio Rao	71a19bdc64	Commit the support for removing cpumask_t and replacing it directly with cpuset_t objects. That is going to offer the underlying support for a simple bump of MAXCPU and then support for number of cpus > 32 (as it is today). Right now, cpumask_t is an int, 32 bits on all our supported architecture. cpumask_t on the other side is implemented as an array of longs, and easilly extendible by definition. The architectures touched by this commit are the following: - amd64 - i386 - pc98 - arm - ia64 - XEN while the others are still missing. Userland is believed to be fully converted with the changes contained here. Some technical notes: - This commit may be considered an ABI nop for all the architectures different from amd64 and ia64 (and sparc64 in the future) - per-cpu members, which are now converted to cpuset_t, needs to be accessed avoiding migration, because the size of cpuset_t should be considered unknown - size of cpuset_t objects is different from kernel and userland (this is primirally done in order to leave some more space in userland to cope with KBI extensions). If you need to access kernel cpuset_t from the userland please refer to example in this patch on how to do that correctly (kgdb may be a good source, for example). - Support for other architectures is going to be added soon - Only MAXCPU for amd64 is bumped now The patch has been tested by sbruno and Nicholas Esborn on opteron 4 x 12 pack CPUs. More testing on big SMP is expected to came soon. pluknet tested the patch with his 8-ways on both amd64 and i386. Tested by: pluknet, sbruno, gianni, Nicholas Esborn Reviewed by: jeff, jhb, sbruno	2011-05-05 14:39:14 +00:00
Marius Strobl	edd870e447	Convert the last use of xcopyout() to ddi_copyout() and remove the now unused xcopyin() as well as xcopyout(). MFC together with r219089. Approved by: mm	2011-05-03 20:13:27 +00:00
Martin Matuska	29bf94b8d8	Fix deduplicated zfs receive (dmu_recv_stream builds incomplete guid_to_ds_map) Illumos-gate changeset: 13329:c48b8bf84ab7 MFC together with v28 Approved by: pjd Obtained from: Illumos (Bug #755)	2011-04-30 14:52:49 +00:00
Marcel Moolenaar	8d098dc0c4	Fix copy-paste bug.	2011-04-27 04:03:04 +00:00
Martin Matuska	8b2aa22d8f	Partially fix ZFS compat code for sparc64. Some endianess bugs still need to be resolved. Submitted by: marius (parts of the fix) MFC after: 1 month	2011-04-08 11:08:26 +00:00
Artem Belevich	7a3f3cabb1	Stripped '32' suffix from linux systrace module name on i386. Approved by: avg	2011-04-08 06:27:43 +00:00
Jung-uk Kim	3453537fa5	Use atomic load & store for TSC frequency. It may be overkill for amd64 but safer for i386 because it can be easily over 4 GHz now. More worse, it can be easily changed by user with 'machdep.tsc_freq' tunable (directly) or cpufreq(4) (indirectly). Note it is intentionally not used in performance critical paths to avoid performance regression (but we should, in theory). Alternatively, we may add "virtual TSC" with lower frequency if maximum frequency overflows 32 bits (and ignore possible incoherency as we do now).	2011-04-07 23:28:28 +00:00
Pawel Jakub Dawidek	65612637e8	Checking file access on size change is bogus. The checks are done earlier by VFS where we know if this is truncate(2) or ftruncate(2). If this is the latter we should depend on the mode the file was opened and not on the current permission. PR: standards/154873 Reported by: Mark Martinec <Mark.Martinec@ijs.si> Discussed with: Eric Schrock <eric.schrock@delphix.com> Discussed with: Mark Maybee <Mark.Maybee@Oracle.COM> MFC after: 1 month	2011-03-24 20:28:09 +00:00
Pawel Jakub Dawidek	d7d23301ae	Fix potential panic in dbuf_sync_list() relate to spill blocks handling. Obtained from: IllumOS MFC after: 1 month	2011-03-14 11:07:12 +00:00
Andriy Gapon	308bce2a0e	add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls Add systrace_linux32 and systrace_freebsd32 modules which provide support for tracing compat system calls in addition to native system call tracing provided by systrace module. Provided that all the systrace modules are loaded now you can select what syscalls to trace in the following manner: syscall::xxx:yyy - work on all system calls that match the specification syscall:freebsd:xxx:yyy - only native system calls syscall:linux32:xxx:yyy - linux32 compat system calls syscall:freebsd32:xxx:yyy - freebsd32 compat system calls on amd64 PR: kern/152822 Submitted by: Artem Belevich <fbsdlist@src.cx> Reviewed by: jhb (earlier version) MFC after: 3 weeks	2011-03-12 09:09:25 +00:00
Pawel Jakub Dawidek	cae905e5d0	Correct readdir over ZFS handling. Reported by: Pierre Beyssac <pb@fasterix.frmug.org> MFC after: 1 month	2011-03-08 18:39:41 +00:00
Pawel Jakub Dawidek	a96e8e86f0	Fix libzpool build. MFC after: 1 month	2011-03-06 01:22:14 +00:00
Pawel Jakub Dawidek	2348f1110e	Make renaming of a ZVOL, ZVOL's parent directory and ZVOL snapshot work. Reported by: avg MFC after: 1 month	2011-03-05 22:31:03 +00:00
Pawel Jakub Dawidek	5bf0660559	Simplify zvol_remove_minors() a bit. MFC after: 1 month	2011-03-05 22:24:31 +00:00
Pawel Jakub Dawidek	2fbdb9c0a0	Use proper lock in assertion. MFC after: 1 month	2011-02-28 05:45:31 +00:00
Pawel Jakub Dawidek	10b9d77bf1	Finally... Import the latest open-source ZFS version - (SPA) 28. Few new things available from now on: - Data deduplication. - Triple parity RAIDZ (RAIDZ3). - zfs diff. - zpool split. - Snapshot holds. - zpool import -F. Allows to rewind corrupted pool to earlier transaction group. - Possibility to import pool in read-only mode. MFC after: 1 month	2011-02-27 19:41:40 +00:00
Rebecca Cran	6bccea7c2b	Fix typos - remove duplicate "the". PR: bin/154928 Submitted by: Eitan Adler <lists at eitanadler.com> MFC after: 3 days	2011-02-21 09:01:34 +00:00
Marcel Moolenaar	6e23016fd7	Use the preload_fetch_addr() and preload_fetch_size() convenience functions to obtain the address and size of the preloaded pool configuration file/repository. Sponsored by: Juniper Networks.	2011-02-13 19:46:55 +00:00
Konstantin Belousov	ca67168159	For UIO_NOCOPY case of reading request on zfs vnode, which has vm object attached, activate the page after the successful read, and free the page if read was unsuccessfull. Freshly allocated page is not on any queue yet, and not activating (or deactivating) the page leaves it on no queue, excluding the page from pagedaemon scans and making the memory disappeared until the vnode reclaimed. Reviewed by: avg MFC after: 1 week	2011-02-11 10:46:15 +00:00
Edward Tomasz Napierala	dc7a965673	Make it impossible to clear the MNT_NFS4ACLS flag on ZFS filesystem by using "mount -uw". Reviewed by: pjd MFC after: 2 weeks	2011-02-06 23:34:09 +00:00
Andrey V. Elsukov	459d0e830d	vdev's sectorsize should not be greater than 8 Kbytes and also it should be power of 2. This prevents non-aligned access while probing vdev's labels. PR: kern/147852 Reviewed by: pjd MFC after: 1 week	2011-02-04 15:22:56 +00:00
Martin Matuska	5c92680fa9	Recommit r218169, enclosing with #ifdef _KERNEL This change is sufficient for the ZFS kernel module. Discussed with: pjd MFC after: 1 week	2011-02-01 23:12:13 +00:00
Alexander Kabaev	a9c28a203d	Revert r218169 until it can be tested and fixed properly.	2011-02-01 21:15:35 +00:00
Martin Matuska	4530e5f790	For ZFS, change the type of clock_t to int64_t. The clock_t type in OpenSolaris is long (int64_t on amd64). On FreeBSD clock_t is int32_t. The clock_t type is used in several places in the ZFS code to store system uptime in milliseconds ("seconds * hz"). With hz=1000 we have a 32-bit integer overflow in 24 days, 20 hours, 31 minutes and 23.648 seconds. This has a user reported negative impact on l2arc_feed_thread() and may cause unexpected results from other functions using clock_t. Reported by: Artem Belevich <fbsdlist@src.cx> on freebsd-fs@ MFC after: 1 week	2011-02-01 14:28:50 +00:00
Jayachandran C.	baa8c35cb4	CDDL fixes for MIPS n32. Provide 64 bit atomic ops, and use 32 bit pointer.	2011-01-28 06:12:59 +00:00
Matthew D Fleming	cbc134ad03	Introduce signed and unsigned version of CTLTYPE_QUAD, renaming existing uses. Rename sysctl_handle_quad() to sysctl_handle_64().	2011-01-19 23:00:25 +00:00
Edward Tomasz Napierala	7a93bf9a69	Add MNT_NFS4ACLS to ZFS mount flags. It's not conditional, since there is no way to disable NFSv4 ACLs in ZFS. This should make it easier for the NFS server to figure out whether the exported filesystem supports ACLs or not. Reviewed by: pjd MFC after: 2 weeks	2011-01-19 17:11:52 +00:00
Matthew D Fleming	e704482d43	Re-commit the zfs sysctl(9) type-safety changes. Thanks to dim and pjd for the pointer to zfs_context.h for building userland.	2011-01-13 18:20:19 +00:00
Matthew D Fleming	374a993a88	Revert cddl changes for sysctl(9) until I understand why this isn't building on universe.	2011-01-12 23:06:38 +00:00
Matthew D Fleming	4a2ce5903f	sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly. Commit the zfs piece.	2011-01-12 19:53:30 +00:00
Martin Matuska	df06a59a77	MFp4 r186485, r186859: Fix a race by defining two tasks in the zio structure as we can still be returning from issue task when interrupt task is used. Tested by: pjd Approved by: pjd, delphij (mentor) MFC after: 3 days	2011-01-03 12:57:07 +00:00
Andriy Gapon	dfe3a1b374	cyclic xcall: use smp_no_rendevous_barrier as setup function parameter In this case we call target function only on a single CPU and do not need any synchronization at the setup stage. It's a bit non-obvious but setup function of NULL means that smp_rendezvous_cpus waits for all CPUs to arrive at the rendezvous point, but without doing any actual setup. While using smp_no_rendevous_barrier means that each CPU proceeds on its own schedule without any synchronization whatsoever. MFC after: 3 weeks	2010-12-17 18:22:50 +00:00
Pawel Jakub Dawidek	8735863465	Remove redundant semicolon and empty like.	2010-12-11 13:35:25 +00:00
Ivan Voras	d7ccd95be8	Undo r216230: the interaction between saved ashift in metadata and detected ashift does not support this. With this change, pools created while stripesize=512 could not be imported when stripesize becomes larger (on the same drive). Noticed by: pjd	2010-12-07 15:24:08 +00:00
Andriy Gapon	58f61ce4eb	opensolaris cyclic: fix deadlock and make a little bit closer to upstream The dealock was caused in the following way: - thread T1 on CPU C1 holds a spin mutex, IPIs CPU C2 and waits for the IPI to be handled - C2 executes timer interrupt filter, thus has interrupts disabled, and gets blocked on the spin mutex held by T1 The problem seems to have been introduced by simplifications made to OpenSolaris code during porting. The problem is fixed by reorganizing the code to more closely resemble the upstream version. Interrupt filter (cyclic_fire) now doesn't acquire any locks, all per-CPU data accesses are performed on a target CPU with preemption and interrupts disabled thus precluding concurrent access to the data. cyp_mtx spin mutex is used to disable preemtion and interrupts; it's not used for classical mutual exclusion, because xcall already serializes calls to a CPU. It's an emulation of OpenSolaris cyb_set_level(CY_HIGH_LEVEL) call, the spin mutexes could probably be reduced to just a spinlock_enter()/_exit() pair. Diff with upstream version is now reduced by ~500 lines, however it still remains quite large - many things that are not needed (at the moment) or are irrelevant on FreeBSD were simply ripped out during porting. Examples of such things: - support for CPU onlining/offlining - support for suspend/resume - support for running callouts at soft interrupt levels - support for callout rebinding from CPU to CPU - support for CPU partitions Tested by: Artem Belevich <fbsdlist@src.cx> MFC after: 3 weeks X-MFC with: r216252	2010-12-07 12:25:26 +00:00
Andriy Gapon	a10b0e67d9	opensolaris cyclic xcall: no need for special handling of curcpu smp_rendezvous_cpus already properly handles current CPU case and non-SMP case. MFC after: 3 weeks	2010-12-07 12:04:06 +00:00
Andriy Gapon	fe8c7b3d77	dtrace_xcall: no need for special handling of curcpu smp_rendezvous_cpus alreadt does the right thing in a very similar fashion, so the code was kind of duplicating that. MFC after: 3 weeks	2010-12-07 09:19:47 +00:00
Andriy Gapon	7becfa95b9	dtrace_gethrtime_init: pin to master while examining other CPUs Also use pc_cpumask to be future-friendly. Reviewed by: jhb MFC after: 2 weeks	2010-12-07 09:03:17 +00:00
Ivan Voras	8b08562112	Use GEOM stripesize field when calculating ashift. This will enable correct alignment on drives with large sector sizes (e.g. 4 KiB) but the implementation might need to be revisited if devices with large stripesizes appear (e.g. if RAID controllers or flash drives start using the field), probably by introducing a physsectorsize field in GEOM providers. Discussed with: mav, mostly silence on freebsd-geom@ and freebsd-fs@	2010-12-06 12:18:02 +00:00
Edward Tomasz Napierala	de2a57325d	Don't panic when we read an empty ACL from ZFS. Apparently this may happen with filesystems created under MacOS X ZFS port. This is kind of filesystem corruption (we don't allow for setting empty ACLs), so make acl_get_file(3) and related syscalls fail with EINVAL in that case. In theory, we could return empty ACL to userland, but I'm afraid this would break some code. MFC after: 3 days	2010-11-30 21:04:05 +00:00
Andriy Gapon	c59690f249	zfs+sendfile: populate all requested pages, not just those already cached kern_sendfile() uses vm_rdwr() to read-ahead blocks of data to populate page cache. When sendfile stumbles upon a page that is not populated yet, it sends out all the mbufs that it collected so far. This resulted in very poor performance with ZFS when file data is not in the page cache, because ZFS vop_read for UIO_NOCOPY case populated only those pages that are already in cache, but not valid. Which means that most of the time it populated only the first requested page in the described above scenario. Reported by: Alexander Zagrebin <alexz@visp.ru> Tested by: Alexander Zagrebin <alexz@visp.ru>, Artemiev Igor <ai@kliksys.ru> MFC after: 12 days	2010-11-16 15:53:44 +00:00
Andriy Gapon	f9e2e99d5d	fix misspelling in a comment Reported by: Daniel Braniss <danny@cs.huji.ac.il> MFC after: 3 days	2010-11-16 12:30:47 +00:00
Martin Matuska	8db47aa15e	Disable VFS_HOLD placed on mnt_vnodecovered during the mount of a snapshot and VFS_RELE on a non-existing hold on snapshot parent's z_vfs. This disables the changes from OpenSolaris onnv-revision 9234:bffdc4fc05c4 (bug IDs: 6792139, 6794830) - not applicable to FreeBSD. This fixes the process hang if umounting a manually mounted snapshot. Reported by: Alexander Zagrebin <alexz@visp.ru> Approved by: delphij (mentor) MFC after: 1 week	2010-11-13 21:09:18 +00:00
Xin LI	b97a9057c2	Validate whether the zfs_cmd_t submitted from userland is not smaller than what we have. Without the check the kernel could accessing memory that does not belong to the request struct. Note that we do not test if the struct equals in size at this time, which may faciliate forward compatibility with newer binaries. Reviewed by: pjd at MeetBSD CA '2010 MFC after: 1 week	2010-11-05 22:18:09 +00:00
Martin Matuska	e25376bdd0	Bugfix merge from OpenSolaris: OpenSolaris onnv-revision: 10209:91f47f0e7728 6830541 zfs_get_data_trips on a verify 6696242 multiple zfs_fillpage() zfs: accessing past end of object panics 6785914 zfs fails to drop dn_struct_rwlock in recovery code path Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6830541, 6696242, 6785914) MFC after: 2 weeks	2010-10-26 15:48:03 +00:00
Andriy Gapon	23a1bcf8c6	zfs: add vop_getpages method implementation This should make vnode_pager_getpages path a bit shorter and clearer. Also this should eliminate problems with partially valid pages. Having this method opens room for future optimizations. To do: try to satisfy other pages besides the required one taking into account tradeofs between number of page faults, read throughput and read latency. Also, eventually vop_putpages should be added too. Reviewed by: kib, mm, pjd MFC after: 3 weeks	2010-10-16 20:43:05 +00:00
Rui Paulo	910a5e18ba	Pass a format string to panic() and to taskqueue_start_threads(). Found with: clang	2010-10-13 17:13:43 +00:00
Rui Paulo	6e634bb80f	In zfs_post_common(), use %d instead of %hhu. Found with: clang	2010-10-13 17:12:23 +00:00
Andriy Gapon	f6bb41924c	zfs + sendfile: do not produce partially valid pages for vnode's tail Since r212650 and before this change sendfile(2) could produce a partially valid page for a trailing portion of a ZFS vnode. vm_fault() always wants to see a fully valid page even if it's the last page that partially extends beyond vnode's end. Otherwise it calls vop_getpages() to bring in the page. In the case of ZFS this means that the data is read from the page into the same page and this breaks checks in ZFS mappedread() - a thread that set VPO_BUSY on the page in vm_fault() will get blocked forever waiting for it to be cleared. Many thanks to Kai and Jeremy for reproducing the issue and providing important debugging information and help. Reported by: Kai Gallasch <gallasch@free.de>, Jeremy Chadwick <freebsd@jdc.parodius.com> Tested by: Kai Gallasch <gallasch@free.de>, Jeremy Chadwick <freebsd@jdc.parodius.com> Reviewed by: kib MFC after: 3 days To-Do: apply the same treatment to tmpfs + sendfile	2010-10-12 17:04:21 +00:00
Pawel Jakub Dawidek	19ebc67beb	Provide internal ioflags() function that converts ioflag provided by FreeBSD's VFS to OpenSolaris-specific ioflag expected by ZFS. Use it for read and write operations. Reviewed by: mm MFC after: 1 week	2010-10-10 20:49:33 +00:00
Martin Matuska	a362d75576	Change FAPPEND to IO_APPEND as this is a ioflag and not a fflag. This corrects writing to append-only files on ZFS. PR: kern/149495 [1], kern/151082 [2] Submitted by: Daniel Zhelev <daniel@zhelev.biz> [1], Michael Naef <cal@linu.gs> [2] Approved by: delphij (mentor) MFC after: 1 week	2010-10-08 23:01:38 +00:00
Andriy Gapon	6c6aca1203	opensolaris_kmem kmem_size(): report lesser of vm_kmem_size and available physical memory This is needed to correctly autotune ZFS ARC size when vm_kmem_size is set to value larger than available physical memory. MFC after: 2 weeks	2010-10-07 18:16:14 +00:00
Martin Matuska	aa007a9f0e	Properly handle IO with B_FAILFAST Retry IO once with ZIO_FLAG_TRYHARD before declaring a pool faulted OpenSolaris revision and Bug IDs: 9725:0bf7402e8022 6843014 ZFS B_FAILFAST handling is broken Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6843014) MFC after: 3 weeks	2010-09-27 09:42:31 +00:00
Martin Matuska	96a1a6a568	Enable offlining of log devices. OpenSolaris revision and Bug IDs: 9701:cc5b64682e64 6803605 should be able to offline log devices 6726045 vdev_deflate_ratio is not set when offlining a log device 6599442 zpool import has faults in the display Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6803605, 6726045, 6599442) MFC after: 3 weeks	2010-09-27 09:05:51 +00:00
Andriy Gapon	68653c3bd6	zfs_map_page/zfs_unmap_page: do not use sched_pin() and SFB_CPUPRIVATE zfs_map_page/zfs_unmap_page are mostly called around potential I/O paths and it seems to be a not very good idea to do cpu pinning there. Suggested by: kib MFC after: 2 weeks	2010-09-21 05:58:45 +00:00
Andriy Gapon	ff5e15a487	zfs_vnops: use zfs_map_page/zfs_unmap_page helper functions in another place MFC after: 2 weeks	2010-09-21 05:54:36 +00:00
Andriy Gapon	9d5eb9aa5d	zfs arc_reclaim_needed: fix typo in mismerge in r212780 PR: kern/146410, kern/138790 MFC after: 3 weeks X-MFC with: r212780	2010-09-17 07:34:50 +00:00
Andriy Gapon	921d3fd122	zfs+sendfile: advance uio_offset upon reading as well Picked from analogous code in tmpfs. MFC after: 1 week	2010-09-17 07:20:20 +00:00
Andriy Gapon	44532bc5cd	zfs arc_reclaim_needed: remove redundant checks for arc_c_max and arc_c_max Those checks are not present in upstream code and they are enforced in actual calculations of delta by which ARC size can be grown or should be reduced. MFC after: 3 weeks	2010-09-17 07:17:38 +00:00
Andriy Gapon	7c1353491f	zfs arc_reclaim_needed: more reasonable threshold for available pages vm_paging_target() is not a trigger of any kind for pageademon, but rather a "soft" target for it when it's already triggered. Thus, trying to keep 2048 pages above that level at the expense of ARC was simply driving ARC size into the ground even with normal memory loads. Instead, use a threshold at which a pagedaemon scan is triggered, so that ARC reclaiming helps with pagedaemon's task, but the latter still recycles active and inactive pages. PR: kern/146410, kern/138790 MFC after: 3 weeks	2010-09-17 07:14:07 +00:00
Martin Matuska	d1ee63f836	Fix kernel panic when moving a file to .zfs/shares Fix possible loss of correct error return code in ZFS mount OpenSolaris revisions and Bug IDs: 11824:53128e5db7cf 6863610 ZFS mount can lose correct error return 12079:13822b941977 6939941 problem with moving files in zfs (142901-12) Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6863610, 6939941) MFC after: 3 days	2010-09-15 19:55:26 +00:00
Andriy Gapon	8a3883cfb7	zfs vn_has_cached_data: take into account v_object->cache != NULL This mirrors code in tmpfs. This changge shouldn't affect much read path, it may cause unnecessary vm_page_lookup calls in the case where v_object has no active or inactive pages but has some cache pages. I believe this situation to be non-essential. In write path this change should allow us to properly detect the above case and free a cache page when we write to a range that corresponds to it. If this situation is undetected then we could have a discrepancy between data in page cache and in ARC or on disk. This change allows us to re-enable vn_has_cached_data() check in zfs_write. NOTE: strictly speaking resident_page_count and cache fields of v_object should be exmined under VM_OBJECT_LOCK, but for this particular usage we may get away with it. Discussed with: alc, kib Approved by: pjd Tested with: tools/regression/fsx MFC after: 3 weeks	2010-09-15 11:05:41 +00:00
Andriy Gapon	0b1ca38a69	zfs mappedread, update_pages: use int for offset and length within a page uint64_t, int64_t were redundant there Approved by: pjd Tested by: tools/regression/fsx MFC after: 2 weeks	2010-09-15 10:48:16 +00:00
Andriy Gapon	c002c3e8c2	zfs mappedread: use uiomove_fromphys where possible Reviewed by: alc Approved by: pjd Tested by: tools/regression/fsx MFC after: 2 weeks	2010-09-15 10:44:20 +00:00
Andriy Gapon	fbbdb19dcd	zfs: catch up with vm_page_sleep_if_busy changes Reviewed by: alc Approved by: pjd Tested by: tools/regression/fsx MFC after: 2 weeks	2010-09-15 10:39:21 +00:00
Andriy Gapon	21bd3e2576	tmpfs, zfs + sendfile: mark page bits as valid after populating it with data Otherwise, adding insult to injury, in addition to double-caching of data we would always copy the data into a vnode's vm object page from backend. This is specific to sendfile case only (VOP_READ with UIO_NOCOPY). PR: kern/141305 Reported by: Wiktor Niesiobedzki <bsd@vink.pl> Reviewed by: alc Tested by: tools/regression/sockets/sendfile MFC after: 2 weeks	2010-09-15 10:31:27 +00:00
Martin Matuska	9a13d2e1b3	Remove duplicated VFS_HOLD due to a mismerge. PR: kern/150544 Approved by: delphij (mentor) MFC after: 1 day	2010-09-14 12:12:18 +00:00
Martin Matuska	4eeef2e44a	Add missing vop_vector zfsctl_ops_shares Add missing locks around VOP_READDIR and VOP_GETATTR with z_shares_dir PR: kern/150544 Approved by: delphij (mentor) Obtained from: perforce (pjd) MFC after: 1 day	2010-09-14 10:27:32 +00:00
Pawel Jakub Dawidek	3c907063e9	Remove the page queues lock around vm_page_undirty() - it is no longer needed. Reviewed by: alc	2010-09-13 19:47:09 +00:00
Rui Paulo	47047e3418	Revamp locking a bit. This fixes three problems: * processes now can't go away while we are inserting probes (fixes a panic) * if a trap happens, we won't be holding the process lock (fixes a hang) * fix a LOR between the process lock and the fasttrap bucket list lock Thanks to kib for pointing some problems. Sponsored by: The FreeBSD Foundation	2010-09-12 14:12:16 +00:00
Rui Paulo	eae81e9501	Avoid a LOR (sleepable after non-sleepable) in fasttrap_tracepoint_enable(). Sponsored by: The FreeBSD Foundation	2010-09-11 12:58:31 +00:00
Matthew D Fleming	4d369413e1	Replace sbuf_overflowed() with sbuf_error(), which returns any error code associated with overflow or with the drain function. While this function is not expected to be used often, it produces more information in the form of an errno that sbuf_overflowed() did.	2010-09-10 16:42:16 +00:00
Pawel Jakub Dawidek	6a85b5e08a	Forgot to commit this file. Add ZPOOL_CONFIG_IS_LOG. Reported by: keramida MFC after: 2 weeks	2010-09-10 04:44:13 +00:00
Pawel Jakub Dawidek	86b19d1861	On FreeBSD we can log from pool that have multiple top-level vdevs or log vdevs, so don't deny adding new vdevs if bootfs property is set. MFC after: 2 weeks	2010-09-09 21:20:18 +00:00
Rui Paulo	d3555b6fc2	Fix two bugs in DTrace: * when the process exits, remove the associated USDT probes * when the process forks, duplicate the USDT probes. Sponsored by: The FreeBSD Foundation	2010-09-09 09:58:05 +00:00
Justin T. Gibbs	f03f7a0ca3	Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic. Add the BIO_ORDERED flag for struct bio and update bio clients to use it. The barrier semantics of bioq_insert_tail() were broken in two ways: o In bioq_disksort(), an added bio could be inserted at the head of the queue, even when a barrier was present, if the sort key for the new entry was less than that of the last queued barrier bio. o The last_offset used to generate the sort key for newly queued bios did not stay at the position of the barrier until either the barrier was de-queued, or a new barrier (which updates last_offset) was queued. When a barrier is in effect, we know that the disk will pass through the barrier position just before the "blocked bios" are released, so using the barrier's offset for last_offset is the optimal choice. sys/geom/sched/subr_disk.c: sys/kern/subr_disk.c: o Update last_offset in bioq_insert_tail(). o Only update last_offset in bioq_remove() if the removed bio is at the head of the queue (typically due to a call via bioq_takefirst()) and no barrier is active. o In bioq_disksort(), if we have a barrier (insert_point is non-NULL), set prev to the barrier and cur to it's next element. Now that last_offset is kept at the barrier position, this change isn't strictly necessary, but since we have to take a decision branch anyway, it does avoid one, no-op, loop iteration in the while loop that immediately follows. o In bioq_disksort(), bypass the normal sort for bios with the BIO_ORDERED attribute and instead insert them into the queue with bioq_insert_tail(). bioq_insert_tail() not only gives the desired command order during insertion, but also provides barrier semantics so that commands disksorted in the future cannot pass the just enqueued transaction. sys/sys/bio.h: Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio. sys/cam/ata/ata_da.c: sys/cam/scsi/scsi_da.c Use an ordered command for SCSI/ATA-NCQ commands issued in response to bios with the BIO_ORDERED flag set. sys/cam/scsi/scsi_da.c Use an ordered tag when issuing a synchronize cache command. Wrap some lines to 80 columns. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c sys/geom/geom_io.c Mark bios with the BIO_FLUSH command as BIO_ORDERED. Sponsored by: Spectra Logic Corporation MFC after: 1 month	2010-09-02 19:40:28 +00:00
Rui Paulo	ea950d20f6	Make the /dev/dtrace/helper node have the mode 0660. This allows programs that refuse to run as root (pgsql) to install probes when their user is part of the wheel group. Sponsored by: The FreeBSD Foundation	2010-09-01 12:08:32 +00:00
Jaakko Heinonen	de478dd4b4	execve(2) has a special check for file permissions: a file must have at least one execute bit set, otherwise execve(2) will return EACCES even for an user with PRIV_VFS_EXEC privilege. Add the check also to vaccess(9), vaccess_acl_nfs4(9) and vaccess_acl_posix1e(9). This makes access(2) to better agree with execve(2). Because ZFS doesn't use vaccess(9) for VEXEC, add the check to zfs_freebsd_access() too. There may be other file systems which are not using vaccess*() functions and need to be handled separately. PR: kern/125009 Reviewed by: bde, trasz Approved by: pjd (ZFS part)	2010-08-30 16:30:18 +00:00
Pawel Jakub Dawidek	b8a4becc2d	Return NULL pointer instead of B_FALSE as it is done in the vendor code. Obtained from: //depot/user/pjd/zfs/...	2010-08-28 19:29:06 +00:00
Pawel Jakub Dawidek	3e9e888541	Move ZUT_OBJS in the same place that is used in vendor code. Obtained from: //depot/user/pjd/zfs/...	2010-08-28 19:28:12 +00:00
Martin Matuska	8d87b396f8	Import changes from OpenSolaris that provide - better ACL caching and speedup of ACL permission checks - faster handling of stat() - lowered mutex contention in the read/writer lock (rrwlock) - several related bugfixes Detailed information (OpenSolaris onnv changesets and Bug IDs): 9749:105f407a2680 6802734 Support for Access Based Enumeration (not used on FreeBSD) 6844861 inconsistent xattr readdir behavior with too-small buffer 9866:ddc5f1d8eb4e 6848431 zfs with rstchown=0 or file_chown_self privilege allows user to "take" ownership 9981:b4907297e740 6775100 stat() performance on files on zfs should be improved 6827779 rrwlock is overly protective of its counters 10143:d2d432dfe597 6857433 memory leaks found at: zfs_acl_alloc/zfs_acl_node_alloc 6860318 truncate() on zfsroot succeeds when file has a component of its path set without access permission 10232:f37b85f7e03e 6865875 zfs sometimes incorrectly giving search access to a dir 10250:b179ceb34b62 `6867395` zpool_upgrade_007_pos testcase panic'd with BAD TRAP: type=e (#pf Page fault) 10269:2788675568fd 6868276 zfs_rezget() can be hazardous when znode has a cached ACL 10295:f7a18a1e9610 6870564 panic in zfs_getsecattr Approved by: delphij (mentor) Obtained from: OpenSolaris (multiple Bug IDs) MFC after: 2 weeks	2010-08-28 09:24:11 +00:00
Martin Matuska	abe5837f7c	Update ZFS metaslab code from OpenSolaris. This provides a noticeable write speedup, especially on pools with less than 30% of free space. Detailed information (OpenSolaris onnv changesets and Bug IDs): 11146:7e58f40bcb1c 6826241 Sync write IOPS drops dramatically during TXG sync 6869229 zfs should switch to shiny new metaslabs more frequently 11728:59fdb3b856f6 6918420 zdb -m has issues printing metaslab statistics 12047:7c1fcc8419ca 6917066 zfs block picking can be improved Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6826241, 6869229, 6918420, 6917066) MFC after: 2 weeks	2010-08-28 08:59:55 +00:00
Rui Paulo	4d02a00a57	Remove debugging. Sponsored by: The FreeBSD Foundation	2010-08-28 08:39:37 +00:00
Rui Paulo	9f4ee6172d	Replace a memory barrier with a mutex barrier. Sponsored by: The FreeBSD Foundation	2010-08-28 08:13:38 +00:00
Pawel Jakub Dawidek	4e52cdd0f7	Use ZFS_CTLDIR_NAME instead of hardcoding ".zfs".	2010-08-27 21:31:15 +00:00
Pawel Jakub Dawidek	8733ff6e11	Update comment now that I finally committed r211854. MFC after: 1 month	2010-08-26 23:44:32 +00:00
Andriy Gapon	694a0a8717	zfs arc_reclaim_thread: no need to call arc_reclaim_needed when resetting needfree needfree is checked at the very start of arc_reclaim_needed. This change makes code easier to follow and maintain in face of potential changed in arc_reclaim_needed. Also, put the whole sub-block under _KERNEL because needfree can be set only in kernel code. To do: rename needfree to something else to aovid confusion with OpenSolaris global variable of the same name which is used in the same code, but has different meaning (page deficit). Note: I have an impression that locking around accesses to this variable as well as mutual notifications between arc_reclaim_thread and arc_lowmem are not proper. MFC after: 1 week	2010-08-24 17:48:22 +00:00
Rui Paulo	5caab9d4cd	Replace a pksignal() call with tdksignal(). Pointed out by: kib	2010-08-24 12:12:03 +00:00
Rui Paulo	625564de63	MD fasttrap implementation. Sponsored by: The FreeBSD Foundation	2010-08-24 12:05:58 +00:00
Rui Paulo	8605d1ae99	Port the fasttrap provider to FreeBSD. This provider is responsible for injecting debugging probes in the userland programs and is the basis for the pid provider and the usdt provider. Sponsored by: The FreeBSD Foundation	2010-08-24 11:11:58 +00:00
Rui Paulo	4e41f3537a	Port this to FreeBSD. We miss some suword functions, so we use copyout. Sponsored by: The FreeBSD Foundation	2010-08-22 11:41:06 +00:00
Rui Paulo	de788cde7b	Destroy the helper device when unloading. Sponsored by: The FreeBSD Foundation	2010-08-22 11:05:37 +00:00
Rui Paulo	6c44520886	Add more compatibility structure members needed by the upcoming fasttrap DTrace device. Sponsored by: The FreeBSD Foundation	2010-08-22 11:04:43 +00:00
Rui Paulo	c6f5742f90	Kernel DTrace support for: o uregs (sson@) o ustack (sson@) o /dev/dtrace/helper device (needed for USDT probes) The work done by me was: Sponsored by: The FreeBSD Foundation	2010-08-22 10:53:32 +00:00
Rui Paulo	58f668bba5	Add a function compatibility function dtrace_instr_size_isa() that on FreeBSD does the same as dtrace_dis_isize(). Sponsored by: The FreeBSD Foundation	2010-08-22 10:40:15 +00:00
Rui Paulo	5e3caca7f6	Add the FreeBSD definition for the fasttrap ioctls. Sponsored by: The FreeBSD Foundation	2010-08-22 10:13:56 +00:00
Rui Paulo	cd306d6fa1	Add a sysname char * to struct opensolaris_utsname. Sponsored by: The FreeBSD Foundation	2010-08-21 14:09:24 +00:00
Rui Paulo	e60cf26f21	Port the DTrace helper ioctls to FreeBSD and add a helper member to dof_helper_t (needed by drti.o). Sponsored by: The FreeBSD Foundation	2010-08-21 11:58:08 +00:00
Rui Paulo	e0be1c75f0	Add sysname to struct opensolaris_utsname. This is needed by one DTrace test. Sponsored by: The FreeBSD Foundation	2010-08-21 11:41:32 +00:00
Warner Losh	5c0e643e73	First cut at mips n64 ABI support	2010-08-19 03:31:26 +00:00
Pawel Jakub Dawidek	8dc7024be4	In FreeBSD we use 'jailed' property. MFC after: 2 weeks	2010-08-07 10:23:54 +00:00
Martin Matuska	f4e7a6c3f1	Import two changesets from OpenSolaris to make future updates easier. The changes do not affect FreeBSD code because zfs_znode_move(), cleanlocks() and cleanshares() are not used. OpenSolaris onnv changeset: 9788:f660bc44f2e8, 9909:aa280f585a3e Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6843700, 6790232) MFC after: 7 weeks	2010-07-25 15:17:24 +00:00
Martin Matuska	34f56898a1	Consider snapshots as descendants via zfs allow -d OpenSolaris onnv changeset: 9847:2f3ba86e857a Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6809340) MFC after: 1 week	2010-07-24 22:28:29 +00:00
Andriy Gapon	a85d8d8acc	zfs arc_memory_throttle: available memory is free + cache OpenSolaris freemem has the same meaning as our v_free_count + v_cache_count. Obtained from: Artem Belevich <fbsdlist@src.cx>, Peter Jeremy <peterjeremy@acm.org> Discussed with: pjd MFC after: 2 weeks	2010-07-23 17:44:01 +00:00
Martin Matuska	2bacd082bd	Enable fake resolving of SMB RIDs by using nulldomain and UID_NOBODY - fixes panics when Solaris/OpenSolaris pools that contain files uploaded with the SMB protocol are accessed Enable seting/unsetting the sharesmb property (dummy action) - allows users who import pools from Solaris/Opensolaris to unset the sharesmb property and get rid of annoying messages PR: kern/145778, kern/148709 Approved by: pjd, delphij (mentor) MFC after: 7 weeks	2010-07-22 23:30:24 +00:00
Martin Matuska	f926b455e7	To improve latency, lower default vfs.zfs.vdev.max_pending from 35 to 10 OpenSolaris onnv changeset (partial): 10801:e0bf032e8673 Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6891731) MFC after: 1 week	2010-07-20 05:22:14 +00:00
Nathan Whitehorn	2785677d3d	Add OpenSolaris atomics for powerpc64 and connect ZFS to the build on this platform. Reviewed by: pjd	2010-07-17 13:34:01 +00:00
Nathan Whitehorn	04bcbbf81e	Increase stack size for ZFS sync thread. This is required to make ZFS function on 64-bit PowerPC. Reviewed by: pjd Obtained from: OpenSolaris changeset 14653:7cf402a7f374	2010-07-17 13:31:27 +00:00
John Baldwin	61e1c19319	Revert the previous commit. The race is not applicable to the lockmgr implementation in 8.0 and later as its flags field does not hold dynamic state such as waiters flags, but is only modified in lockinit() aside from VN_LOCK_*(). Discussed with: attilio	2010-07-16 19:52:03 +00:00

... 2 3 4 5 6 ...

757 Commits