freebsd-skq

Author	SHA1	Message	Date
Andriy Gapon	dfe3a1b374	cyclic xcall: use smp_no_rendevous_barrier as setup function parameter In this case we call target function only on a single CPU and do not need any synchronization at the setup stage. It's a bit non-obvious but setup function of NULL means that smp_rendezvous_cpus waits for all CPUs to arrive at the rendezvous point, but without doing any actual setup. While using smp_no_rendevous_barrier means that each CPU proceeds on its own schedule without any synchronization whatsoever. MFC after: 3 weeks	2010-12-17 18:22:50 +00:00
Pawel Jakub Dawidek	8735863465	Remove redundant semicolon and empty like.	2010-12-11 13:35:25 +00:00
Ivan Voras	d7ccd95be8	Undo r216230: the interaction between saved ashift in metadata and detected ashift does not support this. With this change, pools created while stripesize=512 could not be imported when stripesize becomes larger (on the same drive). Noticed by: pjd	2010-12-07 15:24:08 +00:00
Andriy Gapon	58f61ce4eb	opensolaris cyclic: fix deadlock and make a little bit closer to upstream The dealock was caused in the following way: - thread T1 on CPU C1 holds a spin mutex, IPIs CPU C2 and waits for the IPI to be handled - C2 executes timer interrupt filter, thus has interrupts disabled, and gets blocked on the spin mutex held by T1 The problem seems to have been introduced by simplifications made to OpenSolaris code during porting. The problem is fixed by reorganizing the code to more closely resemble the upstream version. Interrupt filter (cyclic_fire) now doesn't acquire any locks, all per-CPU data accesses are performed on a target CPU with preemption and interrupts disabled thus precluding concurrent access to the data. cyp_mtx spin mutex is used to disable preemtion and interrupts; it's not used for classical mutual exclusion, because xcall already serializes calls to a CPU. It's an emulation of OpenSolaris cyb_set_level(CY_HIGH_LEVEL) call, the spin mutexes could probably be reduced to just a spinlock_enter()/_exit() pair. Diff with upstream version is now reduced by ~500 lines, however it still remains quite large - many things that are not needed (at the moment) or are irrelevant on FreeBSD were simply ripped out during porting. Examples of such things: - support for CPU onlining/offlining - support for suspend/resume - support for running callouts at soft interrupt levels - support for callout rebinding from CPU to CPU - support for CPU partitions Tested by: Artem Belevich <fbsdlist@src.cx> MFC after: 3 weeks X-MFC with: r216252	2010-12-07 12:25:26 +00:00
Andriy Gapon	a10b0e67d9	opensolaris cyclic xcall: no need for special handling of curcpu smp_rendezvous_cpus already properly handles current CPU case and non-SMP case. MFC after: 3 weeks	2010-12-07 12:04:06 +00:00
Andriy Gapon	fe8c7b3d77	dtrace_xcall: no need for special handling of curcpu smp_rendezvous_cpus alreadt does the right thing in a very similar fashion, so the code was kind of duplicating that. MFC after: 3 weeks	2010-12-07 09:19:47 +00:00
Andriy Gapon	7becfa95b9	dtrace_gethrtime_init: pin to master while examining other CPUs Also use pc_cpumask to be future-friendly. Reviewed by: jhb MFC after: 2 weeks	2010-12-07 09:03:17 +00:00
Ivan Voras	8b08562112	Use GEOM stripesize field when calculating ashift. This will enable correct alignment on drives with large sector sizes (e.g. 4 KiB) but the implementation might need to be revisited if devices with large stripesizes appear (e.g. if RAID controllers or flash drives start using the field), probably by introducing a physsectorsize field in GEOM providers. Discussed with: mav, mostly silence on freebsd-geom@ and freebsd-fs@	2010-12-06 12:18:02 +00:00
Edward Tomasz Napierala	de2a57325d	Don't panic when we read an empty ACL from ZFS. Apparently this may happen with filesystems created under MacOS X ZFS port. This is kind of filesystem corruption (we don't allow for setting empty ACLs), so make acl_get_file(3) and related syscalls fail with EINVAL in that case. In theory, we could return empty ACL to userland, but I'm afraid this would break some code. MFC after: 3 days	2010-11-30 21:04:05 +00:00
Andriy Gapon	c59690f249	zfs+sendfile: populate all requested pages, not just those already cached kern_sendfile() uses vm_rdwr() to read-ahead blocks of data to populate page cache. When sendfile stumbles upon a page that is not populated yet, it sends out all the mbufs that it collected so far. This resulted in very poor performance with ZFS when file data is not in the page cache, because ZFS vop_read for UIO_NOCOPY case populated only those pages that are already in cache, but not valid. Which means that most of the time it populated only the first requested page in the described above scenario. Reported by: Alexander Zagrebin <alexz@visp.ru> Tested by: Alexander Zagrebin <alexz@visp.ru>, Artemiev Igor <ai@kliksys.ru> MFC after: 12 days	2010-11-16 15:53:44 +00:00
Andriy Gapon	f9e2e99d5d	fix misspelling in a comment Reported by: Daniel Braniss <danny@cs.huji.ac.il> MFC after: 3 days	2010-11-16 12:30:47 +00:00
Martin Matuska	8db47aa15e	Disable VFS_HOLD placed on mnt_vnodecovered during the mount of a snapshot and VFS_RELE on a non-existing hold on snapshot parent's z_vfs. This disables the changes from OpenSolaris onnv-revision 9234:bffdc4fc05c4 (bug IDs: 6792139, 6794830) - not applicable to FreeBSD. This fixes the process hang if umounting a manually mounted snapshot. Reported by: Alexander Zagrebin <alexz@visp.ru> Approved by: delphij (mentor) MFC after: 1 week	2010-11-13 21:09:18 +00:00
Xin LI	b97a9057c2	Validate whether the zfs_cmd_t submitted from userland is not smaller than what we have. Without the check the kernel could accessing memory that does not belong to the request struct. Note that we do not test if the struct equals in size at this time, which may faciliate forward compatibility with newer binaries. Reviewed by: pjd at MeetBSD CA '2010 MFC after: 1 week	2010-11-05 22:18:09 +00:00
Martin Matuska	e25376bdd0	Bugfix merge from OpenSolaris: OpenSolaris onnv-revision: 10209:91f47f0e7728 6830541 zfs_get_data_trips on a verify 6696242 multiple zfs_fillpage() zfs: accessing past end of object panics 6785914 zfs fails to drop dn_struct_rwlock in recovery code path Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6830541, 6696242, 6785914) MFC after: 2 weeks	2010-10-26 15:48:03 +00:00
Andriy Gapon	23a1bcf8c6	zfs: add vop_getpages method implementation This should make vnode_pager_getpages path a bit shorter and clearer. Also this should eliminate problems with partially valid pages. Having this method opens room for future optimizations. To do: try to satisfy other pages besides the required one taking into account tradeofs between number of page faults, read throughput and read latency. Also, eventually vop_putpages should be added too. Reviewed by: kib, mm, pjd MFC after: 3 weeks	2010-10-16 20:43:05 +00:00
Rui Paulo	910a5e18ba	Pass a format string to panic() and to taskqueue_start_threads(). Found with: clang	2010-10-13 17:13:43 +00:00
Rui Paulo	6e634bb80f	In zfs_post_common(), use %d instead of %hhu. Found with: clang	2010-10-13 17:12:23 +00:00
Andriy Gapon	f6bb41924c	zfs + sendfile: do not produce partially valid pages for vnode's tail Since r212650 and before this change sendfile(2) could produce a partially valid page for a trailing portion of a ZFS vnode. vm_fault() always wants to see a fully valid page even if it's the last page that partially extends beyond vnode's end. Otherwise it calls vop_getpages() to bring in the page. In the case of ZFS this means that the data is read from the page into the same page and this breaks checks in ZFS mappedread() - a thread that set VPO_BUSY on the page in vm_fault() will get blocked forever waiting for it to be cleared. Many thanks to Kai and Jeremy for reproducing the issue and providing important debugging information and help. Reported by: Kai Gallasch <gallasch@free.de>, Jeremy Chadwick <freebsd@jdc.parodius.com> Tested by: Kai Gallasch <gallasch@free.de>, Jeremy Chadwick <freebsd@jdc.parodius.com> Reviewed by: kib MFC after: 3 days To-Do: apply the same treatment to tmpfs + sendfile	2010-10-12 17:04:21 +00:00
Pawel Jakub Dawidek	19ebc67beb	Provide internal ioflags() function that converts ioflag provided by FreeBSD's VFS to OpenSolaris-specific ioflag expected by ZFS. Use it for read and write operations. Reviewed by: mm MFC after: 1 week	2010-10-10 20:49:33 +00:00
Martin Matuska	a362d75576	Change FAPPEND to IO_APPEND as this is a ioflag and not a fflag. This corrects writing to append-only files on ZFS. PR: kern/149495 [1], kern/151082 [2] Submitted by: Daniel Zhelev <daniel@zhelev.biz> [1], Michael Naef <cal@linu.gs> [2] Approved by: delphij (mentor) MFC after: 1 week	2010-10-08 23:01:38 +00:00
Andriy Gapon	6c6aca1203	opensolaris_kmem kmem_size(): report lesser of vm_kmem_size and available physical memory This is needed to correctly autotune ZFS ARC size when vm_kmem_size is set to value larger than available physical memory. MFC after: 2 weeks	2010-10-07 18:16:14 +00:00
Martin Matuska	aa007a9f0e	Properly handle IO with B_FAILFAST Retry IO once with ZIO_FLAG_TRYHARD before declaring a pool faulted OpenSolaris revision and Bug IDs: 9725:0bf7402e8022 6843014 ZFS B_FAILFAST handling is broken Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6843014) MFC after: 3 weeks	2010-09-27 09:42:31 +00:00
Martin Matuska	96a1a6a568	Enable offlining of log devices. OpenSolaris revision and Bug IDs: 9701:cc5b64682e64 6803605 should be able to offline log devices 6726045 vdev_deflate_ratio is not set when offlining a log device 6599442 zpool import has faults in the display Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6803605, 6726045, 6599442) MFC after: 3 weeks	2010-09-27 09:05:51 +00:00
Andriy Gapon	68653c3bd6	zfs_map_page/zfs_unmap_page: do not use sched_pin() and SFB_CPUPRIVATE zfs_map_page/zfs_unmap_page are mostly called around potential I/O paths and it seems to be a not very good idea to do cpu pinning there. Suggested by: kib MFC after: 2 weeks	2010-09-21 05:58:45 +00:00
Andriy Gapon	ff5e15a487	zfs_vnops: use zfs_map_page/zfs_unmap_page helper functions in another place MFC after: 2 weeks	2010-09-21 05:54:36 +00:00
Andriy Gapon	9d5eb9aa5d	zfs arc_reclaim_needed: fix typo in mismerge in r212780 PR: kern/146410, kern/138790 MFC after: 3 weeks X-MFC with: r212780	2010-09-17 07:34:50 +00:00
Andriy Gapon	921d3fd122	zfs+sendfile: advance uio_offset upon reading as well Picked from analogous code in tmpfs. MFC after: 1 week	2010-09-17 07:20:20 +00:00
Andriy Gapon	44532bc5cd	zfs arc_reclaim_needed: remove redundant checks for arc_c_max and arc_c_max Those checks are not present in upstream code and they are enforced in actual calculations of delta by which ARC size can be grown or should be reduced. MFC after: 3 weeks	2010-09-17 07:17:38 +00:00
Andriy Gapon	7c1353491f	zfs arc_reclaim_needed: more reasonable threshold for available pages vm_paging_target() is not a trigger of any kind for pageademon, but rather a "soft" target for it when it's already triggered. Thus, trying to keep 2048 pages above that level at the expense of ARC was simply driving ARC size into the ground even with normal memory loads. Instead, use a threshold at which a pagedaemon scan is triggered, so that ARC reclaiming helps with pagedaemon's task, but the latter still recycles active and inactive pages. PR: kern/146410, kern/138790 MFC after: 3 weeks	2010-09-17 07:14:07 +00:00
Martin Matuska	d1ee63f836	Fix kernel panic when moving a file to .zfs/shares Fix possible loss of correct error return code in ZFS mount OpenSolaris revisions and Bug IDs: 11824:53128e5db7cf 6863610 ZFS mount can lose correct error return 12079:13822b941977 6939941 problem with moving files in zfs (142901-12) Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6863610, 6939941) MFC after: 3 days	2010-09-15 19:55:26 +00:00
Andriy Gapon	8a3883cfb7	zfs vn_has_cached_data: take into account v_object->cache != NULL This mirrors code in tmpfs. This changge shouldn't affect much read path, it may cause unnecessary vm_page_lookup calls in the case where v_object has no active or inactive pages but has some cache pages. I believe this situation to be non-essential. In write path this change should allow us to properly detect the above case and free a cache page when we write to a range that corresponds to it. If this situation is undetected then we could have a discrepancy between data in page cache and in ARC or on disk. This change allows us to re-enable vn_has_cached_data() check in zfs_write. NOTE: strictly speaking resident_page_count and cache fields of v_object should be exmined under VM_OBJECT_LOCK, but for this particular usage we may get away with it. Discussed with: alc, kib Approved by: pjd Tested with: tools/regression/fsx MFC after: 3 weeks	2010-09-15 11:05:41 +00:00
Andriy Gapon	0b1ca38a69	zfs mappedread, update_pages: use int for offset and length within a page uint64_t, int64_t were redundant there Approved by: pjd Tested by: tools/regression/fsx MFC after: 2 weeks	2010-09-15 10:48:16 +00:00
Andriy Gapon	c002c3e8c2	zfs mappedread: use uiomove_fromphys where possible Reviewed by: alc Approved by: pjd Tested by: tools/regression/fsx MFC after: 2 weeks	2010-09-15 10:44:20 +00:00
Andriy Gapon	fbbdb19dcd	zfs: catch up with vm_page_sleep_if_busy changes Reviewed by: alc Approved by: pjd Tested by: tools/regression/fsx MFC after: 2 weeks	2010-09-15 10:39:21 +00:00
Andriy Gapon	21bd3e2576	tmpfs, zfs + sendfile: mark page bits as valid after populating it with data Otherwise, adding insult to injury, in addition to double-caching of data we would always copy the data into a vnode's vm object page from backend. This is specific to sendfile case only (VOP_READ with UIO_NOCOPY). PR: kern/141305 Reported by: Wiktor Niesiobedzki <bsd@vink.pl> Reviewed by: alc Tested by: tools/regression/sockets/sendfile MFC after: 2 weeks	2010-09-15 10:31:27 +00:00
Martin Matuska	9a13d2e1b3	Remove duplicated VFS_HOLD due to a mismerge. PR: kern/150544 Approved by: delphij (mentor) MFC after: 1 day	2010-09-14 12:12:18 +00:00
Martin Matuska	4eeef2e44a	Add missing vop_vector zfsctl_ops_shares Add missing locks around VOP_READDIR and VOP_GETATTR with z_shares_dir PR: kern/150544 Approved by: delphij (mentor) Obtained from: perforce (pjd) MFC after: 1 day	2010-09-14 10:27:32 +00:00
Pawel Jakub Dawidek	3c907063e9	Remove the page queues lock around vm_page_undirty() - it is no longer needed. Reviewed by: alc	2010-09-13 19:47:09 +00:00
Rui Paulo	47047e3418	Revamp locking a bit. This fixes three problems: * processes now can't go away while we are inserting probes (fixes a panic) * if a trap happens, we won't be holding the process lock (fixes a hang) * fix a LOR between the process lock and the fasttrap bucket list lock Thanks to kib for pointing some problems. Sponsored by: The FreeBSD Foundation	2010-09-12 14:12:16 +00:00
Rui Paulo	eae81e9501	Avoid a LOR (sleepable after non-sleepable) in fasttrap_tracepoint_enable(). Sponsored by: The FreeBSD Foundation	2010-09-11 12:58:31 +00:00
Matthew D Fleming	4d369413e1	Replace sbuf_overflowed() with sbuf_error(), which returns any error code associated with overflow or with the drain function. While this function is not expected to be used often, it produces more information in the form of an errno that sbuf_overflowed() did.	2010-09-10 16:42:16 +00:00
Pawel Jakub Dawidek	6a85b5e08a	Forgot to commit this file. Add ZPOOL_CONFIG_IS_LOG. Reported by: keramida MFC after: 2 weeks	2010-09-10 04:44:13 +00:00
Pawel Jakub Dawidek	86b19d1861	On FreeBSD we can log from pool that have multiple top-level vdevs or log vdevs, so don't deny adding new vdevs if bootfs property is set. MFC after: 2 weeks	2010-09-09 21:20:18 +00:00
Rui Paulo	d3555b6fc2	Fix two bugs in DTrace: * when the process exits, remove the associated USDT probes * when the process forks, duplicate the USDT probes. Sponsored by: The FreeBSD Foundation	2010-09-09 09:58:05 +00:00
Justin T. Gibbs	f03f7a0ca3	Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic. Add the BIO_ORDERED flag for struct bio and update bio clients to use it. The barrier semantics of bioq_insert_tail() were broken in two ways: o In bioq_disksort(), an added bio could be inserted at the head of the queue, even when a barrier was present, if the sort key for the new entry was less than that of the last queued barrier bio. o The last_offset used to generate the sort key for newly queued bios did not stay at the position of the barrier until either the barrier was de-queued, or a new barrier (which updates last_offset) was queued. When a barrier is in effect, we know that the disk will pass through the barrier position just before the "blocked bios" are released, so using the barrier's offset for last_offset is the optimal choice. sys/geom/sched/subr_disk.c: sys/kern/subr_disk.c: o Update last_offset in bioq_insert_tail(). o Only update last_offset in bioq_remove() if the removed bio is at the head of the queue (typically due to a call via bioq_takefirst()) and no barrier is active. o In bioq_disksort(), if we have a barrier (insert_point is non-NULL), set prev to the barrier and cur to it's next element. Now that last_offset is kept at the barrier position, this change isn't strictly necessary, but since we have to take a decision branch anyway, it does avoid one, no-op, loop iteration in the while loop that immediately follows. o In bioq_disksort(), bypass the normal sort for bios with the BIO_ORDERED attribute and instead insert them into the queue with bioq_insert_tail(). bioq_insert_tail() not only gives the desired command order during insertion, but also provides barrier semantics so that commands disksorted in the future cannot pass the just enqueued transaction. sys/sys/bio.h: Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio. sys/cam/ata/ata_da.c: sys/cam/scsi/scsi_da.c Use an ordered command for SCSI/ATA-NCQ commands issued in response to bios with the BIO_ORDERED flag set. sys/cam/scsi/scsi_da.c Use an ordered tag when issuing a synchronize cache command. Wrap some lines to 80 columns. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c sys/geom/geom_io.c Mark bios with the BIO_FLUSH command as BIO_ORDERED. Sponsored by: Spectra Logic Corporation MFC after: 1 month	2010-09-02 19:40:28 +00:00
Rui Paulo	ea950d20f6	Make the /dev/dtrace/helper node have the mode 0660. This allows programs that refuse to run as root (pgsql) to install probes when their user is part of the wheel group. Sponsored by: The FreeBSD Foundation	2010-09-01 12:08:32 +00:00
Jaakko Heinonen	de478dd4b4	execve(2) has a special check for file permissions: a file must have at least one execute bit set, otherwise execve(2) will return EACCES even for an user with PRIV_VFS_EXEC privilege. Add the check also to vaccess(9), vaccess_acl_nfs4(9) and vaccess_acl_posix1e(9). This makes access(2) to better agree with execve(2). Because ZFS doesn't use vaccess(9) for VEXEC, add the check to zfs_freebsd_access() too. There may be other file systems which are not using vaccess*() functions and need to be handled separately. PR: kern/125009 Reviewed by: bde, trasz Approved by: pjd (ZFS part)	2010-08-30 16:30:18 +00:00
Pawel Jakub Dawidek	b8a4becc2d	Return NULL pointer instead of B_FALSE as it is done in the vendor code. Obtained from: //depot/user/pjd/zfs/...	2010-08-28 19:29:06 +00:00
Pawel Jakub Dawidek	3e9e888541	Move ZUT_OBJS in the same place that is used in vendor code. Obtained from: //depot/user/pjd/zfs/...	2010-08-28 19:28:12 +00:00
Martin Matuska	8d87b396f8	Import changes from OpenSolaris that provide - better ACL caching and speedup of ACL permission checks - faster handling of stat() - lowered mutex contention in the read/writer lock (rrwlock) - several related bugfixes Detailed information (OpenSolaris onnv changesets and Bug IDs): 9749:105f407a2680 6802734 Support for Access Based Enumeration (not used on FreeBSD) 6844861 inconsistent xattr readdir behavior with too-small buffer 9866:ddc5f1d8eb4e 6848431 zfs with rstchown=0 or file_chown_self privilege allows user to "take" ownership 9981:b4907297e740 6775100 stat() performance on files on zfs should be improved 6827779 rrwlock is overly protective of its counters 10143:d2d432dfe597 6857433 memory leaks found at: zfs_acl_alloc/zfs_acl_node_alloc 6860318 truncate() on zfsroot succeeds when file has a component of its path set without access permission 10232:f37b85f7e03e 6865875 zfs sometimes incorrectly giving search access to a dir 10250:b179ceb34b62 `6867395` zpool_upgrade_007_pos testcase panic'd with BAD TRAP: type=e (#pf Page fault) 10269:2788675568fd 6868276 zfs_rezget() can be hazardous when znode has a cached ACL 10295:f7a18a1e9610 6870564 panic in zfs_getsecattr Approved by: delphij (mentor) Obtained from: OpenSolaris (multiple Bug IDs) MFC after: 2 weeks	2010-08-28 09:24:11 +00:00

1 2 3 4 5 ...

485 Commits