571 Commits

Author SHA1 Message Date
pjd
1f39d42bbd Update per-thread I/O statistics collection in ZFS.
This allows to see processes I/O activity in 'top -m io' output.

PR		kern/156218
Reported by:	Marcus Reid <marcus@blazingdot.com>
Patch by:	avg
MFC after:	3 days
2011-10-21 21:49:34 +00:00
pjd
209c4b6ab9 zfs vdev_file_io_start: validate vdev before using vdev_tsd
vdev_tsd can be NULL for certain vdev states.
At least in userland testing with ztest.

Submitted by:	avg
MFC after:	3 days
2011-10-21 14:00:48 +00:00
pjd
57635fa52e - Correctly read gang header from raidz.
- Decompress assembled gang block data if compressed.
- Verify checksum of a gang header.
- Verify checksum of assembled gang block data.
- Verify checksum of uber block.

Submitted by:	avg
MFC after:	3 days
2011-10-20 15:42:38 +00:00
pjd
2b80d6e6dd Always pass data size for checksum verification function, as using
physical block size declared in bp may not always be what we want.
For example in case of gang block header physical block size declared
in bp is much larger than SPA_GANGBLOCKSIZE (512 bytes) and checksum
calculation failed. This bug could lead to accessing unallocated
memory and resets/failures during boot.

MFC after:	3 days
2011-10-19 23:44:38 +00:00
pjd
5dc66f9eb5 Initialize 'rc' properly before using it. This error could lead to infinite
loop when data reconstruction was needed.

MFC after:	3 days
2011-10-19 23:33:48 +00:00
pjd
7523365c00 Remove redundant size calculation.
MFC after:	3 days
2011-10-19 23:31:50 +00:00
mm
dd8c45ce3c Import fix for Illumos bug #1475 to reduce diff against upstream.
Panic caused by this bug was already partially fixed by pjd@
in p4 CH 185940 and 185942.

Reference:
1475 zfs spill block hold can access invalid spill blkptr
https://www.illumos.org/issues/1475

Reviewed by:	delphij
Obtained from:	Illumos (issue 1475, changeset 13469:b8e89e5c4167)
MFC after:	1 week
2011-10-18 13:58:22 +00:00
delphij
34ca38d01d Fix a bug in sa_find_sizes() which could lead to panic:
When calculating space needed for SA_BONUS buffers,
hdrsize is always rounded up to next 8-aligned boundary.
However, in two places the round up was done against
sum of 'total' plus hdrsize.  On the other hand,
hdrsize increments by 4 each time, which means in
certain conditions, we would end up returning with
will_spill == 0 and (total + hdrsize) larger than
full_space, leading to a failed assertion because
it's invalid for dmu_set_bonus.

Sponsored by:	iXsystems, Inc.
Reviewed by:	mm
MFC after:	3 days
2011-10-17 22:23:27 +00:00
marcel
b526ecc3e0 Define dtrace_cmpset_long in terms of atomic_cmpset_long
and not by virtue of inline assembly. Now this file
compiles on all supported architectures.
2011-10-16 22:18:08 +00:00
kmacy
99851f359e In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
avg
94836c37a8 zfs boot subroutines: correctly specify type of an integer literal
Found by adding more warning flags to zfs boot blocks build.

Approved by:	re (kib)
MFC after:	1 week
2011-09-13 14:07:05 +00:00
kib
a9d505a22a Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.

Document the changes to flags field to only require the page lock.

Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.

Reviewed by:    alc, attilio
Tested by:      marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by:	re (bz)
2011-09-06 10:30:11 +00:00
mm
e104c96f01 Generalize ffs_pages_remove() into vn_pages_remove().
Remove mapped pages for all dataset vnodes in zfs_rezget() using
new vn_pages_remove() to fix mmapped files changed by
zfs rollback or zfs receive -F.

PR:		kern/160035, kern/156933
Reviewed by:	kib, pjd
Approved by:	re (kib)
MFC after:	1 week
2011-08-25 08:17:39 +00:00
pjd
173bc8ca15 We need to unlock and destroy vnode attached to znode which we are freeing.
Reviewed by:	kib
Approved by:	re (bz)
MFC after:	1 week
2011-08-24 22:07:38 +00:00
mm
d76f038194 zfs_ioctl.c: improve code readability in zfs_ioc_dataset_list_next()
zvol.c: fix calling of dmu_objset_prefetch() in zvol_create_minors()
by passing full instead of relative dataset name and prefetching all
visible datasets to be processed later instead of just the pool name

Reviewed by:	pjd
Approved by:	re (kib)
MFC after:	1 week
> Reviewed by:   If someone else reviewed your modification.
> Approved by:   If you needed approval for this commit.
> Obtained from: If the change is from a third party.
> MFC after:     N [day[s]|week[s]|month[s]].  Request a reminder email.
> Security:      Vulnerability reference (one per line) or description.
> Empty fields above will be automatically removed.

M    opensolaris/uts/common/fs/zfs/zfs_ioctl.c
M    opensolaris/uts/common/fs/zfs/zvol.c
2011-08-13 21:35:22 +00:00
mm
2b43fcac26 Fix race between dmu_objset_prefetch() invoked from
zfs_ioc_dataset_list_next() and dsl_dir_destroy_check() indirectly
invoked from dmu_recv_existing_end() via dsl_dataset_destroy() by not
prefetching temporary clones, as these count as always inconsistent.
In addition, do not prefetch hidden datasets at all as we are not
going to process these later.

Filed as Illumos Bug #1346

PR:		kern/157728
Tested by:	Borja Marcos <borjam@sarenet.es>, mm
Reviewed by:	pjd
Approved by:	re (kib)
MFC after:	1 week
2011-08-13 10:58:53 +00:00
pjd
1d8972b3f3 Eliminate the zfsdev_state_lock entirely and replace it with the
spa_namespace_lock. This fixes LOR between the spa_namespace_lock and
spa_config lock. LOR can cause deadlock on vdevs removal/insertion.

Reported by:	gibbs, delphij
Tested by:	delphij
Approved by:	re (kib)
MFC after:	1 week
2011-08-12 07:04:16 +00:00
rwatson
4af919b491 Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *.  With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by:	re (bz)
Submitted by:	jonathan
Sponsored by:	Google Inc
2011-08-11 12:30:23 +00:00
mm
9a3c5b3af4 Fix panic in zfs_read() if IO_SYNC flag supplied by checking for
zfsvfs->z_log before calling zil_commit(). [1]
Do not call zfs_read() from zfs_getextattr() with the IO_SYNC flag.

Submitted by:	Alexander Zagrebin <alex@zagrebin.ru> [1]
Reviewed by:	pjd@
Approved by:	re (kib)
MFC after:	3 days
2011-08-02 11:28:33 +00:00
mm
877604c9bd Fix integer overflow in txg_delay() by initializing
the variable "timeout" as clock_t.

Filed as Illumos Bug #1313

Reviewed by:	avg
Approved by:	re (kib)
MFC after:	3 days
2011-08-01 14:50:31 +00:00
mm
f2d210f70e Fix serious bug in ZIL that can lead to pool corruption
in the case of a held dataset during remount.

Detailed description is available at:
https://www.illumos.org/issues/883

illumos-gate revision:	13380:161b964a0e10

Reviewed by:	pjd
Approved by:	re (kib)
Obtained from:	Illumos (Bug #883)
MFC after:	3 days
2011-07-30 19:00:31 +00:00
delphij
a2e6406c41 Bring the code more in-line with OpenSolaris source to
ease future port.

Reviewed by:	pjd, mm
Approved by:	re (kib)
2011-07-21 20:02:22 +00:00
delphij
e413ef16f1 A different implementation of r224231 proposed by pjd@,
which does not require change in the znode structure.
Specifically, it queries rdev from the znode in the
same sa_bulk_lookup already done in zfs_getattr().

Submitted by:	pjd (with some revisions)
Reviewed by:	pjd, mm
Approved by:	re (kib)
2011-07-21 20:01:51 +00:00
delphij
1a0f5fbe27 Add a new field to in-core znode, z_rdev, to represent device nodes.
PR:		kern/159010
Reviewed by:	mm@
Approved by:	re (kib)
MFC after:	2 weeks
2011-07-20 16:53:32 +00:00
mm
94656ca305 ZFS tries to allocate blocks evenly across all devices. This means when
devices are imbalanced zfs will lots of CPU searching for space on devices
which tend to be pretty full. It should instead fail quickly on the full
devices and move onto devices which have more availability.

New loader tunable: vfs.zfs.mg_alloc_failures (min = 8)

Illumos-gate changeset:	13379:4df42cc92254

Obtained from:	Illumos (Bug #1051)
MFC after:	2 weeks
2011-07-18 08:29:49 +00:00
mm
c5160d4717 Resurrect the ZFS "aclmode" property
Change default of "aclmode" to "discard".

Illumos-gate changeset:	13370:8c04143bd318

Obtained from:	Illumos (Feature #742)
MFC after:	2 weeks
2011-07-18 07:16:44 +00:00
attilio
364d0522f7 With retirement of cpumask_t and usage of cpuset_t for representing a
mask of CPUs, pc_other_cpus and pc_cpumask become highly inefficient.

Remove them and replace their usage with custom pc_cpuid magic (as,
atm, pc_cpumask can be easilly represented by (1 << pc_cpuid) and
pc_other_cpus by (all_cpus & ~(1 << pc_cpuid))).

This change is not targeted for MFC because of struct pcpu members
removal and dependency by cpumask_t retirement.

MD review by:	marcel, marius, alc
Tested by:	pluknet
MD testing by:	marcel, marius, gonzo, andreast
2011-07-04 12:04:52 +00:00
mm
333e34b938 Add a new "REFCOMPRESSRATIO" property.
For snapshots, this is the same as COMPRESSRATIO, but for
filesystems/volumes, the COMPRESSRATIO is based on the data "USED" (ie,
includes blocks in children, but not blocks shared with the origin).

This is needed to figure out how much space a filesystem would use if it
were not compressed (ignoring snapshots).

Illumos-gate revision:	13387

Obtained from:	Illumos (Feature #1092)
MFC after:	2 weeks
2011-06-28 07:52:01 +00:00
mm
953bf4fcbd Disable vdev cache (readahead) by default.
The vdev cache is very underutilized (hit ratio 30%-70%) and may consume
excessive memory on systems with many vdevs.

Illumos-gate revision:	13346

Obtained from:	Illumos (Bug #175)
MFC after:	1 week
2011-06-28 06:32:35 +00:00
benl
2071e3510a Fix clang warnings.
Approved by:	philip (mentor)
2011-06-18 13:56:33 +00:00
gibbs
58b2f49fd9 Remove C constructs that are incompatible with C++ from various
OpenSolaris and ZFS header files.  These changes are sufficient
to allow a C++ program to use the libzfs library.

Note: The majority of these files already included 'extern "C"'
      declarations, so the intention of providing C++ compatibility
      already existed even if it wasn't provided.

cddl/compat/opensolaris/include/assert.h:
	Wrap our compatibility assert implementation in
	'extern "C"'.  Since this is a compatibility header
	I matched the Solaris style of doing this explicitly
	rather than rely on FreeBSD's __BEGIN/END_DECLS macro.

sys/cddl/compat/opensolaris/sys/kstat.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/arc.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_pool.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/ddt.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h:
	Rename parameters in function declarations that conflict
	with C++ keywords.  This was the solution preferred by
	members of the Illumos community.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h:
	In C, nested structures are visible in the global namespace,
	but in C++, they take on the namespace of the structure in
	which they are contained.  Flatten nested structure
	definitions within struct zfs_cmd so these structures are
	visible in the global namespace when compiled in both
	languages.

Sponsored by:	 Spectra Logic Corporation
2011-06-10 20:10:30 +00:00
mm
c129cb6c4f Silence notice on pool creation, import and access.
Suggested by:	Jeremy Chadwick (freebsd-stable@)
Discussed with:	pjd
MFC after:	1 week
2011-06-07 20:46:31 +00:00
attilio
fcefe479fe MFC 2011-06-06 21:38:39 +00:00
mm
d3eac6f07d Remove empty #ifndef
MFC after:	3 days
2011-06-06 14:46:43 +00:00
attilio
8e66ca1ff1 MFC 2011-06-04 22:05:20 +00:00
avg
8f7fb42f83 opensolaris compat / zfs: avoid early overflow in ddi_get_lbolt*
Reported by:	David P. Discher <dpd@bitgravity.com>
Tested by:	will
Reviewed by:	art
Discussed with:	dwhite
MFC after:	2 weeks
2011-06-04 07:02:06 +00:00
attilio
b1bf71d3c5 MFC 2011-05-31 14:18:10 +00:00
pjd
b6ae7ca260 Imagine situation where a security problem is found in setuid binary.
User upgrades his system to fix the problem, but if he has any ZFS snapshots
for the file system which contains problematic binary, any user can mount the
snapshot and execute vulnerable binary.

Prevent this from happening by always mounting snapshots with setuid turned off.

MFC after:	2 weeks
2011-05-31 07:02:49 +00:00
attilio
eefddaeed6 MFC 2011-05-27 16:09:10 +00:00
pjd
7a7a27ed7e Silence warnings about unsupoorted value types.
MFC after:	2 weeks
2011-05-27 08:34:31 +00:00
attilio
867c6223e7 MFC 2011-05-26 17:38:00 +00:00
pjd
32108f817a Don't pass pointer to name buffer which is on the stack to another thread,
because the stack might be paged out once the other thread tries to use the
data. Instead, just allocate memory.

MFC after:	2 weeks
2011-05-24 20:10:12 +00:00
pjd
32c533f982 Don't access task structure once we call task function.
The task structure might be no longer available.
This also allows to eliminates the need for two tasks in the zio structure.

Submitted by:	anonymous
MFC after:	2 weeks
2011-05-24 20:07:15 +00:00
attilio
b580be6dfd MFC 2011-05-22 21:46:55 +00:00
rmacklem
09babc9515 Fix the zfs file system so that it uses the lock
flags argument added to VFS_FHTOVP() by r222167.

Reviewed by:	pjd
2011-05-22 21:04:32 +00:00
attilio
627bd73cdb MFC 2011-05-22 20:41:10 +00:00
rmacklem
fbb8a5e8ec Add a lock flags argument to the VFS_FHTOVP() file system
method, so that callers can indicate the minimum vnode
locking requirement. This will allow some file systems to choose
to return a LK_SHARED locked vnode when LK_SHARED is specified
for the flags argument. This patch only adds the flag. It
does not change any file system to use it and all callers
specify LK_EXCLUSIVE, so file system semantics are not changed.

Reviewed by:	kib
2011-05-22 01:07:54 +00:00
attilio
6a2b7fdc52 MFC 2011-05-18 16:01:29 +00:00
mm
36f936fb3c Restore old (v15) behaviour for a recursive snapshot destroy.
(zfs destroy -r pool/dataset@snapshot)

To destroy all descendent snapshots with the same name the top level
snapshot was not required to exist. So if the top level snapshot does
not exist, check permissions of the parent dataset instead.

Filed as Illumos Bug #1043

Reviewed by:	delphij
Approved by:	pjd
MFC after:	together with v28
2011-05-18 07:37:02 +00:00
attilio
d57a3c7c06 MFC 2011-05-16 16:34:03 +00:00