15 Commits

Author SHA1 Message Date
Pawel Jakub Dawidek
3a482ccc5e MFC r203504,r204067,r204073,r204101,r204804,r205079,r205080,r205132,r205133,
r205134,r205231,r205253,r205264,r205346,r206051,r206667,r206792,r206793,
    r206794,r206795,r206796,r206797:

r203504:

Open provider for writting when we find the right one. Opening too much
providers for writing provokes huge traffic related to taste events send
by GEOM on close. This can lead to various problems with opening GEOM
providers that are created on top of other GEOM providers.

Reorted by:	Kurt Touet <ktouet@gmail.com>, mr
Tested by:	mr, Baginski Darren <kickbsd@ya.ru>

r204067:

Update comment. We also look for GPT partitions.

r204073:

Add tunable and sysctl to skip hostid check on pool import.

r204101:

Don't set f_bsize to recordsize. It might confuse some software (like squid).

Submitted by:	Alexander Zagrebin <alexz@visp.ru>

r204804:

Remove racy assertion.

Reported by:	Attila Nagy <bra@fsn.hu>
Obtained from:	OpenSolaris, Bug ID 6827260

r205079:

Remove bogus assertion.

Reported by:	Johan Ström <johan@stromnet.se>
Obtained from:	OpenSolaris, Bug ID 6920880

r205080:

Force commit to correct Bug ID:

Obtained from:	OpenSolaris, Bug ID 6920880

r205132:

Don't bottleneck on acquiring the stream locks - this avoids a massive
drop off in throughput with large numbers of simultaneous reads

r205133:

fix compilation under ZIO_USE_UMA

r205134:

make UMA the default allocator for ZFS buffers - this avoids
a great deal of contention in kmem_alloc

r205231:

- reduce contention by breaking up ARC state locks in to 16 for data
  and 16 for metadata
- export L2ARC tunables as sysctls
- add several kstats to track L2ARC state more precisely
- avoid holding a contended lock when atomically incrementing a
  contended counter (no lock protection needed for atomics)

r205253:

use CACHE_LINE_SIZE instead of hardcoding 128 for lock pad

pointed out by Marius Nuennerich and jhb@

r205264:

- cache line align arcs_lock array (h/t Marius Nuennerich)
- fix ARCS_LOCK_PAD to use architecture defined CACHE_LINE_SIZE
- cache line align buf_hash_table ht_locks array

r205346:

The same code is used to import and to create pool.
The order of operations is the following:
1. Try to open vdev by remembered path and guid.
2. If 1 failed, try to find vdev which guid matches and ignore the path.
3. If 2 failed this means either that the vdev we're looking for is gone
   or that pool is being created and vdev doesn't contain proper guid yet.
   To be able to handle pool creation we open vdev by path anyway.

Because of 3 it is possible that we open wrong vdev on import which can lead to
confusions.

The solution for this is to check spa_load_state. On pool creation it will be
equal to SPA_LOAD_NONE and we can open vdev only by path immediately and if it
is not equal to SPA_LOAD_NONE we first open by path+guid and when that fails,
we open by guid. We no longer open wrong vdev on import.

r206051:

IOCPARM_MAX defines maximum size of a structure that can be passed
directly to ioctl(2). Because of how ioctl command is build using _IO*()
macros we have only 13 bits to encode structure size. So the structure
can be up to 8kB-1.

Currently we define IOCPARM_MAX as PAGE_SIZE.

This is IMHO wrong for three main reasons:

1. It is confusing on archs with page size larger than 8kB (not really
   sure if we support such archs (sparc64?)), as even if PAGE_SIZE is
   bigger than 8kB, we won't be able to encode anything larger in ioctl
   command.

2. It is a waste. Why the structure can be only 4kB on most archs if we
   have 13 bits dedicated for that, not 12?

3. It shouldn't depend on architecture and page size. My ioctl command
   can work on one arch, but can't on the other?

Increase IOCPARM_MAX to 8kB and make it independed of PAGE_SIZE and
architecture it is compiled for. This allows to use all the bits on all the
archs for size. Note that this doesn't mean we will copy more on every ioctl(2)
call. No. We still copyin(9)/copyout(9) only exact number of bytes encoded in
ioctl command.

Practical use for this change is ZFS. zfs_cmd_t structure used for ZFS
ioctls is larger than 4kB.

Silence on:	arch@

r206667:

Fix 3-way deadlock that can happen because of ZFS and vnode lock
order reversal.

thread0 (vfs_fhtovp)	thread1 (vop_getattr)	thread2 (zfs_recv)
--------------------	---------------------	------------------
			vn_lock
rrw_enter_read
						rrw_enter_write (hangs)
			rrw_enter_read (hangs)
vn_lock (hangs)

Reported by:	Attila Nagy <bra@fsn.hu>

r206792:

Set ARC_L2_WRITING on L2ARC header creation.

Obtained from:	OpenSolaris

r206793:

Remove racy assertion.

Obtained from:	OpenSolaris

r206794:

Extend locks scope to match OpenSolaris.

r206795:

Add missing list and lock destruction.

r206796:

Style fixes.

r206797:

Restore previous order.
2010-04-18 21:36:34 +00:00
Xin LI
b1ebb318cb MFC r201690:
Space cleanup for revision 202669 committed separately for easier review.
This commit is purely space changes.

Submitted by:	Matt Reimer
Sponsored by:	VPOP Technologies, Inc.
2010-01-20 01:14:54 +00:00
Xin LI
ac07939f0e MFC r201689:
Instead of assuming all vdevs are healthy, check the newest vdev label
for each vdev's status.  Booting from a degraded vdev should now be
more robust.

Submitted by:	Matt Reimer <mattjreimer at gmail.com>
Sponsored by:	VPOP Technologies, Inc.
2010-01-20 01:13:52 +00:00
John Baldwin
5e05dbe9bd MFC 200309:
- Port bios_getmem() from libi386 to {gpt,}zfsboot() and use it to
  safely allocate a heap region above 1MB.  This enables {gpt,}zfsboot()
  to allocate much larger buffers than before.
- Use a larger buffer (1MB instead of 128K) for temporary ZFS buffers.  This
  allows more reliable reading of compressed files in a raidz/raidz2 pool.
2009-12-18 21:01:56 +00:00
Robert Noland
262b2ce076 MFC 198420
Correct some issues with zfs boot.

   - Teach it to read gang blocks. (essentially untested)
     If you see "ZFS: gang block detected!", please let
     me know, so we can either remove the printf if it
     works, or fix it if it doesn't.

   - If multiple partitions exist on a disk, probe them all.
     We also need to reset dsk->start to 0 to read the right
     sector here.

   - With GPT, we can have 128 partitions.

   - If the bootfs property has ever been set on a pool
     it seems that it never goes away.  zpool won't allow
     you to add to the pool with the bootfs property set.
     However, if you clear the property back to default
     we end up getting 0 for the object number and read
     a bogus block pointer and fail to boot.

   - Fix some error printfs. The printf in the loader is
     only capable of c,s and u formats.

   - Teach printf how to display %llu
2009-11-21 15:02:35 +00:00
Doug Rabson
e1899ef6c8 Add support for booting from raidz1 and raidz2 pools. 2009-05-16 10:48:20 +00:00
Doug Rabson
7b3569ff05 Use full 64bit arithmetic when converting file offsets to block numbers - fixes
booting on filesystems with inode numbers with values above 4194304.

Submitted by:	ps
2008-12-17 18:12:01 +00:00
Paul Saab
5ee5aed0a3 Fix a leak introduced in r185902. We should free the devspec if
we've successfully found a zfs pool.
2008-12-11 16:48:35 +00:00
Paul Saab
390edcc5b9 Avoid a double free in devopen by not freeing the device structure
in zfs_dev_open.  This stops a panic in the loader when trying to
read from a zfs device and no zfs devices exist.
2008-12-11 02:23:49 +00:00
Doug Rabson
937a012e5d Don't get confused if we encounter a device which is part of a raidz or raidz2
pool while probing for vdevs.

PR:		129539
Submitted by:	Paul Wootton (paul at fletchermoorland dot co dot uk)
2008-12-10 10:46:34 +00:00
Paul Saab
8f6a8ed553 Correct include path for i386 specific includes. This allows zfs
to boot on systems where the loader is built on amd64 systems.
2008-12-06 14:45:03 +00:00
Doug Rabson
ebd4055a33 Fix amd64 build and re-enable gptzfsboot. 2008-11-22 14:24:55 +00:00
Doug Rabson
0d16312b46 Some zfsboot fixes from Norikatsu Shigemura:
1. zfsboot2 (boot2) doesn't %d (printf), so change %d to %u.
2. chase new zpool versioning as SPA_VERSION.
   Obtained from: sys/cddl/contrib/opensolaris/uts/common/sys/fs/zfs.h

Submitted by:	nork
2008-11-19 16:59:19 +00:00
Doug Rabson
51f0d2e192 Add a GPT-aware variant of zfsboot which should be used in a similar manner
to gptboot, i.e. installed in a freebsd-boot partition using /sbin/gpart or
/sbin/gpt.

Tweak the /boot/loader ZFS support so that it can find ZFS pools that are
contained in GPT partitions.
2008-11-19 16:39:01 +00:00
Pawel Jakub Dawidek
1ba4a712dd Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes.
This bring huge amount of changes, I'll enumerate only user-visible changes:

- Delegated Administration

	Allows regular users to perform ZFS operations, like file system
	creation, snapshot creation, etc.

- L2ARC

	Level 2 cache for ZFS - allows to use additional disks for cache.
	Huge performance improvements mostly for random read of mostly
	static content.

- slog

	Allow to use additional disks for ZFS Intent Log to speed up
	operations like fsync(2).

- vfs.zfs.super_owner

	Allows regular users to perform privileged operations on files stored
	on ZFS file systems owned by him. Very careful with this one.

- chflags(2)

	Not all the flags are supported. This still needs work.

- ZFSBoot

	Support to boot off of ZFS pool. Not finished, AFAIK.

	Submitted by:	dfr

- Snapshot properties

- New failure modes

	Before if write requested failed, system paniced. Now one
	can select from one of three failure modes:
	- panic - panic on write error
	- wait - wait for disk to reappear
	- continue - serve read requests if possible, block write requests

- Refquota, refreservation properties

	Just quota and reservation properties, but don't count space consumed
	by children file systems, clones and snapshots.

- Sparse volumes

	ZVOLs that don't reserve space in the pool.

- External attributes

	Compatible with extattr(2).

- NFSv4-ACLs

	Not sure about the status, might not be complete yet.

	Submitted by:	trasz

- Creation-time properties

- Regression tests for zpool(8) command.

Obtained from:	OpenSolaris
2008-11-17 20:49:29 +00:00