r205134,r205231,r205253,r205264,r205346,r206051,r206667,r206792,r206793,
r206794,r206795,r206796,r206797:
r203504:
Open provider for writting when we find the right one. Opening too much
providers for writing provokes huge traffic related to taste events send
by GEOM on close. This can lead to various problems with opening GEOM
providers that are created on top of other GEOM providers.
Reorted by: Kurt Touet <ktouet@gmail.com>, mr
Tested by: mr, Baginski Darren <kickbsd@ya.ru>
r204067:
Update comment. We also look for GPT partitions.
r204073:
Add tunable and sysctl to skip hostid check on pool import.
r204101:
Don't set f_bsize to recordsize. It might confuse some software (like squid).
Submitted by: Alexander Zagrebin <alexz@visp.ru>
r204804:
Remove racy assertion.
Reported by: Attila Nagy <bra@fsn.hu>
Obtained from: OpenSolaris, Bug ID 6827260
r205079:
Remove bogus assertion.
Reported by: Johan Ström <johan@stromnet.se>
Obtained from: OpenSolaris, Bug ID 6920880
r205080:
Force commit to correct Bug ID:
Obtained from: OpenSolaris, Bug ID 6920880
r205132:
Don't bottleneck on acquiring the stream locks - this avoids a massive
drop off in throughput with large numbers of simultaneous reads
r205133:
fix compilation under ZIO_USE_UMA
r205134:
make UMA the default allocator for ZFS buffers - this avoids
a great deal of contention in kmem_alloc
r205231:
- reduce contention by breaking up ARC state locks in to 16 for data
and 16 for metadata
- export L2ARC tunables as sysctls
- add several kstats to track L2ARC state more precisely
- avoid holding a contended lock when atomically incrementing a
contended counter (no lock protection needed for atomics)
r205253:
use CACHE_LINE_SIZE instead of hardcoding 128 for lock pad
pointed out by Marius Nuennerich and jhb@
r205264:
- cache line align arcs_lock array (h/t Marius Nuennerich)
- fix ARCS_LOCK_PAD to use architecture defined CACHE_LINE_SIZE
- cache line align buf_hash_table ht_locks array
r205346:
The same code is used to import and to create pool.
The order of operations is the following:
1. Try to open vdev by remembered path and guid.
2. If 1 failed, try to find vdev which guid matches and ignore the path.
3. If 2 failed this means either that the vdev we're looking for is gone
or that pool is being created and vdev doesn't contain proper guid yet.
To be able to handle pool creation we open vdev by path anyway.
Because of 3 it is possible that we open wrong vdev on import which can lead to
confusions.
The solution for this is to check spa_load_state. On pool creation it will be
equal to SPA_LOAD_NONE and we can open vdev only by path immediately and if it
is not equal to SPA_LOAD_NONE we first open by path+guid and when that fails,
we open by guid. We no longer open wrong vdev on import.
r206051:
IOCPARM_MAX defines maximum size of a structure that can be passed
directly to ioctl(2). Because of how ioctl command is build using _IO*()
macros we have only 13 bits to encode structure size. So the structure
can be up to 8kB-1.
Currently we define IOCPARM_MAX as PAGE_SIZE.
This is IMHO wrong for three main reasons:
1. It is confusing on archs with page size larger than 8kB (not really
sure if we support such archs (sparc64?)), as even if PAGE_SIZE is
bigger than 8kB, we won't be able to encode anything larger in ioctl
command.
2. It is a waste. Why the structure can be only 4kB on most archs if we
have 13 bits dedicated for that, not 12?
3. It shouldn't depend on architecture and page size. My ioctl command
can work on one arch, but can't on the other?
Increase IOCPARM_MAX to 8kB and make it independed of PAGE_SIZE and
architecture it is compiled for. This allows to use all the bits on all the
archs for size. Note that this doesn't mean we will copy more on every ioctl(2)
call. No. We still copyin(9)/copyout(9) only exact number of bytes encoded in
ioctl command.
Practical use for this change is ZFS. zfs_cmd_t structure used for ZFS
ioctls is larger than 4kB.
Silence on: arch@
r206667:
Fix 3-way deadlock that can happen because of ZFS and vnode lock
order reversal.
thread0 (vfs_fhtovp) thread1 (vop_getattr) thread2 (zfs_recv)
-------------------- --------------------- ------------------
vn_lock
rrw_enter_read
rrw_enter_write (hangs)
rrw_enter_read (hangs)
vn_lock (hangs)
Reported by: Attila Nagy <bra@fsn.hu>
r206792:
Set ARC_L2_WRITING on L2ARC header creation.
Obtained from: OpenSolaris
r206793:
Remove racy assertion.
Obtained from: OpenSolaris
r206794:
Extend locks scope to match OpenSolaris.
r206795:
Add missing list and lock destruction.
r206796:
Style fixes.
r206797:
Restore previous order.
Space cleanup for revision 202669 committed separately for easier review.
This commit is purely space changes.
Submitted by: Matt Reimer
Sponsored by: VPOP Technologies, Inc.
Instead of assuming all vdevs are healthy, check the newest vdev label
for each vdev's status. Booting from a degraded vdev should now be
more robust.
Submitted by: Matt Reimer <mattjreimer at gmail.com>
Sponsored by: VPOP Technologies, Inc.
- Port bios_getmem() from libi386 to {gpt,}zfsboot() and use it to
safely allocate a heap region above 1MB. This enables {gpt,}zfsboot()
to allocate much larger buffers than before.
- Use a larger buffer (1MB instead of 128K) for temporary ZFS buffers. This
allows more reliable reading of compressed files in a raidz/raidz2 pool.
Correct some issues with zfs boot.
- Teach it to read gang blocks. (essentially untested)
If you see "ZFS: gang block detected!", please let
me know, so we can either remove the printf if it
works, or fix it if it doesn't.
- If multiple partitions exist on a disk, probe them all.
We also need to reset dsk->start to 0 to read the right
sector here.
- With GPT, we can have 128 partitions.
- If the bootfs property has ever been set on a pool
it seems that it never goes away. zpool won't allow
you to add to the pool with the bootfs property set.
However, if you clear the property back to default
we end up getting 0 for the object number and read
a bogus block pointer and fail to boot.
- Fix some error printfs. The printf in the loader is
only capable of c,s and u formats.
- Teach printf how to display %llu
to gptboot, i.e. installed in a freebsd-boot partition using /sbin/gpart or
/sbin/gpt.
Tweak the /boot/loader ZFS support so that it can find ZFS pools that are
contained in GPT partitions.
This bring huge amount of changes, I'll enumerate only user-visible changes:
- Delegated Administration
Allows regular users to perform ZFS operations, like file system
creation, snapshot creation, etc.
- L2ARC
Level 2 cache for ZFS - allows to use additional disks for cache.
Huge performance improvements mostly for random read of mostly
static content.
- slog
Allow to use additional disks for ZFS Intent Log to speed up
operations like fsync(2).
- vfs.zfs.super_owner
Allows regular users to perform privileged operations on files stored
on ZFS file systems owned by him. Very careful with this one.
- chflags(2)
Not all the flags are supported. This still needs work.
- ZFSBoot
Support to boot off of ZFS pool. Not finished, AFAIK.
Submitted by: dfr
- Snapshot properties
- New failure modes
Before if write requested failed, system paniced. Now one
can select from one of three failure modes:
- panic - panic on write error
- wait - wait for disk to reappear
- continue - serve read requests if possible, block write requests
- Refquota, refreservation properties
Just quota and reservation properties, but don't count space consumed
by children file systems, clones and snapshots.
- Sparse volumes
ZVOLs that don't reserve space in the pool.
- External attributes
Compatible with extattr(2).
- NFSv4-ACLs
Not sure about the status, might not be complete yet.
Submitted by: trasz
- Creation-time properties
- Regression tests for zpool(8) command.
Obtained from: OpenSolaris